COVID-19 board
sgenzer
Altair Employee
Hello all RapidMiner community members -
I have been hesitant to post a new discussion about COVID-19 as there is so much already out there, but I am sincerely concerned about the well-being of our RapidMiner family. I am also very interested to hear if anyone out there is either (a) working on any data science COVID-19 projects, and/or (b) any service projects that you may be leading/participating in that is helping COVID-19 patients or research in your local community.
So please use this discussion board to share, discuss, and support one another. I sincerely hope you are well during this very difficult time, and my deepest sympathies to those who are either ill or directly affected by friends or family that are suffering.
Scott
I have been hesitant to post a new discussion about COVID-19 as there is so much already out there, but I am sincerely concerned about the well-being of our RapidMiner family. I am also very interested to hear if anyone out there is either (a) working on any data science COVID-19 projects, and/or (b) any service projects that you may be leading/participating in that is helping COVID-19 patients or research in your local community.
So please use this discussion board to share, discuss, and support one another. I sincerely hope you are well during this very difficult time, and my deepest sympathies to those who are either ill or directly affected by friends or family that are suffering.
Scott
8
Comments
-
for anyone who wants to help Kaggle is having a special competition for Covid-19. not sure if you can use RapidMiner. but any input there would be helpful I'm sure.
https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/
4 -
Dear RM friends,
I use to love the quote "don't waste the hours of daylight to what you can do at night". With the years, and > 1000 24h/36h shifts as anesthesiologist and emergency physician later, I try to live an ascetic life or structured as you will, where sleep, healthy food and minimizing sensorial input are essential when they are accessible. I reality the day starts between 5-6 am where I train my 2 german shepherds fully skilled as personal protection working dogs trained to be teddy bears when they are not working or training (22/24h). Not that I live in a dangerous region or I am paranoid but accomplishing difficult tasks with another intelligent creature is very satisfying. Probably a similar satisfaction as interacting with you all. Back to the current reality:- The covid-19 pandemia affects our view on health, disease, manufacturability of life, even for me being confronted with end of life every day multiple times
- This pandemia required essential and complex structural changes in patient flows in a hospital finding a balance between protecting fragile non-covid positive patients from the pregnant young women to the critical ill patient with abdominal sepsis AND all positive or potentially positive patients in need for exactly the same care plus the extra invasive care when they are covid-19 positive and become critically ill.
- Nobody has any clue what the next days will look like in terms of the number of patient admissions testing positive for covid-19. Here we are looking in the direction of the data science communities to better predict disease spreading taking into account that the numbers of covid-19 patients reported are far from the actually infected patients because a lot of patients have no symptoms but are potentially able to spread and the fact that the trigger to test for covid-19 is geographically different AND is changing over time.
- We try to use our skilled nurses and doctors strategically realizing that we are all exposed and that 10 weeks is a realistic range for a flue epidemic. Even with the best protection (personal protective equipment (PPE)) consuming a massive amount of hand sanitizers, gloves, FFP-2 and FFP-3 masks, goggles and face shields and biohazard clothing (beekeepers like) which make you sweat like hell, we are exposed more than anyone else at this moment and we need to stay healthy for some weeks, keeping our family free from infection.
- There are several efforts developing, testing new drugs or vaccins but there is no hope we will be able to use them even as compassionate use or whatever.
- We all failed by our premature GDPR, HIPAA, etc guidelines not being adapted to find a better balance between privacy and the global value of being able to study spreading based on smartphone sensors, gps, social media content in an early phase in order to prevent further spreading. Examples such as the twitter ebola spread analysis would be welcome.
- Additionally, the majority of the population is saturated with questions where chatbots could be helpful. (They exist, but more translations and locally adapted versions are welcome)
- We (my colleagues with me) all spend more time in the hospital than being at home, but everyone is proud to provide the care needed. They learn to have meetings by zoom etc, a step forward how funny this must sound for you all.
Maybe more reflections in the near future if I find some time. - Btw my wife just arrived at home (also anesthesiologist and was on call). She was kind of emotional of how this will end.
Cheers Sven, Keep the spirit high and use your time to solve real world problems!
3 -
I work for a logistics consulting firm that has a contract with DLA. We just completed a study of the domestic capacity for reagents. The study concluded that normal capacity was not at risk for the continental U.S. Surge, meaning the ability to produce large quantities quickly, was not studied and now we are being questioned on our conclusions. Keeping in mind that capacity and surge are two different things, we are continuing to study data trends along the supply chain and provide answers to our client, DLA. That said, the level of data science involved from our specific firm has been minimal. Many other government organizations are providing superb graphics and models that we are passing along. Our primary concern is how the disrupted supply chains ultimately affect the U.S. warfighter. Thus, it's not the reagents or the related medical equipment that is of concern. It's how the shutdown and quarantine protocols affect production and delivery for all of the OTHER supply chains critical to readiness. What data sets are others looking at for how the shutdown is affecting logistics?2
-
Dear all,
DocMusher : your post is impressive, thanks for sharing!
My wife and me both work for big companies having global production of goods for multiple customers. Both companies still handle the topic mostly manual, consolidating data from several stakeholders manually in order to roughly identify potential countermeasures, aligning with governments or re-focus strategies.
Why?
Data science needs well designed models and structures. You can perform data analysis from SAP easily, but what are the answer you will find and what are measures/actions you derive from the analysis? Countermeasures/actions in regard to this unforeseen situation still needs human intelligence, creativity and flexibility, which should be available in good management (but not mandatorily IS available ;-))...
************
I started to collaborate with @mbs (see other post: https://community.rapidminer.com/discussion/56951/huge-field-trial-regarding-global-economy-ecology-and-society#latest).
We want to analyse global data in multiple areas in order to evaluate positive and negative effects of covid19 on ecology, economy and society. As an outcome, my wish is to publish a paper with results that are beneficial for the society (e.g. key messages easy to read and having impact on daily live...)
We came to following interrim conclusion:
1. We have to wait until more data are available, minimum 150days.
--> However, some data are only available on daily basis, so we are collecting some single data separately
2. We have to ask the right questions in order to find the right answers (sounds simple, but is essential).
--> What is possible to derive from the data, what is logical?
3. It is difficult to get data from different sources.
--> I am willing to get access to statista or tradingeconomics, but it is also helpful to check other sources or find partners to deliver information such as information on consumption, surveys, newspaper articles etc. But i can't spend too much time for dilligence work...
4. And maybe the biggest challenge is to differ between "natural fluctuations of economy", direct effects of COVID19 and indirect effects, thus classical question of causality...
--> Therefore I think on having sub-model approach, thus dividing the overall topic into sub-systems with different topics. And adding also fixed relations (Y=a*X) and logical relations (if 'A' and 'B', then 'C')
Who else is willing to collaborate within this activity?
Minimum outcome is personal learning on how to deal with such kind of complex data analysis problems. Maximum outcome is a paper that might have impact to our society.
Please give PN and let's align on HOW to collaborate!
Jan1 -
I am always interested in any initiative with impact for now or for a next wave, but enabling flows, data gathering, predictive analysis, etc really should result in a measurable impact, otherwise its only interesting. I am in when ROI (valued by " reducing damage control") is the primary goal.
Thanks
Sven1 -
Sven, i fully understand your comment. You are "hands-on" and focussed on COVID19.
However, ROI can be short-term and selective or long-term and holistic.
I am Scientist and I want to change long-term. Based on this single disruptive impact of COVID, I want to find understanding, clear rational and cause-effect-relation evidence EXAMPLES, how we as human being shall change our behavior in order to act sustainable for the future. Furthermore, I want to find ways, how to derive valuable information from data - the essential task of data science (knowing that there are plenty of theories already available).
For concrete action, there are plenty of tools such as this paper showing how to predict COVID spread via Kalman-Filter: https://towardsdatascience.com/using-kalman-filter-to-predict-corona-virus-spread-72d91b74cc8
1 -
Fully agree, keep me in touch, contact me where I can help, I am in.
Sven1 -
@DocMusher and @User23311,
Hello
Great discussion
I agree with you. Let continue it with private message.
Thank you
mbs1 -
@User23311
Very interesting and useful. If you need anything related to Data science, do let me know. I am more than happy to help with this project.
1 -
How to measure social distancing within legal boundaries?
https://www.dailymail.co.uk/sciencetech/article-8125355/US-government-talks-Facebook-Google-track-coronavirus.html
https://www.newscientist.com/article/2238136-google-may-help-uk-officials-track-coronavirus-social-distancing/
1 -
1
-
@DocMusher
Well, this is not big data, but giant data...
This is way above our level regarding power.
But I was wondering what you can do with more data available for simple folks like us. Public available. And combining those diverse and meaningful data to new insights/findings.2 -
RM's Point map is very useful for drawing Corona cases around the world. I think you can achieve more with Python Basemap library with much more effort. I used the JHU dataset and append all types of status ("confirmed", "death" and "recovered").
Data is daily updated. We can model the rate of transmission by using neural nets build-in block and have an idea about what's going to happen soon.
Wish you all healthy days. DE.
4 -
Hey @dedeer ,nice work. I saw on your profile, that you search for a twitter connector for sentiment analysis. Is this solved?BR,Martin1
-
@dedeer
Indeed nice work, would you mind to share your process here?
If someone is interested in a general epidemic calculator, this looked interesting although I lack the time to review in depth, I think @mschmitz has the background to assess its correctness from math point of view.
Cheers, stay healthy, use this period as an opportunity to value what is important.
Sven2 -
Just in case anyone needs it, this is how you pull John Hopkins data from Github:<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="open_file" compatibility="9.6.000" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
<parameter key="resource_type" value="URL"/>
<parameter key="filename" value="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"/>
<parameter key="url" value="http://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"/>
</operator>
<operator activated="true" class="read_csv" compatibility="9.6.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
<parameter key="column_separators" value=","/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="store" compatibility="9.6.000" expanded="true" height="68" name="Store" width="90" x="313" y="34">
<parameter key="repository_entry" value="../data/confirmed cases"/>
</operator>
<operator activated="true" class="open_file" compatibility="9.6.000" expanded="true" height="68" name="Open File (2)" width="90" x="45" y="136">
<parameter key="resource_type" value="URL"/>
<parameter key="filename" value="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"/>
<parameter key="url" value="http://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv"/>
</operator>
<operator activated="true" class="read_csv" compatibility="9.6.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="179" y="136">
<parameter key="column_separators" value=","/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="store" compatibility="9.6.000" expanded="true" height="68" name="Store (2)" width="90" x="313" y="136">
<parameter key="repository_entry" value="../data/deaths"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Store" to_port="input"/>
<connect from_op="Store" from_port="through" to_port="result 1"/>
<connect from_op="Open File (2)" from_port="file" to_op="Read CSV (2)" to_port="file"/>
<connect from_op="Read CSV (2)" from_port="output" to_op="Store (2)" to_port="input"/>
<connect from_op="Store (2)" from_port="through" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="84"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
0 -
hi all -
I am just amazed and awed by this discussion. I think I speak for everyone at RapidMiner saying that you are all in our thoughts. Huge respect for @DocMusher for his amazing work and powerful message.
Scott
0 -
Hey @mschmitz
Thank you very much for your follow up. I am very new in the community and it is great that RM has a very powerful one.
I figured out a way with Twitter and Aylien (not supported anymore I guess) connections and Aylien Extract Sentiment Block for quick look-up the re-tweets sentiment (objective/subjective) after health minister's tweet + 2 hours.
I just want to add text processing by creating a document of the most recent 100 tweets every hour updated and see some words' vectors during this period about Covid-19.
Any suggestions for improving?
Thanks all, stay healthy.
1 -
@DocMusher
Thank you very much for the link it is very great for the application of math models. I think I can inject them in to Generate Attribute block and create a forecasted set for each country and/or state.
Here is my design view for ETL.
You can append and run for playing with the dataset at the Results View Visualization tool. I made a filter for China and Confirmed cases then selected attributes for dates and date-related attributes, applied transpose. Output as below:
Now we can easily visualize rate like:
From the first day of the confirmed case till the last date. As w can observe there is a good resistance at some point which is correlated with the population at the state, that confirmed cases begin decelerating. We can add text data of related country's precautions by authorities, plus or minus couple days, we can measure how it worked.
It is very premature at the moment. I have another design with a neural network block to estimate the next days' confirmed cases.
Any suggestions?
Stay healthy.
1 -
-
@DocMusher
it is excellent! thanks for sharing.0 -
Hi @dedeer,I just want to add text processing by creating a document of the most recent 100 tweets every hour updated and see some words' vectors during this period about Covid-19.You need a RapidMiner Server for this, which is part of the educational program. Do you have one?
Best,
Martin1 -
RM Server can be installed on your laptop. THen of course your laptop needs to run to execute the process every day. You can find installation guides and so on at: https://docs.rapidminer.com/latest/server/Best,Martin
1 -
-
I recently received this answer to my social distancing question. The critical issue here is the balance of individual location privacy vs. public health common good. The latter takes precedence over the former during major public health emergencies and pandemics. In Israel, for example, they had to invoke emergency spy powers (https://www.bbc.co.uk/news/technology-51930681) to develop their COVID-19 HaMagen app (https://www.standardmedia.co.ke/article/2001365512/israel-launches-app-alerting-users-of-exposure-to-coronavirus). In Canada, Toronto is using cellphone location data from telecoms to encourage social distancing (https://nationalpost.com/technology/city-of-toronto-gathering-cellphone-location-data-from-telecoms-in-bid-to-slow-spread-of-covid-19-tory/wcm/f916f892-b47d-43a4-85aa-2a214e136ee0). Other countries are doing the same/developing similar apps; see some relevant news links at http://healthcybermap.org/WHO_COVID19/#10P.S. A description of the corresponding app in use in China is available at https://tinyurl.com/wrccsfw (video: https://youtu.be/3K3fy5eKeuM?t=722)2
-
wow @DocMusher I had not seen these apps before. Thank you. Fascinating.
On a similar note, I was reflecting this morning that this COVID-19 pandemic will provide data for at least 20 years of PhD theses in every field imaginable...3 -
I would appreciate if people from different countries could write either directly to me or put posts whether in their country the tuberculosis vaccine is obligatory in their country and if not, in which year it ceased to be obligatory. There is a hypothesis, that this vaccination alleviates the course of illness.
0 -
Social distancing, google mobility:
https://www.google.com/covid19/mobility/
https://globalnews.ca/news/6775542/google-mobility-reports-a-slippery-slope-cyber-security-expert/
FYI0 -