How to Analyze Time Data Per Person
I am very new to RapidMiner and have the following task to do:
I have data collected from activity trackers for different individuals. The trackers show step count, heart rate, and blood pressure and how they change every second. I want to use the step count data to predict blood pressure using different machine learning models. However, I am struggling to set up the data, because of the millions of time data corresponding to one person ( I have a total of 20 people). Any suggestions?
Best Answers
-
Hi @n_alkassab, and welcome to the RapidMiner Community!
In order to help you, let's decompose this problem in some steps first, to help you with this:
- Getting the repository prepared.
- Adding data to the repository.
- Training your models with time series data.
Why? Because since every person is different, so training a single model might be a bit overkill. That is the approach I used, at least
Getting the repository prepared.
To begin, you need your data split in two example sets: one for the people in your study and the other one for the measurements. What we are planning ahead is to build a way to iterate over the patient example set and read the measurements example set, filter by the patient ID and train a single model
I would create a new repository with this shape:
Figure 1: Data, Processes and Models, because we will have one model per person.Once you have these, you can import your data. I created a simple CSV with Patient ID, Patient Name, Date, Systolic, Diastolic, Pulse. You can find that example one attached to this answer. Of course, that's not the same data you have, but it will help us setting up the rest of the example.
Adding data to the repository.
I imported my data to the repository under the name of Original Patient Data. You can use the Read CSV or Read Excel operators, but for this little example, I wanted my data inside the RapidMiner repository.
Then you should obtain a list of patients and a list of measurements separately. I built a process for this, named it Processes/01 Prepare Patient Data and saved it.
Figure 2: How to prepare data. The process is called "01 Prepare Patient Data" and is also attached.Training your models
Finally, to train your models, you should make use of the Loop Examples operator in combination with the Extract Macro operator. Here is a picture:
Inside the Loop Examples operator, I have this:
Basically what I do is to extract the Patient ID and Patient Name from a Macro, read all the measures, filter examples per each patient, select only the data I need for my model, train my model with that data and store the results. In this case, I save each cluster model visualization generated from clustering data. I wouldn't want to take from you the joy of building stuff.
This model I made is called 02 Train Models, and the result is that it saves the models for each patient in the Models directory from your newly created repository.
From this, you should be able to train your model and apply the corrections needed but you have a working sample. I attached the repository too, so you can know how things work there.
Hope this helps,
Rodrigo.
-2 -
Hi @n_alkassab,
BTW, further adjustments you can make:- Store your data once it is filtered, so you don't have to work with millions of records but just the ones you need on every second.
- Store your data in a relational database so you don't have to redo everything every single time. I always recommend PostgreSQL for these things.
All the best,
Rodrigo.0
Answers
-
Hi @n_alkassab, and welcome to the RapidMiner Community!
In order to help you, let's decompose this problem in some steps first, to help you with this:
- Getting the repository prepared.
- Adding data to the repository.
- Training your models with time series data.
Why? Because since every person is different, so training a single model might be a bit overkill. That is the approach I used, at least
Getting the repository prepared.
To begin, you need your data split in two example sets: one for the people in your study and the other one for the measurements. What we are planning ahead is to build a way to iterate over the patient example set and read the measurements example set, filter by the patient ID and train a single model
I would create a new repository with this shape:
Figure 1: Data, Processes and Models, because we will have one model per person.Once you have these, you can import your data. I created a simple CSV with Patient ID, Patient Name, Date, Systolic, Diastolic, Pulse. You can find that example one attached to this answer. Of course, that's not the same data you have, but it will help us setting up the rest of the example.
Adding data to the repository.
I imported my data to the repository under the name of Original Patient Data. You can use the Read CSV or Read Excel operators, but for this little example, I wanted my data inside the RapidMiner repository.
Then you should obtain a list of patients and a list of measurements separately. I built a process for this, named it Processes/01 Prepare Patient Data and saved it.
Figure 2: How to prepare data. The process is called "01 Prepare Patient Data" and is also attached.Training your models
Finally, to train your models, you should make use of the Loop Examples operator in combination with the Extract Macro operator. Here is a picture:
Inside the Loop Examples operator, I have this:
Basically what I do is to extract the Patient ID and Patient Name from a Macro, read all the measures, filter examples per each patient, select only the data I need for my model, train my model with that data and store the results. In this case, I save each cluster model visualization generated from clustering data. I wouldn't want to take from you the joy of building stuff.
This model I made is called 02 Train Models, and the result is that it saves the models for each patient in the Models directory from your newly created repository.
From this, you should be able to train your model and apply the corrections needed but you have a working sample. I attached the repository too, so you can know how things work there.
Hope this helps,
Rodrigo.
-2 -
Hi @n_alkassab,
BTW, further adjustments you can make:- Store your data once it is filtered, so you don't have to work with millions of records but just the ones you need on every second.
- Store your data in a relational database so you don't have to redo everything every single time. I always recommend PostgreSQL for these things.
All the best,
Rodrigo.0 -
Thank you soo much ! you saved me a ton of time I really appreciate it2