Prediction Model + Result Analysis
asiddiq
New Altair Community Member
Dear,
I have (24 Columns, and 5100 Rows) Data that contain the following attributes [Dengue Fever Data(district name, gender, nationality, week and year of record the case), Air quality Data (temperature, Humidity, rainfall, and other)], for the period between 2010 to 2018. I would like to create a prediction model that involve the following steps:
1. Dimensionality reduction
2. Clustering
3. Linear regression.
4. Time Series Analysis.
I have tried simple design but I got the following result, and I'm not sure if my work is right to not!?
I have (24 Columns, and 5100 Rows) Data that contain the following attributes [Dengue Fever Data(district name, gender, nationality, week and year of record the case), Air quality Data (temperature, Humidity, rainfall, and other)], for the period between 2010 to 2018. I would like to create a prediction model that involve the following steps:
1. Dimensionality reduction
2. Clustering
3. Linear regression.
4. Time Series Analysis.
I have tried simple design but I got the following result, and I'm not sure if my work is right to not!?
0
Best Answer
-
Okay, I think you can utilize the following steps after importing the data.
- Feature Engineering
In this step you can capture seasons, Utilize 'Year' variable to generate nominal attributes common vaccine and medicines used. You might wanna consider aggregation of cases on basis of Districts/ Zones if you want to predict location risk areas.
Conversely, for predicting future patients, what is your underlying goal. For e.g, Predicting future patients on basis of the season and districts / Predicting the overall future patients in a particular month/ quarter / year. - Dimensionality Reduction(PCA) can be used to substitute over-correlated variables with a PC or two. You should store the pre processing model as you will need later for model inference re-scoring new data. However, make sure if you use PCA for your key attributes it might be difficult to understand their impact in the model at interpretability stage.
- Clustering can be used to replace missing values. This can be done by utilizing impute missing values operator. You can use k-NN amongst other algorithms inside the sub-process.
- Linear regression can be achieved by GLM (generalized linear model). You can also use Optimize parameters(Grid) for identifying best regularization parameter alpha. Finally, you can use explain predictions and model simulator to understand the dependency of model on various attributes.
1 - Feature Engineering
Answers
-
Hi @asiddiq,
The result you shared shows the Linear regression model and it shows the coefficient of your variable as well as the importance of the variable. Since, you have the data for Dengue fever, are you trying to predict how many people will suffer for it based on a time series prediction? I am unable to follow your motivation for Dimensionality reduction and Clustering. Can you please elaborate?
Also, from your problem statement feature engineering in terms of seasonality and weather patterns would be an essential step for developing predictive model.0 -
I would like to predict future patents and future location risk areas. The reduce dimension and clusters are work together to replace the missing values by using the k-nearest method. is it clear!0
-
Okay, I think you can utilize the following steps after importing the data.
- Feature Engineering
In this step you can capture seasons, Utilize 'Year' variable to generate nominal attributes common vaccine and medicines used. You might wanna consider aggregation of cases on basis of Districts/ Zones if you want to predict location risk areas.
Conversely, for predicting future patients, what is your underlying goal. For e.g, Predicting future patients on basis of the season and districts / Predicting the overall future patients in a particular month/ quarter / year. - Dimensionality Reduction(PCA) can be used to substitute over-correlated variables with a PC or two. You should store the pre processing model as you will need later for model inference re-scoring new data. However, make sure if you use PCA for your key attributes it might be difficult to understand their impact in the model at interpretability stage.
- Clustering can be used to replace missing values. This can be done by utilizing impute missing values operator. You can use k-NN amongst other algorithms inside the sub-process.
- Linear regression can be achieved by GLM (generalized linear model). You can also use Optimize parameters(Grid) for identifying best regularization parameter alpha. Finally, you can use explain predictions and model simulator to understand the dependency of model on various attributes.
1 - Feature Engineering