nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

Prediction Model + Result Analysis

asiddiq

Dear,
I have (24 Columns, and 5100 Rows) Data that contain the following attributes [Dengue Fever Data(district name, gender, nationality, week and year of record the case), Air quality Data (temperature, Humidity, rainfall, and other)], for the period between 2010 to 2018. I would like to create a prediction model that involve the following steps:
1. Dimensionality reduction
2. Clustering
3. Linear regression.
4. Time Series Analysis.

I have tried simple design but I got the following result, and I'm not sure if my work is right to not!?

Image: https://us.v-cdn.net/6038102/uploads/editor/rj/69p483jlyg7m.png

Find more posts tagged with

AI Studio

Clustering

Time Series

Regression

Sampling

Predictions + Scoring

Accepted answers

hbajpai

Okay, I think you can utilize the following steps after importing the data.

Feature Engineering
In this step you can capture seasons, Utilize 'Year' variable to generate nominal attributes common vaccine and medicines used. You might wanna consider aggregation of cases on basis of Districts/ Zones if you want to predict location risk areas.
Conversely, for predicting future patients, what is your underlying goal. For e.g, Predicting future patients on basis of the season and districts / Predicting the overall future patients in a particular month/ quarter / year.
Dimensionality Reduction(PCA) can be used to substitute over-correlated variables with a PC or two. You should store the pre processing model as you will need later for model inference re-scoring new data. However, make sure if you use PCA for your key attributes it might be difficult to understand their impact in the model at interpretability stage.
Clustering can be used to replace missing values. This can be done by utilizing impute missing values operator. You can use k-NN amongst other algorithms inside the sub-process.
Linear regression can be achieved by GLM (generalized linear model). You can also use Optimize parameters(Grid) for identifying best regularization parameter alpha. Finally, you can use explain predictions and model simulator to understand the dependency of model on various attributes.

I hope this helps, please reach out if you have any questions. You can also share sample data and your process if you need any further clarification.

All comments

hbajpai

Hi @asiddiq,

The result you shared shows the Linear regression model and it shows the coefficient of your variable as well as the importance of the variable. Since, you have the data for Dengue fever, are you trying to predict how many people will suffer for it based on a time series prediction? I am unable to follow your motivation for Dimensionality reduction and Clustering. Can you please elaborate?
Also, from your problem statement feature engineering in terms of seasonality and weather patterns would be an essential step for developing predictive model.

asiddiq

I would like to predict future patents and future location risk areas. The reduce dimension and clusters are work together to replace the missing values by using the k-nearest method. is it clear!

hbajpai

Okay, I think you can utilize the following steps after importing the data.

Feature Engineering
In this step you can capture seasons, Utilize 'Year' variable to generate nominal attributes common vaccine and medicines used. You might wanna consider aggregation of cases on basis of Districts/ Zones if you want to predict location risk areas.
Conversely, for predicting future patients, what is your underlying goal. For e.g, Predicting future patients on basis of the season and districts / Predicting the overall future patients in a particular month/ quarter / year.
Dimensionality Reduction(PCA) can be used to substitute over-correlated variables with a PC or two. You should store the pre processing model as you will need later for model inference re-scoring new data. However, make sure if you use PCA for your key attributes it might be difficult to understand their impact in the model at interpretability stage.
Clustering can be used to replace missing values. This can be done by utilizing impute missing values operator. You can use k-NN amongst other algorithms inside the sub-process.
Linear regression can be achieved by GLM (generalized linear model). You can also use Optimize parameters(Grid) for identifying best regularization parameter alpha. Finally, you can use explain predictions and model simulator to understand the dependency of model on various attributes.

I hope this helps, please reach out if you have any questions. You can also share sample data and your process if you need any further clarification.