Hi
For a Big Data university class we have to do a predictive analysis of a data file. It is about people visiting a gym
and my task is to build a model, which can predict, when the gym is too crowded.
I am in the very beginning of this class, therefore we only work with Nested Holdout Testing, Cross Validation and Random Forrests.
We have to answer the following questions:
1.) Which usage pattern of the gym can you identify on the basis of a visual analysis of the data set
2.) What is the generalisation performance of your "best" model? Does it tend to strong overfitting?
3.) Which differences can you observe between a decision tree and a random forest?
4.) When would you suggest to someone to go to the gym?
Description of the data:
• Number of people
• timestamp (number of seconds since beginning of day)
• day_of_week (0 - 6)
• is_weekend (0 or 1)
• is_holiday (0 or 1)
• apparent_temperature (degrees fahrenheit)
• temperature (degrees fahrenheit)
• is_start_of_semester (0 or 1)
I know how to build the models (more or less) , but I'm having a hard time reading the essential informations out of it.
Any help is kindly appreciated.
Attached you'll find the csv file