How to Use/Model "Time Series" Data for College Athletics Finance Data
S_R_Webster
New Altair Community Member
Greetings all:
I am a student and new to the community, so please take it easy on me for this first go around. I have read some of the questions and responses to other time series questions but am still not finding an answer, or maybe just not understanding the answers given, or both. I don't have a model to share yet because that is where I am stuck to begin with. I would like to have three separate models to use for a predictive analysis project I have for a data science class. We only are concerned with training and prediction, not testing. We briefly learned how to run a simple linear regression, decision tree, and logistic regression model and I thought the data I had for my project could be used for all three. However, this was not including the fact that my data is based on time series and what we learned did not have that element in there. My data is based on roughly 77 different universities for about 13 years of data. The goal is to train the data on the first 12 years and then use the 13th year features for running the prediction and determining the target (in our case, amount of profit/loss for linear, and profitable-yes/no for decision tree and logistic regression). I am not sure what the best way to attack this problem is. Anything helps at this point. Thank you everyone in advance!
I am a student and new to the community, so please take it easy on me for this first go around. I have read some of the questions and responses to other time series questions but am still not finding an answer, or maybe just not understanding the answers given, or both. I don't have a model to share yet because that is where I am stuck to begin with. I would like to have three separate models to use for a predictive analysis project I have for a data science class. We only are concerned with training and prediction, not testing. We briefly learned how to run a simple linear regression, decision tree, and logistic regression model and I thought the data I had for my project could be used for all three. However, this was not including the fact that my data is based on time series and what we learned did not have that element in there. My data is based on roughly 77 different universities for about 13 years of data. The goal is to train the data on the first 12 years and then use the 13th year features for running the prediction and determining the target (in our case, amount of profit/loss for linear, and profitable-yes/no for decision tree and logistic regression). I am not sure what the best way to attack this problem is. Anything helps at this point. Thank you everyone in advance!
Tagged:
0
Best Answer
-
@S_R_Webster great so you are going to build a model that predicts if a University is going to lose or earn money on a given year taking into consideration the outcomes of previous years.
You have some interesting data on to work with you may play with binning some off your attributes and please keep in mind that a model should only consider information that is available at the time of the prediction in order to work.
So in order to train your model you'll need to take into consideration only information that was available to predict 2018. And once you have trained and optimized your model you can create a dataset that will be able to predict 2019 outcome.
This means that instead of taking 2017_Total Ticket Sales you'll work with something like PreviousYear_Total Ticket Sales
and you'll do all that on the ETL. And also you can create attribute like Previous_Year_Profit/Loss, 2yearbefore_Profit/Loss
and maybe those attribute could capture the University trend.
1
Answers
-
@S_R_Webster Hi I'll try to help you could you send us an image of the dataset you are using or could you name the columns you have?
After reading your file go to File--> Print/Export_Image to obtain an image of your DataSet.
The main thing you'll need to do is the ETL to convert your DATA into an example set that would help us predict with a model your label (cost or yes/no) . Time data can be transformed into attribute like. First Date, Age, TimeSinceX or Time between Y and X but without a sample of your data is difficult to have some ideas.
For the Time Series you could take this little course
https://academy.rapidminer.com/learn/course/time-series-analytics/time-series-analytics/data-preparation-and-analysis
2 -
Thank you for responding!
I am not sure I can share the data set, not sure how the rights of use would go with sharing. One data base is cafidatabase and the other date comes from sports-reference. We have University, University ID (self-made), Year(s) (from 2005-2018), NCAA Football Win-Percentage per school for each year, NCAA Football Simple Rating System (a strength of schedule measure) per school for each year, Total Football Spending for school for each year, Total Football Coaching Pay for school for each year, Total Recruiting Expenditures (as an aggregate for all athletics, men and woman) for school for each year, Total Facility/Equipment Expenditures (as an aggregate for all athletics, men and woman) for school for each year, Total Ticket Sales (as an aggregate for all athletics, men and woman) for school for each year, Total Revenue (as an aggregate for all athletics, men and woman) for school for each year, Total Expenditures (as an aggregate for all athletics, men and woman) for school for each year, Profit/Loss "our target/label" (as an aggregate for all athletics, men and woman) for school for each year determined by subtracting Total Expenditures from Total Revenues. Does this information help? Thank you for the link to tutorial as well. I will also check that out as soon as I have a chance. I don't necessarily want anyone to do the work for me, just lead me to the water so I can drink.0 -
@S_R_Webster ok based on what you description we could do some things
You already have your first Data Set on which you could work with Automodel I think you could Remove College ID and University from the equation and you'll have a simple dataset that uses your Label Profit/Loss as a predictor.
That way you'll have a model that will take into consideration al your numeric data and build a first model.
I don´t know if you have explored your data first and this needs to be the firs step. Read your data and explore the statistics and graphs for each attribute.- Do you have outliers on your data?
- What happens when you use the Year value as a color
- Make a scatter plot with at least to of your attributes and see what you notice
Best regards and hope this helps you.
Automodel video https://academy.rapidminer.com/learn/video/auto-model-classification
Turbo Prep video https://academy.rapidminer.com/learn/video/turbo-prep-introduction
3 -
@ MarcoBarradas I am in the process still of cleaning the data and fixing/estimating what little missing data I have and looking for outliers and what not. I was not sure if this was actually a time series matter or not to be honest, I just knew I had 13 years of data and wanted to use the first 12 years for each university to train on, then taking the feature values for each university for the 13th year to predict the 13th year target/label for each school. I will check out the to links you provided as well later this afternoon when I get off from work. Thanks again for following up and assisting with this, my deepest gratitude!0
-
@S_R_Webster great so you are going to build a model that predicts if a University is going to lose or earn money on a given year taking into consideration the outcomes of previous years.
You have some interesting data on to work with you may play with binning some off your attributes and please keep in mind that a model should only consider information that is available at the time of the prediction in order to work.
So in order to train your model you'll need to take into consideration only information that was available to predict 2018. And once you have trained and optimized your model you can create a dataset that will be able to predict 2019 outcome.
This means that instead of taking 2017_Total Ticket Sales you'll work with something like PreviousYear_Total Ticket Sales
and you'll do all that on the ETL. And also you can create attribute like Previous_Year_Profit/Loss, 2yearbefore_Profit/Loss
and maybe those attribute could capture the University trend.
1