Prediction for next orders, any ideas?
Dear Community!
I have a .csv file with 100.000 rows and 439 columns. This spreadsheet represents the customers' habits for using a specific service. For each rows there is an ID for every customer and every transaction date with the following format: 1 for Monday, 2 for Tuesday... etc. I need to predict the next date of transaction for every customer, using these past records.
Here's an example for the format of the database:
customer_id transaction1 transaction2 ... transaction438
1 1 2 3 4 5 6 7 ... 745 746 747
2 2 7 16 20 21 23 28 ... 412
3 1 2 3 4 5 6 7 ... 285 322
4 5 7 8 12 14 19 21 ... 924 925 926
Any ideas what model should I use for this prediction for the best accuracy?
NOTE: The database have lots of missing values depends on the frequency of ordering.
Answers
-
This looks like some sort of sales projection analysis. I would look at the process I shared here: http://community.rapidminer.com/t5/RapidMiner-Studio/How-to-get-forecast-values-of-future-from-time-series-data/m-p/37698
You would need to do a bit of missing value replacements using the Replace Missing Values operator and need to install the Series extension from our marketplace. Is there seasonality involved?
0 -
It is a homework at the university, we are learning the basics of RapidMiner. We needed to do similar examples earlier, but there was a label column for the learning database, but this time I have no clue, how I could predict the possible outcome without that special column. I thinked about some sort of pattern analysis, or converting the database to a range from 1 to 7 to simplify the problem, but I couldn't move along to a real solution.
I think seasonality doesn't matter, because it's just an example.
0 -
If it's sales, you could sum up the values and do a Total Sales per month or week? You can use the dates as your ID and then the Total Sales as you Label.
0 -
Because the database contains the days of transaction in a code format, not the quantity, making totals is not possible or making sense.
0 -
AH! Did you try the Generalized Sequential Patterns operator?
0