Split data

Question

Hi, I have got another question, hopefully someone might be able to point me in the right direction. I am using the knn process and looping through a lot of data based on the name. I want to split my data but by different amounts depending on where I am through the data. My decisions are done on a time basis, the top part of my data is the earliest observations and the bottom the later observations. I will try and show an simple example below. Example Name 1 is in the data set 10 times. The first time it appears in the data set it will have no previous results so a KNN can not be done, so I would discard this example. The second time the name appears I want to base the KNN on the example that has happened before, so the top 10% of the data for Name 1 would go into creating the model. Then the current example would go into apply model and I would discard the other 80% (as from this examples point of view it has not happened yet so it is information I would not have at the time). The third time the name appears I would base the KNN on the two above examples, so the top 20% of data would go into creating the model. The the current example would go into apply model and I would discard 70%. I want to carry on doing this as per the below table. Make Model Apply Model Discard 4th 30% 10% 60% 5th 40% 10% 50% 6th 50% 10% 40% 7th 60% 10% 30% 8th 70% 10% 20% 9th 80% 10% 10% 10th 90% 10% 0% I should also note that names might occur different times sometimes just once other times over 20. I was hoping to use the split data function with a macro to split the data. I have the percentages in my data, but I am struggling to get the figures into my split data operator. This is my current operation, I have tried to use macros but have taken them out as it did not work and replaced them with some random ratio. Any help would be much appreciated. Thanks, Oli

oli · Answer

Hi,

Just wondered if anyone had any suggestions on this, all help very much appreciated.

Thanks,

Oli