Newbie Needs Direction

New to this. Within my data set there are subsets of rows defined by a unique ID. Each ID represents an independent event. How do I set up a scenario that first treats each ID independently and then apply models accross the events?
Best Answer
-
Correct, but you can easily create that using the "Generate ID" operator first, which will assign a unique id to every row, and then run the Pivot operator after that. And your problem should be solved!
0
Answers
-
Hello @dhc and welcome! It would probably help if you post a small example of your data to make sure we are interpreting your explanation properly and to understand the specific structure of your data. In general, it sounds like what you want to do is either pivot the data (use the "Pivot" operator) so you take multiple sub-events and put them together into a single row based on the unique id for the event and keep all the detailed data associated with each sub-event in separate variables. Or if they are all numeric attributes and you want to take only certain formulations such as the sum or average or count, then you can do that via the "Aggregate" operator. Either way, you will end up with a dataset that has only as many rows as you have unique event ids, and at that point you should be able to apply standard modeling techniques. Don't forget for supervised learning that you'll need to define your label (outcome) variable using the "Set Role" operator as well.
@stevefarr you may want to move this to the product help section rather than community news.
Best,
1 -
Yes I agree i should have started in other topic - how do I move?
Brian - thanks. Here is screen shot (doesn't show the label attribute.)
Im mining horse racing data…. Each value in column A represents a race, so the remaining attributes are relevant in the context of that race only.
.
0 -
I just explored the Pivot operator - looks like I need a uniqed identifier within each group - correct?
0 -
0
-
Correct, but you can easily create that using the "Generate ID" operator first, which will assign a unique id to every row, and then run the Pivot operator after that. And your problem should be solved!
0 -
The results were unwieldy. The ID's need to be seeded with 1 for each "primary key", I added an attribute that serves the purpose. Not sure pivot is way to go. Anyway - thanks for help. I'll keep trying
0