Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Handling multiple nominal values in one category
e4gle
Hello,
can I handle somehow (for instance - with a decission tree model) data with multiple nominal values (spearated- let's say- by commas) under one category? Like in category name: tags, values: rapid, miner, datamining... etc?
Thank You for Your help
Find more posts tagged with
AI Studio
ETL + Data Prep
Accepted answers
All comments
land
Hi,
sorry, but I don't understand your question. Could you give an example for that? What do you understand under category?
Greetings,
Sebastian
e4gle
Well, maybe the usage of word "cattegory" was unfortunate.
Let's say i have some files described by some atrributes, like "name" "category" "location" and "tags".
I want to know if i can somehow handle this last attribute- "tags" to take more than one nominal value.
For instance:
name - article1, category- sport, location- New York, tags- knicks, basketball, celtics
Is it clear enough now? Im a begginer in data mining and may not express myself clearly.
IngoRM
Hi,
you have several options and which one is the best totally depends on what you are planning to do with the data:
In general, you could use the operators "Split" and "Merge" to handle those multiple nominal values for one attribute,
Sometimes is might be better to handle this attribute with value type "text" and use the text processing operators, e.g. in order to determine how often certain tags are used
In some cases, you might simply want to keep the tag collection as it is (maybe sort it) in order to calculate similarities etc. (although even in that case I would probably go for a text processing approach)
...
Which one is the best option depends, but in general you can handle this setting with "Split" and "Merge" and define a separating character like '#' or something else which does not occur in your tags.
Hope that helps at least a bit. Cheers,
Ingo
e4gle
And is there a classification method that would handle multiple values of this "tags" attribute? The problem is not in splitting the values of this attribute, but in finding a way to handle all of it's values.
IngoRM
Hi,
well, what's the difference between a classification scheme which is able to handle this itself and preprocessing the data so that all classification schemes can handle it? Right, with the latter - the more modular option - you have much more option to choose from. So I would always go for a well-thought preprocessing combined with a powerful and already existing classification method.
Cheers,
Ingo
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups