Handling multiple nominal values in one category
e4gle
New Altair Community Member
Hello,
can I handle somehow (for instance - with a decission tree model) data with multiple nominal values (spearated- let's say- by commas) under one category? Like in category name: tags, values: rapid, miner, datamining... etc?
Thank You for Your help
can I handle somehow (for instance - with a decission tree model) data with multiple nominal values (spearated- let's say- by commas) under one category? Like in category name: tags, values: rapid, miner, datamining... etc?
Thank You for Your help
Tagged:
0
Answers
-
Hi,
sorry, but I don't understand your question. Could you give an example for that? What do you understand under category?
Greetings,
Sebastian0 -
Well, maybe the usage of word "cattegory" was unfortunate.
Let's say i have some files described by some atrributes, like "name" "category" "location" and "tags".
I want to know if i can somehow handle this last attribute- "tags" to take more than one nominal value.
For instance:
name - article1, category- sport, location- New York, tags- knicks, basketball, celtics
Is it clear enough now? Im a begginer in data mining and may not express myself clearly.0 -
Hi,
you have several options and which one is the best totally depends on what you are planning to do with the data:- In general, you could use the operators "Split" and "Merge" to handle those multiple nominal values for one attribute,
- Sometimes is might be better to handle this attribute with value type "text" and use the text processing operators, e.g. in order to determine how often certain tags are used
- In some cases, you might simply want to keep the tag collection as it is (maybe sort it) in order to calculate similarities etc. (although even in that case I would probably go for a text processing approach)
- ...
Hope that helps at least a bit. Cheers,
Ingo0 -
And is there a classification method that would handle multiple values of this "tags" attribute? The problem is not in splitting the values of this attribute, but in finding a way to handle all of it's values.0
-
Hi,
well, what's the difference between a classification scheme which is able to handle this itself and preprocessing the data so that all classification schemes can handle it? Right, with the latter - the more modular option - you have much more option to choose from. So I would always go for a well-thought preprocessing combined with a powerful and already existing classification method.
Cheers,
Ingo0