Handling multiple nominal values in one category

e4gle
e4gle New Altair Community Member
edited November 2024 in Community Q&A
Hello,
can I handle somehow (for instance - with a decission tree model) data with multiple nominal values (spearated- let's say- by commas) under one category? Like in category name: tags, values: rapid, miner, datamining... etc?

Thank You for Your help

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • land
    land New Altair Community Member
    Hi,
    sorry, but I don't understand your question. Could you give an example for that? What do you understand under category?

    Greetings,
      Sebastian
  • e4gle
    e4gle New Altair Community Member
    Well, maybe the usage of word "cattegory" was unfortunate.

    Let's say i have some files described by some atrributes, like "name" "category" "location" and "tags".

    I want to know if i can somehow handle this last attribute- "tags" to take more than one nominal value.

    For instance:
    name - article1, category- sport, location- New York, tags- knicks, basketball, celtics

    Is it clear enough now? Im a begginer in data mining and may not express myself clearly.
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    you have several options and which one is the best totally depends on what you are planning to do with the data:
    • In general, you could use the operators "Split" and "Merge" to handle those multiple nominal values for one attribute,
    • Sometimes is might be better to handle this attribute with value type "text" and use the text processing operators, e.g. in order to determine how often certain tags are used
    • In some cases, you might simply want to keep the tag collection as it is (maybe sort it) in order to calculate similarities etc. (although even in that case I would probably go for a text processing approach)
    • ...
    Which one is the best option depends, but in general you can handle this setting with "Split" and "Merge" and define a separating character like '#' or something else which does not occur in your tags.

    Hope that helps at least a bit. Cheers,
    Ingo
  • e4gle
    e4gle New Altair Community Member
    And is there a classification method that would handle multiple values of this "tags" attribute? The problem is not in splitting the values of this attribute, but in finding a way to handle all of it's values.
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    well, what's the difference between a classification scheme which is able to handle this itself and preprocessing the data so that all classification schemes can handle it? Right, with the latter - the more modular option - you have much more option to choose from. So I would always go for a well-thought preprocessing combined with a powerful and already existing classification method.

    Cheers,
    Ingo

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.