Grouping of classes

caste
caste New Altair Community Member
edited November 5 in Community Q&A
Hi

With RapidMiner is it possible to automatically collapse the classes in a learning set on a given number of classes by their cardinality so that variance? The goal is to improve the precision of methods such as SVM and KNN.

I have a learning set of 20.000 elements divided in more than 100 classes, with high variance in the number of elements and I need to reduce them to 20 classes.

For example:

Class A - 3 elements
Class B - 4 elements
Class C - 8 elements

It would be nice to have the opportunity to reduce to a given number of classes, i.e. 2 this way:

Class 1 - 7 elements (obtained by Class A and B)
Class 2 - 8 elements (obtained by Class C)


Please, help me!! I'm trying with operations research methods but have so less time...

Thank you!


Tagged:

Answers

  • land
    land New Altair Community Member
    Hmm,
    I'm not quite sure if I understood you correctly. You want to merge most similar classes to improve the precision? But this would not improve performance on the problem, instead it would simply change the problem...
    But if you want to do this manually, you could use the MergeNominalValues operator to do this. Perhabs you should take a look at the parameterIteration operator and its examples in the meta directory of the example processes. It could save you a lot of typing.

    Greetings,
      Sebastian
  • caste
    caste New Altair Community Member
    I needed that because I'm building an hashing system to distribute a huge load of information. The semantic bonds are not that important, so I could collapse classes without taking care of their names but of their weight in the context. This balancing helps the SVM recognition.

    Actually I solved my problem using a Operational Research method, the Assembly Line Balancing problem implementation.

    Just a note: i tried to use the evolutionary parameter optimization of the examples, but even with the examples it took really many hours, so I decided to change approach.

    Thanks for your availability and compliments for the software you realized and the choice of keeping it open source: it is really great!