New to rapid-i, is this a suitable use?

jcims · April 2009

Hi there,

I am doing a bit of analysis on a moderate bit of data.

Essentially i have about 15000 objects. Each object has 200 individual properties. Each property may have more than one value. All values are textual in nature.

I would like to be able to select certain properties (maybe 10 or 20 of the 200), and categorize/classify (not sure the nomenclature here) by unique combinations of these properties. In other words, identify the full set of unique buckets required to contain all of the objects, where each bucket is identified by a unique combination of values of the selected properties.

I'm not sure if that's clear or not, basically we're trying to identify templates and/or patterns in the objects, and in doing so need to identify which property value combinations are most popular and/or important.

Any pointers on how i would perform this analysis in rapid-i (if even possible) would be appreciated.

Thanks!

steffen · April 2009

Hello and welcome to RapidMiner

ok, you describe the problem in a very general manner. I will try my best...

Essentially i have about 15000 objects. Each object has 200 individual properties. Each property may have more than one value. All values are textual in nature.

RapidMiner terms:
property = attribute or feature
object = example

Question:
Do you mean that an entry given an example and an attribute can be a series of values instead of a single value ?

I would like to be able to select certain properties (maybe 10 or 20 of the 200), and categorize/classify (not sure the nomenclature here) by unique combinations of these properties.

The first part of the sentence is called FeatureSelection in RapidMiner
The second part is unclear: What do you mean with "unique combination" ? I guess you mean the right thing, but I cannot confirm it without an example.

I'm not sure if that's clear or not, basically we're trying to identify templates and/or patterns in the objects, and in doing so need to identify which property value combinations are most popular and/or important.

That is what Data Mining is all about, I guess what you want to do here is called "clustering". If you do not know what that means, you have to study a Data Mining Book (no excuses

).

kind regards,

Steffen

jcims · April 2009

Hi Steffen,

Thank you for your reply. The 'FeatureSelection' capability did seem to be what I'm after.

As you can tell, my vocabulary is quite limited in this domain, and your suggestion to read a book on Data Mining is probably the most important thing I can do at the moment. It is frustrating seeing all of this capability and not knowing how to use it.

Thanks again.

New to rapid-i, is this a suitable use?

Answers

Categories