New to rapid-i, is this a suitable use?
jcims
New Altair Community Member
Hi there,
I am doing a bit of analysis on a moderate bit of data.
Essentially i have about 15000 objects. Each object has 200 individual properties. Each property may have more than one value. All values are textual in nature.
I would like to be able to select certain properties (maybe 10 or 20 of the 200), and categorize/classify (not sure the nomenclature here) by unique combinations of these properties. In other words, identify the full set of unique buckets required to contain all of the objects, where each bucket is identified by a unique combination of values of the selected properties.
I'm not sure if that's clear or not, basically we're trying to identify templates and/or patterns in the objects, and in doing so need to identify which property value combinations are most popular and/or important.
Any pointers on how i would perform this analysis in rapid-i (if even possible) would be appreciated.
Thanks!
I am doing a bit of analysis on a moderate bit of data.
Essentially i have about 15000 objects. Each object has 200 individual properties. Each property may have more than one value. All values are textual in nature.
I would like to be able to select certain properties (maybe 10 or 20 of the 200), and categorize/classify (not sure the nomenclature here) by unique combinations of these properties. In other words, identify the full set of unique buckets required to contain all of the objects, where each bucket is identified by a unique combination of values of the selected properties.
I'm not sure if that's clear or not, basically we're trying to identify templates and/or patterns in the objects, and in doing so need to identify which property value combinations are most popular and/or important.
Any pointers on how i would perform this analysis in rapid-i (if even possible) would be appreciated.
Thanks!
Tagged:
0
Answers
-
Hello and welcome to RapidMiner
ok, you describe the problem in a very general manner. I will try my best...
RapidMiner terms:
Essentially i have about 15000 objects. Each object has 200 individual properties. Each property may have more than one value. All values are textual in nature.
property = attribute or feature
object = example
Question:
Do you mean that an entry given an example and an attribute can be a series of values instead of a single value ?
The first part of the sentence is called FeatureSelection in RapidMiner
I would like to be able to select certain properties (maybe 10 or 20 of the 200), and categorize/classify (not sure the nomenclature here) by unique combinations of these properties.
The second part is unclear: What do you mean with "unique combination" ? I guess you mean the right thing, but I cannot confirm it without an example.
That is what Data Mining is all about, I guess what you want to do here is called "clustering". If you do not know what that means, you have to study a Data Mining Book (no excuses ).
I'm not sure if that's clear or not, basically we're trying to identify templates and/or patterns in the objects, and in doing so need to identify which property value combinations are most popular and/or important.
kind regards,
Steffen
0 -
Hi Steffen,
Thank you for your reply. The 'FeatureSelection' capability did seem to be what I'm after.
As you can tell, my vocabulary is quite limited in this domain, and your suggestion to read a book on Data Mining is probably the most important thing I can do at the moment. It is frustrating seeing all of this capability and not knowing how to use it.
Thanks again.
0