Difference between various data types
lmsasu
New Altair Community Member
Hi all,
quite newbie question: in RM5, which is the difference between "numeric" and "real" in defining metadata? Where could I find a quick help on topics like these? the "rapidminer-5.0-manual-english_v1.0.pdf" and "rapidminer-4.6-tutorial.pdf" does not talk about these simple subjects.
Thanks,
Lucian
quite newbie question: in RM5, which is the difference between "numeric" and "real" in defining metadata? Where could I find a quick help on topics like these? the "rapidminer-5.0-manual-english_v1.0.pdf" and "rapidminer-4.6-tutorial.pdf" does not talk about these simple subjects.
Thanks,
Lucian
Tagged:
0
Answers
-
Hi,
actually numeric is the supertype of real and integer.
The same is nominal for polynominal, text and binominal.
Greetings,
Sebastian0 -
Thanks. Does some of the documentation specify this? Or it is supposed that I should read the code
Lucian0 -
Hi.
unless you want to take a look into the manual, that would be a perfect idea. Anyway I would think reading the manual until page 12 is more easy...
http://sourceforge.net/projects/rapidminer/files/1.%20RapidMiner/5.0/rapidminer-5.0-manual-english_v1.0.pdf/download
Greetings,
Sebastian0 -
Sebastian,
I have read the page 12 of the manual, and I can't see a difference between both, nominal and polynominal. Because both can handle categorical values. I mean, if you have the variable "color" (red, green and blue), you'll have a categorical variable, and therefore a nominal variable, is it not redundant the "poly" prefix?
What would be the difference, alghorithmically speaking, between them ? (the same for numerical)
Thanks in advance.
Pablo.0 -
Hi Pablo,
you are right: from an algorithmic point of view there is currently no difference between "polynominal" and "nominal". As far as I know, all operators which can handle one of both can automatically handle both (please correct me somebody if I forgot an operator where this would indeed make a difference). But who knows: Maybe there is such a difference later on for a new operator and the used ontology can be seen as a preparation for that. However, in today's practical processses you will be perfectly fine by using one of both options and just make sure that all operators are happy
The same is true for numerical value types although I think that there actually are (or at least: was) some algorithm which really has relied on the fact that the input has to be "real" instead of "numerical"...
Cheers,
Ingo0 -
Ingo i think I found one, the cross distance operator namely. because i have two lists of polynominal expressions, i tried to match them on each other and the results i get are not really helpful. taken i am not completely nuts, i think this hinges on the fact that the wordlist to data operator produces a polynominal attribute but the cross distance operator just accepts nominal attribute types.Ingo Mierswa wrote:
Hi Pablo,
you are right: from an algorithmic point of view there is currently no difference between "polynominal" and "nominal". As far as I know, all operators which can handle one of both can automatically handle both (please correct me somebody if I forgot an operator where this would indeed make a difference). But who knows: Maybe there is such a difference later on for a new operator and the used ontology can be seen as a preparation for that. However, in today's practical processses you will be perfectly fine by using one of both options and just make sure that all operators are happy
The same is true for numerical value types although I think that there actually are (or at least: was) some algorithm which really has relied on the fact that the input has to be "real" instead of "numerical"...
Cheers,
Ingo0 -
Hi,
actually I doubt this because each Polynomial attribute is a nominal attribute.
I think you are trying to compute the distance between "mule" and "donkey". What is the distance? There's only one sane answer: 1. And whats the distance between "mule" and "horse"? Yes, 1. "mule" and "mule" would be zero, if you don't have already guessed...
RapidMiner currently provides only this distance measure between nominal values. So I doubt a process comparing wordlists per row does make any sense at all?
Greetings,
Sebastian0 -
hi sebastian, thanks fir your answers, in both of the threads, yes indeed this would help if all permutations are calculated, therefore each row of vector a with each row of vector b. in the optimal case the operator which i am looking for would give me a 1 in the case of a match a zero otherwise. is there something like this? because i am looking for a, to follow your example, a mule-mule match! i have tried cross distances and but the results are completely strange; even if there should be a match, seen by comparing the lists myself, it gives me a 1 distance. so i guess i am not handling this operator right.
best regards, andre0 -
Ingo Mierswa wrote:
Hi Pablo,
you are right: from an algorithmic point of view there is currently no difference between "polynominal" and "nominal". As far as I know, all operators which can handle one of both can automatically handle both (please correct me somebody if I forgot an operator where this would indeed make a difference). [...]
Cheers,
Ingo
Hi Ingo,
I indeed noticed an operator where the distinction between nominal and polynominal makes a differnce. I am often building web mining processes where extracted data is incrementally written to a database (appended to a table). The same process is repeated after a few days to collect data that was missed during the first run (timeouts etc.) and recently added contents.
To find only those examples I import the relevant URLs (Read Excel) and load the already collected items from database (Read Database). Both operators are followed by "Set Role" to set IDs. Finally the "Set Minus" operator builds the desired example set. The attribute obtained from database is usually nominal and the one from the Excel file is of type polynominal. Process execetuion is interrupted as the "Set Minus" operator complains about incompatible types and requests an attribute of type polynominal. Since there is no convenient way of changing the attribute obtained from database from nominal to polynominal, I always set nominal instead of polynominal for the "Read Excel" operator. Doesn't mean much trouble for me, but shows a case where there is a difference between both types. I don't know if it is necessary there...
Regards
Matthias0 -
Hi,
if you want to compare lists containing subsets of each other and you want to count the number of the same entries you can use set operations on the example sets and remove all that are not within both (Intersect) and count the number of examples. You can extract the number of examples also as macro or performance value.
Greetings,
Sebastian0