Just starting with RapidMiner - Some Basic Design Questions
nulspace
New Altair Community Member
Hi there,
I've just begun using RapidMiner for a course in university - I'm certainly rusty with this type of software, so I'm facing a steep learning curve!
I understand the idea of the program, I've created a repository and have read in a .csv file, and now I'm trying to glean some information from my data.
Here is a breakdown of my .csv file:
10 columns of generic real attributes which I've called "att1" to "att10". Their roles are "regular" attributes.
2 columns for 3-class and 4-class labellings of the data, which I called label1 and label2. I also chose their role in the wizard to be "label"
Question 1
When I create the dataSet in the repository and double click it, bringing up the Meta Data view of my "exampleSet", it shows me the 10 attribute columns and only one label column (the 4-class one). Why is that? Will a process only look at one column of labels in a dataset?
Question 2
This seems like it should be a simple question to answer, but I'm absolutely stuck: how would I go about calculating the mean and standard deviation of each real attribute? The mean I can actually see when I look at the data set metadata, but I'm stumped on how to find or display the standard deviation for each attribute. Any help on this would be greatly appreciated.
Lastly, what are the best resources for learning these basic skills?
Thanks,
nul
I've just begun using RapidMiner for a course in university - I'm certainly rusty with this type of software, so I'm facing a steep learning curve!
I understand the idea of the program, I've created a repository and have read in a .csv file, and now I'm trying to glean some information from my data.
Here is a breakdown of my .csv file:
10 columns of generic real attributes which I've called "att1" to "att10". Their roles are "regular" attributes.
2 columns for 3-class and 4-class labellings of the data, which I called label1 and label2. I also chose their role in the wizard to be "label"
Question 1
When I create the dataSet in the repository and double click it, bringing up the Meta Data view of my "exampleSet", it shows me the 10 attribute columns and only one label column (the 4-class one). Why is that? Will a process only look at one column of labels in a dataset?
Question 2
This seems like it should be a simple question to answer, but I'm absolutely stuck: how would I go about calculating the mean and standard deviation of each real attribute? The mean I can actually see when I look at the data set metadata, but I'm stumped on how to find or display the standard deviation for each attribute. Any help on this would be greatly appreciated.
Lastly, what are the best resources for learning these basic skills?
Thanks,
nul
Tagged:
0
Answers
-
Arrrgh! Well, I've solved one of my problems. Quite a silly error on my part.
I've noticed (now) that in the meta data, it displays the STDev beside the avg. :-[ Oops.
However, I still don't understand why only 1 of 2 label columns is displayed. Can anybody help me with that?
Thanks very much,
nul0 -
nulspace wrote:
Arrrgh! Well, I've solved one of my problems. Quite a silly error on my part.
I've noticed (now) that in the meta data, it displays the STDev beside the avg. :-[ Oops.
However, I still don't understand why only 1 of 2 label columns is displayed. Can anybody help me with that?
Thanks very much,
nul
You can have only one Label
If you doing some clustering or classification then use Cluster type.....
hope it helps0 -
Thank you, that makes sense.
By classifying those two data groups as "Clusters", can I still perform the same operators on them as if they were "labels" (Naive Bayes, for example)?0