"SVM Start"

Question

Hallo together, I'm just making my first steps with Rapidminer (using Matlab before), so switching from programming command lines to using a GUI is quite difficult for me. Unfortunately, I found no useful tutorials or topics here dealing with simple starting problems. What I want to do: I analyze charisma in speech data and like to divide my samples into charismatic / noncharismatic speakers. By now I've extracted 168 features for each of my 412 samples and a more or less subjective dividing that ought to be automated by using the features instead. In Matlab following two steps had to be done for a SVM: Generating training vectors by splitting up the sample into 2 pieces (training / test dataset). First column in training dataset (0/1) determines whether sample is charismatic or not. Matlab took the first part of samples for determining a model. After that the second half of the samples were predicted by applying the model and I got a column vector (0/1) for each sample as a result I can compare with manual assessments and create thereof the fitting. What I've done in Rapidminer: Analogic to Matlab I imported the training dataset (.xls), let rapidminer guess the characteristics (real values, 1st column I changed to 'label' and 'nominal') and pulled the data into the main process window. After that I took the first SVM operator (Modeling --> Classification --> SVM). I've run the process and obtained a model I saved in the repository. Afterwards, equally to the second step in Matlab, I chose 'Modeling --> Model application --> apply model and added test dataset as well as the stored model as input for the application. Now as a result I unfortunately I do not get a classification (0/1) but a value between -0.87 < x < 2.541 for each sample. What I need for help: I assume something in the declaration either of training or test dataset is wrong I don't get. Therefore I need to know how settings have to be done for achieving a real two-group classification. Hope the problem is understandable and not described too detailed.. Thanks for all comments..

tolau100 · Answer

Thanks for your advices. They really helped me figuring out a bit more how RM works and where I can look for further information by myself.

Greets,
tolau

land · Answer

Hi,
welcome to RapidMiner. Before going into detail, I would suggest you take a look at the arbitrary videos available, giving an introduction to RapidMiner's gui approach.
Here are several linked:
http://rapid-i.com/content/view/189/212/
as well as you will stumble over some more on youtube.

Additionally there are some sample processes coming with RapidMiner. They are inside the sample repository. I would suggest taking a look at the 01_Learner and 03_Validation samples. They show, how one can do modeling and validation inside RapidMiner.

Normally it is advisory to simply load the complete data set as one Repository Entry and use the X-Validation operator for automatically split the data into folds. You can use the Split Validation operator for avoiding Crossvalidation (what I would not suggest) and use such a fixed split approach. In general it will be much easier this way...

Greetings,
  Sebastian