"SVM or Regression from data in database - how to??"
noah977
New Altair Community Member
I'm VERY new to RM. Just installed it today
So far, I'm very impressed and a bit overwhemled by all the options it has.
I was hoping someone could help me design a model/workflow in the GUI for a simple problem.
-My data is stored in MYSQL (I do understand how to use DatabaseExampleSource to access the raw data
-The input is 4 columns. The first is a unique ID, the next 2 are various features (numbers), the last column is the result.
Fields: ID, first_measure, Second_measure, resulting_score
Example Data: 1, 13.5, 57.2, 6.12312313
I would like to use RM to create a "predictor" for this data. Build a model based on many training examples. One thought is regression, the other is an SVM. I might also expand into a model with 50-60 features. In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction.
As I wrote above, I can connect to my database and select the data. I'm not sure what to do with the data once I have it.
Any advice?
So far, I'm very impressed and a bit overwhemled by all the options it has.
I was hoping someone could help me design a model/workflow in the GUI for a simple problem.
-My data is stored in MYSQL (I do understand how to use DatabaseExampleSource to access the raw data
-The input is 4 columns. The first is a unique ID, the next 2 are various features (numbers), the last column is the result.
Fields: ID, first_measure, Second_measure, resulting_score
Example Data: 1, 13.5, 57.2, 6.12312313
I would like to use RM to create a "predictor" for this data. Build a model based on many training examples. One thought is regression, the other is an SVM. I might also expand into a model with 50-60 features. In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction.
As I wrote above, I can connect to my database and select the data. I'm not sure what to do with the data once I have it.
Any advice?
Tagged:
0
Answers
-
Hi Noah,
Congratulations on coming upon RM and having made the first steps. Of course, RM is a bit overwhelming at the beginning, but once you have toyed around a while and understood the general principle on how to build a process, I am sure you will highly appreciate the vast possibilities for designing data mining processes RM offers.noah977 wrote:
I'm VERY new to RM. Just installed it today
So far, I'm very impressed and a bit overwhemled by all the options it has.
But enough of advertising ..
The first steps of your tasks are to designate your ID and your result_score as special attributes, namely as a (who would have thought ) id and label, respectively. This can be done by setting the parameters [tt]id_attribute[/tt] and [tt]label_attribute[/tt] of the [tt] DatabaseExampleSource[/tt] operator to the appropriate column names. Note that this designation can also be done separetely by the operator [tt]ChangeAttributeRole[/tt], one for each attribute.noah977 wrote:
(I do understand how to use DatabaseExampleSource to access the raw data
-The input is 4 columns. The first is a unique ID, the next 2 are various features (numbers), the last column is the result.
Fields: ID, first_measure, Second_measure, resulting_score
Example Data: 1, 13.5, 57.2, 6.12312313
I would like to use RM to create a "predictor" for this data. Build a model based on many training examples. One thought is regression, the other is an SVM. I might also expand into a model with 50-60 features. In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction.
As I wrote above, I can connect to my database and select the data. I'm not sure what to do with the data once I have it.
The second step is to simply place the [tt]LinearRegression[/tt] or e.g. the [tt]LibSVM[/tt] operator in the process. If you then run the process, it should give you a regression or SVM model, respectively.
The task of genetic feature selection is a bit more complicated. I stronly advise you to have a look at the RM built-in tutorial (i.e. the example processes coming with RM). There are also examples for feature selection. You should easily get an idea how this works from them.
Hope that helps,
Tobias0 -
Tobias,
Thank you for the quick answer.
I can't wait to get good with RM. I see so many great possibilities!
One additional question: Can I specify some details about a feature. For example, one of my features is the ID number of a category. We keep it as an Integer in our DB. I want to tell RM that it is not an actual number to average, etc, but just an identifier of a category. (I guess one way would be to translate it into a string "ID-1", etc. but I was hoping there was a nicer way to do this in RM.)
Thanks again!!!!
-N0 -
Hi Noah,
Wow, great to see someone that eager to learn RM ... lets me answer even well outside office hours!noah977 wrote:
Thank you for the quick answer.
I can't wait to get good with RM. I see so many great possibilities!
Nothing easier than that. Just use an [tt]Numerical2Polynominal[/tt] operator inside an [tt]AttributeSubsetPreprocessing[/tt] operator with the attribute specified as parameter. Here is the XML code snippet:noah977 wrote:
One additional question: Can I specify some details about a feature. For example, one of my features is the ID number of a category. We keep it as an Integer in our DB. I want to tell RM that it is not an actual number to average, etc, but just an identifier of a category. (I guess one way would be to translate it into a string "ID-1", etc. but I was hoping there was a nicer way to do this in RM.)
Regards,
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
<parameter key="attribute_name_regex" value ="ID"/>
<parameter key="condition_class" value="attribute_name_filter"/>
<operator name="Numerical2Polynominal" class="Numerical2Polynominal">
</operator>
</operator>
Tobias0