"SVM or Regression from data in database - how to??"

New Altair Community Member

Nov 22, 2008

Updated Nov 5, 2024 by Jocelyn

I'm VERY new to RM. Just installed it today

So far, I'm very impressed and a bit overwhemled by all the options it has.

I was hoping someone could help me design a model/workflow in the GUI for a simple problem.

-My data is stored in MYSQL (I do understand how to use DatabaseExampleSource to access the raw data

-The input is 4 columns. The first is a unique ID, the next 2 are various features (numbers), the last column is the result.
Fields: ID, first_measure, Second_measure, resulting_score
Example Data: 1, 13.5, 57.2, 6.12312313

I would like to use RM to create a "predictor" for this data. Build a model based on many training examples. One thought is regression, the other is an SVM. I might also expand into a model with 50-60 features. In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction.

As I wrote above, I can connect to my database and select the data. I'm not sure what to do with the data once I have it.

Any advice?

Find more posts tagged with

Sort by:

1 - 3 of 31

TobiasMalbrecht

New Altair Community Member

Nov 22, 2008

Hi Noah,

noah977 wrote:

I'm VERY new to RM. Just installed it today

So far, I'm very impressed and a bit overwhemled by all the options it has.

noah977 wrote:	noah977 wrote:
noah977 wrote:	I'm VERY new to RM. Just installed it today So far, I'm very impressed and a bit overwhemled by all the options it has.

Congratulations on coming upon RM and having made the first steps. Of course, RM is a bit overwhelming at the beginning, but once you have toyed around a while and understood the general principle on how to build a process, I am sure you will highly appreciate the vast possibilities for designing data mining processes RM offers.

But enough of advertising ..

noah977 wrote:

(I do understand how to use DatabaseExampleSource to access the raw data

-The input is 4 columns. The first is a unique ID, the next 2 are various features (numbers), the last column is the result.
Fields: ID, first_measure, Second_measure, resulting_score
Example Data: 1, 13.5, 57.2, 6.12312313

I would like to use RM to create a "predictor" for this data. Build a model based on many training examples. One thought is regression, the other is an SVM. I might also expand into a model with 50-60 features. In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction.

As I wrote above, I can connect to my database and select the data. I'm not sure what to do with the data once I have it.

noah977 wrote:	noah977 wrote:
noah977 wrote:	(I do understand how to use DatabaseExampleSource to access the raw data -The input is 4 columns. The first is a unique ID, the next 2 are various features (numbers), the last column is the result. Fields: ID, first_measure, Second_measure, resulting_score Example Data: 1, 13.5, 57.2, 6.12312313 I would like to use RM to create a "predictor" for this data. Build a model based on many training examples. One thought is regression, the other is an SVM. I might also expand into a model with 50-60 features. In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction. As I wrote above, I can connect to my database and select the data. I'm not sure what to do with the data once I have it.

The first steps of your tasks are to designate your ID and your result_score as special attributes, namely as a (who would have thought

) id and label, respectively. This can be done by setting the parameters [tt]id_attribute[/tt] and [tt]label_attribute[/tt] of the [tt] DatabaseExampleSource[/tt] operator to the appropriate column names. Note that this designation can also be done separetely by the operator [tt]ChangeAttributeRole[/tt], one for each attribute.

The second step is to simply place the [tt]LinearRegression[/tt] or e.g. the [tt]LibSVM[/tt] operator in the process. If you then run the process, it should give you a regression or SVM model, respectively.

The task of genetic feature selection is a bit more complicated. I stronly advise you to have a look at the RM built-in tutorial (i.e. the example processes coming with RM). There are also examples for feature selection. You should easily get an idea how this works from them.

Hope that helps,
Tobias

noah977

New Altair Community Member

Nov 22, 2008

Tobias,

Thank you for the quick answer.

I can't wait to get good with RM. I see so many great possibilities!

One additional question: Can I specify some details about a feature. For example, one of my features is the ID number of a category. We keep it as an Integer in our DB. I want to tell RM that it is not an actual number to average, etc, but just an identifier of a category. (I guess one way would be to translate it into a string "ID-1", etc. but I was hoping there was a nicer way to do this in RM.)

Thanks again!!!!

-N

TobiasMalbrecht

New Altair Community Member

Nov 22, 2008

Hi Noah,

noah977 wrote:

Thank you for the quick answer.

I can't wait to get good with RM. I see so many great possibilities!

noah977 wrote:	noah977 wrote:
noah977 wrote:	Thank you for the quick answer. I can't wait to get good with RM. I see so many great possibilities!

Wow, great to see someone that eager to learn RM ... lets me answer even well outside office hours!

noah977 wrote:

One additional question: Can I specify some details about a feature. For example, one of my features is the ID number of a category. We keep it as an Integer in our DB. I want to tell RM that it is not an actual number to average, etc, but just an identifier of a category. (I guess one way would be to translate it into a string "ID-1", etc. but I was hoping there was a nicer way to do this in RM.)

noah977 wrote:	noah977 wrote:
noah977 wrote:	One additional question: Can I specify some details about a feature. For example, one of my features is the ID number of a category. We keep it as an Integer in our DB. I want to tell RM that it is not an actual number to average, etc, but just an identifier of a category. (I guess one way would be to translate it into a string "ID-1", etc. but I was hoping there was a nicer way to do this in RM.)

Nothing easier than that. Just use an [tt]Numerical2Polynominal[/tt] operator inside an [tt]AttributeSubsetPreprocessing[/tt] operator with the attribute specified as parameter. Here is the XML code snippet:


<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
    <parameter key="attribute_name_regex" value ="ID"/>
    <parameter key="condition_class" value="attribute_name_filter"/>
    <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
    </operator>
</operator>

Regards,
Tobias

"SVM or Regression from data in database - how to??"

Find more posts tagged with

Quick Links