Simple preprocessing methods
Contemno
New Altair Community Member
Hello there,
I'm looking for simple preprocessing methods.
Maybe I'm just blind but I can't find anything that matches my criteria.
1. A simple recoding.
For example produce an atrribute B out of an existing attribute A (containing values from 1 to 5) by the following rules:
A: 1,2 --> B: 1
A: 3,4 --> B: 2
A: 5 --> B: 3
2. A simple condition-knot.
For example produce an attribute B out of an existing attribute A (containing ages of humans) like this:
A: 1-18 / to 18 --> B: 1 or "young"
A: 19-40 --> B: 2 or "midage"
A: from 41--> B: 3 or "old"
Thank you in advance.
Greets from the baltic sea,
Sebastian L.
I'm looking for simple preprocessing methods.
Maybe I'm just blind but I can't find anything that matches my criteria.
1. A simple recoding.
For example produce an atrribute B out of an existing attribute A (containing values from 1 to 5) by the following rules:
A: 1,2 --> B: 1
A: 3,4 --> B: 2
A: 5 --> B: 3
2. A simple condition-knot.
For example produce an attribute B out of an existing attribute A (containing ages of humans) like this:
A: 1-18 / to 18 --> B: 1 or "young"
A: 19-40 --> B: 2 or "midage"
A: from 41--> B: 3 or "old"
Thank you in advance.
Greets from the baltic sea,
Sebastian L.
Tagged:
0
Answers
-
Hi Sebastian,
there is a simple operator called UserBasedDiscretization what exactly does what your are searching for. To solve your second problem you might edit the list as follows:
First line is called young and its upper limit is 18. So the interval will be negative infinity to 18
Second line is called midage and its upper limit is 40. The interval will be >= 18 and < 40.
This would look like tihs in XML
To solve the first problem you could make use of UserBasedDiscretization and another operator called NominalNumbers2Numerical. I think you can quite comprehend what this leads to Just enter as new value a number like "1" or "2" and then use this operator to change that attribute into a numerical one, if you need it numerical. If you need to process changes only on one or a few of all attributes, use AttributeSubsetPreprocessing, to select the attributes the inner operators should work on.
<parameter key="young" value="18.0"/>
<parameter key="midage" value="40.0"/>
<parameter key="old" value="2000.0"/>
Hope I could help,
Greetings Sebastian0 -
Thx for your answer.
Unfortunately it's not working as it should be.
When I use the knot Nominal2Numeric the values are changed completely.
A "48" maybe is changed to a "1". (not the mentioned recoding)
The problem is that without this knot the recoding isn't done on this value.
A second problem accured. How can I delete rows with missing values. Incomplete ones in other words.
You told me to use AttributeSubsetPreprocessing, to select the attributes I need to process changes on.
But this knot is only able to selct one attribute. Isn't it?
Maybe there is a possibility to define more than one atrribute in "attribute_name_regex"?
I need to define the attributes by name wich the following is processed on.
Thx for any help.0 -
Hi Sebastian,
hm, thats three questions in a single posting ... so, here we go ...
You might have missed that (the other) Sebastian has recommended the NominalNumbers2Numeric operator, not the Nominal2Numeric operator! This should work as expected.Contemno wrote:
Thx for your answer.
Unfortunately it's not working as it should be.
When I use the knot Nominal2Numeric the values are changed completely.
A "48" maybe is changed to a "1". (not the mentioned recoding)
The problem is that without this knot the recoding isn't done on this value.
This can be done by filtering the example set with the operator called ExampleFilter and setting the condition_class parameter to "no_missing_attributes".Contemno wrote:
A second problem accured. How can I delete rows with missing values. Incomplete ones in other words.
The "attribute_name_regex" parameter indeed does allow regular expressions to define the attributes. Hence, the operators inside the AttributeSubsetPreprocessing are applied on all attributes matching the regular expressions. If you want e.g. to apply the inner operators on say two attributes called age and weight, the corresponding regular expression which lets you chose these attributes is age|weight . You may find additional information on regular expressions in the Rapidminer tutorial which is available on the documentation area of our website:Contemno wrote:
You told me to use AttributeSubsetPreprocessing, to select the attributes I need to process changes on.
But this knot is only able to selct one attribute. Isn't it?
Maybe there is a possibility to define more than one atrribute in "attribute_name_regex"?
I need to define the attributes by name wich the following is processed on.
http://rapid-i.com/content/view/36/83/lang,de/
How that helps to solve your problems,
regards,
Tobias0 -
Thank you so much Tobias.
You halped me a lot. It's working now very well.
But theres another question. You wrote:If you want e.g. to apply the inner operators on say two attributes called age and weight, the corresponding regular expression which lets you chose these attributes is age|weight .
I'm not familiar with regular expressions. You gave an example with an " | " to combine two attributes.
The tutorial is in this case a bit "meager". Is there any good explaination of all expressions? (wildcards, ...)
Here my case:
I've 56 attributes (e.g. ID, age, regio, ANT_U30, ANT_U35, ... , P_Expert, P_Vkude,...).
Now I wanna filter all attributes beginning with "ANT_" because there are twelve of them and I don't wanna write them all down separately.
In short a shortcut for "ANT_U20|ANT_U25|ANT_U30|ANT_U35|...".
Thx in advance.
Sebastian0 -
Hello Sebastian
The pattern you are looking for is: ANT_*
where * is representing any letter.
To learn more about regular expressions:
basic concepts: http://en.wikipedia.org/wiki/Regular_expression
tutorial for regular expressions in java : http://www.javaregex.com/tutorial.html (weird design, but the tutorial is nice)
hope this was helpful
greetings
Steffen0