Compare attribute columns based on value ranges?
hi,
I want to compare the values from 2 attribute columns from 2 different excel files.. e.g radius1 and radius2,
now I want to "identify" those as equal (meaning, their ID is the same) if they are equal in a certain range, e.g radius1 = 1.77 and radius 2 = 1.78
like in a formula: if radius1 = between 1.02*radius2 and 0.98*radius2, then its equal!
then I want to join all the rows based on that equal row entries if it matches above formula.
is it somehow possible to identify equality based on ranges like above?
Answers
-
Hi!
If you don't have too much data, you could do a Cartesian Join, then use Generate Attributes for calculating the difference and then Filter Examples for only keeping the examples with a small difference.
If your example sets have many lines, Cartesian Join will create a huge data set. In that case, you might want to try this Generic Join approach with the built-in scripting:
http://datascientist.at/2016/06/generic-joins-in-rapidminer/#english
Regards,
Balázs
0 -
If you are only interested in casewise comparison of radius1 and radius2 values, then @BalazsBarany method works equally well without the Cartesian join--just use generate attribute to calculate the difference and filter those that meet your threshhold. But if you do want a pairwise comparison of all possible combinations of radius1 and radius2, I hope you have a small dataset! The combinations inflate pretty quickly :-) .
Best,
0