[SOLVED] Filter Attributes
Analyticaltim
New Altair Community Member
Dear Rapid Community,
While this question is undeniably basic I am at my wit's end of how to solve it so I turn to you.
I am working with a dataset of housing sales figures in NYC. One of my Attributes is called "NEIGHBORHOODS" I want to filter specific neighborhoods out of this larger dataset for exploration. Thus, I use the "Filter Examples" operator and select "attribute_value_filter" and use the string: "NEIGHBORHOOD=FORT GREENE" (note that all original data is in Caps thus the case sensitive nature of my string). This string does not return the filtered data. Instead in the Results window I get an ExampleSet with 0 examples, 0 special attributes, 3 regular attributes.
I have checked my spelling again and again checked the data to make sure it is all there and checked all over the internet to make sure my paramater string is correct. To no avail.
There is certainly something I am missing. Any help is much appreciated.
Yours,
Tim
While this question is undeniably basic I am at my wit's end of how to solve it so I turn to you.
I am working with a dataset of housing sales figures in NYC. One of my Attributes is called "NEIGHBORHOODS" I want to filter specific neighborhoods out of this larger dataset for exploration. Thus, I use the "Filter Examples" operator and select "attribute_value_filter" and use the string: "NEIGHBORHOOD=FORT GREENE" (note that all original data is in Caps thus the case sensitive nature of my string). This string does not return the filtered data. Instead in the Results window I get an ExampleSet with 0 examples, 0 special attributes, 3 regular attributes.
I have checked my spelling again and again checked the data to make sure it is all there and checked all over the internet to make sure my paramater string is correct. To no avail.
There is certainly something I am missing. Any help is much appreciated.
Yours,
Tim
0
Answers
-
Is the attribute defined as nominal or text?0
-
Hi,
You're missing an S there ;D
One of my Attributes is called "NEIGHBORHOODS"
[...]
and use the string: "NEIGHBORHOOD=FORT GREENE"
On a more serious note, I just created an ExampleSet with such an attribute and tested your condition and it worked flawlessly for me (tried with attribute as nominal, polynominal and text). What version of RapidMiner are you using? Can you post your process setup here (go to the XML tab and just copy&paste)?
Regards,
Marco0 -
Dear Marco,
You are quite right on the "S"!
I currently have the attribute under the "NEIGHBORHOOD" as a polynominal. Could this be the problem?
Below is the XML of my filter process.
Thanks again for all your help RapidMiner Rocks!
Tim
?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
<process expanded="true" height="437" width="654">
<operator activated="true" class="retrieve" compatibility="5.3.000" expanded="true" height="60" name="Retrieve Brooklyn Big with date" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Tim's Repository/Real Estate Work/Brooklyn Big with date"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.3.000" expanded="true" height="76" name="Filter Examples" width="90" x="246" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="NEIGHBORHOOD=FORT GREENE "/>
</operator>
<connect from_op="Retrieve Brooklyn Big with date" from_port="output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Hi,
no it should not matter.. You have included a whitespace at the end of your condition ("FORT GREENE ") though, please make sure that's not the error.
Apart from that I don't know. It works for me, so I'm afraid I cannot help you any further without the actual data. If you could provide a minimal sample (you can use the Filter Example Range and Select Attributes operators to only the absolute minimum needed) to me (if the data should not be publically visible you can contact me via PM) I could have a look and check if there is a bug involved.
Regards,
Marco0 -
Dear Marco,
Below is some origina data that I extracted with the "Filter examples range operator" within this example range the problem persists for me as well. You are correct about the "white space" in the code. I was trying that to see if it was my problem and it accidentally got in that XML I sent you. Sorry. The truncated dataset is below. Same problem with any neighborhood example in this case, Bath Beach or Carroll Gardens.
Thanks again for your help!
Tim
"NEIGHBORHOOD","SALE PRICE","SALE DATE"
"CARROLL GARDENS ",907278.0,10/9/12 12:00 AM
"CARROLL GARDENS ",1522283.0,8/22/12 12:00 AM
"CARROLL GARDENS ",885000.0,8/22/12 12:00 AM
"CARROLL GARDENS ",1508642.0,8/10/12 12:00 AM
"CARROLL GARDENS ",830000.0,8/7/12 12:00 AM
"CARROLL GARDENS ",1483413.0,8/30/12 12:00 AM
"BEDFORD STUYVESANT ",712775.0,9/27/12 12:00 AM
"BEDFORD STUYVESANT ",700000.0,10/24/12 12:00 AM
"BEDFORD STUYVESANT ",700000.0,10/24/12 12:00 AM
"BEDFORD STUYVESANT ",450000.0,11/14/12 12:00 AM
"BATH BEACH ",0.0,11/19/12 12:00 AM
"BATH BEACH ",0.0,11/12/12 12:00 AM
"BATH BEACH ",0.0,11/13/12 12:00 AM
"BATH BEACH ",0.0,11/13/12 12:00 AM
"BATH BEACH ",0.0,12/7/12 12:00 AM
"BATH BEACH ",0.0,11/7/12 12:00 AM
"BATH BEACH ",610000.0,6/28/12 12:00 AM
"BATH BEACH ",0.0,5/3/12 12:00 AM
"BATH BEACH ",0.0,3/26/12 12:00 AM
"BATH BEACH ",508000.0,8/24/12 12:00 AM
"BATH BEACH ",690000.0,11/14/12 12:00 AM
"BATH BEACH ",0.0,2/6/12 12:00 AM
"BATH BEACH ",800000.0,2/6/12 12:00 AM
"BATH BEACH ",420000.0,4/4/12 12:00 AM
"BATH BEACH ",500000.0,7/19/12 12:00 AM
0 -
Hi,
there we go. You're having trouble because of the whitespaces at the end of each NEIGHBORHOOD name. Sadly due to some restrictions the parameter you entered will get trimmed, aka will have its leading and trailing whitespaces removed, therefore it won't work. What you can do is remove the whitespaces for your NEIGHBORHOOD attribute, and you can do so via the Generate Attributes operator. Just add it after you retrieve your data and before the Filter Examples operator. Then add a key/value pair to the function descriptions parameter as follows:
You can then filter on the NEIGHBORHOOD_NEW attribute and will finally get your desired results
attribute name: NEIGHBORHOOD_NEW
function expressions: trim(NEIGHBORHOOD)
We plan to enhance the Filter Examples operator in the future, but until then I'm afraid the workaround is necessary in this case.
Regards,
Marco0 -
Marco!
My man! It worked like a dream! Thank you very much. You are tops! ;D
Thanks again for all your help.
RapidMiner Rocks!
Tim
0