Loop Examples Error: Too Few Examples
Hi,
I'm running a Loop Examples to identify different items in my dataset. I have 73 items that I'm looking for, which I've put into a macro. The macro reads a file of 74 lines, the first line being a header. When the Loop Examples gets to line 74, instead of exiting the loop, it's telling me that I have Too Few Examples. I think it might be reading the header row as an example, so it's looking for 1 more example which doesn't exist.
I've told Rapidminer that I have a header row and I've even tried changing some other parameters to see if that would make a difference, but it didn't . I couldn't find any documentation about this problem.
Thanks,
Karim
Find more posts tagged with
Here's the XML.
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
<parameter key="logverbosity" value="status"/>
<process expanded="true">
<operator activated="false" class="read_excel" compatibility="7.5.003" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="34">
<parameter key="excel_file" value="C:\Users\karim\Google Drive\InfoClin Analytics\Data Cleaning\Data Cleaning Algorithms\To Be Cleaned\1 Million Sample\Drug Database Aug 9 2017 v2.xlsx"/>
<parameter key="sheet_number" value="3"/>
<parameter key="imported_cell_range" value="A1:B74"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="METFORMINS.true.polynominal.attribute"/>
<parameter key="1" value="DINS.true.integer.attribute"/>
</list>
</operator>
<operator activated="false" class="extract_macro" compatibility="7.5.003" expanded="true" height="68" name="Extract Macro" width="90" x="179" y="34">
<parameter key="macro" value="DIN_Metformin"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="DINS"/>
<parameter key="example_index" value="1"/>
<list key="additional_macros"/>
</operator>
<operator activated="false" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="112" y="340">
<list key="function_descriptions">
<parameter key="Metformin" value="if(contains(DIN,%{DIN_Numbers}),1,0)"/>
</list>
</operator>
<operator activated="false" class="concurrency:loop_values" compatibility="7.5.003" expanded="true" height="82" name="Loop Values" width="90" x="112" y="238">
<parameter key="attribute" value="DIN"/>
<parameter key="iteration_macro" value="DIN_Metformin"/>
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="DIN.equals.%{DIN_Metformin}"/>
</list>
<parameter key="filters_logic_and" value="false"/>
</operator>
<connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="false" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes" width="90" x="112" y="442">
<list key="function_descriptions">
<parameter key="Name_New" value="lower(Name_orig)"/>
</list>
</operator>
<operator activated="true" class="retrieve" compatibility="7.5.003" expanded="true" height="68" name="Retrieve" width="90" x="112" y="136">
<parameter key="repository_entry" value="//Local Repository/processes/Medication Data for Processing"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples (4)" width="90" x="313" y="391">
<list key="filters_list">
<parameter key="filters_entry_key" value="DIN.equals.NULL"/>
<parameter key="filters_entry_key" value="DIN.equals.?"/>
</list>
<parameter key="filters_logic_and" value="false"/>
</operator>
<operator activated="true" class="sample" compatibility="7.5.003" expanded="true" height="82" name="Sample" width="90" x="514" y="136">
<parameter key="sample_size" value="10000"/>
<list key="sample_size_per_class"/>
<list key="sample_ratio_per_class"/>
<list key="sample_probability_per_class"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples (5)" width="90" x="648" y="136">
<list key="filters_list">
<parameter key="filters_entry_key" value="Name_New.contains.metform"/>
</list>
</operator>
<operator activated="true" class="sample" compatibility="7.5.003" expanded="true" height="82" name="Sample (2)" width="90" x="581" y="289">
<parameter key="sample_size" value="10000"/>
<list key="sample_size_per_class"/>
<list key="sample_ratio_per_class"/>
<list key="sample_probability_per_class"/>
</operator>
<operator activated="true" class="loop_examples" compatibility="7.5.003" expanded="true" height="103" name="Loop Examples" width="90" x="782" y="238">
<parameter key="iteration_macro" value="Loop"/>
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.5.003" expanded="true" height="68" name="Read Excel (3)" width="90" x="179" y="34">
<parameter key="excel_file" value="C:\Users\karim\Google Drive\InfoClin Analytics\Data Cleaning\Data Cleaning Algorithms\To Be Cleaned\1 Million Sample\Drug Database Aug 9 2017 v2.xlsx"/>
<parameter key="sheet_number" value="3"/>
<parameter key="imported_cell_range" value="B1:B74"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="1" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="DINS.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="extract_macro" compatibility="7.5.003" expanded="true" height="68" name="Extract Macro (2)" width="90" x="313" y="34">
<parameter key="macro" value="DIN_Metformin"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="DINS"/>
<parameter key="example_index" value="%{Loop}"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples (2)" width="90" x="313" y="187">
<parameter key="parameter_string" value="DIN=%{DIN_Metformin}"/>
<parameter key="condition_class" value="attribute_value_filter"/>
<list key="filters_list">
<parameter key="filters_entry_key" value="DIN.equals.%{DIN_Metformin}"/>
</list>
<parameter key="filters_logic_and" value="false"/>
</operator>
<operator activated="true" class="append" compatibility="7.5.003" expanded="true" height="82" name="Append" width="90" x="581" y="289"/>
<connect from_port="example set" to_op="Filter Examples (2)" to_port="example set input"/>
<connect from_op="Read Excel (3)" from_port="output" to_op="Extract Macro (2)" to_port="example set"/>
<connect from_op="Filter Examples (2)" from_port="example set output" to_op="Append" to_port="example set 1"/>
<connect from_op="Filter Examples (2)" from_port="unmatched example set" to_port="example set"/>
<connect from_op="Append" from_port="merged set" to_port="output 1"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Filter Examples (4)" to_port="example set input"/>
<connect from_op="Filter Examples (4)" from_port="example set output" to_op="Sample" to_port="example set input"/>
<connect from_op="Filter Examples (4)" from_port="unmatched example set" to_op="Sample (2)" to_port="example set input"/>
<connect from_op="Sample" from_port="example set output" to_op="Filter Examples (5)" to_port="example set input"/>
<connect from_op="Filter Examples (5)" from_port="example set output" to_port="result 1"/>
<connect from_op="Sample (2)" from_port="example set output" to_op="Loop Examples" to_port="example set"/>
<connect from_op="Loop Examples" from_port="output 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Hi @karim_keshavjee - I looked at your process. I notice that your Read Excel operator in the root process (the one that is grayed out) has the "first row as names" parameter checked, but the one inside the Loop Examples operator (not grayed out) does NOT Have this parameter checked. Is this your problem?
Scott
hello @karim_keshavjee - ok I have looked at this again. There are a lot of rather unusual things going on here and it is very hard to unpack. Some observations:
- when you loop examples, you are looping the examples in "Medication Data for Processing". But when you extract the macro, you're doing it from "Drug Database...".
- in Filter Examples(4), you're only selecting those with NULL or ?.
- your Append operator inside the loop has only one connection
I would highly advise you to look at these issues. A good way to debug is to use breakpoints at each step along the way of your process so you can see what your dataset looks like.
Scott
Hi Karem,
Apologies for delay in response. Got tied up in some work.
@sgenzer Thanks for looking to this. Please feel free to let me know if any furter help is needed here.
Cheers,
hi @karim_keshavjee - there are lots of resources both online and built into RapidMiner to learn how to use Loop Examples and macros. Have you completed the tutorials? The one called "Data Handling" would be the one you want. In addition, the "Getting Started with RapidMiner" video series is extremely helpful.
Scott
Hi Karim,
Could please share the XML code of the RapidMiner process you have built (of the screenshot you shared)?
This would help to recreate the process with exact parameters of the operators you have set and check the error.
Cheers,