Send Example Set to Excel files based on Unique ID
miked
New Altair Community Member
Hi All,
I'm hoping to once again reach out to the Rapidminer community which has been so helpful. I have created a process where my end result is about 2000 rows which all have a unique ID associated with them. What I'd like to do is be able to write an excel file for each of these unique ID's. Specifically - loop over the process and based on a client ID...create an excel document that only show's that client's data. Excel or PDF would work. Thanks for any help.
I'm hoping to once again reach out to the Rapidminer community which has been so helpful. I have created a process where my end result is about 2000 rows which all have a unique ID associated with them. What I'd like to do is be able to write an excel file for each of these unique ID's. Specifically - loop over the process and based on a client ID...create an excel document that only show's that client's data. Excel or PDF would work. Thanks for any help.
1
Best Answers
-
ah ok no problem @miked - very similar to what I did before:
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.6.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34"> <parameter key="excel_file" value="/Users/scottgenzer-rapidminercommunity/Downloads/KPI Examples.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="America/New_York"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="Brand.true.polynominal.attribute"/> <parameter key="1" value="Country.true.polynominal.attribute"/> <parameter key="2" value="Region.true.polynominal.attribute"/> <parameter key="3" value="Market.true.integer.attribute"/> <parameter key="4" value="Data_Year.true.integer.attribute"/> <parameter key="5" value="Data_Month.true.polynominal.attribute"/> <parameter key="6" value="Dealer_ID.true.polynominal.attribute"/> <parameter key="7" value="Data_Point.true.polynominal.attribute"/> <parameter key="8" value="KPI.true.polynominal.attribute"/> <parameter key="9" value="Data.true.real.attribute"/> <parameter key="10" value="Rank.true.integer.attribute"/> <parameter key="11" value="Value.true.real.attribute"/> <parameter key="12" value="Report Month.true.polynominal.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="concurrency:loop_values" compatibility="9.6.000" expanded="true" height="82" name="Loop Values" width="90" x="179" y="34"> <parameter key="attribute" value="Dealer_ID"/> <parameter key="iteration_macro" value="loop_value"/> <parameter key="reuse_results" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="filter_examples" compatibility="9.6.000" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="custom_filters"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"> <parameter key="filters_entry_key" value="Dealer_ID.equals.%{loop_value}"/> </list> <parameter key="filters_logic_and" value="true"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="write_excel" compatibility="9.6.000" expanded="true" height="103" name="Write Excel" width="90" x="179" y="34"> <parameter key="excel_file" value="DealerID%{loop_value}"/> <parameter key="file_format" value="xlsx"/> <enumeration key="sheet_names"/> <parameter key="sheet_name" value="RapidMiner Data"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="number_format" value="#.0"/> <parameter key="encoding" value="SYSTEM"/> </operator> <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Write Excel" to_port="input"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> </operator> <connect from_op="Read Excel" from_port="output" to_op="Loop Values" to_port="input 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> </process> </operator> </process>
Scott
5 -
ok got a wacky Java error when I ran it (goodness knows why) but if I turn off parallelization, it writes new Excel sheets for each dealer on my desktop.
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.6.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34"> <parameter key="excel_file" value="/Users/scottgenzer-rapidminercommunity/Downloads/KPI Examples.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="America/New_York"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="Brand.true.polynominal.attribute"/> <parameter key="1" value="Country.true.polynominal.attribute"/> <parameter key="2" value="Region.true.polynominal.attribute"/> <parameter key="3" value="Market.true.integer.attribute"/> <parameter key="4" value="Data_Year.true.integer.attribute"/> <parameter key="5" value="Data_Month.true.polynominal.attribute"/> <parameter key="6" value="Dealer_ID.true.polynominal.attribute"/> <parameter key="7" value="Data_Point.true.polynominal.attribute"/> <parameter key="8" value="KPI.true.polynominal.attribute"/> <parameter key="9" value="Data.true.real.attribute"/> <parameter key="10" value="Rank.true.integer.attribute"/> <parameter key="11" value="Value.true.real.attribute"/> <parameter key="12" value="Report Month.true.polynominal.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="concurrency:loop_values" compatibility="9.6.000" expanded="true" height="82" name="Loop Values" width="90" x="179" y="34"> <parameter key="attribute" value="Dealer_ID"/> <parameter key="iteration_macro" value="loop_value"/> <parameter key="reuse_results" value="false"/> <parameter key="enable_parallel_execution" value="false"/> <process expanded="true"> <operator activated="true" class="filter_examples" compatibility="9.6.000" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="custom_filters"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"> <parameter key="filters_entry_key" value="Dealer_ID.equals.%{loop_value}"/> </list> <parameter key="filters_logic_and" value="true"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="write_excel" compatibility="9.6.000" expanded="true" height="103" name="Write Excel" width="90" x="179" y="34"> <parameter key="excel_file" value="/Users/scottgenzer-rapidminercommunity/Desktop/DealerID%{loop_value}.xlsx"/> <parameter key="file_format" value="xlsx"/> <enumeration key="sheet_names"/> <parameter key="sheet_name" value="RapidMiner Data"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="number_format" value="#.0"/> <parameter key="encoding" value="SYSTEM"/> </operator> <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Write Excel" to_port="input"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> </operator> <connect from_op="Read Excel" from_port="output" to_op="Loop Values" to_port="input 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> </process> </operator> </process>
Scott5
Answers
-
hi @miked - I'd just do a Loop Examples and a Filter Example Range inside that filters for only that row. Like this:
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34"> <parameter key="repository_entry" value="//Samples/data/Iris"/> </operator> <operator activated="true" class="loop_examples" compatibility="9.6.000" expanded="true" height="82" name="Loop Examples" width="90" x="179" y="34"> <parameter key="iteration_macro" value="example"/> <process expanded="true"> <operator activated="true" class="filter_example_range" compatibility="9.6.000" expanded="true" height="82" name="Filter Example Range" width="90" x="45" y="34"> <parameter key="first_example" value="%{example}"/> <parameter key="last_example" value="%{example}"/> <parameter key="invert_filter" value="false"/> </operator> <operator activated="true" class="write_excel" compatibility="9.6.000" expanded="true" height="103" name="Write Excel" width="90" x="179" y="34"> <parameter key="excel_file" value="ROW%{example}"/> <parameter key="file_format" value="xlsx"/> <enumeration key="sheet_names"/> <parameter key="sheet_name" value="RapidMiner Data"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="number_format" value="#.0"/> <parameter key="encoding" value="SYSTEM"/> </operator> <connect from_port="example set" to_op="Filter Example Range" to_port="example set input"/> <connect from_op="Filter Example Range" from_port="example set output" to_op="Write Excel" to_port="input"/> <portSpacing port="source_example set" spacing="0"/> <portSpacing port="sink_example set" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> </operator> <connect from_op="Retrieve Iris" from_port="output" to_op="Loop Examples" to_port="example set"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> </process> </operator> </process>
Scott2 -
Hi Scott,
That is a big help..thank you. I realize I should've been a bit more clear on what I needed to do so apologies for that. I've attached a sample set of data. Basically each "Dealer_ID" has a set of let's say 4 KPI's. I'd like to be able to have an excel file for each Dealer ID that contains their KPI's.
1 -
ah ok no problem @miked - very similar to what I did before:
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.6.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34"> <parameter key="excel_file" value="/Users/scottgenzer-rapidminercommunity/Downloads/KPI Examples.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="America/New_York"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="Brand.true.polynominal.attribute"/> <parameter key="1" value="Country.true.polynominal.attribute"/> <parameter key="2" value="Region.true.polynominal.attribute"/> <parameter key="3" value="Market.true.integer.attribute"/> <parameter key="4" value="Data_Year.true.integer.attribute"/> <parameter key="5" value="Data_Month.true.polynominal.attribute"/> <parameter key="6" value="Dealer_ID.true.polynominal.attribute"/> <parameter key="7" value="Data_Point.true.polynominal.attribute"/> <parameter key="8" value="KPI.true.polynominal.attribute"/> <parameter key="9" value="Data.true.real.attribute"/> <parameter key="10" value="Rank.true.integer.attribute"/> <parameter key="11" value="Value.true.real.attribute"/> <parameter key="12" value="Report Month.true.polynominal.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="concurrency:loop_values" compatibility="9.6.000" expanded="true" height="82" name="Loop Values" width="90" x="179" y="34"> <parameter key="attribute" value="Dealer_ID"/> <parameter key="iteration_macro" value="loop_value"/> <parameter key="reuse_results" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="filter_examples" compatibility="9.6.000" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="custom_filters"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"> <parameter key="filters_entry_key" value="Dealer_ID.equals.%{loop_value}"/> </list> <parameter key="filters_logic_and" value="true"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="write_excel" compatibility="9.6.000" expanded="true" height="103" name="Write Excel" width="90" x="179" y="34"> <parameter key="excel_file" value="DealerID%{loop_value}"/> <parameter key="file_format" value="xlsx"/> <enumeration key="sheet_names"/> <parameter key="sheet_name" value="RapidMiner Data"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="number_format" value="#.0"/> <parameter key="encoding" value="SYSTEM"/> </operator> <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Write Excel" to_port="input"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> </operator> <connect from_op="Read Excel" from_port="output" to_op="Loop Values" to_port="input 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> </process> </operator> </process>
Scott
5 -
Hi Scott,
That's awesome..thanks so much! I see this then puts everything into a collection ...am I then able to write to multiple excel files from there? Right now only one Excel file is being generated. Still a newbie so any help is appreciated. Thanks
-Mike0 -
ok got a wacky Java error when I ran it (goodness knows why) but if I turn off parallelization, it writes new Excel sheets for each dealer on my desktop.
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.6.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34"> <parameter key="excel_file" value="/Users/scottgenzer-rapidminercommunity/Downloads/KPI Examples.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="America/New_York"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="Brand.true.polynominal.attribute"/> <parameter key="1" value="Country.true.polynominal.attribute"/> <parameter key="2" value="Region.true.polynominal.attribute"/> <parameter key="3" value="Market.true.integer.attribute"/> <parameter key="4" value="Data_Year.true.integer.attribute"/> <parameter key="5" value="Data_Month.true.polynominal.attribute"/> <parameter key="6" value="Dealer_ID.true.polynominal.attribute"/> <parameter key="7" value="Data_Point.true.polynominal.attribute"/> <parameter key="8" value="KPI.true.polynominal.attribute"/> <parameter key="9" value="Data.true.real.attribute"/> <parameter key="10" value="Rank.true.integer.attribute"/> <parameter key="11" value="Value.true.real.attribute"/> <parameter key="12" value="Report Month.true.polynominal.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="concurrency:loop_values" compatibility="9.6.000" expanded="true" height="82" name="Loop Values" width="90" x="179" y="34"> <parameter key="attribute" value="Dealer_ID"/> <parameter key="iteration_macro" value="loop_value"/> <parameter key="reuse_results" value="false"/> <parameter key="enable_parallel_execution" value="false"/> <process expanded="true"> <operator activated="true" class="filter_examples" compatibility="9.6.000" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="custom_filters"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"> <parameter key="filters_entry_key" value="Dealer_ID.equals.%{loop_value}"/> </list> <parameter key="filters_logic_and" value="true"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="write_excel" compatibility="9.6.000" expanded="true" height="103" name="Write Excel" width="90" x="179" y="34"> <parameter key="excel_file" value="/Users/scottgenzer-rapidminercommunity/Desktop/DealerID%{loop_value}.xlsx"/> <parameter key="file_format" value="xlsx"/> <enumeration key="sheet_names"/> <parameter key="sheet_name" value="RapidMiner Data"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="number_format" value="#.0"/> <parameter key="encoding" value="SYSTEM"/> </operator> <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Write Excel" to_port="input"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> </operator> <connect from_op="Read Excel" from_port="output" to_op="Loop Values" to_port="input 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> </process> </operator> </process>
Scott5