Create Association Process Error
I am trying to run a process to create associations but am getting "no results found". I am using excel files running through Process Documents from Files, then to Numerical to Binomial, to fp-growth, and then create association rules. I have tried changing the number of files I use and changing min support on fp-growth but all changes still return no results. Any suggestions on how to get the process to run? Thanks!
Answers
-
hello @jes_craig_94 - welcome to the community. We would be happy to help you. Can you please post your XML process (see instructions titled "Read Before Posting" on the right) and, if possible, the data files you are trying to use?
Scott
0 -
Here is the XML and attached are the file. Thanks for the help @sgenzer!
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Files" width="90" x="179" y="136">
<list key="text_directories">
<parameter key="Groups" value="C:\Users\Jessica\Documents\MKT 861\Focus Groups"/>
</list>
<parameter key="vector_creation" value="Binary Term Occurrences"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="85"/>
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases" width="90" x="246" y="85"/>
<operator activated="true" class="text:filter_stopwords_german" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (German)" width="90" x="447" y="85"/>
<operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="581" y="85"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (German)" to_port="document"/>
<connect from_op="Filter Stopwords (German)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="numerical_to_binominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Binominal (2)" width="90" x="380" y="85">
<parameter key="max" value="1.0"/>
</operator>
<operator activated="true" class="fp_growth" compatibility="7.6.001" expanded="true" height="82" name="FP-Growth" width="90" x="514" y="85">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.1"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="7.6.001" expanded="true" height="82" name="Create Association Rules" width="90" x="648" y="85"/>
<connect from_port="input 1" to_op="Process Documents from Files" to_port="word list"/>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Numerical to Binominal (2)" to_port="example set input"/>
<connect from_op="Numerical to Binominal (2)" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
<connect from_op="Create Association Rules" from_port="item sets" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
hello @jes_craig_94 - thanks for posting. OK it's pretty clear why you're getting no results. If you put a breakpoint after Process Documents from Files, you will see that you only have three examples in your example set, one from each file:
So RapidMiner does not have much to go on here in order to create association rules. My hunch is that this is not what you intended - you probably wanted each row in your csv to be a separate example, not each file. That operator, at least in my experience, is when you have a large folder of individual text files that you want to analyze. If my hunch is correct, you're going to likely be more successful with something like this:
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="concurrency:loop_files" compatibility="7.6.001" expanded="true" height="82" name="Loop Files" width="90" x="45" y="34">
<parameter key="directory" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/jes_craig_94"/>
<parameter key="filter_type" value="regex"/>
<parameter key="filter_by_regex" value=".*.csv"/>
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="246" y="34">
<parameter key="column_separators" value=","/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<connect from_port="file object" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_port="output 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" breakpoints="after" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append" width="90" x="179" y="34"/>
<operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="85"/>
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases" width="90" x="313" y="85"/>
<operator activated="true" class="text:filter_stopwords_german" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (German)" width="90" x="514" y="85"/>
<operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="648" y="85"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (German)" to_port="document"/>
<connect from_op="Filter Stopwords (German)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="numerical_to_binominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Binominal (2)" width="90" x="447" y="34">
<parameter key="max" value="1.0"/>
</operator>
<operator activated="true" class="fp_growth" compatibility="7.6.001" expanded="true" height="82" name="FP-Growth" width="90" x="581" y="34">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.1"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="7.6.001" expanded="true" height="82" name="Create Association Rules" width="90" x="715" y="85"/>
<connect from_op="Loop Files" from_port="output 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Append" from_port="merged set" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Numerical to Binominal (2)" to_port="example set input"/>
<connect from_op="Numerical to Binominal (2)" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
<connect from_op="Create Association Rules" from_port="item sets" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>This process does not work (yet) as there has to be some ETL done to get the csv files to append properly. I will leave that to you.
Good luck!
Scott0 -
@sgenzer thank you so much for the help! I was able to load the files and run through the breakpoint point after the "Append" but I am running into an issue on the "Process Documents from Data" and it is not returning any attributes. I am using the suggestion you gave and it still gave me no results. What should I expect to see? And do you have any idea what I could be doing wrong? I assume this data will produce some results but again, am unsure.
0 -
hmm...you have lots of examples now and they are all binomial now, right? Can you share your new process?
Scott
0