"FP Growth - is there a
Gary_Hearne
New Altair Community Member
I am running FP-Growth through Create Association Rules.
There is one item that occurs very frequently because it is simply very popular, and so produces a lot of spurious or "uninteresting" rules. Is there any way to specify an item should not be used, rather than inisitng that an item must be used?
0
Best Answer
-
Update: I've realised that I can get the same effect by inserting one or more Replace operators between the example set and FP-Growth, and replacing the unwanted terms with nothing.But any more elegant solutions, or advice on more complex subsetting, still appreciated.1
Answers
-
Update: I've realised that I can get the same effect by inserting one or more Replace operators between the example set and FP-Growth, and replacing the unwanted terms with nothing.But any more elegant solutions, or advice on more complex subsetting, still appreciated.1
-
Hi @Gary_HearneYes, you could do this. However, by taking them out before the FP-Growth calculation this may change the found item sets and rules. It still may be a good idea though for larger data sets simply to reduce the run time.Alternatively, you can use the operator Association Rules to Example Set from the Converters extension:https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_convertersThis operator takes all rules and converts them into a regular data set which can then be filtered down with the operator Filter Examples. The process below shows a quick example (you need to install the extension first though to make this work...).Hope this helps,
Ingo<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="34"><br> <parameter key="repository_entry" value="//Samples/data/Golf"/><br> </operator><br> <operator activated="true" class="discretize_by_frequency" compatibility="9.2.000" expanded="true" height="103" name="Discretize" width="90" x="179" y="34"><br> <parameter key="return_preprocessing_model" value="false"/><br> <parameter key="create_view" value="false"/><br> <parameter key="attribute_filter_type" value="value_type"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="numeric"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="real"/><br> <parameter key="block_type" value="value_series"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_series_end"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="use_sqrt_of_examples" value="false"/><br> <parameter key="number_of_bins" value="3"/><br> <parameter key="range_name_type" value="long"/><br> <parameter key="automatic_number_of_digits" value="true"/><br> <parameter key="number_of_digits" value="-1"/><br> </operator><br> <operator activated="true" class="nominal_to_binominal" compatibility="9.2.000" expanded="true" height="103" name="Nominal to Binominal" width="90" x="313" y="34"><br> <parameter key="return_preprocessing_model" value="false"/><br> <parameter key="create_view" value="false"/><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="nominal"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="file_path"/><br> <parameter key="block_type" value="single_value"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="single_value"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="transform_binominal" value="false"/><br> <parameter key="use_underscore_in_name" value="false"/><br> </operator><br> <operator activated="true" class="concurrency:fp_growth" compatibility="9.2.000" expanded="true" height="82" name="FP-Growth" width="90" x="447" y="34"><br> <parameter key="input_format" value="items in dummy coded columns"/><br> <parameter key="item_separators" value="|"/><br> <parameter key="use_quotes" value="false"/><br> <parameter key="quotes_character" value="""/><br> <parameter key="escape_character" value="\"/><br> <parameter key="trim_item_names" value="true"/><br> <parameter key="min_requirement" value="support"/><br> <parameter key="min_support" value="0.95"/><br> <parameter key="min_frequency" value="100"/><br> <parameter key="min_items_per_itemset" value="1"/><br> <parameter key="max_items_per_itemset" value="0"/><br> <parameter key="max_number_of_itemsets" value="1000000"/><br> <parameter key="find_min_number_of_itemsets" value="true"/><br> <parameter key="min_number_of_itemsets" value="100"/><br> <parameter key="max_number_of_retries" value="15"/><br> <parameter key="requirement_decrease_factor" value="0.9"/><br> <enumeration key="must_contain_list"/><br> </operator><br> <operator activated="true" class="create_association_rules" compatibility="9.2.000" expanded="true" height="82" name="Create Association Rules" width="90" x="581" y="34"><br> <parameter key="criterion" value="confidence"/><br> <parameter key="min_confidence" value="0.5"/><br> <parameter key="min_criterion_value" value="0.8"/><br> <parameter key="gain_theta" value="2.0"/><br> <parameter key="laplace_k" value="1.0"/><br> </operator><br> <operator activated="true" class="converters:rules_2_example_set" compatibility="0.5.000" expanded="true" height="82" name="Association Rules to ExampleSet" width="90" x="715" y="34"/><br> <operator activated="true" class="filter_examples" compatibility="9.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="849" y="34"><br> <parameter key="parameter_expression" value=""/><br> <parameter key="condition_class" value="custom_filters"/><br> <parameter key="invert_filter" value="false"/><br> <list key="filters_list"><br> <parameter key="filters_entry_key" value="Premises.does_not_contain.Outlook"/><br> <parameter key="filters_entry_key" value="Conclusion.does_not_contain.Outlook"/><br> </list><br> <parameter key="filters_logic_and" value="true"/><br> <parameter key="filters_check_metadata" value="true"/><br> </operator><br> <connect from_op="Retrieve Golf" from_port="output" to_op="Discretize" to_port="example set input"/><br> <connect from_op="Discretize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/><br> <connect from_op="Nominal to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/><br> <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/><br> <connect from_op="Create Association Rules" from_port="rules" to_op="Association Rules to ExampleSet" to_port="rules input"/><br> <connect from_op="Association Rules to ExampleSet" from_port="example set" to_op="Filter Examples" to_port="example set input"/><br> <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="42"/><br> </process><br> </operator><br></process>
1