"Recognition of id's in FP Growth"
spoorthy9547
New Altair Community Member
Hello,
I am dealing with transaction data where each transaction has an id.Every id has three or more items dealing with the transaction.So,in the excel sheet i will have an id repeated 3 times if the transaction contains 3 items.Now,i have to find frequent item sets in the transactions.I tried to use FP growth algorithm and i dont get the expected output.Is there a way where an id is grouped with all its transactions?
Thanks,
I am dealing with transaction data where each transaction has an id.Every id has three or more items dealing with the transaction.So,in the excel sheet i will have an id repeated 3 times if the transaction contains 3 items.Now,i have to find frequent item sets in the transactions.I tried to use FP growth algorithm and i dont get the expected output.Is there a way where an id is grouped with all its transactions?
Thanks,
0
Answers
-
Hi,
You'll need to Pivot your data into Binominals first. If that means nothing to you then you need to check out the examples; believe me, it saves time in the long run.
Good luck!0 -
Thanks for your reply!!
I tried to use the Market Basket Analysis template and everything works fine till the aggregate operator.I gave my input an excel sheet with 3 columns CustomerId,itemId,itemCount. I have my CutomerId as integer and i mentioned it as an id,itemId as nominal and itemCount as integer.When i give my input from Aggegate to pivot its throwing me an error stating "the exampleset doesn't contain itemid and itemcount".
What should i do??Here is my XML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<description>Reads a data set containing of three columns: customerId, itemId, and itemCount. The item count is summed up per item and customer, pivoting is performed to have one attribute per item, and finally, association rules are generated.</description>
<process expanded="true" height="578" width="840">
<operator activated="true" class="read_excel" compatibility="5.2.006" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
<parameter key="excel_file" value="C:\Users\mc29546\Documents\SPOORTHY\Grill\EXCEL\test1.xlsx"/>
<parameter key="imported_cell_range" value="A1:C7"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="CustomerId.true.integer.id"/>
<parameter key="1" value="itemId.true.nominal.attribute"/>
<parameter key="2" value="itemCount.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="set_macro" compatibility="5.2.006" expanded="true" height="76" name="Define Item Count" width="90" x="179" y="30">
<parameter key="macro" value="%{itemCountAttributeName}"/>
<parameter key="value" value="itemCount"/>
</operator>
<operator activated="true" class="set_macro" compatibility="5.2.006" expanded="true" height="76" name="Define Customer" width="90" x="313" y="30">
<parameter key="macro" value="customerIdAttributeName"/>
<parameter key="value" value="CustomerId"/>
</operator>
<operator activated="true" class="set_macro" compatibility="5.2.006" expanded="true" height="76" name="Define Item" width="90" x="447" y="30">
<parameter key="macro" value="itemIdAttributeName"/>
<parameter key="value" value="itemId"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.1.006" expanded="true" height="76" name="Aggregate" width="90" x="45" y="210">
<list key="aggregation_attributes">
<parameter key="itemCount" value="sum"/>
</list>
<parameter key="group_by_attributes" value="CustomerId|itemId"/>
</operator>
<operator activated="true" breakpoints="after" class="pivot" compatibility="5.2.006" expanded="true" height="76" name="Pivot" width="90" x="179" y="210">
<parameter key="group_attribute" value="CustomerId"/>
<parameter key="index_attribute" value="itemId"/>
</operator>
<operator activated="false" class="replace_missing_values" compatibility="5.2.006" expanded="true" height="94" name="Replace Missing Values" width="90" x="447" y="300">
<parameter key="default" value="zero"/>
<list key="columns"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.2.006" expanded="true" height="76" name="Set Role" width="90" x="313" y="210">
<parameter key="name" value="CustomerId"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles">
<parameter key="CustomerId" value="id"/>
<parameter key="itemId" value="regular"/>
<parameter key="itemCount" value="regular"/>
</list>
</operator>
<operator activated="true" class="numerical_to_binominal" compatibility="5.2.006" expanded="true" height="76" name="Numerical to Binominal" width="90" x="581" y="210">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|sum(itemCount)"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.2.006" expanded="true" height="94" name="Nominal to Binominal" width="90" x="727" y="225">
<parameter key="attributes" value="|itemId"/>
</operator>
<operator activated="true" class="fp_growth" compatibility="5.2.006" expanded="true" height="76" name="FP-Growth" width="90" x="581" y="75">
<parameter key="positive_value" value="true"/>
<parameter key="min_support" value="0.1"/>
</operator>
<operator activated="false" class="create_association_rules" compatibility="5.2.006" expanded="true" height="76" name="Create Association Rules" width="90" x="715" y="75"/>
<connect from_op="Read Excel" from_port="output" to_op="Define Item Count" to_port="through 1"/>
<connect from_op="Define Item Count" from_port="through 1" to_op="Define Customer" to_port="through 1"/>
<connect from_op="Define Customer" from_port="through 1" to_op="Define Item" to_port="through 1"/>
<connect from_op="Define Item" from_port="through 1" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Pivot" to_port="example set input"/>
<connect from_op="Pivot" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="180"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Hi there,
There is an example in the Samples for FPGrowth, I'd start there.
Good luck.
0