"Preprocessing for FPGrowth"

guilhermecr
guilhermecr New Altair Community Member
edited November 2024 in Community Q&A
I am working with basket analisys. I am already generating the binomial format using other programs.

What RM operator can I use to transform the dataset from this format:

1,3
2,3,4
1,2,3

to this:

1,0,1,0
0,1,1,1
1,1,1,0

Thanks in advance :)

Answers

  • land
    land New Altair Community Member
    Hi,
    your data format is called dense, because it only saves the indices of the columns unequal 0. RapidMiner supports a dense format, but it slightly differs from yours. If you could bring your data in the following format, you can easily load it:
    1:1 3:1
    2:1 3:1 4:1
    1:1 2:1 3:1

    If you then use the operator SparseFormatExampleSource with the parameter format set to no_label and the parameter dimension set to the number of dimensions (the highest number occuring in your file) then it works.

    Greetings,
      Sebastian
  • guilhermecr
    guilhermecr New Altair Community Member
    I am starting with market basket, so I have been practicing with datasets available in the internet.
    I have used the 'retail' data set available at http://fimi.cs.helsinki.fi/data/retail.dat, which is in the dense format.

    But since I will get my own data from a friend's shop, my question is:

    What is the best format for a market basket analysis with RM?


    Thanks

    PS: I will probaly use Apriori and FPGrowth.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.