Mining binary data

neilduggan
neilduggan New Altair Community Member
edited November 5 in Community Q&A
Hi

I have some opinion poll data on internet usage (26 columns i.e. questions, approx 800 rows i.e. respondents) with answers of "true" or "false" or blank for 24 of the questions. I've binned the "age" column into three bins and the final column is location / state (which has ~ 40 values, currently numerical)

A couple of questions:

1. At the moment, the data is setup such that "sex" contains "true" for female & "false" for male - should this be separate true / false columns for male and female?

1. What's the best RapidMiner operator to mine this data for trends (e.g. young / old women / men are more likely to XXX)? I've tried using "w-apriori" but it gives me very basic rules. I've also tried "FP-growth" + "Create Association Rules" and it works slightly better but still not great. I've different attributes to "label" and it makes some difference but nothing major.

3. Is it possible to use RapidMiner to create rules in relation to the respondents location as the data stands? Or do I need to create a column for each state with true / false for each respondent?

Apologies if these are stupid questions!!  :o

Thanks

Neil
Tagged:

Answers

  • neilduggan
    neilduggan New Altair Community Member
    Anyone??  ???
  • fras
    fras New Altair Community Member
    To see some trends in your data you should try one of RapidMiners Charts.
    If you would like to train a model I would suggest a decision tree.
    But dont forget to set the role "label" to one of your 24 question columns.
  • neilduggan
    neilduggan New Altair Community Member
    Thanks fras, I'll give the decision tree a go.

    Do I need to change the way I've setup the "sex" column (and other columns)? Or is it ok the way it is?