[SOLVED] Select attributes only shows metadata and no variables?

kasper2304
kasper2304 New Altair Community Member
edited November 2024 in Community Q&A
Hi out there.

I am working on a text mining project where i need to create a subset of variables for further dimensionality reduction before using training my model.  Having watched the videos online i have come to the conclusion that the "select attributes" node is the one i have to use.

Here is what i have done so far.

I have created two folders on my hard drive. One folder containing positive cases and another folder containing negative cases giving me a total of 300 cases. Somehow RapidMiner manages to get two extra cases which i believe is the "folders" themselves which i will have to remove, but first things first.

I used "Process documents from files" and loaded the two directories with class name "1" and "0". Within the "process documents from files" node i have "transform cases", "tokenize", "filter stop words", "extract token number", "extract length", "aggregate token length", "stem snowball" and "filter tokens".

The settings of "process documents from files" node are:

use file extension as type = TRUE
create wor dvector = TRUE
add meta information = TRUE
prune method = PERCENTUAL

This gives me around 150 variables where i need to kick some of them out before doing dimensionality reduction. As an example "names" does not make much sense to do any analysis with in my case.

THE PROBLEM:

The problem arises when i use the "select attribute" node. It should in my world be straight forward to attach the node to my "process documents from files" node and then simply select/de-select the variables i want to continue with. BUT the only variables that is displayed when i try to use subset option is four metadata attributes... In my world all the 150 variables should be displayed... So is this a bug or do i have some settings wrong somewhere?

Best
Kasper
Tagged:

Answers

  • Andrew2
    Andrew2 New Altair Community Member
    Hello

    The attribute names are determined from the data at run time so the meta data can't get hold of them. A work around is to store the example set in the repository using "Store" and fetch it again using "Retrieve".

    regards

    Andrew
  • kasper2304
    kasper2304 New Altair Community Member
    Thanks Andrew.

    I was actually just about to try that work around.

    Best
    Kasper

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.