Applying an operation to a large example set

mikeb
mikeb New Altair Community Member
edited November 5 in Community Q&A
Hi,
I have an example set with 10,000 examples and 3,800 attributes.  These are document file names and the TF-IDF values for 3800 terms in those documents.  I want to raise each TF-IDF value by the power of 0.75.  Is there a simple, fast way to do this?

What I have tried is looping through each of the attributes and generating a new attribute that is the TF-IDF value raised by the power of 0.75, then looping through the resulting collection and using recall, join, and remember operators to join each collection example to the previous ones as I iterate through the loop.  The problem is that this slows down and eventually stalls out or crashes as the iterations increase and the joined example set gets larger and larger.  So I am wondering if there is some more efficient way to do the (seemingly) simple thing of applying one operation like this to every value in the example set.

I should also mention that I looked at the Generate Function Set operator.  This looks like what I want, except that the specific operation I want to do is not included as one of the choices in that operator.

Thanks in advance for your help.
Tagged:

Answers

  • Hello mikeb

    Groovy is the answer. Use the Script operator with this code.
    ExampleSet exampleSet = operator.getInput(ExampleSet.class);

    for (Attribute attribute : exampleSet.getAttributes()) {
        String name = attribute.getName();
        for (Example example : exampleSet) {
            example[name] = (example[name])**0.75;
        }
    }

    return exampleSet;
    I did an experiment with 10,000 examples by 3,800 attributes and it took 2 minutes on my laptop. Obviously other's results may vary :)

    regards

    Andrew
  • mikeb
    mikeb New Altair Community Member
    Hi awchisholm,
    Thanks!  I think that will work for me.
    mikeb