Filtering ExampleSet Keywords

Mireille
Mireille New Altair Community Member
edited November 2024 in Community Q&A

Hi!

I have created a process thanks to the "Process Documents from Files" operator, and included Tokenize, Filter Stopwords, Filter Tokens, Transform Cases, Create n-Grams and Stem. I also selected the vector creation option with TF-IDF. Since I am trying to find keywords in dozens of documents, in the results I am getting an ExampleSet chart with over 5000 columns. I was wondering if anyone knew how I could filter these results, so that I could have the top 100 relevant keywords or so? 

Or alternatively, if there was a way to graphically visualize all the keywords, so that the most important would become obvious?

Any help would be greatly appreciated:)

 

 

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Attach a Wordlist to Data operator to the WOR port of your Process Documents operator and then use a Sort Operator to sort them in descending fashion. You will get an example set of the most frequent words.

  • Mireille
    Mireille New Altair Community Member

    Thank you very much for your help!

     

    I am just a little confused as to how to use the "sort" operator because it asks me which attribute to sort, however, each word is listed as a different attribute. 

    Thanks again!

     

     

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Sort by Total. 

  • Mireille
    Mireille New Altair Community Member

    Thank you,

     

    Is there any way to order words by their TF-IDF weighted value rather than their frequency?

     

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.