How to filter token by total occurrences?

cc4699
cc4699 New Altair Community Member
edited November 2024 in Community Q&A

Prune method on the "Process Documents" is filtering the tokens by the number of document occurrences. How can I filter them by "Total Occurrences" instead?

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member

    You should be able to review the output wordlist and identify the tokens you want to eliminate by sorting by total occurrences, and create a small text file with those words.  Then you can use Filter Stopwords (dictionary) to suppress those tokens from your document processing. 

     

  • cc4699
    cc4699 New Altair Community Member

    Thanks for the reply. I see what you are saying however,the problem is that I cannot filter or use the total occurrences field. Sorting seems to be an unnecessary step since I need to filter the ones under certain threshold.

     

    Sorting is also not working properly. There is nothing in populated in attribute list for sort operator but if I put total_occurrences, it works with a warning.