"Filtering by term frequency"

samtpfote
samtpfote New Altair Community Member
edited November 5 in Community Q&A
Hello everybody,

I would like to get all Terms of a html-collection that appear in more than 99% of the documents.

But how can I:
  -  get the number of documents in my collection and
  -  caluclate the value #Term (in documents )/#documents?

It would be really great if someone could help me!

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hello samtpfote,

    you can use Wordlist to Data to convert the wordlist output of Process Documents to a dataset. Then you can be creative with Generate Attributes and Filter Examples to generate/extract all the information that you need.

    The total number of documents corresponds the the number of examples in the exa output of Process Documents. You can extract that number into a macro with the Extract Macro operator.

    If you have further questions, please come back!

    All the best,
    Marius