[SOLVED] Text mining: Does pruning make sense at all?

chaosbringer
chaosbringer New Altair Community Member
edited November 5 in Community Q&A
Hi,
i have a question (of cause):
The process document from text-operator can create fectors using the tf-idf-measure.
Further, it allows pruning the text beforehand based on e.g. the occurence of terms.
So, does it make sense at all to prune the text from frequen terms, when i want to use the tf-idf-measure?
Does pruning beforehand bias the resulting tf-idf-values?

Thank you very much,
Julian
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi Julian,

    often pruning does help, but there is no general answer. Just put the Process Documents operator into a Parameter Optimization and experiment with the parameter settings until you get good results.

    Best, Marius
  • chaosbringer
    chaosbringer New Altair Community Member
    Thank you for your answer.
    It seems to me that this is a bit fishing/dredging for data, but obviously i have to live with that. Thank you.

    Best,
    Julian