[SOLVED] Text mining: Does pruning make sense at all?
chaosbringer
New Altair Community Member
Hi,
i have a question (of cause):
The process document from text-operator can create fectors using the tf-idf-measure.
Further, it allows pruning the text beforehand based on e.g. the occurence of terms.
So, does it make sense at all to prune the text from frequen terms, when i want to use the tf-idf-measure?
Does pruning beforehand bias the resulting tf-idf-values?
Thank you very much,
Julian
i have a question (of cause):
The process document from text-operator can create fectors using the tf-idf-measure.
Further, it allows pruning the text beforehand based on e.g. the occurence of terms.
So, does it make sense at all to prune the text from frequen terms, when i want to use the tf-idf-measure?
Does pruning beforehand bias the resulting tf-idf-values?
Thank you very much,
Julian
Tagged:
0
Answers
-
Hi Julian,
often pruning does help, but there is no general answer. Just put the Process Documents operator into a Parameter Optimization and experiment with the parameter settings until you get good results.
Best, Marius0 -
Thank you for your answer.
It seems to me that this is a bit fishing/dredging for data, but obviously i have to live with that. Thank you.
Best,
Julian0