"Text pattern identification"
ratheesan
New Altair Community Member
Hello,
I have a text document related with insurance.In that data there is some words like "No alcohol content" and "alcohol content".While working with this documents the RM considering all "alcohol" together.How can I count the number of "alcohol" with neighbor term"no".
Thanks
Ratheesan
I have a text document related with insurance.In that data there is some words like "No alcohol content" and "alcohol content".While working with this documents the RM considering all "alcohol" together.How can I count the number of "alcohol" with neighbor term"no".
Thanks
Ratheesan
Tagged:
0
Answers
-
Hello Ratheesan,
you can use the RapidMiner text preprocessing operator TermNGramGenerator in order to not only count individual words, but also word pairs or other multi-word terms. Alternatively or in addition, you can also use a TokenReplace operator before the StringTokenizer to map multi-word terms like no alcohol to one word tokens:
Cheers,
operator name="Root" class="Process" expanded="yes">
<operator name="TextInput" class="TextInput" expanded="yes">
<list key="texts">
</list>
<list key="namespaces">
</list>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="Replace 'no alcohol' by 'noalcohol' to count it us one new word" class="TokenReplace">
<list key="replace_dictionary">
<parameter key="no alcohol" value="noalcohol"/>
</list>
</operator>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="Consider pairs of words in addition to individual words" class="TermNGramGenerator">
</operator>
</operator>
</operator>
Ralf0 -
Hello Ralf ,
I really appreciate your help.It is working fine.Here I am getting all the combinations of words such as single word,2 words,3 words etc.Here we can control the maximum number of words only.But I need to extract the combination of 3 words onwards.How can I achieve this goal.
Thanks
Ratheesan0