A program to recognize and reward our most engaged community members
hi,
i have issues in word counting, when process the text some words are like
how can i remove it
hello @rajbanokhan - can you please post your XML process so we can see? Please see "READ BEFORE POSTING" pane on the right hand side of your Reply window for instructions.
Scott
i dont know about pruning but i use these steps
Tokenize Nonletters (Tokenize) Tokenize Linguistic (Tokenize) Filter Stopwords (English) Filter Tokens (by Length) Stem (Porter) Transform Cases
Pruning is available via the Process Documents from Data operator. Your Tokenize, Stem, etc should all be inside that subprocess.
yes i apply prune method and its work thank you so much
hi i am doing textmining i want the most frequent word value come first. like if
banana occurences 10 and car occurence is 8 then it come like
banana 10
car 8
how to count from higher to lower
If you use the wordlist to data operator your list becomes an example set, and then you can use the sort and filter operators.
hi
i am using sort technique for sorting but in attribute names the options are
label
meta data date
meta data file
meta data path
i select label option but i am not get the sorting data or you can say most frequent value
when i used wordlist to data then i use sort operator the parameter "attribute names" were not show their options
so thats why i only use process documents from files and sort but sort doesnt sort my data
i am doing textmining. i use process document from files operator. when i run the process it gives me a list of words but i dont want the whole list of words. i just want select my own words from the list which i want. suppose i want words cat, dog, mouse, table chair. how can i get these words only these words from list.
Hi,
have a look at the "Filter Tokens Using Example Set" operator of operator toolbox. This should do the trick.
Cheers,
Martin
thank you to focus on my question.
but i dont get the operator .
is this an filter example operator or filter tokens (by content) can you guide me because filter example is not working it show empty (no words are show)
i use both one by one and filter tokens by content work. and thanks again for
did you install operator toolbox extension?
BR,
hi sir
i install textmining extension and i didnt find this extension in searching of extension in marketplace
hi didnt find operator tool box extension
hello @rajbanokhan so both of those extensions can be found in the marketplace. If you open RapidMiner Studio, you should see a menu at the top called "Extensions". Choose the first item "Marketplace (Updates and Extensions)... Then search for "Text Processing" and "Operator Toolbox".
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34"> <parameter key="text" value="hi how i find or count the total number of words in one document and then in second and then third and so on?"/> </operator> <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="34"/> <operator activated="true" class="text:extract_token_number" compatibility="8.1.000" expanded="true" height="68" name="Extract Token Number" width="90" x="313" y="34"/> <connect from_op="Create Document" from_port="output" to_op="Tokenize" to_port="document"/> <connect from_op="Tokenize" from_port="document" to_op="Extract Token Number" to_port="document"/> <connect from_op="Extract Token Number" from_port="document" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>