A program to recognize and reward our most engaged community members
Hello I want to extract five words with the highest tf-idf in the output tf-idf matrix. How should i do ??? Thanks
and how remove '@' , '#' charachters and url from sentence in rapidminer???
Hi @ahootanha,
To answer to your first question, you can find here a process which perform what you want to do :
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34"> <parameter key="connection" value="dkk"/> <parameter key="query" value="tesla"/> </operator> <operator activated="true" class="nominal_to_text" compatibility="8.1.000" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Text"/> </operator> <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Text"/> </operator> <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="514" y="34"> <list key="specify_weights"/> <process expanded="true"> <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="380" y="34"/> <connect from_port="document" to_op="Tokenize" to_port="document"/> <connect from_op="Tokenize" from_port="document" to_port="document 1"/> <portSpacing port="source_document" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <operator activated="true" class="text:wordlist_to_data" compatibility="8.1.000" expanded="true" height="82" name="WordList to Data" width="90" x="648" y="85"/> <operator activated="true" class="sort" compatibility="8.1.000" expanded="true" height="82" name="Sort" width="90" x="782" y="85"> <parameter key="attribute_name" value="total"/> <parameter key="sorting_direction" value="decreasing"/> </operator> <operator activated="true" class="filter_example_range" compatibility="8.1.000" expanded="true" height="82" name="Filter Example Range" width="90" x="916" y="85"> <parameter key="first_example" value="1"/> <parameter key="last_example" value="5"/> </operator> <connect from_op="Search Twitter" from_port="output" to_op="Nominal to Text" to_port="example set input"/> <connect from_op="Nominal to Text" from_port="example set output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/> <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/> <connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/> <connect from_op="WordList to Data" from_port="example set" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/> <connect from_op="Sort" from_port="original" to_port="result 3"/> <connect from_op="Filter Example Range" from_port="example set output" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> </process> </operator></process>
I hope it helps,
Regards,
Lionel
If you Tokenize on non-letters, all the special characters will be stripped from the resulting words that comprise the word vector.
@ahootanha what @Telcontar120 says is true. My suggestion is the use the Specify Characters in the Tokenize operator to select what to split on. I do a lot of Twitter extraction and I don't want #hashtag to get wiped out by default, so I split on stuff like !.?"[ but not on #.
Hellothank youButI am a beginnerI did not understand where to use these codesHow to write a regular expression in the filter token operator?Please guideThanks
Can you give more guidance? And an example
@Thomas_Ott wrote: @ahootanha what @Telcontar120 says is true. My suggestion is the use the Specify Characters in the Tokenize operator to select what to split on. I do a lot of Twitter extraction and I don't want #hashtag to get wiped out by default, so I split on stuff like !.?"[ but not on #.
@ahootanha grab the process here: http://www.neuralmarkettrends.com/use-rapidminer-discover-twitter-content/
Hellothank you very muchButI do not know where to use these code in my rapidshare program?Please guideSend me a screenshot of the implementation of operatorsThanks
@Thomas_Ott wrote: @ahootanha grab the process here: http://www.neuralmarkettrends.com/use-rapidminer-discover-twitter-content/
Hellothank you very muchButI do not know where to use these code in my rapidminer program?Please guideSend me a screenshot of the implementation of operatorsThanks
@ahootanha take a look at this thread: https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Import-XML-code-to-process/m-p/32606#M23194
hello @ahootanha welcome to the community! Some quick recommendations for you (pretty much exactly what @Thomas_Ott was recommending)...
• Post your XML process here in this thread (see https://youtu.be/KkgB5QXWXJ8 and "Read Before Posting" on right when you reply)• Attach your dataset if possible (use a fictionalized version if there are privacy concerns)• Make sure you have all dependent extensions installed (see https://youtu.be/pjBqG3xtXx4)
Scott
HelloI saw links to YouTubeI installed all the packagesBut still can notExtract ten repetitive words from the tf-idf matrixPlease guideThanks
Hello I saw links to YouTube I installed all the packages But still can not Extract ten repetitive words from the tf-idf matrix Please guide Thanks
Should you run the program afterWrite xml code?how?
hello @ahootanha I really need to see your data and your XML process in order to help. Can you please post both here in this thread?
HelloThankI did not use codingI just entered the data and used the process document (TF-IDF)Thank you for helping mePlease