Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"text mining visulaization"
emolano
Hi all,
Help for a new user! I'm doing some text mining and want to visualize the word frequency. how can I do this?
something like a tag cloud/word cloud would be nice.
This is what I have so far...
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#CRM Data Mining#ylt#/h3#ygt##ylt#p#ygt#.#ylt#/p#ygt#"/>
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_url" value="jdbc:mysql://test:3306/test"/>
<parameter key="username" value="test"/>
<parameter key="password" value="C2jgjgjh4JiellkjDOm4="/>
<parameter key="query" value="SELECT `ID_NUM`, `SHORT_DESC`, `PLATFORM` FROM `PROBLEM` WHERE platform is not null;"/>
<parameter key="label_attribute" value="PLATFORM"/>
<parameter key="id_attribute" value="ID_NUM"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="filter_nominal_attributes" value="true"/>
<parameter key="remove_original_attributes" value="true"/>
<parameter key="vector_creation" value="TermOccurrences"/>
<parameter key="output_word_list" value="C:\Documents and Settings\emolano\My Documents\rm_workspace\output"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="2"/>
</operator>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
<operator name="TermNGramGenerator" class="TermNGramGenerator">
</operator>
</operator>
</operator>
... I get the word frequency but not know hot to visualize it...
Thanks
e
Find more posts tagged with
AI Studio
Text Mining + NLP
Accepted answers
All comments
IngoRM
Hi,
I would suggest a parallel plot - at least if you have less than a few thousand terms. Alternatively, you could also use the CorpusBasedWeighting for each class and visualize the different weight vectors.
Cheers,
Ingo
derchief
Hi Ingo,
you said "the CorpusBasedWeighting for each class". How can I define such a class? In my case, the values of the Weighting are 0 or 1, which seems to deliver no usable results.
I have two further related questions:
1) In my setting, I am loading some txts and get a list of words with values like "avg = 0.029 +/- 0.167". I don´t understand exactly, what this means. Can I group the words using this information depending on their occurence in the source-files?
2) But most important is that I would like to seperate my txts in groups and visualize their analyses to compare them. For a tiny example, one group could be femal, one group is male text and I would like to compare the usage of words or combination of words (like: these are typical female phrases:...). Is there a possibility to tell rapid-miner which text belongs to which group and to consider this information?
Cheers,
Chris
Setting:
<operator name="Root" class="Process" expanded="yes">
<operator name="TextInput" class="TextInput" expanded="yes">
<list key="texts">
<parameter key="NoteLens" value="C:\Dokumente und Einstellungen\cniemann\Eigene Dateien\NoteLens Documents\store"/>
</list>
<parameter key="default_content_type" value="txt"/>
<parameter key="default_content_encoding" value="UTF-8"/>
<parameter key="default_content_language" value="german"/>
<parameter key="vector_creation" value="TermOccurrences"/>
<parameter key="id_attribute_type" value="short"/>
<list key="namespaces">
</list>
<parameter key="create_text_visualizer" value="true"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="GermanStopwordFilter" class="GermanStopwordFilter">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
</operator>
</operator>
<operator name="CorpusBasedWeighting" class="CorpusBasedWeighting">
<parameter key="normalize_weights" value="false"/>
<parameter key="class_to_characterize" value="3"/>
</operator>
</operator>
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups