"text mining visulaization"

New Altair Community Member

May 6, 2009

Updated Nov 5, 2024 by Jocelyn

Hi all,
Help for a new user! I'm doing some text mining and want to visualize the word frequency. how can I do this?
something like a tag cloud/word cloud would be nice.
This is what I have so far...
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#CRM Data Mining#ylt#/h3#ygt##ylt#p#ygt#.#ylt#/p#ygt#"/>
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_url" value="jdbc:mysql://test:3306/test"/>
<parameter key="username" value="test"/>
<parameter key="password" value="C2jgjgjh4JiellkjDOm4="/>
<parameter key="query" value="SELECT `ID_NUM`, `SHORT_DESC`, `PLATFORM` FROM `PROBLEM` WHERE platform is not null;"/>
<parameter key="label_attribute" value="PLATFORM"/>
<parameter key="id_attribute" value="ID_NUM"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="filter_nominal_attributes" value="true"/>
<parameter key="remove_original_attributes" value="true"/>
<parameter key="vector_creation" value="TermOccurrences"/>
<parameter key="output_word_list" value="C:\Documents and Settings\emolano\My Documents\rm_workspace\output"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="2"/>
</operator>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
<operator name="TermNGramGenerator" class="TermNGramGenerator">
</operator>
</operator>
</operator>

... I get the word frequency but not know hot to visualize it...
Thanks
e

Find more posts tagged with

AI Studio

Text Mining + NLP

Sort by:

1 - 2 of 21

IngoRM

New Altair Community Member

May 12, 2009

Hi,

I would suggest a parallel plot - at least if you have less than a few thousand terms. Alternatively, you could also use the CorpusBasedWeighting for each class and visualize the different weight vectors.

Cheers,
Ingo

derchief

New Altair Community Member

May 18, 2009

Hi Ingo,

you said "the CorpusBasedWeighting for each class". How can I define such a class? In my case, the values of the Weighting are 0 or 1, which seems to deliver no usable results.

I have two further related questions:

1) In my setting, I am loading some txts and get a list of words with values like "avg = 0.029 +/- 0.167". I don´t understand exactly, what this means. Can I group the words using this information depending on their occurence in the source-files?

2) But most important is that I would like to seperate my txts in groups and visualize their analyses to compare them. For a tiny example, one group could be femal, one group is male text and I would like to compare the usage of words or combination of words (like: these are typical female phrases:...). Is there a possibility to tell rapid-miner which text belongs to which group and to consider this information?

Cheers,
Chris

Setting:

<operator name="Root" class="Process" expanded="yes">
<operator name="TextInput" class="TextInput" expanded="yes">
<list key="texts">
<parameter key="NoteLens" value="C:\Dokumente und Einstellungen\cniemann\Eigene Dateien\NoteLens Documents\store"/>
</list>
<parameter key="default_content_type" value="txt"/>
<parameter key="default_content_encoding" value="UTF-8"/>
<parameter key="default_content_language" value="german"/>
<parameter key="vector_creation" value="TermOccurrences"/>
<parameter key="id_attribute_type" value="short"/>
<list key="namespaces">
</list>
<parameter key="create_text_visualizer" value="true"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="GermanStopwordFilter" class="GermanStopwordFilter">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
</operator>
</operator>
<operator name="CorpusBasedWeighting" class="CorpusBasedWeighting">
<parameter key="normalize_weights" value="false"/>
<parameter key="class_to_characterize" value="3"/>
</operator>
</operator>

"text mining visulaization"

Find more posts tagged with

Quick Links