Hello,
I am new to machine learning and its methods, so excuse me if my question sounds stupid. So, I have some data(750 rows in an Excel sheet) on which i first do some pre processing (stopword filtering, stemming etc), after i get the output from these operators i want to do SOM clustering so I can see the results and make a feature vector of the most important concepts in my document. However all i see in the result from the SOM operator is just different colored "dots" spread across the map, and i have no idea what they mean, if anyone can help me in this regard, I would be very grateful.
Here is the code from my experiment:
<operator name="Root" class="Process" expanded="yes"> <operator name="ExcelExampleSource" class="ExcelExampleSource"> <parameter key="excel_file" value="C:\Documents and Settings\Lexusboy\My Documents\TCV\Postings_short.xls"/> <parameter key="first_row_as_names" value="true"/> <parameter key="id_column" value="4"/> </operator> <operator name="StringTextInput" class="StringTextInput" expanded="yes"> <parameter key="filter_nominal_attributes" value="true"/> <parameter key="remove_original_attributes" value="true"/> <parameter key="default_content_language" value="english"/> <parameter key="vector_creation" value="BinaryOccurrences"/> <parameter key="return_word_list" value="true"/> <parameter key="output_word_list" value="C:\Documents and Settings\Lexusboy\My Documents\RapidMiner\pre processing\word_list"/> <parameter key="id_attribute_type" value="short"/> <list key="namespaces"> </list> <parameter key="create_text_visualizer" value="true"/> <operator name="StringTokenizer" class="StringTokenizer"> </operator> <operator name="GermanStopwordFilter" class="GermanStopwordFilter"> </operator> <operator name="StopwordFilterFile" class="StopwordFilterFile" activated="no"> <parameter key="file" value="C:\Documents and Settings\Lexusboy\My Documents\RapidMiner\pre processing\stopwords.txt"/> </operator> <operator name="GermanStemmer" class="GermanStemmer"> </operator> <operator name="TokenLengthFilter" class="TokenLengthFilter"> <parameter key="min_chars" value="3"/> </operator> </operator> <operator name="SOMDimensionalityReduction" class="SOMDimensionalityReduction"> <parameter key="return_preprocessing_model" value="true"/> <parameter key="number_of_dimensions" value="1"/> <parameter key="training_rounds" value="50"/> </operator> </operator>
|
Best Regards