How to view data
Hi,
Another new fan of this great piece of software...
Anyways, I am currently working on a multiclassification problem.
Of which I have been able to get pretty decent perfromance..
So, I have the performance matrix which tells me the predicted class vs the true class and then example set which contains the list of attributes etc.
But I want to see which example is assigned to which class instead of just the overview which the performance matrix gives.
Because in the end, I am more interested on seeing the "live" example result rather than the statistical measures.
Any clue how can I view that
Thanks
Another new fan of this great piece of software...

Anyways, I am currently working on a multiclassification problem.
Of which I have been able to get pretty decent perfromance..
So, I have the performance matrix which tells me the predicted class vs the true class and then example set which contains the list of attributes etc.
But I want to see which example is assigned to which class instead of just the overview which the performance matrix gives.
Because in the end, I am more interested on seeing the "live" example result rather than the statistical measures.
Any clue how can I view that
Thanks
Find more posts tagged with
Sort by:
1 - 6 of
61
Hi Neil,
Thanks!! Also just wanted to convey that I used the info from your blog only to get started (k-NN for document classification).
A quick question.
You usedthe cosine similarity for a measure in k-NN
But I am not able to do so..:(
whenever I try that it says the attributes are not numeric.. Hence i end up using the default measure...
I have like watched that video so many times on hope that maybe i missed something but am not able to spot that out.
I reckon I have to specify when I am importing the data file.
I have data as a csv file??
Thanks
Thanks!! Also just wanted to convey that I used the info from your blog only to get started (k-NN for document classification).
A quick question.
You usedthe cosine similarity for a measure in k-NN
But I am not able to do so..:(
whenever I try that it says the attributes are not numeric.. Hence i end up using the default measure...
I have like watched that video so many times on hope that maybe i missed something but am not able to spot that out.
I reckon I have to specify when I am importing the data file.
I have data as a csv file??
Thanks
In case it helps..
Here is my xml file..
Here is my xml file..
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
<parameter key="parallelize_main_process" value="true"/>
<process expanded="true" height="521" width="815">
<operator activated="true" class="read_csv" compatibility="5.1.008" expanded="true" height="60" name="Read CSV" width="90" x="112" y="75">
<parameter key="csv_file" value="/Users/mohitdeepsingh/Desktop/RMData/data_10k_cleanse.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Comment"/>
</list>
<parameter key="encoding" value="MacRoman"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.attribute_value.attribute"/>
<parameter key="1" value="att2.true.attribute_value.attribute"/>
<parameter key="2" value="att3.true.attribute_value.attribute"/>
<parameter key="3" value="att4.true.attribute_value.attribute"/>
<parameter key="4" value="att5.true.attribute_value.attribute"/>
<parameter key="5" value="att6.true.attribute_value.attribute"/>
<parameter key="6" value="att7.true.attribute_value.attribute"/>
<parameter key="7" value="att8.true.attribute_value.attribute"/>
<parameter key="8" value="att9.true.attribute_value.attribute"/>
<parameter key="9" value="att10.true.attribute_value.attribute"/>
<parameter key="10" value="att11.true.attribute_value.attribute"/>
<parameter key="11" value="att12.true.attribute_value.attribute"/>
<parameter key="12" value="att13.true.attribute_value.label"/>
</list>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="5.1.002" expanded="true" height="76" name="Process Documents from Data" width="90" x="112" y="255">
<list key="specify_weights"/>
<process expanded="true" height="586" width="922">
<operator activated="true" class="web:extract_html_text_content" compatibility="5.1.002" expanded="true" height="60" name="Extract Content" width="90" x="112" y="30"/>
<operator activated="true" class="text:transform_cases" compatibility="5.1.002" expanded="true" height="60" name="Transform Cases" width="90" x="112" y="120"/>
<operator activated="true" class="text:replace_tokens" compatibility="5.1.002" expanded="true" height="60" name="Replace Tokens" width="90" x="112" y="255">
<list key="replace_dictionary"/>
</operator>
<operator activated="true" class="text:tokenize" compatibility="5.1.002" expanded="true" height="60" name="Tokenize" width="90" x="112" y="390"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.002" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="380" y="75"/>
<operator activated="true" class="text:stem_snowball" compatibility="5.1.002" expanded="true" height="60" name="Stem (Snowball)" width="90" x="380" y="210"/>
<operator activated="true" class="text:filter_by_length" compatibility="5.1.002" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="380" y="345">
<parameter key="min_chars" value="2"/>
<parameter key="max_chars" value="99"/>
</operator>
<connect from_port="document" to_op="Extract Content" to_port="document"/>
<connect from_op="Extract Content" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Replace Tokens" to_port="document"/>
<connect from_op="Replace Tokens" from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
<connect from_op="Stem (Snowball)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.1.008" expanded="true" height="76" name="Select Attributes" width="90" x="246" y="435">
<parameter key="attribute_filter_type" value="no_missing_values"/>
</operator>
<operator activated="true" class="x_validation" compatibility="5.1.008" expanded="true" height="130" name="Validation" width="90" x="648" y="165">
<parameter key="number_of_validations" value="100"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="parallelize_training" value="true"/>
<parameter key="parallelize_testing" value="true"/>
<process expanded="true" height="586" width="436">
<operator activated="true" class="k_nn" compatibility="5.1.008" expanded="true" height="76" name="k-NN" width="90" x="112" y="30">
<parameter key="k" value="5"/>
<parameter key="weighted_vote" value="true"/>
<parameter key="numerical_measure" value="CosineSimilarity"/>
<parameter key="divergence" value="SquaredEuclideanDistance"/>
</operator>
<connect from_port="training" to_op="k-NN" to_port="training set"/>
<connect from_op="k-NN" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="586" width="436">
<operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model" width="90" x="112" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.1.008" expanded="true" height="76" name="Performance" width="90" x="87" y="321"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
<portSpacing port="sink_averagable 3" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="result 1"/>
<connect from_op="Validation" from_port="training" to_port="result 2"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 3"/>
<connect from_op="Validation" from_port="averagable 2" to_port="result 4"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>
You have to use the apply model operator with the trained model and the unlabelled data you want to predict.
Here's a fake example that shows this. It also shows the performance of this and highlights overfitting. regards
Andrew