"Retrieve KNN Distance Results"
Hi, is there an operator to extract distance results from application of KNN lazy learner to labeled and scored data? I would like to see the underlying data driving the predictions.
Best Answer
-
good ideas, looks like I can get what I'm looking for thru data to similarity operator. Also, the graph outputs (spring) are especially helpful in visualizing the KNN method.
1
Answers
-
Not really sure 100% in my guessing, but maybe 'Cross Distances' operator might help you in this case?
I have never used it myself on real data but it seems it has same distance measures as k-NN does.
0 -
Which distance are you looking for? The distances to the k nearest neighbors themselves would be fine for a k of 1 to 3, but will look pretty messy when you reach k=50+.
Here's a sample process, personally I'm not too keen.
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="136">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="179" y="136">
<enumeration key="partitions">
<parameter key="ratio" value="0.7"/>
<parameter key="ratio" value="0.3"/>
</enumeration>
</operator>
<operator activated="true" class="k_nn" compatibility="8.2.000" expanded="true" height="82" name="k-NN" width="90" x="313" y="238">
<parameter key="k" value="3"/>
</operator>
<operator activated="true" class="cross_distances" compatibility="8.2.000" expanded="true" height="103" name="Cross Distances" width="90" x="380" y="85">
<parameter key="only_top_k" value="true"/>
<parameter key="k" value="3"/>
</operator>
<operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="514" y="34">
<list key="aggregation_attributes">
<parameter key="distance" value="average"/>
</list>
<parameter key="group_by_attributes" value="request"/>
</operator>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="581" y="289">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="648" y="34">
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="id" value="request"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="715" y="136">
<parameter key="attribute_name" value="average(distance)"/>
<parameter key="target_role" value="distance_measure"/>
<list key="set_additional_roles"/>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="k-NN" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Cross Distances" to_port="request set"/>
<connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="k-NN" from_port="exampleSet" to_op="Cross Distances" to_port="reference set"/>
<connect from_op="Cross Distances" from_port="result set" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Cross Distances" from_port="request set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Join" to_port="left"/>
<connect from_op="Join" from_port="join" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
good ideas, looks like I can get what I'm looking for thru data to similarity operator. Also, the graph outputs (spring) are especially helpful in visualizing the KNN method.
1