Strange Results With Local Outlier Factor

User: "Mickey"
New Altair Community Member
Updated by Jocelyn
I am getting strange results with the LOF operator: most of the "outlier" values are around the range of 0.15 instead of around 1.0.
However for most points LOF should be around 1.0 for the following reasons:
1) The LOF paper proves that LOF is around 1.0 for most points inside clusters.
2) It makes sense. From the way LOF works, you'd expect LOF around 1 for most points in clusters anyway!
3) My own implementation of a simpler variant of LOF (just average of k-dist) does give LOF of around 1 for most points.
I tried this both on my own data as well as data generated using RapidMiner, but the LOF from rapidminer is around 0.15 for both.

Here is code to recreate the synthetic test:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="476" width="681">
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="165">
        <parameter key="target_function" value="gaussian mixture clusters"/>
        <parameter key="number_examples" value="1000"/>
        <parameter key="number_of_attributes" value="2"/>
      </operator>
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="255">
        <parameter key="number_examples" value="20"/>
        <parameter key="number_of_attributes" value="2"/>
      </operator>
      <operator activated="true" class="discretize_by_bins" expanded="true" height="94" name="Discretize" width="90" x="179" y="255">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="attributes" value="label"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="255">
        <parameter key="name" value="label"/>
        <parameter key="target_role" value="label"/>
      </operator>
      <operator activated="true" class="append" expanded="true" height="94" name="Append" width="90" x="447" y="165"/>
      <operator activated="true" class="detect_outlier_lof" expanded="true" height="76" name="Detect Outlier (LOF)" width="90" x="514" y="30"/>
      <connect from_op="Generate Data" from_port="output" to_op="Append" to_port="example set 1"/>
      <connect from_op="Generate Data (2)" from_port="output" to_op="Discretize" to_port="example set input"/>
      <connect from_op="Discretize" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Append" to_port="example set 2"/>
      <connect from_op="Append" from_port="merged set" to_op="Detect Outlier (LOF)" to_port="example set input"/>
      <connect from_op="Detect Outlier (LOF)" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Find more posts tagged with