LOF on Text Data

New Altair Community Member

Apr 10, 2019

Updated Nov 5, 2024 by Jocelyn

Hello Team,

I am fairly new to RM and currently conducting some research on online text.
In particular I am trying to detect outliers from an set of documents by using the LOF operator.
Now I have some troubles, since the LOF for each document is very close to 1, no matter how I set the MinPtsUB and MinPtsLB.
Basically I have represented the each document as vector of term frequency and TF-IDF, before applying the LOF operator.
So I have two ExampleSets representing the corpus as, a matrix of TF values and a matrix of TF-IDF values, to check the differences.
However, for both matrices I get LOF values that are equal or very close to one, which does not make any sence to me.

Could you tell me, if and what I am doing wrong?

Best

Please find my XML enclosed:

<?xml version="1.0" encoding="UTF-8" ?>

- <process version="9.2.000">

- <context>

</context>

- <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">

- <process expanded="true">

- <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve PreppedTestData" width="90" x="112" y="34">

</operator>

- <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">

</operator>

- <operator activated="true" class="detect_outlier_lof" compatibility="9.2.000" expanded="true" height="82" name="Detect Outlier (LOF)" width="90" x="447" y="34">

</operator>

- <operator activated="false" class="anomalydetection:Local Outlier Factor (LOF)" compatibility="2.4.001" expanded="true" height="103" name="Local Outlier Factor (LOF)" width="90" x="380" y="340">

</operator>

- <operator activated="true" class="store" compatibility="9.2.000" expanded="true" height="68" name="Store" width="90" x="648" y="34">

</operator>

- <operator activated="false" class="write_excel" compatibility="9.2.000" expanded="true" height="82" name="Write Excel" width="90" x="581" y="442">

</operator>

- <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve PreppedTestData (2)" width="90" x="112" y="187">

</operator>

- <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="187">

</operator>

- <operator activated="true" class="detect_outlier_lof" compatibility="9.2.000" expanded="true" height="82" name="Detect Outlier (2)" width="90" x="447" y="187">

</operator>

</operator>

- <operator activated="true" class="store" compatibility="9.2.000" expanded="true" height="68" name="Store (2)" width="90" x="648" y="187">

</operator>

- <operator activated="false" class="write_excel" compatibility="9.2.000" expanded="true" height="82" name="Write Excel (2)" width="90" x="648" y="595">