Benford's Law?

gfsi_j
New Altair Community Member
Hello,
I'm new to RapidMiner, and I'm looking to develop some processes for fraud detection. To that end, one thing I'm curious about is whether RapidMiner has any tools to apply Benford's Law to help find possibly fabricated data? I haven't been able to find any operators for that purpose, but perhaps I am looking in the wrong place.
Thanks!
I'm new to RapidMiner, and I'm looking to develop some processes for fraud detection. To that end, one thing I'm curious about is whether RapidMiner has any tools to apply Benford's Law to help find possibly fabricated data? I haven't been able to find any operators for that purpose, but perhaps I am looking in the wrong place.
Thanks!
Tagged:
0
Answers
-
Hi,
as far as I know we do not have this in as a native operator. There might be a simply way to build it with 3-4 operators.
Best,
Martin0 -
Here's the process I use for Benford. I can't claim credit, I think this might be one of Tobias' originally. Very handy in it will accept any numerical attribute simply by tweaking the first macro.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.4.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="generate_transfer_data" compatibility="6.4.000" expanded="true" height="60" name="Generate Transfer Data" width="90" x="45" y="75"/>
<operator activated="true" class="set_macro" compatibility="6.4.000" expanded="true" height="76" name="Set Macro" width="90" x="179" y="30">
<parameter key="macro" value="ATTRIBUTE"/>
<parameter key="value" value="Amount"/>
</operator>
<operator activated="true" class="numerical_to_polynominal" compatibility="6.0.003" expanded="true" height="76" name="Numerical to Polynominal" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="%{ATTRIBUTE}"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="30">
<list key="function_descriptions">
<parameter key="digit" value="cut(%{ATTRIBUTE},0,1)"/>
<parameter key="digit_complex" value="floor(parse(%{ATTRIBUTE})/pow(10,floor(log(parse(%{ATTRIBUTE})))))"/>
</list>
</operator>
<operator activated="true" class="aggregate" compatibility="6.0.006" expanded="true" height="76" name="Aggregate" width="90" x="246" y="120">
<list key="aggregation_attributes">
<parameter key="digit" value="count (fractional)"/>
</list>
<parameter key="group_by_attributes" value="digit"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="6.4.000" expanded="true" height="94" name="Filter Examples" width="90" x="380" y="120">
<list key="filters_list">
<parameter key="filters_entry_key" value="digit.does_not_equal.0"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="514" y="120">
<list key="function_descriptions">
<parameter key="benford" value="log(1+1/parse(digit))"/>
</list>
</operator>
<connect from_op="Generate Transfer Data" from_port="output" to_op="Set Macro" to_port="through 1"/>
<connect from_op="Set Macro" from_port="through 1" to_op="Numerical to Polynominal" to_port="example set input"/>
<connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
And i got a new building block!
Thanks a lot John.0 -
Thanks very much, JEdward! That's very helpful0