Hi!
I've just started with rapidminer and think it's amazing. Being relatively new to data mining and machine learning, I'm starting simple, and so please forgive me if this question is naive.
I created a process (XML later) to generate some nominal data so I could try to understand the "transition matrix" operator.
The code results in the following transition matrix:
value0 0.0 0.3386693346673337 0.0
value1 0.0 0.33316658329164583 0.0
value2 0.0 0.0 0.3281640820410205
Now, I'm sure it's because I don't know what I'm looking at, but I wrote a quick perl script to calculate what I thought was the same thing, and it produced the following result (from the same example set that generated the above transition matrix):
value0 value1 value2
value0 0.325 0.360 0.315
value1 0.366 0.297 0.336
value2 0.323 0.341 0.335
So you can see that my perl code reveals my (perhaps mis-) understanding that the rows of the transition matrix should total 1.
It's obvious to me that I don't understand the nuance in the description of the Transition Matrix operator:
This operator calculates the transition matrix of a specified attribute, i.e. the operator counts how often each possible nominal value follows after each other. |
Would some kind soul please put me out of my misery and explain what it is I am seeing when I look at the output of the Transition Matrix operator?
Many thanks!
Here's the XML for the process I created:
?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<process expanded="true" height="325" width="145">
<operator activated="true" class="generate_nominal_data" compatibility="5.1.014" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="255">
<parameter key="number_examples" value="2000"/>
<parameter key="number_of_attributes" value="1"/>
<parameter key="number_of_values" value="3"/>
</operator>
<operator activated="true" class="write_csv" compatibility="5.1.014" expanded="true" height="60" name="Write CSV" width="90" x="151" y="254">
<parameter key="csv_file" value="C:\Documents and Settings\MikeN\My Documents\Mike\tmat.csv"/>
<parameter key="column_separator" value=","/>
<parameter key="quote_nominal_values" value="false"/>
</operator>
<operator activated="true" class="transition_matrix" compatibility="5.1.014" expanded="true" height="76" name="Transition Matrix" width="90" x="333" y="227">
<parameter key="attribute" value="att1"/>
</operator>
<connect from_op="Generate Nominal Data" from_port="output" to_op="Write CSV" to_port="input"/>
<connect from_op="Write CSV" from_port="through" to_op="Transition Matrix" to_port="example set"/>
<connect from_op="Transition Matrix" from_port="example set" to_port="result 1"/>
<connect from_op="Transition Matrix" from_port="transition matrix" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Here's my perl script:
#!/usr/bin/perl -w
use strict;
my $curr_state;
my %trans;
my %state_counts;
<>;
while(<>){
my ($state,undef) =split /,/;
$state_counts{$state}++;
if($curr_state){
$trans{$curr_state}->{$state}++;
}
$curr_state = $state;
}
print "\t",join("\t",(sort keys %state_counts)),"\n";
foreach $curr_state (sort keys %trans){
print $curr_state;
foreach (sort keys %{$trans{$curr_state}}){
print "\t",sprintf("%0.3f",$trans{$curr_state}->{$_}/$state_counts{$curr_state});
#print join(",",$curr_state,$_,$trans{$curr_state}->{$_}/$state_counts{$curr_state}),"\n";
}
print "\n";
}