Transition Matrix Operator: simple question
Hi!
I've just started with rapidminer and think it's amazing. Being relatively new to data mining and machine learning, I'm starting simple, and so please forgive me if this question is naive.
I created a process (XML later) to generate some nominal data so I could try to understand the "transition matrix" operator.
The code results in the following transition matrix:
value0 0.0 0.3386693346673337 0.0
value1 0.0 0.33316658329164583 0.0
value2 0.0 0.0 0.3281640820410205
Now, I'm sure it's because I don't know what I'm looking at, but I wrote a quick perl script to calculate what I thought was the same thing, and it produced the following result (from the same example set that generated the above transition matrix):
value0 value1 value2
value0 0.325 0.360 0.315
value1 0.366 0.297 0.336
value2 0.323 0.341 0.335
So you can see that my perl code reveals my (perhaps mis-) understanding that the rows of the transition matrix should total 1.
It's obvious to me that I don't understand the nuance in the description of the Transition Matrix operator:
Many thanks!
Here's the XML for the process I created:
I've just started with rapidminer and think it's amazing. Being relatively new to data mining and machine learning, I'm starting simple, and so please forgive me if this question is naive.
I created a process (XML later) to generate some nominal data so I could try to understand the "transition matrix" operator.
The code results in the following transition matrix:
value0 0.0 0.3386693346673337 0.0
value1 0.0 0.33316658329164583 0.0
value2 0.0 0.0 0.3281640820410205
Now, I'm sure it's because I don't know what I'm looking at, but I wrote a quick perl script to calculate what I thought was the same thing, and it produced the following result (from the same example set that generated the above transition matrix):
value0 value1 value2
value0 0.325 0.360 0.315
value1 0.366 0.297 0.336
value2 0.323 0.341 0.335
So you can see that my perl code reveals my (perhaps mis-) understanding that the rows of the transition matrix should total 1.
It's obvious to me that I don't understand the nuance in the description of the Transition Matrix operator:
Would some kind soul please put me out of my misery and explain what it is I am seeing when I look at the output of the Transition Matrix operator?
This operator calculates the transition matrix of a specified attribute, i.e. the operator counts how often each possible nominal value follows after each other.
Many thanks!
Here's the XML for the process I created:
Here's my perl script:
?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<process expanded="true" height="325" width="145">
<operator activated="true" class="generate_nominal_data" compatibility="5.1.014" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="255">
<parameter key="number_examples" value="2000"/>
<parameter key="number_of_attributes" value="1"/>
<parameter key="number_of_values" value="3"/>
</operator>
<operator activated="true" class="write_csv" compatibility="5.1.014" expanded="true" height="60" name="Write CSV" width="90" x="151" y="254">
<parameter key="csv_file" value="C:\Documents and Settings\MikeN\My Documents\Mike\tmat.csv"/>
<parameter key="column_separator" value=","/>
<parameter key="quote_nominal_values" value="false"/>
</operator>
<operator activated="true" class="transition_matrix" compatibility="5.1.014" expanded="true" height="76" name="Transition Matrix" width="90" x="333" y="227">
<parameter key="attribute" value="att1"/>
</operator>
<connect from_op="Generate Nominal Data" from_port="output" to_op="Write CSV" to_port="input"/>
<connect from_op="Write CSV" from_port="through" to_op="Transition Matrix" to_port="example set"/>
<connect from_op="Transition Matrix" from_port="example set" to_port="result 1"/>
<connect from_op="Transition Matrix" from_port="transition matrix" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
#!/usr/bin/perl -w
use strict;
my $curr_state;
my %trans;
my %state_counts;
<>;
while(<>){
my ($state,undef) =split /,/;
$state_counts{$state}++;
if($curr_state){
$trans{$curr_state}->{$state}++;
}
$curr_state = $state;
}
print "\t",join("\t",(sort keys %state_counts)),"\n";
foreach $curr_state (sort keys %trans){
print $curr_state;
foreach (sort keys %{$trans{$curr_state}}){
print "\t",sprintf("%0.3f",$trans{$curr_state}->{$_}/$state_counts{$curr_state});
#print join(",",$curr_state,$_,$trans{$curr_state}->{$_}/$state_counts{$curr_state}),"\n";
}
print "\n";
}