Transition Matrix Operator: simple question
chunga
New Altair Community Member
Hi!
I've just started with rapidminer and think it's amazing. Being relatively new to data mining and machine learning, I'm starting simple, and so please forgive me if this question is naive.
I created a process (XML later) to generate some nominal data so I could try to understand the "transition matrix" operator.
The code results in the following transition matrix:
value0 0.0 0.3386693346673337 0.0
value1 0.0 0.33316658329164583 0.0
value2 0.0 0.0 0.3281640820410205
Now, I'm sure it's because I don't know what I'm looking at, but I wrote a quick perl script to calculate what I thought was the same thing, and it produced the following result (from the same example set that generated the above transition matrix):
value0 value1 value2
value0 0.325 0.360 0.315
value1 0.366 0.297 0.336
value2 0.323 0.341 0.335
So you can see that my perl code reveals my (perhaps mis-) understanding that the rows of the transition matrix should total 1.
It's obvious to me that I don't understand the nuance in the description of the Transition Matrix operator:
Many thanks!
Here's the XML for the process I created:
I've just started with rapidminer and think it's amazing. Being relatively new to data mining and machine learning, I'm starting simple, and so please forgive me if this question is naive.
I created a process (XML later) to generate some nominal data so I could try to understand the "transition matrix" operator.
The code results in the following transition matrix:
value0 0.0 0.3386693346673337 0.0
value1 0.0 0.33316658329164583 0.0
value2 0.0 0.0 0.3281640820410205
Now, I'm sure it's because I don't know what I'm looking at, but I wrote a quick perl script to calculate what I thought was the same thing, and it produced the following result (from the same example set that generated the above transition matrix):
value0 value1 value2
value0 0.325 0.360 0.315
value1 0.366 0.297 0.336
value2 0.323 0.341 0.335
So you can see that my perl code reveals my (perhaps mis-) understanding that the rows of the transition matrix should total 1.
It's obvious to me that I don't understand the nuance in the description of the Transition Matrix operator:
Would some kind soul please put me out of my misery and explain what it is I am seeing when I look at the output of the Transition Matrix operator?
This operator calculates the transition matrix of a specified attribute, i.e. the operator counts how often each possible nominal value follows after each other.
Many thanks!
Here's the XML for the process I created:
Here's my perl script:
?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<process expanded="true" height="325" width="145">
<operator activated="true" class="generate_nominal_data" compatibility="5.1.014" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="255">
<parameter key="number_examples" value="2000"/>
<parameter key="number_of_attributes" value="1"/>
<parameter key="number_of_values" value="3"/>
</operator>
<operator activated="true" class="write_csv" compatibility="5.1.014" expanded="true" height="60" name="Write CSV" width="90" x="151" y="254">
<parameter key="csv_file" value="C:\Documents and Settings\MikeN\My Documents\Mike\tmat.csv"/>
<parameter key="column_separator" value=","/>
<parameter key="quote_nominal_values" value="false"/>
</operator>
<operator activated="true" class="transition_matrix" compatibility="5.1.014" expanded="true" height="76" name="Transition Matrix" width="90" x="333" y="227">
<parameter key="attribute" value="att1"/>
</operator>
<connect from_op="Generate Nominal Data" from_port="output" to_op="Write CSV" to_port="input"/>
<connect from_op="Write CSV" from_port="through" to_op="Transition Matrix" to_port="example set"/>
<connect from_op="Transition Matrix" from_port="example set" to_port="result 1"/>
<connect from_op="Transition Matrix" from_port="transition matrix" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
#!/usr/bin/perl -w
use strict;
my $curr_state;
my %trans;
my %state_counts;
<>;
while(<>){
my ($state,undef) =split /,/;
$state_counts{$state}++;
if($curr_state){
$trans{$curr_state}->{$state}++;
}
$curr_state = $state;
}
print "\t",join("\t",(sort keys %state_counts)),"\n";
foreach $curr_state (sort keys %trans){
print $curr_state;
foreach (sort keys %{$trans{$curr_state}}){
print "\t",sprintf("%0.3f",$trans{$curr_state}->{$_}/$state_counts{$curr_state});
#print join(",",$curr_state,$_,$trans{$curr_state}->{$_}/$state_counts{$curr_state}),"\n";
}
print "\n";
}
Tagged:
0
Answers
-
I've looked at the code for the Transition Matrix operator.
While there may or may not be other little problems (for e.g. why do I get so may 0's in my matrix?), I see that the Transition Matrix operator tries to define a matrix in which each entry [i,j] is the proportion of all transitions that are represented by that from state[i] to state[j], rather than what I would have thought would have been more interesting: the proportion of all transitions from state[i] that are represented by that from state[i] to state[j].
From the nature of the result, it might appear that there is another small problem:
From "com.rapidminer.tools.container.Tuple":/**
From com.rapidminer.operator.visualisation.dependencies.TransitionMatrixOperator:
* This class can be used to build pairs of typed objects and sort them.
* ATTENTION!!
* This class is not usable for hashing since only the first version is used as
* hash entry. To use a hash function on a tupel, use Pair!
*
* @author Sebastian Land
*/Map<Tupel<String, String>, Integer> transitions = new HashMap<Tupel<String, String>, Integer>();
So that explains why I get only 1 non-zero value in each row.
It seems to me that TransitionMatrixOperator might have at least 1, and possibly 2 bugs.
What is the correct procedure to ask that it be looked into by someone more knowledgeable than me?
Many Thanks!
0 -
The following patch would (read that "might" -- I haven't compiled and tested it :-[ ) fix the "bugs" (or at least "divergence in expectations") that I mention above.
*** TransitionMatrixOperator.java 2011-12-04 14:04:21.312500000 -0600
--- TransitionMatrixOperator-fixed.java 2011-12-04 14:12:16.421875000 -0600
***************
*** 42,48 ****
import com.rapidminer.parameter.ParameterType;
import com.rapidminer.parameter.ParameterTypeAttribute;
import com.rapidminer.tools.Ontology;
! import com.rapidminer.tools.container.Tupel;
/**
* This operator calculates the transition matrix of a specified attribute,
--- 42,48 ----
import com.rapidminer.parameter.ParameterType;
import com.rapidminer.parameter.ParameterTypeAttribute;
import com.rapidminer.tools.Ontology;
! import com.rapidminer.tools.container.Pair;
/**
* This operator calculates the transition matrix of a specified attribute,
***************
*** 78,97 ****
throw new UserError(this, 119, attribute.getName(), "TransitionMatrix");
Set<String> values = new TreeSet<String>();
! Map<Tupel<String, String>, Integer> transitions = new HashMap<Tupel<String, String>, Integer>();
- int numberOfTransitions = exampleSet.size() - 1;
String lastValue = null;
for (Example example: exampleSet) {
String currentValue = example.getNominalValue(attribute);
values.add(currentValue);
!
if (lastValue != null) {
! Tupel<String, String> currentTupel = new Tupel<String, String>(lastValue, currentValue);
! if (transitions.containsKey(currentTupel))
! transitions.put(currentTupel, transitions.get(currentTupel) + 1);
else
! transitions.put(currentTupel, 1);
}
lastValue = currentValue;
}
--- 78,100 ----
throw new UserError(this, 119, attribute.getName(), "TransitionMatrix");
Set<String> values = new TreeSet<String>();
! Map<String,Integer> numberOfTransitions = new TreeMap<String,Integer>();
! Map<Pair<String, String>, Integer> transitions = new HashMap<Pair<String, String>, Integer>();
String lastValue = null;
for (Example example: exampleSet) {
String currentValue = example.getNominalValue(attribute);
values.add(currentValue);
! if(!numberOfTransitions.containsKey(currentValue)){
! numberOfTransitions.put(currentValue,0);
! }
! numberOfTransitions.put(currentValue,numberofTransitions.get(currentValue)++);
if (lastValue != null) {
! Pair<String, String> currentPair = new Pair<String, String>(lastValue, currentValue);
! if (transitions.containsKey(currentPair))
! transitions.put(currentPair, transitions.get(currentPair) + 1);
else
! transitions.put(currentPair, 1);
}
lastValue = currentValue;
}
***************
*** 105,112 ****
}
NumericalMatrix matrix = new NumericalMatrix("Transition", valueArray, false);
! for(Entry<Tupel<String, String>, Integer> entry: transitions.entrySet()) {
! matrix.setValue(valuePositions.get(entry.getKey().getFirst()), valuePositions.get(entry.getKey().getSecond()), ((double) entry.getValue().intValue()) / numberOfTransitions);
}
exampleSetOutput.deliver(exampleSet);
--- 108,115 ----
}
NumericalMatrix matrix = new NumericalMatrix("Transition", valueArray, false);
! for(Entry<Pair<String, String>, Integer> entry: transitions.entrySet()) {
! matrix.setValue(valuePositions.get(entry.getKey().getFirst()), valuePositions.get(entry.getKey().getSecond()), ((double) entry.getValue().intValue()) / numberOfTransitions.get(entry.getKey().getFirst()));
}
exampleSetOutput.deliver(exampleSet);0 -
Hi,
I have to admit I would expect the same behaviour you have described but I am sure there is a reason it has been implemented this way.
Unfortunately I can't just change the behaviour a operator works - though this might not be the most used operator - because processes
that depend on this operator would be corrupted. I've created a bug report at http://bugs.rapid-i.com/ and we will discuss it later with the team.
Thanks for your hint anyway!
Regards,
Nils0 -
Could we possibly create a new operator, say called MarkovTransitionMatrix, that would calculate a transition matrix in the old-fashioned, Markov-chain sense?0
-
If you need that kind of operator before we take a look at the old operator you can easily build your own extension
and include it into RapidMiner.
Cheers,
Nils0