Transition Matrix Operator: simple question

chunga
chunga New Altair Community Member
edited November 5 in Community Q&A
Hi!

I've just started with rapidminer and think it's amazing.  Being relatively new to data mining and machine learning, I'm starting simple, and so please forgive me if this question is naive.

I created a process (XML later) to generate some nominal data so I could try to understand the "transition matrix" operator.

The code results in the following transition matrix:

value0 0.0 0.3386693346673337 0.0
value1 0.0 0.33316658329164583 0.0
value2 0.0 0.0 0.3281640820410205

Now, I'm sure it's because I don't know what I'm looking at, but I wrote a quick perl script to calculate what I thought was the same thing, and it produced the following result (from the same example set that generated the above transition matrix):

        value0  value1  value2
value0  0.325  0.360  0.315
value1  0.366  0.297  0.336
value2  0.323  0.341  0.335

So you can see that my perl code reveals my (perhaps mis-) understanding that the rows of the transition matrix should total 1.

It's obvious to me that I don't understand the nuance in the description of the Transition Matrix operator:
This operator calculates the transition matrix of a specified attribute, i.e. the operator counts how often each possible nominal value follows after each other.
Would some kind soul please put me out of my misery and explain what it is I am seeing when I look at the output of the Transition Matrix operator?

Many thanks!
Here's the XML for the process I created:

?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
    <process expanded="true" height="325" width="145">
      <operator activated="true" class="generate_nominal_data" compatibility="5.1.014" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="255">
        <parameter key="number_examples" value="2000"/>
        <parameter key="number_of_attributes" value="1"/>
        <parameter key="number_of_values" value="3"/>
      </operator>
      <operator activated="true" class="write_csv" compatibility="5.1.014" expanded="true" height="60" name="Write CSV" width="90" x="151" y="254">
        <parameter key="csv_file" value="C:\Documents and Settings\MikeN\My Documents\Mike\tmat.csv"/>
        <parameter key="column_separator" value=","/>
        <parameter key="quote_nominal_values" value="false"/>
      </operator>
      <operator activated="true" class="transition_matrix" compatibility="5.1.014" expanded="true" height="76" name="Transition Matrix" width="90" x="333" y="227">
        <parameter key="attribute" value="att1"/>
      </operator>
      <connect from_op="Generate Nominal Data" from_port="output" to_op="Write CSV" to_port="input"/>
      <connect from_op="Write CSV" from_port="through" to_op="Transition Matrix" to_port="example set"/>
      <connect from_op="Transition Matrix" from_port="example set" to_port="result 1"/>
      <connect from_op="Transition Matrix" from_port="transition matrix" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Here's my perl script:

#!/usr/bin/perl -w
use strict;
my $curr_state;
my %trans;
my %state_counts;
<>;
while(<>){
  my ($state,undef) =split /,/;
  $state_counts{$state}++;
  if($curr_state){
    $trans{$curr_state}->{$state}++;
  }
  $curr_state = $state;
}

print "\t",join("\t",(sort keys %state_counts)),"\n";
foreach $curr_state (sort keys  %trans){
  print $curr_state;
  foreach (sort keys %{$trans{$curr_state}}){
    print "\t",sprintf("%0.3f",$trans{$curr_state}->{$_}/$state_counts{$curr_state});
    #print join(",",$curr_state,$_,$trans{$curr_state}->{$_}/$state_counts{$curr_state}),"\n";
  }
  print "\n";
}

Tagged:

Answers

  • chunga
    chunga New Altair Community Member
    I've looked at the code for the Transition Matrix operator.

    While there may or may not be other little problems (for e.g. why do I get so may 0's in my matrix?), I see that the Transition Matrix operator tries to define  a matrix in which each entry [i,j]  is the proportion of all transitions that are represented by that from state[i] to state[j], rather than what I would have thought would have been more interesting: the proportion of all transitions from state[i] that are represented by that from state[i] to state[j].

    From the nature of the result,  it might appear that there is another small problem:

    From "com.rapidminer.tools.container.Tuple":
    /**
    * This class can be used to build pairs of typed objects and sort them.
    * ATTENTION!!
    * This class is not usable for hashing since only the first version is used as
    * hash entry. To use a hash function on a tupel, use Pair!
    *
    * @author Sebastian Land
    */
    From  com.rapidminer.operator.visualisation.dependencies.TransitionMatrixOperator:
    Map<Tupel<String, String>, Integer> transitions = new HashMap<Tupel<String, String>, Integer>();
    So that explains why I get  only 1 non-zero value in each row.

    It seems to me that TransitionMatrixOperator might have at least 1, and possibly 2 bugs.

    What is the correct procedure to ask that it be looked into by someone more knowledgeable than me?

    Many Thanks!

  • chunga
    chunga New Altair Community Member
    The following patch would (read that "might" -- I haven't compiled and tested it  :-[  ) fix the "bugs" (or at least "divergence in expectations") that I mention above.
    *** TransitionMatrixOperator.java	2011-12-04 14:04:21.312500000 -0600
    --- TransitionMatrixOperator-fixed.java 2011-12-04 14:12:16.421875000 -0600
    ***************
    *** 42,48 ****
      import com.rapidminer.parameter.ParameterType;
      import com.rapidminer.parameter.ParameterTypeAttribute;
      import com.rapidminer.tools.Ontology;
    ! import com.rapidminer.tools.container.Tupel;
     
      /**
      * This operator calculates the transition matrix of a specified attribute,
    --- 42,48 ----
      import com.rapidminer.parameter.ParameterType;
      import com.rapidminer.parameter.ParameterTypeAttribute;
      import com.rapidminer.tools.Ontology;
    ! import com.rapidminer.tools.container.Pair;
     
      /**
      * This operator calculates the transition matrix of a specified attribute,
    ***************
    *** 78,97 ****
      throw new UserError(this, 119, attribute.getName(), "TransitionMatrix");
     
      Set<String> values = new TreeSet<String>();
    ! Map<Tupel<String, String>, Integer> transitions = new HashMap<Tupel<String, String>, Integer>();
     
    - int numberOfTransitions = exampleSet.size() - 1;
      String lastValue = null;
      for (Example example: exampleSet) {
      String currentValue = example.getNominalValue(attribute);
      values.add(currentValue);
    !
      if (lastValue != null) {
    ! Tupel<String, String> currentTupel = new Tupel<String, String>(lastValue, currentValue);
    ! if (transitions.containsKey(currentTupel))
    ! transitions.put(currentTupel, transitions.get(currentTupel) + 1);
      else
    ! transitions.put(currentTupel, 1);
      }
      lastValue = currentValue;
      }
    --- 78,100 ----
      throw new UserError(this, 119, attribute.getName(), "TransitionMatrix");
     
      Set<String> values = new TreeSet<String>();
    ! Map<String,Integer> numberOfTransitions = new TreeMap<String,Integer>();
    ! Map<Pair<String, String>, Integer> transitions = new HashMap<Pair<String, String>, Integer>();
     
      String lastValue = null;
      for (Example example: exampleSet) {
      String currentValue = example.getNominalValue(attribute);
      values.add(currentValue);
    ! if(!numberOfTransitions.containsKey(currentValue)){
    !     numberOfTransitions.put(currentValue,0);
    ! }
    ! numberOfTransitions.put(currentValue,numberofTransitions.get(currentValue)++);
      if (lastValue != null) {
    ! Pair<String, String> currentPair = new Pair<String, String>(lastValue, currentValue);
    ! if (transitions.containsKey(currentPair))
    ! transitions.put(currentPair, transitions.get(currentPair) + 1);
      else
    ! transitions.put(currentPair, 1);
      }
      lastValue = currentValue;
      }
    ***************
    *** 105,112 ****
      }
     
      NumericalMatrix matrix = new NumericalMatrix("Transition", valueArray, false);
    ! for(Entry<Tupel<String, String>, Integer> entry: transitions.entrySet()) {
    ! matrix.setValue(valuePositions.get(entry.getKey().getFirst()), valuePositions.get(entry.getKey().getSecond()), ((double) entry.getValue().intValue()) / numberOfTransitions);
      }
     
      exampleSetOutput.deliver(exampleSet);
    --- 108,115 ----
      }
     
      NumericalMatrix matrix = new NumericalMatrix("Transition", valueArray, false);
    ! for(Entry<Pair<String, String>, Integer> entry: transitions.entrySet()) {
    !     matrix.setValue(valuePositions.get(entry.getKey().getFirst()), valuePositions.get(entry.getKey().getSecond()), ((double) entry.getValue().intValue()) / numberOfTransitions.get(entry.getKey().getFirst()));
      }
     
      exampleSetOutput.deliver(exampleSet);
  • Nils_Woehler
    Nils_Woehler New Altair Community Member
    Hi,

    I have to admit I would expect the same behaviour you have described but I am sure there is a reason it has been implemented this way.
    Unfortunately I can't just change the behaviour a operator works - though this might not be the most used operator - because processes
    that depend on this operator would be corrupted. I've created a bug report at http://bugs.rapid-i.com/ and we will discuss it later with the team.

    Thanks for your hint anyway!

    Regards,
    Nils
  • chunga
    chunga New Altair Community Member
    Could we possibly create a new operator, say called MarkovTransitionMatrix, that would calculate a transition matrix in the old-fashioned, Markov-chain sense?
  • Nils_Woehler
    Nils_Woehler New Altair Community Member
    If you need that kind of operator before we take a look at the old operator you can easily build your own extension
    and include it into RapidMiner.

    Cheers,
    Nils