🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Simple Crosstab with totals

User: "btibert"
New Altair Community Member
Updated by Jocelyn
I have seen other questions posted here and via Google Searches, but perhaps the solution was for prior RM versions where operators are either no longer available, or are now required paid licenses via 3rd party tools.

My thought was to simply Pivot the data, which works as expected, but the ability to add a total is lacking.  In my case, it's just a simple 2x2, which I can do with pivot, but the table is lacking totals. 

Could I add them up, sure?  But that feels clunky given how powerful, and intuitive, the tool is to use.

Is is not possible without paying for a license via an extension found in the Marketplace?  I am really going to enjoy using this as a teaching tool for my class, but it feels like I must be missing something obvious when generating simple 2x2 tables, with totals, is basically EDA 101.

Thanks,

Brock

Find more posts tagged with

Sort by:
1 - 14 of 141
    User: "MartinLiebig"
    Altair Employee
    Updated by MartinLiebig

    can you give me an example (for example in excel) how this might look like? Let me see, maybe I find the time to build it.

    Cheers,
    Martin


    User: "btibert"
    New Altair Community Member
    OP
    Thanks for the quick reply.  I was just playing around with the built in tutorial for lift charts, and modified the output to include pivot.  The attached screenshots show the process and the expected output.   From the example, I am just crossing the label with the predicted value into a 2x2 matrix.

    And to your other question, those examples are definitely more complicated than I am looking for out of the box.

    Cheers,

    Brock




    User: "sgenzer"
    Altair Employee
    hi @btibertb yeah that seems like one of those things where simply 'no one has ever asked before'. It can probably be done very simply with a few lines of code, or perhaps you can create a BuildingBlock for people to use?

    Scott
    Hey @btibert ,
    i've got 20 minutes to write a prototype in java for it. Are you interested polishing it and creating your own extension? The code for the operator is this:
    package com.rapidminer.extension.operator.blending;

    import java.util.ArrayList;
    import java.util.Arrays;
    import java.util.List;

    import com.rapidminer.example.Attribute;
    import com.rapidminer.example.Example;
    import com.rapidminer.example.ExampleSet;
    import com.rapidminer.example.table.AttributeFactory;
    import com.rapidminer.example.utils.ExampleSetBuilder;
    import com.rapidminer.example.utils.ExampleSets;
    import com.rapidminer.operator.Operator;
    import com.rapidminer.operator.OperatorDescription;
    import com.rapidminer.operator.UserError;
    import com.rapidminer.operator.ports.InputPort;
    import com.rapidminer.operator.ports.OutputPort;
    import com.rapidminer.parameter.ParameterType;
    import com.rapidminer.parameter.ParameterTypeAttribute;
    import com.rapidminer.parameter.ParameterTypeBoolean;
    import com.rapidminer.parameter.ParameterTypeCategory;
    import com.rapidminer.parameter.ParameterTypeDouble;
    import com.rapidminer.parameter.ParameterTypeInt;
    import com.rapidminer.parameter.conditions.EqualStringCondition;
    import com.rapidminer.tools.Ontology;


    public class CrossTable extends Operator {

    InputPort exaInput = getInputPorts().createPort("exa");

    OutputPort crossOutput = getOutputPorts().createPort("cro");
    OutputPort crossOut = getOutputPorts().createPassThroughPort("ori");

    public final static String PARAMETER_X_ATTRIBTUE = "x_attribute";
    public final static String PARAMETER_Y_ATTRIBTUE = "y_attribute";

    public CrossTable(OperatorDescription description) {
    super(description);
    }

    public void doWork() throws UserError {
    ExampleSet exampleSet = exaInput.getData(ExampleSet.class);
    Attribute x = exampleSet.getAttributes().get(getParameterAsString(PARAMETER_X_ATTRIBTUE));
    Attribute y = exampleSet.getAttributes().get(getParameterAsString(PARAMETER_Y_ATTRIBTUE));
    Integer xSize = x.getMapping().getValues().size();
    Integer ySize = y.getMapping().getValues().size();

    double[][] values = new double[xSize + 1][ySize + 2];
    // Initialize with 0s
    for (int xCounter = 0; xCounter < xSize + 1; ++xCounter) {
    for (int yCounter = 1; yCounter < ySize + 2; ++yCounter) {
    values[xCounter][yCounter] = 0;
    }
    }
    // Add the names
    Attribute nameAttribute = AttributeFactory.createAttribute("name",Ontology.POLYNOMINAL);
    for (int xCounter = 0; xCounter < xSize; ++xCounter) {
    values[xCounter][0] = nameAttribute.getMapping().mapString(x.getMapping().getValues().get(xCounter));
    }
    values[xSize][0] = nameAttribute.getMapping().mapString("sum");

    // sum! :)
    for (Example exa : exampleSet) {
    int xIndex = x.getMapping().getValues().indexOf(exa.getNominalValue(x));
    int yIndex = y.getMapping().getValues().indexOf(exa.getNominalValue(y));

    values[xIndex][yIndex+1] += 1;
    }

    for (int xCounter = 0; xCounter < xSize; ++xCounter) {
    double ySum = 0;
    for (int yCounter = 0; yCounter < ySize; ++yCounter) {
    ySum += values[xCounter][yCounter+1];
    }
    values[xCounter][ySize+1] = ySum;
    }

    for (int yCounter = 0; yCounter < ySize; ++yCounter) {
    double xSum = 0;
    for (int xCounter = 0; xCounter < xSize; ++xCounter) {
    xSum += values[xCounter][yCounter+1];
    }
    values[xSize][yCounter+1] = xSum;
    }

    values[xSize][ySize+1] = exampleSet.size();

    List<Attribute> listOfAttributes = new ArrayList<Attribute>();
    listOfAttributes.add(nameAttribute);
    for (String value : y.getMapping().getValues()) {
    listOfAttributes.add(AttributeFactory.createAttribute("count(" + value + ")", Ontology.REAL));
    }
    listOfAttributes.add(AttributeFactory.createAttribute("sum", Ontology.REAL));


    ExampleSetBuilder builder = ExampleSets.from(listOfAttributes);
    for(double[] row : values){
    builder.addRow(row);
    }
    crossOutput.deliver(builder.build());
    }
    @Override
    public List<ParameterType> getParameterTypes() {
    List<ParameterType> types = new ArrayList<>();

    types.add(new ParameterTypeAttribute(PARAMETER_X_ATTRIBTUE,"Attribute to construct the horizontal axis.",exaInput));
    types.add(new ParameterTypeAttribute(PARAMETER_Y_ATTRIBTUE,"Attribute to construct the vertical axis.",exaInput));

    return types;
    }


    }

    Best,
    Martin

    User: "BalazsBaranyRM"
    New Altair Community Member
    Hi,

    you can do it in RapidMiner.


    It needs a few steps:


    Unfortunately, it's not a good building block (yet) because the first (grouping) attribute name is hard coded. 

    Generate Aggregation creates the "total" attribute with the sum of all numeric columns (attribute filter type=value_type, value type=numeric).

    Aggregate creates the sums with "use default aggregation", attribute filter type ... numeric, and default aggregation function = sum.

    Then we add the text "Total" with the attribute name from the un-aggregated example set and rename the aggregated attributes from "sum(whatever)" to just "whatever" using Rename by Replacing (replace what: sum.(.+).$ ; replace by: $1). 

    Maybe we could extract the attribute name from the incoming example set, save it as a macro and reuse that in the Generate Attributes step. Then it would be completely generic and a possible building block. But that's an exercise for the reader ;-)

    Regards,

    Balázs
    User: "BalazsBaranyRM"
    New Altair Community Member
    Accepted Answer
    Updated by BalazsBaranyRM
    Hi again,

    here's the Building Block.

    It selects the nominal attribute, Transposes the column, and extract the column name from it.

    It should work as long as you have this structure in your data (one nominal attribute, 2x2 or maybe more numerical attributes).

    Regards,

    Balázs
    User: "btibert"
    New Altair Community Member
    OP
    Wow, this is absolutely fantastic, thank you everyone.  Let me digest this (I am prepping for class tonight) but I am very encouraged by the ideas and feedback above.  Excellent stuff!
    User: "btibert"
    New Altair Community Member
    OP
    mschmitz  absolutely interested in learning how to do that, I am not quite there yet. 
    User: "btibert"
    New Altair Community Member
    OP
    @BalazsBarany how would I go about adding that to my local install?  Thanks for your help on this!
    User: "btibert"
    New Altair Community Member
    OP
    NVM, I figured that part out.  I was looking for a way to "install" via the interface, but realized I can drop the file inside the install folder.  Thanks again.
    @btibert
    Maybe my solution is not the easiest but it works:
    data.loc['Total']= data.sum()</code><?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="generate_nominal_data" compatibility="9.3.001" expanded="true" height="68" name="Generate Nominal Data" width="90" x="45" y="34">
            <parameter key="number_examples" value="100"/>
            <parameter key="number_of_attributes" value="1"/>
            <parameter key="number_of_values" value="5"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
          </operator>
          <operator activated="true" class="blending:pivot" compatibility="9.3.001" expanded="true" height="82" name="Pivot" width="90" x="179" y="34">
            <parameter key="group_by_attributes" value="att1"/>
            <parameter key="column_grouping_attribute" value="label"/>
            <list key="aggregation_attributes">
              <parameter key="label" value="count"/>
            </list>
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="default_aggregation_function" value="first"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="9.3.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="313" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="att1"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value=".*_"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="9.3.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="514" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default" value="zero"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="generate_aggregation" compatibility="9.3.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="648" y="34">
            <parameter key="attribute_name" value="Total_Column"/>
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="att1"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="aggregation_function" value="sum"/>
            <parameter key="concatenation_separator" value="|"/>
            <parameter key="keep_all" value="true"/>
            <parameter key="ignore_missings" value="true"/>
            <parameter key="ignore_missing_attributes" value="false"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.3.001" expanded="true" height="82" name="Transpose" width="90" x="782" y="34"/>
          <operator activated="true" class="rename_by_example_values" compatibility="9.3.001" expanded="true" height="82" name="Rename by Example Values" width="90" x="916" y="34">
            <parameter key="row_number" value="1"/>
          </operator>
          <operator activated="true" class="parse_numbers" compatibility="9.3.001" expanded="true" height="82" name="Parse Numbers" width="90" x="1050" y="34">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="decimal_character" value="."/>
            <parameter key="grouped_digits" value="false"/>
            <parameter key="grouping_character" value=","/>
            <parameter key="infinity_representation" value=""/>
            <parameter key="unparsable_value_handling" value="fail"/>
          </operator>
          <operator activated="true" class="generate_aggregation" compatibility="9.3.001" expanded="true" height="82" name="Generate Aggregation (2)" width="90" x="1184" y="34">
            <parameter key="attribute_name" value="Total_Row"/>
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="att1"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="aggregation_function" value="sum"/>
            <parameter key="concatenation_separator" value="|"/>
            <parameter key="keep_all" value="true"/>
            <parameter key="ignore_missings" value="false"/>
            <parameter key="ignore_missing_attributes" value="false"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.3.001" expanded="true" height="82" name="Transpose (2)" width="90" x="1050" y="238"/>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Pivot" to_port="input"/>
          <connect from_op="Pivot" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
          <connect from_op="Generate Aggregation" from_port="example set output" to_op="Transpose" to_port="example set input"/>
          <connect from_op="Transpose" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
          <connect from_op="Rename by Example Values" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/>
          <connect from_op="Parse Numbers" from_port="example set output" to_op="Generate Aggregation (2)" to_port="example set input"/>
          <connect from_op="Generate Aggregation (2)" from_port="example set output" to_op="Transpose (2)" to_port="example set input"/>
          <connect from_op="Transpose (2)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    </pre><div>Another way is using the Python process<br>and use this line of code to add a row with the totals of each column<br><pre class="CodeBlock"><code>#
    


    in case you want to learn how to write operators: This tutorial is a great start https://docs.rapidminer.com/latest/developers/extensions/ . It is quite easy, if you have a bit of experience with Java (or similar langauges). My code will make way more sense if you did though. I know by the way that Katharina Morik, who supervised @IngoRM 's PhD thesis, is using extensions for her lectures, enabling students to implement their own versions of algorithms.

    Cheers,
    Martin

    User: "btibert"
    New Altair Community Member
    OP
    @mschmitz
    Thanks, I am mostly focused in R and python, but I will give it a review.
    User: "btibert"
    New Altair Community Member
    OP
    @BalazsBarany
    To circle back, this behaved exactly as expected for my example above.  Thanks!