Nominal statistics problems

alexman
alexman New Altair Community Member
edited November 5 in Community Q&A
Hi I'm accessing to a database where I've numerical data and nominal data. When I execute the process in the GUI of RapidMiner I get results like mode and average in column statistics of MetaDataView tab. The problem is that with the API of rapid miner I'm not allowed to get statistics like mode and least (and the GUI shows it to me) except with the AVERAGE.

      File f = new File("operadores2.xml")
      Process process = new Process(f)
      IOContainer ioc = process.run()
      ExampleSet ses = ioc.get(ExampleSet.class)
      ses.recalculateAllAttributeStatistics()
      ExampleTable ext = ses.getExampleTable()
      System.out.println(ses.getStatistics(ext.getAttribute(14), Statistics.AVERAGE))
this works but ....

File f = new File("operadores2.xml")
      Process process = new Process(f)
      IOContainer ioc = process.run()
      ExampleSet ses = ioc.get(ExampleSet.class)
      ses.recalculateAllAttributeStatistics()
      ExampleTable ext = ses.getExampleTable()
      System.out.println(ses.getStatistics(ext.getAttribute(7), Statistics.MODE))
this doesn't work (attribute number 7 is nominal)

what i'm doing wrong ?

thx

Answers

  • land
    land New Altair Community Member
    Hi,
    perhabs the documentation does not say it explicit enough, but never use the ExampleTable unless you are going to construct a full new data set. NEVER. That's in 99% the wrong way.

    Here it causes problems because:
    The statistics are base on attributes inside an example set. This is because an exampleset might not cover all rows and not all columns of an ExampleTable. Since you never should operate on ExampleTables, there's no use of calculating statistics on them. So here's the way you should do it:

          File f = new File("operadores2.xml")
          Process process = new Process(f)
          IOContainer ioc = process.run()
          ExampleSet ses = ioc.get(ExampleSet.class)
          Attributes attributes = exampleSet.getAttributes();
          int i = 0;
          for(Attribute attribute: attributes) {
                if (i == 14) {
                      // doing whatever you want on the 14th regular attribute
                      ses.recalculateAttributeStatistics(attribute);
                      if (attribute.isNominal())
                            return ses.getStatistics(attribute, Statistics.MODE);
                      else
                            return ses.getStatistics(attribute, Statistics.AVERAGE);

                }
                i++;
          }
    There is no easy way of accessing the x-th attribute, because in general we want to avoid this way. Otherwise you would have to guarantee that the order of attributes always is constant. You should rather use the name of an attribute.

    Greetings,
      Sebastian
  • alexman
    alexman New Altair Community Member
    Thanks a lot for your response, but I still have a problem.
    I have print the results in the standard output and I get this

    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    1.0
    0.0
    1.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    1020.0615384615385
    1276.0615384615385
    0.0
    1.0
    255.2923076923077
    0.0
    0.0 or 1.0 are nominal attributes. In the GUI I get more information for example (mode = Barcelona(52) least = Madrid(1))

    My question is, is it posible to get Barcelona and 52 values or not?

    thanks again
  • land
    land New Altair Community Member
    Hi,
    yes, thats possible. You have to take the nominal mapping for getting the name of the mode value. And use the COUNT statistics for getting the numbers. Here's how to do:
    		double value = 0; // the mode value as retrieved before
    Attribute attribute = exampleSet.getAttributes().iterator().next();
    String valueName = attribute.getMapping().mapIndex((int) value);
    double count = exampleSet.getStatistics(attribute, Statistics.COUNT);
    Greetings,
      Sebastian
  • alexman
    alexman New Altair Community Member
    Hi again,

          Attributes attributes = es.getAttributes();
          for(Attribute attribute: attributes) {
               
                es.recalculateAttributeStatistics(attribute)
                if (attribute.isNominal()){
                double value = es.getStatistics(attribute, Statistics.MODE)
                String valueName = attribute.getMapping().mapIndex((int) value)
                double count = es.getStatistics(attribute, Statistics.COUNT)
                System.out.println("Atributte name "+valueName+" times that appears "+count)
                }
                else{
                System.out.println(es.getStatistics(attribute, Statistics.AVERAGE))
                }
            }
          }

    and thats the output

    ....
    Atributte name Windows times that appears NaN

    the problem is that it doesn't count the times that the attribute appears.

    thx