Nominal statistics problems
alexman
New Altair Community Member
Hi I'm accessing to a database where I've numerical data and nominal data. When I execute the process in the GUI of RapidMiner I get results like mode and average in column statistics of MetaDataView tab. The problem is that with the API of rapid miner I'm not allowed to get statistics like mode and least (and the GUI shows it to me) except with the AVERAGE.
what i'm doing wrong ?
thx
this works but ....
File f = new File("operadores2.xml")
Process process = new Process(f)
IOContainer ioc = process.run()
ExampleSet ses = ioc.get(ExampleSet.class)
ses.recalculateAllAttributeStatistics()
ExampleTable ext = ses.getExampleTable()
System.out.println(ses.getStatistics(ext.getAttribute(14), Statistics.AVERAGE))
this doesn't work (attribute number 7 is nominal)
File f = new File("operadores2.xml")
Process process = new Process(f)
IOContainer ioc = process.run()
ExampleSet ses = ioc.get(ExampleSet.class)
ses.recalculateAllAttributeStatistics()
ExampleTable ext = ses.getExampleTable()
System.out.println(ses.getStatistics(ext.getAttribute(7), Statistics.MODE))
what i'm doing wrong ?
thx
Tagged:
0
Answers
-
Hi,
perhabs the documentation does not say it explicit enough, but never use the ExampleTable unless you are going to construct a full new data set. NEVER. That's in 99% the wrong way.
Here it causes problems because:
The statistics are base on attributes inside an example set. This is because an exampleset might not cover all rows and not all columns of an ExampleTable. Since you never should operate on ExampleTables, there's no use of calculating statistics on them. So here's the way you should do it:
There is no easy way of accessing the x-th attribute, because in general we want to avoid this way. Otherwise you would have to guarantee that the order of attributes always is constant. You should rather use the name of an attribute.
File f = new File("operadores2.xml")
Process process = new Process(f)
IOContainer ioc = process.run()
ExampleSet ses = ioc.get(ExampleSet.class)
Attributes attributes = exampleSet.getAttributes();
int i = 0;
for(Attribute attribute: attributes) {
if (i == 14) {
// doing whatever you want on the 14th regular attribute
ses.recalculateAttributeStatistics(attribute);
if (attribute.isNominal())
return ses.getStatistics(attribute, Statistics.MODE);
else
return ses.getStatistics(attribute, Statistics.AVERAGE);
}
i++;
}
Greetings,
Sebastian0 -
Thanks a lot for your response, but I still have a problem.
I have print the results in the standard output and I get this
0.0 or 1.0 are nominal attributes. In the GUI I get more information for example (mode = Barcelona(52) least = Madrid(1))
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1020.0615384615385
1276.0615384615385
0.0
1.0
255.2923076923077
0.0
My question is, is it posible to get Barcelona and 52 values or not?
thanks again0 -
Hi,
yes, thats possible. You have to take the nominal mapping for getting the name of the mode value. And use the COUNT statistics for getting the numbers. Here's how to do:double value = 0; // the mode value as retrieved before
Greetings,
Attribute attribute = exampleSet.getAttributes().iterator().next();
String valueName = attribute.getMapping().mapIndex((int) value);
double count = exampleSet.getStatistics(attribute, Statistics.COUNT);
Sebastian0 -
Hi again,
and thats the output
Attributes attributes = es.getAttributes();
for(Attribute attribute: attributes) {
es.recalculateAttributeStatistics(attribute)
if (attribute.isNominal()){
double value = es.getStatistics(attribute, Statistics.MODE)
String valueName = attribute.getMapping().mapIndex((int) value)
double count = es.getStatistics(attribute, Statistics.COUNT)
System.out.println("Atributte name "+valueName+" times that appears "+count)
}
else{
System.out.println(es.getStatistics(attribute, Statistics.AVERAGE))
}
}
}
the problem is that it doesn't count the times that the attribute appears.
....
Atributte name Windows times that appears NaN
thx0