Creating a new attribute

cherokee
cherokee New Altair Community Member
edited November 5 in Community Q&A
Hi!

I have a problem -- what an exceptional thing  ;)

This is too bad, I just don't know why the following code doesn't work. It is especially embarrassing as I allready asked a very similar question for version 4.6 (http://rapid-i.com/rapidforum/index.php/topic,1028.0.html)! But I can neither remember the solution nor find the final code  :-[  :
	ExampleTable table = exampleSet.getExampleTable();
table.addAttribute(resultAttr);
exampleSet.getAttributes().addRegular(resultAttr);

DataRowReader reader = table.getDataRowReader();

while(reader.hasNext()) {
reader.next().set(resultAttr, 1.0d);
}
I always get an [tt]ArrayIndexOutOfBoundsException[/tt] at [tt]com.rapidminer.example.table.DoubleArrayDataRow.set[/tt]!

What am I doing wrong?

Best regards,
chero

P.S.: Just for completeness: I wrote my own ExampleSet class, so [tt]exampleSet[/tt] from above is of class [tt]Point2DExampleSet[/tt]. But all used methods are copied from [tt]SimpleExampleSet[/tt].

Answers

  • land
    land New Altair Community Member
    Hi,
    this is a little bit too complicated to answer with this few informations. Did you make a breakpoint on the method and stepped through it to find the problem? Why isn't the memory table expanded for the new column?

    Greetings,
      Sebastian
  • cherokee
    cherokee New Altair Community Member
    Hi!

    Yes, I used a breakpoint. Stepping through the code showed that
    a) [tt]exampleSet.getAttributes().addRegular(resultAttr);[/tt] workes fine
    b) inside this method [tt]numberOfAttributes[/tt] is smaller than [tt]columns[/tt]
    c) nevertheless [tt]reader.next().set(resultAttr, 1.0d);[/tt] throws the error

    Right now I'm using a workaround (manually creating a deep clone). But I cannot find the mistake. If you are interested I can give you the whole code including the custom example set. But I think this would be beyond the scope of the forum help.

    Best regards,
    chero
  • land
    land New Altair Community Member
    Hi Chero,
    that it would be. Something must be restricted to our enterprise customer :)

    But I would advise to take a step by step look into the example table. This is the only place where it can be forgotten to enlarge the DataTableRow.

    Greetings,
      Sebastian
  • cherokee
    cherokee New Altair Community Member
    Hi!

    I finally found my mistake. I will describe the problem here as I think it is bug of the RM API or at least some nice to have feature. And nevertheless it might help others that run into the same problem.

    First of all the above mentioned code is absolutely correct. It works. The problem arises much earlier. As I wrote I extended ExampleSet myself, consequently I wrote a custom reader. There is the problem. Here is the setup I used to parse my data and create the MemoryExampleTable:
    • Create the example table ([tt]MemoryExampleTable dataTable = new MemoryExampleTable(new ArrayList<Attribute> ())[/tt]). I had to use an empty list of attributes first due to project specific reasons. But this doesn't change the problem per se.
    • Parse the actual data and store it in a list ([tt]List<MyDataType> data[/tt]).
    • Add attributes to the example table ([tt]dataTable.addAttributes(someNewCreatedAttributes)[tt]). The number of attributes is not known a priori but extracted from the MyDataTypes.
    • For each data row
         
      • Create a new double array ([tt]double[] dataArray = new double[numberOfAttributes][/tt])
      •  
      • Fill the array with data
      •  
      • Create a new data row ([tt]DataRow row = new DoubleArrayDataRow(dataArray)[/tt]).
      •  
      • Add this data row to the example table ([tt]dataTable.addDataRow(row)[/tt]).
    For me this way seamed natural and working. The problem is the red part! I choose the size of the double array to be exact the size i needed to store my data. This doesn't work.

    When you add attribute to an example table the contained data rows are resized if necessary. But -- for efficiency reasons -- not by 1 but by 11. The memory example table stores the actual number of columns and assumes(!) that all data rows have (at least) this number of columns. You have to choose your array to be as large as this number but there is no way to access this number from outside the memory table. (In my case the table expected the data rows to have 121 columns but my array had a length of only 117  :( )

    As a workaround I changed the array to the size [tt]numberOfAttributes+11[/tt]. This is just fine but as the increment size (actually increment size -1)  is a private(!) constant of MemoryExampleTable there is no guarantee that this will work in any other RM release!

    Possible solutions in the core code would be
    • to use [tt]ensureNumberOfColumns[/tt] when adding new DataRows
    • create a public getter function for [tt]columns[/tt] in MemoryExampleTable
    So. I'm sorry that I nearly wrote a novel about the problem.

    Best regards,
    chero