ModelApplier needs to much memory with high-dimensional data?

Legacy User
Legacy User New Altair Community Member
edited November 5 in Community Q&A
Hi again,


I was playing around with the cross validation for some time using one of the templates that come with RapidMiner and the sparse toy data file. Using the toy data, the  standard-XVal with a LibSVM classification learner + ModelApplier + Evauator runs in less than 2 sek.
Then I changed the the dimension of the data from the current 25 features to something larger (e.g. 100000), simply by adding 1 additional feature with the index 99999 and some value to each of my 10 sparse data  vectors.
Unfortunately, the application (!) of the learned model to the test data now  runs extremely long, using incredible amounts of memory. When I do the same without RapidMiner, using a simple perl script and the standard LibSVM implementation, the XVal is again done in seconds. Am I using the wrong ModelApplier or wrong options?

Thank you so much,
Mome
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi Mome,
    this might result from some internal conversions, but I'm not sure. Could you please send me the example data file and the process?

    Greetings,
      Sebastian
  • Legacy User
    Legacy User New Altair Community Member
    Sorry for the late reply, some other project occupied all my time. Meanwhile, I found out that RapidMiner works indeed very well. I found my stupid mistake:
    The SparseFormatExampleSource has a "DataManagement" parameter. When I store 1 Mio (very sparse set) attributes for thousands of samples using a double_array, I assume this leads to an extremely large (and extremely sparse) matrix. Choosing "boolean_sparse_array"  instead worked well for my problem. I promis to read the operator description more carefully next time  :D

    Thanks a lot
    Mome