ModelApplier needs to much memory with high-dimensional data?
Legacy User
New Altair Community Member
Hi again,
I was playing around with the cross validation for some time using one of the templates that come with RapidMiner and the sparse toy data file. Using the toy data, the standard-XVal with a LibSVM classification learner + ModelApplier + Evauator runs in less than 2 sek.
Then I changed the the dimension of the data from the current 25 features to something larger (e.g. 100000), simply by adding 1 additional feature with the index 99999 and some value to each of my 10 sparse data vectors.
Unfortunately, the application (!) of the learned model to the test data now runs extremely long, using incredible amounts of memory. When I do the same without RapidMiner, using a simple perl script and the standard LibSVM implementation, the XVal is again done in seconds. Am I using the wrong ModelApplier or wrong options?
Thank you so much,
Mome
I was playing around with the cross validation for some time using one of the templates that come with RapidMiner and the sparse toy data file. Using the toy data, the standard-XVal with a LibSVM classification learner + ModelApplier + Evauator runs in less than 2 sek.
Then I changed the the dimension of the data from the current 25 features to something larger (e.g. 100000), simply by adding 1 additional feature with the index 99999 and some value to each of my 10 sparse data vectors.
Unfortunately, the application (!) of the learned model to the test data now runs extremely long, using incredible amounts of memory. When I do the same without RapidMiner, using a simple perl script and the standard LibSVM implementation, the XVal is again done in seconds. Am I using the wrong ModelApplier or wrong options?
Thank you so much,
Mome
Tagged:
0
Answers
-
Hi Mome,
this might result from some internal conversions, but I'm not sure. Could you please send me the example data file and the process?
Greetings,
Sebastian0 -
Sorry for the late reply, some other project occupied all my time. Meanwhile, I found out that RapidMiner works indeed very well. I found my stupid mistake:
The SparseFormatExampleSource has a "DataManagement" parameter. When I store 1 Mio (very sparse set) attributes for thousands of samples using a double_array, I assume this leads to an extremely large (and extremely sparse) matrix. Choosing "boolean_sparse_array" instead worked well for my problem. I promis to read the operator description more carefully next time
Thanks a lot
Mome
0