Memory constraints using large data set
Mandar
New Altair Community Member
Hello Everyone,
I have been using RapidMiner to build models having 500K records and each record having 1000 attributes. I use the Read Database operator and then perform variable reduction using the 'Remove Useless Attributes' operator. I have to perform additional operations to determine the importance of the attributes with respect to the target but I have been facing memory constraints.
I have observed that after each operation RM reserves more memory and then it ends up giving an error which asks me to allocate more space to the software. I have a powerful machine with 16GB RAM and I have allocated 13GB to RM but since it does not free the memory and I keep on facing the same issue. I have tried using the 'Free Memory' operator but it frees up memory of unused objects and it does not end up freeing memory in my case. Is there a way to tell RM explicitly to free memory after an operation has been completed. I would appreciate any help in this matter.
Thanks,
Mandar
I have been using RapidMiner to build models having 500K records and each record having 1000 attributes. I use the Read Database operator and then perform variable reduction using the 'Remove Useless Attributes' operator. I have to perform additional operations to determine the importance of the attributes with respect to the target but I have been facing memory constraints.
I have observed that after each operation RM reserves more memory and then it ends up giving an error which asks me to allocate more space to the software. I have a powerful machine with 16GB RAM and I have allocated 13GB to RM but since it does not free the memory and I keep on facing the same issue. I have tried using the 'Free Memory' operator but it frees up memory of unused objects and it does not end up freeing memory in my case. Is there a way to tell RM explicitly to free memory after an operation has been completed. I would appreciate any help in this matter.
Thanks,
Mandar
Tagged:
0
Answers
-
If RapidMiner really does not free the memory, a possible solution could be to split up the singleprocess into two: in the first one you could calculate the most important attributes with Remove Useless Attributes, and in the second process you could start by loading only the important attributes. That will keep the memory footprint small right from the beginning.
Best regards,
Marius0