Running concurrent processes
weslee3
New Altair Community Member
Can anyone confirm or deny that RM 5.3 is capable of processing multiple processes concurrently?
I am integrating RM in a Java project and am having an issue when attempting to run multiple RM Processes concurrently. Here is an overview of what I am doing:
1. Main thread: initialize RapidMiner
2. Create a runnable that executes a RM Process via a File
3. Submit 20 instances of this runnable to a thread pool
4. All 20 results show different results in the resulting IOContainer <-- THIS IS THE PROBLEM
NOTES:
"a warning: Be careful with static references. RM 6 will probably allow multiple processes opened, and RapidAnalytics certainly does."
http://rapid-i.com/rapidforum/index.php/topic,2917.msg11719.html#msg11719
I am integrating RM in a Java project and am having an issue when attempting to run multiple RM Processes concurrently. Here is an overview of what I am doing:
1. Main thread: initialize RapidMiner
2. Create a runnable that executes a RM Process via a File
3. Submit 20 instances of this runnable to a thread pool
4. All 20 results show different results in the resulting IOContainer <-- THIS IS THE PROBLEM
NOTES:
- The Process contains only static data.
- The Process produces the same result on every run when running it serially.
- The Process uses the Series extension (rmx_series-5.3.0.jar)
- The Runnables do not share any data.
"a warning: Be careful with static references. RM 6 will probably allow multiple processes opened, and RapidAnalytics certainly does."
http://rapid-i.com/rapidforum/index.php/topic,2917.msg11719.html#msg11719
0
Answers
-
Hi,
RapidMiner 5.3 is not capable of that.
Regards,
Marco0 -
Ah.. I was afraid that was going to be the case. Same problem, same basic setup, same symptoms. Very confusing. Many late nights getting different ExampleSets each run. Traumatizing.
Most of the new frontiers in training systems (like finding interesting bits in huge problem spaces) depend on how ubiquitous parallel processing has become.
To really access RM's power on machines with 4-16+ cores, when working on embarrassingly parallel problems, clean process separation seems an absolute necessity. I haven't dug deep enough into the code to see exactly where objects (iterators, RNGs, etc?) are being shared (assuming that is the issue), but I would like to express how useful it would be to clean up those dependencies in a future release of RapidMiner.
Think of me at 2 AM, grasping for that break through, the sense of achievement and pride welling up within me as the points of data plot a beautiful line... and having to fall asleep hurt, distraught and confused in the fetal position. Why RapidMiner, why?! I thought you were my friend!
Ahem. Anyway... right now I have to do some evil things, like starting up separate JVMs depending on how many cores are on the machine. It's cave-man threading. Ungabunga! Somehow I know we can do better!
If you have any ideas on where the problem(s) might be in more concrete terms, I am certainly all ears and would happily dive into the code to try and help solve the mystery...0 -
Hi,
the RapidMiner5 design was done quite some years ago, in a time where multiple processes and multi-threading in general where not yet that big a deal (sadly )
We will however rectify these shortcomings in the future
Regards,
Marco0