which operator allows me to use all four logical processors for parallel computing?
jsramirezgo
New Altair Community Member
I Got a Rapid Miner Enterprise Medium License (I am able to use four logical processors), however, I dont know how to use it. I mean, I dont know which operator in Rapid Miner allows me to deploy parallel computing by using more than one logical processor.
I really appreciate the help because this is part of an academical research and I need urgent help in order to continue my experiment.
Thanks!
I really appreciate the help because this is part of an academical research and I need urgent help in order to continue my experiment.
Thanks!
Tagged:
0
Answers
-
hi @jsramirezgo - most of the main modeling operators (and many of the ETL ones) are already optimized for parallelization and there is literally nothing you need to do. When you run processes (or Auto Model), you should see your processors all kick in. If you do NOT see them kick in, please let us know.
Scott
1 -
Hi sgenzer, thanks a lot for your reply.
I got two questions regarding your reply:
1. how can I see whether the processors all kick in?
2. the operator "execute process" that runs multiple process can be named as a parallel computing operator?
Thanks!0 -
hi @jsramirezgo -
1. to see your CPU usage really depends on your computer. On my Mac I look at Activity Monitor. I think there are more techie ways to do this but I will let others chime in who know more. Maybe @Telcontar120? @rfuentealba? @lionelderkrikor?
2. The "Execute Process" operator will simply execute a process somewhere else in RapidMiner. If that process is optimized for parallelization, it will run parallelized.
Hope that helps?
Scott
0 -
Hi @jsramirezgo,
To answer to Scott's question, I'm using Windows as OS, but
I'm using a Rainmeter skin which is displayed on the desktop of my computer and which displays the "CPU usage" (with % Core 1 , % Core 2, % Core 3 etc.) and the "RAM usage" in %.
Regards,
Lionel1 -
You can also use the Log operator to capture CPU execution time and memory utilization.2
-
🤦♂️ #occamsrazor thx @Telcontar120 @lionelderkrikor
1 -
Hello guys.
thanks a lot for your replies. About checking the processors I realised how to do it, thank you!
regarding scott’s answer, I think I have a new question that would help me to resolve this finally:
How can I know If a process is optimized for parallelization?
thanks!0 -
oh that's easier - just change this parameter (in preferences) and look at the execution time
1 -
Excellent information Scott. I finally solved my questions.
Thank you very much!0 -
Sorry Scott, I realised it didn't work for me. In theory I understood what you said and I set up the parallel execution in preferences. Also, my hardware has 8 logical processors, however, when I did my test with the parameter "worker threads for active process=0" I got a certatin execution time, but, when I tested with the parameter "worker threads for active process=4" I got the same execution time.
I think the idea of parallel computing is to reduce time, but both escenarios had the same execution time. what was wrong? Do I need an specific operator?
I really appreciate your help. Sorry for bein persistence. Just want to clarify my doubts with RM.0 -
Hi @jsramirezgo!Setting the preference to 0 means that all available/allwoed processors should be used. If you want to test multiple cores versus one core, set the preference to 0 (or 4) and 1 respectively.Also, for the loop operators that are parallelized, there is an option "enable parallel execution" which lets you decide if you want to execute the loop iterations in parallel or not.Hope this helps!Jan2
-
Hey Jan!
thanks a lot! Quite clear. Questions solved.
regards.1 -
Hello,
I am late to this reply, hence I'll add a few more things, not closely related with parallel execution of operators but it does with processes.
I have a MacBook Pro for RapidMiner Studio, and I really don't care about parallel execution on it. However, for my world domination projects, I use 4 MacBook Pro's with 12-core i9-9900, each one with an agent configured to run up to 11 parallel tasks. If you have such a setup, use Nagios. It uses the SNMP protocol to monitor the status of the machines, and due to the nature of that protocol, it doesn't affect much of the network throughput.
Now be sure that I'm not running one task in parallel in all these computers (is it feasible? I need it badly) but many different enqueued processes. More often than not when I need this kind of power, I divide my processes and use the Schedule Process operator to cascade, or an API with data through RapidMiner Server.
A real case for this: let's say I have 32000 pages from a website that you need to apply NLP. I do convert these to examples and perform a Loop Examples, pass the entire data on a POST to the API and finish the process. This creates 32000 requests to the RapidMiner Server, and the results are solved with 44 processes. In my last development project, 42 tasks served by all 4 computers could solve nearly 1200 pages per minute, taking only 30 minutes. I did that with my old good MacBook Air and it took 7 hours to complete the same task.
If anyone has a better suggestion for me, I'm all ears.
Too bad the MacBooks aren't mine
Just my two cents.
All the best,
Rodrigo.1 -
holy cow @rfuentealba looks like you have quite the rig!0
-
Hi @rfuentealbaWithout knowing much about the use case, I would try to reduce the number of calls to the server. Each call generates a tremendous overhead (if the number of calls remains large, consider using the scoring agent).One option is to do the web crawling on a separate process (possibly with an external tool), save the pages to a file or in the repository and then have RM Server process the files/dataset on one or more scheduled processes.Let me know if this helps, if you tell us more maybe we come up with more ideasRegards,Sebastian
1 -
Hi @jsramirezgoIMHO the parallalism inside a given process is handled quite well by RapidMiner, you don't need to do anything (that is a great advantage compared to doing data science in a programming language). The kind of parallelism that would be most useful to you is running processes at the same time.Do you know that you can run processes in the background?
That way you can keep working while your experiments run, pretty neat!
1