Parallel Processing - Multiple server instances

Nikouy
Nikouy New Altair Community Member
edited November 5 in Community Q&A
Dear Community,

I am currently working on my MSc research thesis, which is based on Rapidminer  and cloud deployment models. Therefore, I will probably be bugging you a bit as I research and questions arise :smile: .

One of the questions I intend to answer is: Can Rapidminer support parallel processing or have multiple instances (different servers deployed) mining collaboratively the same data set?

I would appreciate if you can help me answer these questions and point me in the right direction to explore various options. While the Rapidminer platform may not directly support this set up, some work arounds or custom configuracions or data mining strategies might.

Any help or feedback is very welcome.

Thanks in advance,
Nicolas

Best Answer

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi Nicolas,

    Studio and Server share the same execution engine. Therefore, parallel processes in Studio also work in parallel on Server.

    Sharing data mining algorithms between multiple independent systems is an active research topic. For most machine learning algorithms, it's not worth the effort, as network communication latencies would make the entire process much more inefficient than executing a smaller number of processes on one system.

    There are exceptions like neural network building using TensorFlow with the Deep Learning extension. This can use multiple systems with or without GPUs to work one problem.

    There are also related operations that are very easy to parallelize. E. g. you can score large datasets in parallel with existing models on multiple servers. You can use Radoop with a Hadoop cluster to parallelize preprocessing, filtering, etc. and build a few model types.

    Regards,

    Balázs

Answers

  • sgenzer
    sgenzer
    Altair Employee
    hi @Nikouy let me see if I can help you here.

    Can Rapidminer support parallel processing
    absolutely. This is the default mode now. You can control this in Preferences:



    or have multiple instances (different servers deployed) mining collaboratively the same data set?
    absolutely. Please see https://docs.rapidminer.com/latest/server/overview/
  • Nikouy
    Nikouy New Altair Community Member
    Hi Scott,

    Thank you for your reply, I apologise if I wasn't clear enough with my questions.
    When I say parallel processing, I refer to multiple server instances or Job agents mining collaboratively and in paralell the same data set, or working on the same execution/process at the same time. Is this possible?

    Also, I see that your Screenshot is from RM Studio. Do these settings affect the process execution when this is exectuted in RM server, or that only controls paralell processing in a local RM studio instance?

    Thanks,
    Nicolas

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi Nicolas,

    Studio and Server share the same execution engine. Therefore, parallel processes in Studio also work in parallel on Server.

    Sharing data mining algorithms between multiple independent systems is an active research topic. For most machine learning algorithms, it's not worth the effort, as network communication latencies would make the entire process much more inefficient than executing a smaller number of processes on one system.

    There are exceptions like neural network building using TensorFlow with the Deep Learning extension. This can use multiple systems with or without GPUs to work one problem.

    There are also related operations that are very easy to parallelize. E. g. you can score large datasets in parallel with existing models on multiple servers. You can use Radoop with a Hadoop cluster to parallelize preprocessing, filtering, etc. and build a few model types.

    Regards,

    Balázs