Rapidminer capabilities
Hi I have a couple of questions about Rapidminer, apologies if these have been asked before but many answers seem to be from a few years ago.
- Does rapidminer support parallel processing or is it strictly linear? And when a process is built can it work live and run data as it comes one at a time?
- How would one run rapidminer so no data is stored on local machines?
- How would I interact with a web app through rapidminer eg if I have a model built elsewhere ready to be called, can I do this through, say, the execute python operator?
Thanks
Best Answer
-
Hi there, my assumption is that these questions are based on RapidMiner Server (since that is the forum we are in), as answers may vary for RapidMiner Studio.
- RapidMiner does have parallel processing for certain operators, but not all operators. You have to look at specific operators to determine how they work. The release notes describe many of the recent improvements in parallel processing and also memory management, I would encourage you to look these over (focus on release notes for the 7.x versions): https://docs.rapidminer.com/studio/releases/
- It's a little less clear what you mean when you say "can it work live and run data as it comes in one at a time". If you mean process requests that come to it one record at a time via a webservice, then the answer is certainly yes. If you mean with respect to database operations, then it is a bit more complex, but the answer is that it may be able to be set up to work this way, depending on what exactly you mean, although it is not necessarily the most efficient.
- For RapidMiner Server, you can create database connections to whatever external databases that you want for reading and writing data. However, to operate it does require a "local" database to store any RapidMiner created objects (such as datasets and models) as well as processes, logs, etc., but you can also make that an external database as well--you would need to define that during the Server installation. No actual datasets need to be stored locally.
- If you want to call an external model, or process of any sort, you can do so via multiple ways: either the Python or R scripting extensions, or you can also call external web services or APIs via the web mining extension.
In short, RapidMiner Server is a very powerful platform and it is almost certainly capable of handling your data science needs, although some of your configuration requirements may require some customization that differs from the typical setup.
0
Answers
-
Hi there, my assumption is that these questions are based on RapidMiner Server (since that is the forum we are in), as answers may vary for RapidMiner Studio.
- RapidMiner does have parallel processing for certain operators, but not all operators. You have to look at specific operators to determine how they work. The release notes describe many of the recent improvements in parallel processing and also memory management, I would encourage you to look these over (focus on release notes for the 7.x versions): https://docs.rapidminer.com/studio/releases/
- It's a little less clear what you mean when you say "can it work live and run data as it comes in one at a time". If you mean process requests that come to it one record at a time via a webservice, then the answer is certainly yes. If you mean with respect to database operations, then it is a bit more complex, but the answer is that it may be able to be set up to work this way, depending on what exactly you mean, although it is not necessarily the most efficient.
- For RapidMiner Server, you can create database connections to whatever external databases that you want for reading and writing data. However, to operate it does require a "local" database to store any RapidMiner created objects (such as datasets and models) as well as processes, logs, etc., but you can also make that an external database as well--you would need to define that during the Server installation. No actual datasets need to be stored locally.
- If you want to call an external model, or process of any sort, you can do so via multiple ways: either the Python or R scripting extensions, or you can also call external web services or APIs via the web mining extension.
In short, RapidMiner Server is a very powerful platform and it is almost certainly capable of handling your data science needs, although some of your configuration requirements may require some customization that differs from the typical setup.
0 -
Great answer Brian!
I should add that both RapidMiner Server and Studio can execute processes in parallel, regardless of whether the operators are parallel themselves. This is one of the most important features of Server!
0 -
Thanks for the response, that cleared a lot up for me
0