"how to develop a new algorithm in RapidMiner?"
Obaeissa
New Altair Community Member
I have an idea of a new algorithm I want to develop it and tested in RapidMiner. Should I use the extension template provided by RapidMiner or there is another way?
Tagged:
1
Best Answers
-
Hello, @Obaeissa, welcome to the community.
The RapidMiner extension template is provided to you so that you don't have to connect with RapidMiner and import stuff from there. If you are proficient in Java, it is the most recommended way to implement your algorithm. You can also use the Apache Groovy programming language to implement it and run it as "Execute Script". However, I haven't seen much documentation about this (perhaps my good friends @mschmitz, @David_A and @land can give you some more tricks. Perhaps @IngoRM too).
If your idea of an algorithm is something you are trying for the first time, I would recommend to create a Python (or whatever language you feel comfortable with) implementation first, and then build a RapidMiner operator (or superoperator) based on that. At least that is what I did when I "invented" the Naïve Bayes algorithm (Yes, I did it almost 200 years after Sir Thomas Bayes, but I didn't know it until I saw my first data science books, so... sorry). If you go this route, make sure you use the Anaconda Python distribution and the Python Scripting extension, so it can be easier to test it through RapidMiner.
BTW, write a paper about your algorithm. It is important to keep things as scientific as possible, not because it is a RapidMiner requirement but because data scientists like academic processes. Yes, you will hear @yyhuang saying that "a lot of academic data scientists haven't seen problems in real life", but creating an algorithm (rather than making use of it) is a totally different matter.
Hope this helps,
Rodrigo.
4 -
Hi @ObaeissaTo add to Rodrigo's comment: I would definitely recommend to always work with the language you are already comfortable with. If you know Java, there is simply no point in learning Python first but going straight to building a Java extension is most likely the simplest way for you. But if you already know R or Python or even have an implementation there already, the first thing should always be to integrate those first. Just like Rodrigo has said.So let's assume you in fact do know Java and want to go down the extension route. Then please use this documentation here:If you are familiar with Java, Git, Gradle and you favorite IDE (IntelliJ, Eclipse) already, you should be able to be up and running in less than an hour...On the freelancing: while I would certainly be able to code this for you, I have some doubts that you would be willing to pay my daily rate for that - so I hope that somebody else would step in here to help out in case you need it.Hope this helps,Ingo3
Answers
-
Hello, @Obaeissa, welcome to the community.
The RapidMiner extension template is provided to you so that you don't have to connect with RapidMiner and import stuff from there. If you are proficient in Java, it is the most recommended way to implement your algorithm. You can also use the Apache Groovy programming language to implement it and run it as "Execute Script". However, I haven't seen much documentation about this (perhaps my good friends @mschmitz, @David_A and @land can give you some more tricks. Perhaps @IngoRM too).
If your idea of an algorithm is something you are trying for the first time, I would recommend to create a Python (or whatever language you feel comfortable with) implementation first, and then build a RapidMiner operator (or superoperator) based on that. At least that is what I did when I "invented" the Naïve Bayes algorithm (Yes, I did it almost 200 years after Sir Thomas Bayes, but I didn't know it until I saw my first data science books, so... sorry). If you go this route, make sure you use the Anaconda Python distribution and the Python Scripting extension, so it can be easier to test it through RapidMiner.
BTW, write a paper about your algorithm. It is important to keep things as scientific as possible, not because it is a RapidMiner requirement but because data scientists like academic processes. Yes, you will hear @yyhuang saying that "a lot of academic data scientists haven't seen problems in real life", but creating an algorithm (rather than making use of it) is a totally different matter.
Hope this helps,
Rodrigo.
4 -
Thank you for the advice, Actually, I'm a Ph.D. CS candidate in my research I have developed a new algorithm that has been tested theoretically and mathematically but not yet coded. So I want to code the algorithm and validate the concept. If anyone can support me to do so or any freelancer that will be great cuz I'm running out of time (very tight timeline for submission).3
-
Hi @ObaeissaTo add to Rodrigo's comment: I would definitely recommend to always work with the language you are already comfortable with. If you know Java, there is simply no point in learning Python first but going straight to building a Java extension is most likely the simplest way for you. But if you already know R or Python or even have an implementation there already, the first thing should always be to integrate those first. Just like Rodrigo has said.So let's assume you in fact do know Java and want to go down the extension route. Then please use this documentation here:If you are familiar with Java, Git, Gradle and you favorite IDE (IntelliJ, Eclipse) already, you should be able to be up and running in less than an hour...On the freelancing: while I would certainly be able to code this for you, I have some doubts that you would be willing to pay my daily rate for that - so I hope that somebody else would step in here to help out in case you need it.Hope this helps,Ingo3
-
rfuentealba @IngoRM - Sorry for reopening this discussion, I was just going through community to find my solution i find this post. So my concern is i know python and i have some model or some basic prepossessing ETL code written in python now i want to make a operator for that, how to do it i had seen how to make your operator and almost everywhere i got way via Java. How can i make operator if i know python not java.1
-
Hi @pallav,
the first step is to create a process and use the Python Scripting extension to solve your problem.
When that process is working and you got the inputs, outputs and parameters right, you can use the Custom Operators extension to transform the process into an operator.
Custom Operators: https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_process_defined_operators
Tutorial: https://community.rapidminer.com/discussion/56338/tutorial-for-creating-custom-operators
After building the custom operator (one or many), you create the custom extension. It will be a normal RapidMiner extension (in your case depending on the Python Scripting extension), and you can put it on Server, give it to other people and even publish it on the Marketplace if it is helpful for others.
Regards,
Balázs2 -
@pallav you also may want to reach out to @bhupendra_patil as he also is building operators using Python using a slightly different technique.2
-
@sgenzer - Thanks a lot for refrence. @bhupendra_patil - It will be great if you can guide me through my major concern is how to define the parameters for the model i am making.2