"how to develop a new algorithm in RapidMiner?"

Obaeissa
Obaeissa New Altair Community Member
edited November 2024 in Community Q&A
I have an idea of a new algorithm I want to develop it and tested in RapidMiner. Should I use the extension template provided by RapidMiner or there is another way?

Best Answers

  • rfuentealba
    rfuentealba New Altair Community Member
    Answer ✓
    Hello, @Obaeissa, welcome to the community.

    The RapidMiner extension template is provided to you so that you don't have to connect with RapidMiner and import stuff from there. If you are proficient in Java, it is the most recommended way to implement your algorithm. You can also use the Apache Groovy programming language to implement it and run it as "Execute Script". However, I haven't seen much documentation about this (perhaps my good friends @mschmitz, @David_A and @land can give you some more tricks. Perhaps @IngoRM too).

    If your idea of an algorithm is something you are trying for the first time, I would recommend to create a Python (or whatever language you feel comfortable with) implementation first, and then build a RapidMiner operator (or superoperator) based on that. At least that is what I did when I "invented" the Naïve Bayes algorithm (Yes, I did it almost 200 years after Sir Thomas Bayes, but I didn't know it until I saw my first data science books, so... sorry). If you go this route, make sure you use the Anaconda Python distribution and the Python Scripting extension, so it can be easier to test it through RapidMiner.

    BTW, write a paper about your algorithm. It is important to keep things as scientific as possible, not because it is a RapidMiner requirement but because data scientists like academic processes. Yes, you will hear @yyhuang saying that "a lot of academic data scientists haven't seen problems in real life", but creating an algorithm (rather than making use of it) is a totally different matter.

    Hope this helps,

    Rodrigo.

  • IngoRM
    IngoRM New Altair Community Member
    edited March 2019 Answer ✓
    To add to Rodrigo's comment: I would definitely recommend to always work with the language you are already comfortable with.  If you know Java, there is simply no point in learning Python first but going straight to building a Java extension is most likely the simplest way for you.  But if you already know R or Python or even have an implementation there already, the first thing should always be to integrate those first.  Just like Rodrigo has said.
    So let's assume you in fact do know Java and want to go down the extension route.  Then please use this documentation here:
    If you are familiar with Java, Git, Gradle and you favorite IDE (IntelliJ, Eclipse) already, you should be able to be up and running in less than an hour...
    On the freelancing: while I would certainly be able to code this for you, I have some doubts that you would be willing to pay my daily rate for that ;) - so I hope that somebody else would step in here to help out in case you need it.
    Hope this helps,
    Ingo

Answers

  • rfuentealba
    rfuentealba New Altair Community Member
    Answer ✓
    Hello, @Obaeissa, welcome to the community.

    The RapidMiner extension template is provided to you so that you don't have to connect with RapidMiner and import stuff from there. If you are proficient in Java, it is the most recommended way to implement your algorithm. You can also use the Apache Groovy programming language to implement it and run it as "Execute Script". However, I haven't seen much documentation about this (perhaps my good friends @mschmitz, @David_A and @land can give you some more tricks. Perhaps @IngoRM too).

    If your idea of an algorithm is something you are trying for the first time, I would recommend to create a Python (or whatever language you feel comfortable with) implementation first, and then build a RapidMiner operator (or superoperator) based on that. At least that is what I did when I "invented" the Naïve Bayes algorithm (Yes, I did it almost 200 years after Sir Thomas Bayes, but I didn't know it until I saw my first data science books, so... sorry). If you go this route, make sure you use the Anaconda Python distribution and the Python Scripting extension, so it can be easier to test it through RapidMiner.

    BTW, write a paper about your algorithm. It is important to keep things as scientific as possible, not because it is a RapidMiner requirement but because data scientists like academic processes. Yes, you will hear @yyhuang saying that "a lot of academic data scientists haven't seen problems in real life", but creating an algorithm (rather than making use of it) is a totally different matter.

    Hope this helps,

    Rodrigo.

  • Obaeissa
    Obaeissa New Altair Community Member
    Thank you for the advice, Actually, I'm a Ph.D. CS candidate in my research I have developed a new algorithm that has been tested theoretically and mathematically but not yet coded.  So I want to code the algorithm and validate the concept. If anyone can support me to do so or any freelancer that will be great cuz I'm running out of time (very tight timeline for submission). 
  • IngoRM
    IngoRM New Altair Community Member
    edited March 2019 Answer ✓
    To add to Rodrigo's comment: I would definitely recommend to always work with the language you are already comfortable with.  If you know Java, there is simply no point in learning Python first but going straight to building a Java extension is most likely the simplest way for you.  But if you already know R or Python or even have an implementation there already, the first thing should always be to integrate those first.  Just like Rodrigo has said.
    So let's assume you in fact do know Java and want to go down the extension route.  Then please use this documentation here:
    If you are familiar with Java, Git, Gradle and you favorite IDE (IntelliJ, Eclipse) already, you should be able to be up and running in less than an hour...
    On the freelancing: while I would certainly be able to code this for you, I have some doubts that you would be willing to pay my daily rate for that ;) - so I hope that somebody else would step in here to help out in case you need it.
    Hope this helps,
    Ingo
  • land
    land New Altair Community Member
    Hello @Obaeissa
    if you need help with an implementation of a RapidMiner Extension in Java, we offer that on a regular basis. Please feel free to reach out via PM to discuss, if that is a viable option for you.

    Greetings,
     Sebastian
  • pallav
    pallav New Altair Community Member
    edited March 2020
    rfuentealba @IngoRM  - Sorry for reopening this discussion, I was just going through community to find my solution i find this post. So my concern is i know python and  i have some model or some basic prepossessing ETL code written in python  now i want to make a operator for that, how to do it i had seen how to make your operator and almost everywhere i got way via Java. How can i make operator if i know python not java. 
  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Hi @pallav,

    the first step is to create a process and use the Python Scripting extension to solve your problem.

    When that process is working and you got the inputs, outputs and parameters right, you can use the Custom Operators extension to transform the process into an operator.

    Custom Operators: https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_process_defined_operators
    Tutorial: https://community.rapidminer.com/discussion/56338/tutorial-for-creating-custom-operators

    After building the custom operator (one or many), you create the custom extension. It will be a normal RapidMiner extension (in your case depending on the Python Scripting extension), and you can put it on Server, give it to other people and even publish it on the Marketplace if it is helpful for others.

    Regards,

    Balázs
  • sgenzer
    sgenzer
    Altair Employee
    @pallav you also may want to reach out to @bhupendra_patil as he also is building operators using Python using a slightly different technique.
  • pallav
    pallav New Altair Community Member
    @sgenzer - Thanks a lot for refrence. @bhupendra_patil - It will be great if you can guide me through my major concern is how to define the parameters for the model i am making.