Integration of Rapid Miner in Pentaho

Question

I know that Pentaho include Weka for data mining task, but now I'm using Rapid Miner. How can I Integrate a Rapid Miner process in Pentaho??
Thank you for help me.

steffen · Answer

Hello Andrea

Well ... "how" is somehow funny, because we are not talking about something simple here...

In general, it is not possible to do this by using tool x (at least: I do not know any) in one hour. You have to integrate it on your own.

If you are able to write a plugin for kettle and know how to extend rapidminer, you can figure it out on your own. Without learning these things, you will simply not be able to succeed.

Here are some hints:* kettle has a completely different data flow architecture than rapidminer. In short, kettle is designed to process millions of rows using a iterator-principle (i.e. it tries to process one row at a time), meanwhile rapidminer also allows this but focuses more on processing huge datasets as whole. This means, that you have either a rapidminer process, which is also able to handle one row at time (nice !) or not, which will result in a so called "blocking step" in kettle. The former case applies generally to model-application, the latter case to nearly all data mining processes I perform on a daily basis. 
* Giving this, could it be an option to perform all the etl-stuff with kettle and apply rapidminer afterwards (with data stored in a database) ?
* A implementation could look like this: A step in pentaho takes a rapidminer process (as xml-file) as argument, inits rapidminer, then converts the input data for this step (either one row at a time or all rows by collecting them in this step), passes it to the process and executes. The result is converted back to kettle-format. Clearly a lot of work, hm ?

happy mining coding

steffen