RapidMiner 9.3 Python Notebooks
jacobcybulski
New Altair Community Member
According to the release information RM 9.3 has a better integration with Python, including integration with Jupyter notebooks (apparently a new operator is to be found) to "seamlessly execute notebooks", which I figured out as it is using the old "Execute Python" (which still wants the rm_main). Is there any documentation or examples of the new API features, e.g. what Python package needs to be installed to call RM functions?
Jacob
1
Best Answers
-
Hi Jacob,
The new Python library is available on GitHub: https://github.com/rapidminer/python-rapidminer
Execute Python can now use an .ipynb file besides a .py file.
Yes, it still expects the rm_main method, that makes input and output data handling possible. We considered at least 6 alternative approaches to that, e.g. user can choose a name for each variable that contains the input data, etc. All of them had their own drawbacks (e.g. need to modify the script), so we kept the rm_main method approach. It can actually be used pretty well in a notebook if one gets used to it, e.g. having this extra function does not affect how you use the same file in Jupyter. Note that you can use cell tagging to ignore or include cells when running Execute Python. And with the rapidminer Python library, you can get data from the repository directly when you are developing your code in Jupyter or a Python IDE.
We'll publish some guidance, best practices and also update docs.rapidminer.com.
Stay tuned!
Feel free to let us know how you would use these features.
Best,
Peter1 -
@jacobcybulski
Great!
Calling Studio from Python has this overhead, it always starts a session. It helps batch-like execution more than an interactive use case. Performing multiple operations in the same session is a feature we are considering.
Using the Server class with a Server repository directly from Python, on the other hand, is super fast. We focused now on helping collaboration there.
I am happy that generally you use RM, and may call Python from there. I would say you most probably only need the rapidminer Python package if you are working on a more complicated code using a coder tool and would then use the code in a RM process.
Best,
Peter1
Answers
-
Hi Jacob,
The new Python library is available on GitHub: https://github.com/rapidminer/python-rapidminer
Execute Python can now use an .ipynb file besides a .py file.
Yes, it still expects the rm_main method, that makes input and output data handling possible. We considered at least 6 alternative approaches to that, e.g. user can choose a name for each variable that contains the input data, etc. All of them had their own drawbacks (e.g. need to modify the script), so we kept the rm_main method approach. It can actually be used pretty well in a notebook if one gets used to it, e.g. having this extra function does not affect how you use the same file in Jupyter. Note that you can use cell tagging to ignore or include cells when running Execute Python. And with the rapidminer Python library, you can get data from the repository directly when you are developing your code in Jupyter or a Python IDE.
We'll publish some guidance, best practices and also update docs.rapidminer.com.
Stay tuned!
Feel free to let us know how you would use these features.
Best,
Peter1 -
@phellinger Thank Peter, it all worked well. I have managed to run the Notebook in RM and to connect to RM from Python which also worked well.I have only noticed that while the connector is passed around different calls to RM, rather than continuing the RM session, it seems to restart every time, negotiating the licenses, loading all extensions, so it takes ages to do each separate call. Is there a way to fix this?JacobP.S. I imagine that in general, I'd be doing the calls from RM to Python2
-
@jacobcybulski
Great!
Calling Studio from Python has this overhead, it always starts a session. It helps batch-like execution more than an interactive use case. Performing multiple operations in the same session is a feature we are considering.
Using the Server class with a Server repository directly from Python, on the other hand, is super fast. We focused now on helping collaboration there.
I am happy that generally you use RM, and may call Python from there. I would say you most probably only need the rapidminer Python package if you are working on a more complicated code using a coder tool and would then use the code in a RM process.
Best,
Peter1