How to read data from local repository?
DDresen
New Altair Community Member
Hey there,
I'm using the 'store' operator to store a csv file into my local repository. Everything works fine and when I use the 'retrieve' operator it can access the created file just as its supposed to be. But when I use the 'execute python' operator in which the code tries to read the .csv by the path I copied from the file in my local repository by clicking 'copy location to clipboard' it says that the file doesn't exist. How can I access the file by the 'execute python' operator?
If I can access it just by connecting the 'retrieve' operator to the 'execute python' - how do I call the output of 'retrieve' in the python code?
I'm using the 'store' operator to store a csv file into my local repository. Everything works fine and when I use the 'retrieve' operator it can access the created file just as its supposed to be. But when I use the 'execute python' operator in which the code tries to read the .csv by the path I copied from the file in my local repository by clicking 'copy location to clipboard' it says that the file doesn't exist. How can I access the file by the 'execute python' operator?
If I can access it just by connecting the 'retrieve' operator to the 'execute python' - how do I call the output of 'retrieve' in the python code?
Tagged:
0
Best Answer
-
Hi @DDresen,If I'm reading your question right, then what you need to do is just wire the output of your Retrieve operator into the first inp port of Execute Python.Next, the mandatory rm_main function signature has to match the number of wired input ports. The inputs will be made available to you in that same order, as the input parameters of the rm_main function. You will get these inputs as pandas DataFrames, so you can work with them inside rm_main based on this assumption.I attached a very rudimentary process that should demonstrate this.Let me know if this helps.5
Answers
-
Hi @DDresen ,historically (before RM 9.7) RapidMiner used it's own serialization format (.ioo). This is written in java code and it was not possible to read those files with other systems or programming languages (or not easily). You would need to use a write csv operator and then read the csv.This changed now with 9.7. In 9.7 we changed our serialization format from .ioo to .rmhdf5tables for tables. hdf5 is a standard file format which we use to store. To make exactly your use case easier.The way how we store data into hdf5 is a bit different to what pandas expects. So if I remember this correctly, you need to use our own python lib for this.@tkenez can you help me with the details here?~Martin1
-
Hi @DDresen,If I'm reading your question right, then what you need to do is just wire the output of your Retrieve operator into the first inp port of Execute Python.Next, the mandatory rm_main function signature has to match the number of wired input ports. The inputs will be made available to you in that same order, as the input parameters of the rm_main function. You will get these inputs as pandas DataFrames, so you can work with them inside rm_main based on this assumption.I attached a very rudimentary process that should demonstrate this.Let me know if this helps.5