Unable to read file from disk using Execute Python operator
Hello, I am trying to get a better understanding of how RM Server can interact with its environment. I wrote a log file using an Execute Python operator within RM to create a test log file. I am now trying to use a different Execute Python operator to read the log file from disk (Linux) and then use a Store operator to store this data in the remote repository. All of this if running on the Linux RM Server.
What ends up happening is that RM wries an empty dataset. When I look in the server.log file I see multiple lines of this:
WARNING [com.rapidminer.operator.Operator] (scheduledprocess_1503585018370) Read CSV: Could not parse line 0 in input: com.rapidminer.tools.CSVParseException: Value quotes not closed at position 0. Last characters read: ,"
Here is my overall process:
Overall processPython code
Is the data frame not being constructed properly? It appears that the Execute Python process is writing a temporary CSV file somewhere that RM is trying to read and is failing to do so.
Best Answer
-
Hello Scott, thanks for your reply. This is mainly in case I need to do use a more detailed python process for more complex data transformations, needing to read/write to a database within that script, etc. and that I want to use python's logging module to log to disk. There are some cases were detailed logging is necessary and RM is not going to be a good tool for doing that. I know that I can successfully log to disk from a script in an ExecutePython operator, and I was finally able to read the file using the "Read Document" operator and then store it to the repository. It just seemed to me though that this should still work as it is returning a DataFrame object, but instead throws a CSVParseException. Anyway, I will look at using "Read Document" instead for reading and analyzing log files in the future.
Thanks
1
Answers
-
Hi @ccricha - good to have you here. I guess my first question is why are you using python scripts to read/write log files? There are very nice, easy-to-use operators built in to RapidMiner that will do this for you:
I have used these operators in RM processes running on an Ubuntu server running RM Server with no problems at all. Give it a try?
Scott
0 -
Hello Scott, thanks for your reply. This is mainly in case I need to do use a more detailed python process for more complex data transformations, needing to read/write to a database within that script, etc. and that I want to use python's logging module to log to disk. There are some cases were detailed logging is necessary and RM is not going to be a good tool for doing that. I know that I can successfully log to disk from a script in an ExecutePython operator, and I was finally able to read the file using the "Read Document" operator and then store it to the repository. It just seemed to me though that this should still work as it is returning a DataFrame object, but instead throws a CSVParseException. Anyway, I will look at using "Read Document" instead for reading and analyzing log files in the future.
Thanks
1