Prediction Difference between RM GUI Run and Native Java Run
Danyo83
New Altair Community Member
Hi Everybody,
I built up a classification model which applies a Forward Selection and a Simple Validation. In order to get the predictions of the classified data I saved the attributes weights into a file. Then I Apply the very same model to the separated test data (which is the same test data as in the first FS process) in order to get the predictions labels. By the way, it would be a lot easier, if one could access the prediction label directly after the Feature selection process, maybe you can fix this too???
Anyway, I use CSV files as Input data. Since I wanna execute the very same process sequentially to all my CSV files in a folder, which have the same structure, I built a simple JAVA program which runs a separate process with the above described model and each of the CSV files. I use the rapidminer.jar library to instantiate the processes by the model file and do not need to call the GUI.
However the results between the GUI process and the JAVA process differ altough the same model and CSV files are used. I found out that running the GUI first, and the JAVA process for multiple times afterwards, leads to the same results. However if I run another instance in the GUI, the Rapidminer library seems to fail and all following processes produces false classification predictions in the JAVA process. To sum up If when I run 10 files with the same process only one will produce the same (correct) result as of the GUI, the rest will be wrong. When I run the model with another CSV file with the GUI and then restart the process in JAVA, the only result which will be correct is the one which was executed by the GUI right before.
Shall I upload something?
I hope you have some explanation and advice
Thanks a lot in advance
Daniel
PS:
Yes, I initialize RapidMiner before each run:
I built up a classification model which applies a Forward Selection and a Simple Validation. In order to get the predictions of the classified data I saved the attributes weights into a file. Then I Apply the very same model to the separated test data (which is the same test data as in the first FS process) in order to get the predictions labels. By the way, it would be a lot easier, if one could access the prediction label directly after the Feature selection process, maybe you can fix this too???
Anyway, I use CSV files as Input data. Since I wanna execute the very same process sequentially to all my CSV files in a folder, which have the same structure, I built a simple JAVA program which runs a separate process with the above described model and each of the CSV files. I use the rapidminer.jar library to instantiate the processes by the model file and do not need to call the GUI.
However the results between the GUI process and the JAVA process differ altough the same model and CSV files are used. I found out that running the GUI first, and the JAVA process for multiple times afterwards, leads to the same results. However if I run another instance in the GUI, the Rapidminer library seems to fail and all following processes produces false classification predictions in the JAVA process. To sum up If when I run 10 files with the same process only one will produce the same (correct) result as of the GUI, the rest will be wrong. When I run the model with another CSV file with the GUI and then restart the process in JAVA, the only result which will be correct is the one which was executed by the GUI right before.
Shall I upload something?
I hope you have some explanation and advice
Thanks a lot in advance
Daniel
PS:
Yes, I initialize RapidMiner before each run:
RapidMiner.init();I start each process by the model file.
Process p = new Process(new File(filename));
Tagged:
0
Answers
-
Hi,
Sorry, I didn't get this. Maybe I would understand if I see the process but right now I have no idea what could be optimized here ???
By the way, it would be a lot easier, if one could access the prediction label directly after the Feature selection process, maybe you can fix this too???
Ooooor you could have simply used the operator "Loop Files" and execute the process on each file. This would have been a matter or 20 seconds instead
Anyway, I use CSV files as Input data. Since I wanna execute the very same process sequentially to all my CSV files in a folder, which have the same structure, I built a simple JAVA program which runs a separate process with the above described model and each of the CSV files.
Although I must admit that I do not relly believe this since you have invoked init() before each run let's just rule this one out: To me this sounds as if the same global number generator was used for all iterations and hence the results differ. If this is indeed the case, please use local random seed for all randomized operators and check if this still happens.
To sum up If when I run 10 files with the same process only one will produce the same (correct) result as of the GUI, the rest will be wrong. When I run the model with another CSV file with the GUI and then restart the process in JAVA, the only result which will be correct is the one which was executed by the GUI right before.
Hope that already helps,
Ingo0