Collection of some ideas
spitfire_ch
New Altair Community Member
Hi,
I am experimenting with rapidminer for a couple of weeks now and am very impressed by the great possibilities it offers and the very helpful team. Over the time, some ideas came to my mind. I'll post them in a short list and let you decide if one or two of them are any good:
Hanspeter
I am experimenting with rapidminer for a couple of weeks now and am very impressed by the great possibilities it offers and the very helpful team. Over the time, some ideas came to my mind. I'll post them in a short list and let you decide if one or two of them are any good:
- More information when viewing a decision tree model: In addition to the graphic representation of the label distribution in each node or leaf, it would be nice if one could hover over a node/leaf and see the distribution in absolute numbers (how many cases of each class of the training set are in the current note/leaf
- When doing parameter optimization, so far only the performance of the best combination is returned. It would be nice, if one could also see how other combinations performed (e.g. the top n combinations, where n would be a user defined value). Maybe there are combinations very close to the best one, that have other advantages which make them more desirable than the best one.
- I would always like to see the final model in the end. Currently, this is not possible with all operators. e.g. the optimize selection operator trains a model, but does not allow you the see the final model in the end, without adding another model training step using the selected attributes
- Stacking using probabilities instead/in addition to final labels. See http://rapid-i.com/rapidforum/index.php/topic,2744.0.html
- Stop subprocess button, allowing you to exit of an "infinite loop" without canceling the entire process. See end of first post in http://rapid-i.com/rapidforum/index.php/topic,2745.0.html
- Difficult to implement and not so important: Graphical representations of more models, e.g. a 2D-Representation of SVM, displaying how the boundary separates the data. Something like here: http://kernelsvm.tripod.com/
Hanspeter
Tagged:
0
Answers
-
Hi,
thanks for sending this list in. Please find some comments below:
This is actually already implemented - at least for the leaf nodes. If you keep the mouse over a leaf node, a tooltip window will pop up showing more information. The inner nodes only show the total number of examples in this subtree, not their distribution until there. We could try to add the distribution numbers there as well.
More information when viewing a decision tree model: In addition to the graphic representation of the label distribution in each node or leaf, it would be nice if one could hover over a node/leaf and see the distribution in absolute numbers (how many cases of each class of the training set are in the current note/leaf
This is already possible. Just use a log operator inside of the parameter optimization and log the parameter values together with the performance. In the log operator, you can also specify a sorting type like Top K. Simply define the number of interesting values there as well as the sorting dimension (probably the performance) and the direction.
When doing parameter optimization, so far only the performance of the best combination is returned. It would be nice, if one could also see how other combinations performed (e.g. the top n combinations, where n would be a user defined value). Maybe there are combinations very close to the best one, that have other advantages which make them more desirable than the best one.
The problem here is that not always a model would be the result of a parameter optimization. Sometimes, the result would consist of several models (e.g. a model, some preprocessing models, and a word list for text processing). Sometimes, other results are generated and sometimes no results are generated at all. It will be difficult to handle this in general while keeping compatibility but I am open for suggestions here.
I would always like to see the final model in the end. Currently, this is not possible with all operators. e.g. the optimize selection operator trains a model, but does not allow you the see the final model in the end, without adding another model training step using the selected attributes
Good point. See my comments in this thread.
Stacking using probabilities instead/in addition to final labels. See http://rapid-i.com/rapidforum/index.php/topic,2744.0.html
Also a very useful point but a bit difficult to handle in general. Please see my comments in the other thread.
Stop subprocess button, allowing you to exit of an "infinite loop" without canceling the entire process. See end of first post in http://rapid-i.com/rapidforum/index.php/topic,2745.0.html
I also would really like this and I am sure that in the future we will add additional model visualizations like the ones described there.
Difficult to implement and not so important: Graphical representations of more models, e.g. a 2D-Representation of SVM, displaying how the boundary separates the data. Something like here: http://kernelsvm.tripod.com/
Thanks again for sending this in! I hope that my suggestions for the first two points help you already right now.
Cheers,
Ingo0 -
Hi,
Oh, sorry, this is embarrassing. I've always noticed the tooltip in neural nets, but somehow not in decision trees. Seems, I wasn't patient enough. I tried again and of course you're right, the information is right there! Sorry for that and thank you for correcting me on this - I might never have realized this useful feature is already thereThis is actually already implemented - at least for the leaf nodes. If you keep the mouse over a leaf node, a tooltip window will pop up showing more information. The inner nodes only show the total number of examples in this subtree, not their distribution until there. We could try to add the distribution numbers there as well.
I am a bit confused here. Most optimization operators allow you to see the performance in the end. What model is this performance based on? Isn't it the model with the most optimized parameters / selection?The problem here is that not always a model would be the result of a parameter optimization. Sometimes, the result would consist of several models (e.g. a model, some preprocessing models, and a word list for text processing). Sometimes, other results are generated and sometimes no results are generated at all. It will be difficult to handle this in general while keeping compatibility but I am open for suggestions here.
Thanks a ton for having taken the time to answer to my (sometimes rather stupid) questions / suggestions. This is really highly appreciated. Your support is exemplary - as is Rapidminer!
Kind regards
Hanspeter
0