Holiday Release of Operator Toolbox!
Hello everybody! I have the great honor to announce the next release of our operator toolbox extension. With this extension we add some useful new functionality as well as build on our existing capabilities. But without further delay, lets talk about the new things!
Detect Outliers (Univariate)
Often you want to figure out if a value is strange. RapidMiner already offers a lot of complex algorithms like LOF, CBLOF, rPCA and all the good stuff in the anomaly detection extension. .The easy methods like a traditional z-score were not yet embedded into single operator – So that’s what we did here! One operator to check for odd things:
The out port contains a table with all the information you need.
We added a column containing an overall outlier score, which aggregates the scores of the individual columns. Usually this is the average of the columns, but you can also get the max or the product.
The vis ports uses an old friend of yours – The explain predictions object! You know it from the Explain Predictions operator or from AutoModel. In this case we use it not to visualize the influence factors for the score, but to visualize the outlierness of every single value:
As expected an age of 0.9 is an outlier.
Lastly the operator also gives you a preprocessing model which allows you to apply the algorithms fitted on this data set on a different data set using the Apply Model operator.
This operator currently supports three methods to calculate the score:
- z-Score: How many standard deviations are you away from the mean? I.e. score = (x-mean)/std_dev
- Quartiles: How many interquartile-ranges are you away from the median? I.e. score (x-median)/iqr where IQR is the delta of the 25th and the 75th percentile. This is very similar to the Tukey Test operator.
- Histogram: Use a Histgoram of the data. If the value is a value of a unfrequent bin, than this is an outlier. This is similar to the HBOS operator
Scan your Repository and your Processes
To manage your repository from processes we added the ability to scan it! The List Repository Object can be pointed to any directory and gives you all the objects in the folder structure.
This allows you to also execute every process in a folder by combining this with Loop Values and Execute process. There are multiple other use cases if you combine this with loops.
The Scan Processes operator allows you to go deeper into processes. It gives you a list of all operators used in the processes of a folder.
One use case for this is to search for deprecated operators or operators you do not want to use anymore. Another use case is of course to analyze your own processes using machine learning!
Store (Tagged)
Have you ever wondered who created an object? Or when? Or with what commit-id on a RM Project? The Store (Tagged) operator gives you exactly this option! If you store an object with this Store (Tagged) you will get all of this information as an annotation to the object.
This works for every object, not just for tables.
Read and Write SFTP support private keys now
The Read and Write SFTP operators did support proxies and username/password authentication from the last toolbox version. We went a step further and add the ability to use keys for this.