new features in the 4.4 release

ema · March 2009

Hi,
I got an email that the new rapid miner 4.4 will be release soon,

i cant wait ...

what are the new features
specially in clustering and classifications?

land · March 2009

Hi,
here's a snapshot from the changes.txt in the current developer repository:

Changes from RapidMiner 4.3.2 to RapidMiner 4.4 [2009/??/??]
---------------------------------------------------------------

* New operators:

- ExampleSetSuperset
- ExampleSetUnion
- MacroConstruction
- CumulateSeries
- FastLargeMargin

* Parameters will now be adapted according to an operator
rename, for example the settings of operators like
the ProcessLog or the parameter optimization operators
are automatically corrected to the new operator names

* Graphs like the similarity graph display the strengths
of the edges now by their color

* Added new tree layout algorithm for the decision trees
preventing most overlapping, the old tighter version
is available as layout type "Tree (Tight)"

* Decision trees now show the subtree size as tool tip
for the inner nodes, the edges are now darker for
larger subtrees and brighter for smaller ones

* Tables like the (meta) data view now supports a new
context menu for common table operations like column
sorting or row / column selection

* The New Operator dialog now also supports full text
search in the description texts of the operators

* RapidMiner now stores all parameter values in the
process files including the default values which ensures
a better compatibility with future versions. The XML tab,
however, only shows the values differing from the default

* Univariate and multivariate series windowing operators
now also support nominal attributes and even mixed
types in cases where the series is represented by
the examples (rows) of the data set

* The range statistics of nominal attributes in the
meta data view now shows the values with highest and
lowest occurrency counts, sorts the values according
to the counts, and displays only an excerpt of the
occurring values if large amounts of different values
exist

* List of recent files is now directly saved after opening
a new process and not only during shutdown

* Changes in the process setup are now allowed even during
process runtime, e.g. when waiting at a breakpoint

* Updated to latest version of Weka (as of February 26th, 2009)

* Bugfixes:

- fixed bug accuracy criterion for the revised decision
tree learner
- Fixed bug in parameter list of ValueSubgroupIterator
- Fixed bug in ExceptionHandling which sometimes led to
doubled outputs
- Fixed bug in ProcessBranch which sometimes led to
doubled outputs
- ViewAttributes did not add min and max statistics
so that those statistics where not calculated on
data table views

Changes from RapidMiner 4.3.1 to RapidMiner 4.3.2 [2009/02/17]
---------------------------------------------------------------

* New operators:

- LinearDiscriminantAnalysis
- QuadraticDiscriminantAnalysis
- RegularizedDiscriminantAnalysis
- DasyLabExampleSource
- FileIterator
- ExceptionHandling
- ChangeAttributeNamesReplace
- ChangeAttributeNames2Generic
- DateAdjust
- MinMaxBinDiscretization
- RainflowMatrix

* Deprecated operators:

- DirectoryIterator (use FileIterator instead)

* Renamed parameters:

- ExampleSetWriter:
quote_whitespace is now named quote_nominal_values

* ExampleSetMerge can now handle missing values

* RapidMiner does now better support counts for the in-
and output types which should considerably reduce the
amount of warnings if operators like IOConsumer,
IOMultiplier or ExampleSetMerge (reducing several objects
of the same type to one of the same) are used

* FileIterator replaces DirectoryIterator and adds many
new features like recursive iteration, file name based
filtering, and a new macro for the parent path

* Centroid based clusterings now support assigning unseen
examples to the nearest cluster on apply time

* ProcessBranch now supports a branching with respect
to the existance of an input object

* ClearProcessLog now also allows to remove the complete
logging table

* The logging tables of the ProcessLog operator will now
not be generated during start up but during the first
operator usage (and also during the following if the
table was deleted in the meantime, e.g. in a loop)

* Added support for different time zones, users can now
define the preferred time zone in the settings dialog
and time conversion operators are not able to respect
this setting

* Date and times are now displayed in the system's local
settings

* New plotter: Block

* Added support for applying a log scale for the color
column for the Scatter plot and the new Block plotter

* Data tables like those generated by the process log
are now de-coupled from the table used for plotting
preventing that the rows will be sampled and rows
would be removed from the data table

* A double click on the region between two columns in
the table header now automatically resizes the left
column to a fitting size (known from Windows programs)

* A double click on the same region while pressing CTRL
will resize all table columns according to the contents

* GuessValueTypes now only works on regular attributes
and provides a parameter for extending it on the special
attributes (work_on_special)

* AttributeFilter now also provides a new parameter
work_on_special

* The operator Replace now also allows empty replace_by
values

* The ExampleSetJoin operator now also works if the
id of the first example set is not part of the second

* Guess value types can now handle missing values

* CSVExampleSetWriter now supports the parameter quote_nominal

* All feature selection and weighting operators now also
provide the possibility to log the names of the features
of the current generation's best individual

* The Replace operator now supports capturing groups

* The file based example source operators (ExampleSource,
SimpleExampleSource, CSVExampleSource...) now better
supports quoted strings and also escaped quotes (escaping
with \")

* Implementation details:

- The method Tools.quotedSplit(...) should now be used
instead of a regular split followed by the method
Tools.mergeQuotedSplits(...)

* Bugfixes:

- fixed bug in DBScan for empty cluster models
- fixed bug for simple sampling in cases where a local
random seed was used
- fixed bug in process logging to files which prevented
the writing of the first logged result
- fixed bug in PSO optimization for cases where the fitness
should be minimized instead of maximized
- fixed bug in binary performance measure which was not
delivering the fitness for specificity, sensitivity,
and youden index
- fixed bug in meta data table viewer in cases where huge
numbers of long nominal values existed which caused a
crash of the Java Virtual Machine in some cases

Changes from RapidMiner 4.3 to RapidMiner 4.3.1 [2009/01/12]
---------------------------------------------------------------

* New operators:

- RemoveDuplicates
- Cluster2Prediction
- DirectoryIterator
- TextObjectWriter
- TextObjectLoader
- TextExtractor
- SingleTextObjectInput
- TextCleaner
- TextObject2ExampleSet
- TextSegmenter
- AddAttribute
- SetData
- EMClustering
- AttributeWeights2ExampleSet
- TransitionGraph
- DatabaseExampleVisualizationOperator

* Revised decision tree learning which lead to drastically
reduced runtimes and better tree models in terms of
generalization capabilities

* The bar chart now displays the category as label in the
domain axis

* Removed plotter: Bars 3D

* The IOObjectReader now allows the definition of the expected
output type

* The LiftParetoChart does no longer re-apply the input model if
a predicted label does already exist

* Added the ability to "explode" tiles of pie and ring charts

* Added several new options for the reporting operators of the
RapidMiner Enterprise Edition as well as true parameter handling
including type checks

* Updated to latest release of Jung

* Fixed GUI related memory leaks

* Implementation details:

- The class AttributeWeightsCreator was renamed to
ExampleSet2AttributeWeights

* Bugfixes:

- Fixed a combination of GUI and process thread related
memory leaks
- Fixed bug in Series Multiple Plotter which prevented
rescaling
- Pie and Bar charts used class limit instead of legend
limit in order to decide if the legend should be shown
- special format in ExampleSetWriter ignored quote
whitespace setting
- bug in XVPrediction fixed

Hope that satisfies your needs :P

Greetings,
Sebastian

ema · March 2009

Thank you very much... can not wait

land · March 2009

Hi Ema,
then you could check out the developer version using the developer branch from cvs? A guide for checking out using eclipse is on our website.

Greetings,
Sebastian

IngoRM · March 2009

The new version 4.4 will be released this week. So only a few days left for waiting ;D

Cheers,
Ingo

ema · March 2009

Hi ,
downloaded the new Rapidminer...

I was wondering how to use the Cluster2Prediction ?

Thank you

land · March 2009

Hi Ema,
Cluster2Prediction enables you to use classification performance measures for clustering, if label informations are available. For example think of the situation, where you know what has to be in the same cluster for a subset of your data. You then might use any flat clustering algorithm and test if it discovers your cluster structure. To achieve this, the operator matches the given cluster labels with the class labels in the best fitting way and converts the clusterattribute into a prediction attribute. You then might use the standard performance operators for classification to calculate the performance.

Greetings,
Sebastian

ema · March 2009

Hi.
Thank you very much

It works great

but with aggolom_clustering i tried to use it but
it is not working

i tried to flattern then to use example2cluster

but still can not work ...

Thank you in advance

IngoRM · March 2009

Hi,

there seems to be a problem during the flattening of the agglomerative clustering. I send this topic to Sebastian who is our clustering expert.

Cheers,
Ingo

new features in the 4.4 release

Answers

Categories