comparing data mining tools
jesslyn
New Altair Community Member
I am currently evaluating Rapidminer, R, SAS Enterprise and Orange. Can someone provides some useful information to me?
which software provides better features in terms of 1)scalability and 2)power and flexibility, 3)how well the tools access and manage the data, 4) which is more graphical user friendly as well as 5) visualization.
I've done some research and I found out that rapidminer is better than the other 3 softwares.
I need someone to provide me more information about this topic as I am currently evaluating on these 3 tools. thanks.
which software provides better features in terms of 1)scalability and 2)power and flexibility, 3)how well the tools access and manage the data, 4) which is more graphical user friendly as well as 5) visualization.
I've done some research and I found out that rapidminer is better than the other 3 softwares.
I need someone to provide me more information about this topic as I am currently evaluating on these 3 tools. thanks.
Tagged:
0
Answers
-
Hi,
we are glad you are interested in RapidMiner. But please don't double post. You questions have been answered here: http://rapid-i.com/rapidforum/index.php/topic,5187.0.html
Best,
Nils0 -
Hi Jesslyn,
I just have answered a couple of questions already here:
http://rapid-i.com/rapidforum/index.php/topic,5187.0.html
Let me add some information to the new ones:
The desktop version of RapidMiner is working, well, on your desktop. Hence, there is a limit for the amount of data by the amount of memory your desktop system has. Things are of course much better for the server RapidAnalytics, which is usually running on better hardware. And there are several specific extensions for improving scalability for RapidMiner: a) an In-DB-Extension for executing processes directly in the database (for many processes, there is then literally no limit anymore), b) a Streaming Extension which offers operators so that data is no longer completely loaded into memory, and c) there is the Radoop Extension which allows running data transformation and modeling processes in distributed Hadoop clusters.
1)scalability
This has been partly answered already. Right now, there is no other graphical data mining suite offering more operations and more options for combining them including all necessary control structures like loops, branches, macros (variables), etc. More can be found in the fact sheet.
2)power and flexibility
Again, please have a look at the fact sheet. There are plenty of operators for connecting to data sources and transforming the data. Actually, many users of RapidMiner do not perform data analysis but ETL processes
3)how well the tools access and manage the data
Although this is a matter of taste I would like to point out that the Rapid-I team has put a lot of efforts into better supporting analysts, especially beginners. There are a lot of features like meta data propagation, quick fixes, error detection, online help, operator recommendations etc. to simplify the analyst's life. More, as you might guess already, can be found in the fact sheet.
4) which is more graphical user friendly
And a last time: the most important visualization techniques are listed in the fact sheet. This is actually an area we are pretty proud on since RapidMiner offers really a huge amount of different visualization techniques. And there is the new "Advanced Plotter" section (the documentation for this can be found in our download section).
5) visualization
Fact Sheet
Probably, you will find the following fact sheet for RapidMiner and RapidAnalytics interesting:
http://rapid-i.com/downloads/rapidminer/facts/rapidminer_rapidanalytics_fact_sheet.pdf
Cheers,
Ingo0 -
Hi Ingo,
thanks for replying. I understand that for Rapidminer Community edition is a free software and there is a limitation in size constraints like how many rows or records it can handle. however, can i have an estimation on what the limit will be? Millions of rows of data? 1 million, 2 million? thanks.0 -
Hello jesslyn
I don't believe there is an explicit limitation in the community edition on the maximum number of rows that can be processed. There is always a physical resource limit imposed by the machine you are running on however. Whenever these limits are encountered there are plenty of approaches that can be adopted to work round them. For example, the stream database and loop batch operators let you process things in batches at the expense of increased running time of course. The other thing is to use Rapid Analytics and run processes remotely.
regards
Andrew0