🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

What is the maximum amount of rows

User: "Jeffersonjpa"
New Altair Community Member
Updated by Jocelyn
What is the maximum amount of rows you have already imported into the rapidminer? 10 million ?

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "David_A"
    New Altair Community Member
    Accepted Answer
    You mean, what's the largest data set you can work with?
    That would highly depend on your available hardware (storage space, RAM, ...) but other than that, there is no limit (considering you don't hit your license limit). On my travel laptop with only only 8GB of RAM, I could easily create a test data set with 10 million rows of random data.
    But of course if you actual start working with the data, the memory requirements and practical run time limits  are more complex.

    I hope that helps.
    User: "sgenzer"
    Altair Employee
    Accepted Answer
    hi @Jeffersonjpa I don't think you're really going to get an answer to this question :smiley: Almost all of our customers use proprietary data and hence we are not able to give you what you're looking for. I can, however, share this example of just how powerful the platform is - given enough resources. It is a from an unnamed commercial customer running real data:

    Dataset: 1.5m examples (rows), 49 attributes (columns) of which 5 were nominal and 44 were numerical
    Hardware: cluster of 64 AMD Opteron 6380 chipsets (16 cores each, 2.5MHz), 504GB RAM with 384GB swap

    Generalized Linear Model (GLM): runtime = 1 min 21 sec
    Deep Learning (H2O implementation): runtime = 7 min 29 sec

    User reported that all CPUs were "pegged" during this run with up to 180GB being consumed at times.

    Does this help? It's one example. You can have another data set with the same rows and columns that produces very different runtimes due to what those rows and columns contain. All I'm trying to share is that RapidMiner will use pretty much whatever resources you throw at it.

    Scott