Locking up during data import

drobertson123
drobertson123 New Altair Community Member
edited November 5 in Community Q&A
Hello

I am hoping someone has some advice.  Being new to Rapid Miner I am not sure if I am missing something, but this doesn't seem to be right.

I am seeing a consistent problem while I attempt to import data from a CSV file.  The file contains roughly 5 million rows of data.  Each row is comma seperated values containing 3 data items.  A date  (example: 12/3/2010), an integer representing the time during the day and a decimal value.  Everything seems to go fine during the import specification process.  When I actually ask it to finish and do the import the software freezes.  If I go away and come back to it the program screen is black.  It stays that way until I kill the Rapid Miner process.

In Task Manager Rapid Miner is not using any CPU cycles and it isn't consuming much RAM.  The program just seems to be blocked.

Does anyone have any idea what is happening and what to do to fix it?


Thanks for the help.

Doug
Tagged:

Answers

  • haddock
    haddock New Altair Community Member
    Greetings Doug, and welcome!

    Assuming you've got enough RAM etc., in your position I'd break the problem down by ...

    1. Breaking the data into chunks,
    2. Cutting down the column separator possibilities in the CSV read operator,

    Because this sort of problem can be caused simply by a wayward column separator, like a space, and scrolling through five million lines is not hugely thrilling!

    Happy hunting, hope that nails it down  ;D
  • drobertson123
    drobertson123 New Altair Community Member
    Thanks for the advice.

    I have 8GB of RAM on a windows 7 system. I tried a smaller batch of data (4000 records) and it worked.  I am trying to nail down where the issue is but I still can't seem to find it.

    Are there limitations on the number of rows imported?  I work with large data sets and it would be nice to know what limitations I have.  Also, should I be upping the memroy settings anywhere to get better performance on large data sets?

    I apreciate any advice you can give.  This looks like a great tool, but I am still learning a lot.

    Doug
  • haddock
    haddock New Altair Community Member
    Hi there,

    As far as I know the limits are OS imposed, and the memory allowed is tweakable in the startup scripts; but I'm on XP and Vista 64 and not familiar with 7.

    Good luck!

  • land
    land New Altair Community Member
    Hi,
    I would suggest switching to the result perspective while executing the process and watching the memory monitor. If the memory consumption increases steadily and finally the monitor turns red and the gui starts to hang, then you simply have not enough memory.
    Of course this might be caused by wrong parameter settings of the importer, but this is unlikely if it works with 4000 samples.

    Greetings,
      Sebastian
  • drobertson123
    drobertson123 New Altair Community Member
    Everyone thanks for the support.

    I figured out what the issue was.  I am running windows 7 x64 and I had the 32 bit Rapid Miner installed.  Despite it running in a 32 bit space it seemed to cause many problems.  Please watch out for this in the future.

    I now have the 64 bit version installed and it works fine.

    I apreciate the good advice people gave.

    Thanks,
    Doug Robertson