Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Database doesn't appear after importing

Hello,

I'm trying to import a .csv database to rapidminer, but after I tell it where to store the data, it says that it is importing the data to the location I specified. Then after the importing data window closes the dataset doesn't appear.

I'm trying to import the 2014 .csv from the Stanford Database on Ideology, Money in politics and elections: public version 2.0 and I've already extracted it from the .gz. I use the settings that it recommends in the data format, and no errors seem to appear, I have it replace errors with missing values then I try to place the data in the local repository, and it says it is importing the data, but nothing happens when it looks like it is finished.

I don't know what is going on, and why it is doing this.

Thank you for the help,

Jack.

Find more posts tagged with

AI Studio

Databases

Data Import

Accepted answers

sgenzer

RM can handle huge data sets but yeah that's very large for CSV. Normally data scientists would be using databases (e.g. SQL) for data management instead of raw import/export of CSV files at that size. I have 16GB on my machine as well and it was struggling to manage the file.

That said your machine should not lock up under any circumstances. Let's try first opening the CSV in Excel + creating a new CSV or XLSX file with only the first, say, 10k rows. See if that imports ok.

Scott

All comments

sgenzer

hi @Jack1701 that clearly should not happen. Can you please share a screenshot when "Then after the importing data window closes the dataset doesn't appear"? Also please send me your rapidminer-studio.log file. It is in your .RapidMiner folder.

Scott

Jack1701

Image: https://us.v-cdn.net/6030995/uploads/editor/wt/djhqz7n2hgfk.png

Image: https://us.v-cdn.net/6030995/uploads/editor/5p/xxmlsuvirdao.png

Here is a screenshot right before the import appears to stop, and what it shows after.

rapidminer-studio.log

sgenzer

hi @Jack1701 ok you did not warn me that this was a 10GB csv file

I just tried to import it myself and got an error (exactly the same as what I saw in your log file:

Image: https://us.v-cdn.net/6030995/uploads/editor/db/n8rok1l8uwbg.png

Keep in mind that it took RM about 15 min to get here. How much RAM do you have on your machine?

Scott

Jack1701

Sorry, I didn't realize that was abnormally large.

I have 16.0 GB of RAM on my machine.

sgenzer

Jack1701

I opened it in Excel, and it was able to load the first 1,048,575 rows, and RapidMiner was able to import that when I saved it in another .csv file. It ended up being about 350 MB, is there a way to get excel to load different parts of the file, and load the file in, in parts, or is that the most excel can do in this case?

sgenzer

so as you can see, Excel is a piece of cr@p when it comes to handling large data sets. My local installation (Office 365 Excel for Mac) only uses ONE logical processor so it's not even parallelized. My advice would be to load the data set into a MySQL database and forget Excel.

Scott

Jack1701

Ok, thank you so much for the help.