Database doesn't appear after importing
Jack1701
New Altair Community Member
Hello,
I'm trying to import a .csv database to rapidminer, but after I tell it where to store the data, it says that it is importing the data to the location I specified. Then after the importing data window closes the dataset doesn't appear.
I'm trying to import the 2014 .csv from the Stanford Database on Ideology, Money in politics and elections: public version 2.0 and I've already extracted it from the .gz. I use the settings that it recommends in the data format, and no errors seem to appear, I have it replace errors with missing values then I try to place the data in the local repository, and it says it is importing the data, but nothing happens when it looks like it is finished.
I don't know what is going on, and why it is doing this.
Thank you for the help,
Jack.
Tagged:
0
Best Answer
-
RM can handle huge data sets but yeah that's very large for CSV. Normally data scientists would be using databases (e.g. SQL) for data management instead of raw import/export of CSV files at that size. I have 16GB on my machine as well and it was struggling to manage the file.
That said your machine should not lock up under any circumstances. Let's try first opening the CSV in Excel + creating a new CSV or XLSX file with only the first, say, 10k rows. See if that imports ok.
Scott5
Answers
-
Here is a screenshot right before the import appears to stop, and what it shows after.
0 -
Sorry, I didn't realize that was abnormally large.I have 16.0 GB of RAM on my machine.0
-
RM can handle huge data sets but yeah that's very large for CSV. Normally data scientists would be using databases (e.g. SQL) for data management instead of raw import/export of CSV files at that size. I have 16GB on my machine as well and it was struggling to manage the file.
That said your machine should not lock up under any circumstances. Let's try first opening the CSV in Excel + creating a new CSV or XLSX file with only the first, say, 10k rows. See if that imports ok.
Scott5 -
I opened it in Excel, and it was able to load the first 1,048,575 rows, and RapidMiner was able to import that when I saved it in another .csv file. It ended up being about 350 MB, is there a way to get excel to load different parts of the file, and load the file in, in parts, or is that the most excel can do in this case?
1 -
so as you can see, Excel is a piece of cr@p when it comes to handling large data sets. My local installation (Office 365 Excel for Mac) only uses ONE logical processor so it's not even parallelized. My advice would be to load the data set into a MySQL database and forget Excel.
Scott1 -
Ok, thank you so much for the help.0