[SOLVED]the problem of setup data set meta data information

Unknown
edited November 5 in Community Q&A
I am trying to load a big csv file(about 18G)  into rapidminer for building a classification model. The “import configuration wizard” seems has difficulty in loading the data. Therefore, I choose to use the “Edit parameter list: data set meta data information” to set up the attribute and label information. However, the UI-interface only allows me to setup those information column-by-column. My csv file has about 80000 columns. How should I handle this kind of scenario? Thanks.

Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    It may be a good idea to import the data into a database. MySQL for example offers command line options to create a table directly from a CSV file. Once having the data in a database, it will also be easier to process it with RapidMiner - even if you have a powerful machine with sufficient RAM, performing any kind of operation on 18GB of data will require a lot of patience. Usually, you want to work only on a subset (sample) of the complete data, and the database can help you to easily access those parts.

    Best regards,
    Marius