"Total records in imported data aggregates with old imported data"

jdude35
jdude35 New Altair Community Member
edited November 2024 in Community Q&A

Hello, I am relatively new to RapidMiner.  I am using the Rosette extension for text mining. I have imported a number of practice datasets to see how the various text mining algorithms work. The problem I am having is, whenever I import a new CSV file, no matter how I import it, the data somehow aggregates or combines with prior data.  For example, I had one dataset with 1035 records. Later, I imported a dataset with only 100 records in it and it shows up with 1135 records. Is there a step in the import process that is causing all of my data to accumulate in this unwanted fashion?

Best Answer

  • sgenzer
    sgenzer
    Altair Employee
    Answer ✓

    hello @jdude35 - welcome to the RapidMiner User Community.  We are happy to have you here.

     

    That is indeed odd.  The number of rows that you import by CSV should indeed be the number of rows.  Tell you what - instead of using the "Add Data" button, try using the "Read CSV" operator as the beginning of your process.  It's a better way to go anyway.

     

    Screen Shot 2017-10-02 at 9.50.46 PM.png

    Use the "Import Configuration Wizard" to set up the Read CSV operator, or just press go and see what happens.  Note that the default column separator for "Read CSV" is a semi-colon (;), not a comma (,) as one may expect from a CSV file.

     

    If you still have trouble, I recommend copying your process XML (open the XML tab) and pasting it here in this thread using the </> button.  This helps all of us troubleshoot each other.


    Scott

     

Answers

  • sgenzer
    sgenzer
    Altair Employee
    Answer ✓

    hello @jdude35 - welcome to the RapidMiner User Community.  We are happy to have you here.

     

    That is indeed odd.  The number of rows that you import by CSV should indeed be the number of rows.  Tell you what - instead of using the "Add Data" button, try using the "Read CSV" operator as the beginning of your process.  It's a better way to go anyway.

     

    Screen Shot 2017-10-02 at 9.50.46 PM.png

    Use the "Import Configuration Wizard" to set up the Read CSV operator, or just press go and see what happens.  Note that the default column separator for "Read CSV" is a semi-colon (;), not a comma (,) as one may expect from a CSV file.

     

    If you still have trouble, I recommend copying your process XML (open the XML tab) and pasting it here in this thread using the </> button.  This helps all of us troubleshoot each other.


    Scott

     

  • jdude35
    jdude35 New Altair Community Member

    Thank you. I was finally able to successfully import a csv without that strange data accumulation. I used the "Read CSV" and I used the Import Wizard and it worked.