Combine two files

[Deleted User]
[Deleted User] New Altair Community Member
edited November 5 in Community Q&A
hi 

Is it possible to combine two data(train+test) and make a new data which the train part has label and the test with out label?

thank you

Best Answers

  • varunm1
    varunm1 New Altair Community Member
    Answer ✓
    Hello @mbs

    If the dataset has the same attribute names, you can use the append operator to append two datasets. If the test or train datasets have extra columns compared to each other, then you can use append (superset). Regarding testing data, do you have labels for testing data? If you do not have labels, you can use filter examples to separate testing data and use it for prediction. 

    If you have labels for testing data, my suggestion is to use "generate attribute" to add a new attribute (with the same name) in both training and testing sets, this new attribute can have a value "Tra" for training and "Tes" for testing. This way, we can filter them after appending using this new column.
  • varunm1
    varunm1 New Altair Community Member
    Answer ✓
    You informed me that you already created a single file (manually) with data consisting of unlabeled and labeled samples, Am I right? If you did this, just import that new file and use filter examples as I said earlier. If you still have a problem with attribute names, delete that row and try to create new attribute names.

Answers

  • varunm1
    varunm1 New Altair Community Member
    Answer ✓
    Hello @mbs

    If the dataset has the same attribute names, you can use the append operator to append two datasets. If the test or train datasets have extra columns compared to each other, then you can use append (superset). Regarding testing data, do you have labels for testing data? If you do not have labels, you can use filter examples to separate testing data and use it for prediction. 

    If you have labels for testing data, my suggestion is to use "generate attribute" to add a new attribute (with the same name) in both training and testing sets, this new attribute can have a value "Tra" for training and "Tes" for testing. This way, we can filter them after appending using this new column.
  • [Deleted User]
    [Deleted User] New Altair Community Member
    edited October 2019
    @varunm1

    thank you for the answer but it is a bit complex so could you please send me an example ( process)?

    you saw my data before B)

    thank you
  • varunm1
    varunm1 New Altair Community Member
    Hello @mbs

    Here is the dummy data created and the .rmp file that you can import to rapidminer and see. The append (superset) operator is in the opertor toolbox that you need to install from the marketplace.



  • [Deleted User]
    [Deleted User] New Altair Community Member
    @varunm1


    it has read excel operator and again i got error :'(
  • [Deleted User]
    [Deleted User] New Altair Community Member
    combine both data plz

    thanks
  • varunm1
    varunm1 New Altair Community Member
    You can import data into a repository and then drag and drop those instead of reading excel operators. If this doesn't work out, my suggestion is to create a single excel file with both train and test data and then import them to rapidminer. You can then apply filter to divide training and test datasets. I attached a dummy excel file with a new attribute that defines either that sample belongs to training (Tra) or testing (Tes). This column is used to separate the data (Filter examples). 


  • [Deleted User]
    [Deleted User] New Altair Community Member
    edited October 2019
    thank you 

    does it has any label?

    label is important for my work
  • varunm1
    varunm1 New Altair Community Member
    edited October 2019
    The data I created has a label for training and missing values for testing. If you have labels in testing data that is fine, you are filtering out test data using the column that says either the data belongs to training or testing. See the "Data_type" column in the excel sheet attached in the previous post, that column specifies which sample the data belongs to. Once you separate them and use the apply model, it will take care of testing.
  • [Deleted User]
    [Deleted User] New Altair Community Member
    @varunm1

    hi 

    still i see error  :(

    the name of some empty column in my data is error :/
  • varunm1
    varunm1 New Altair Community Member
    @mbs looks like something weird happening in your excel file. The hidden spaces may be causing an issue for your data import. Not sure though.
  • [Deleted User]
    [Deleted User] New Altair Community Member
    edited October 2019
    @varunm1


    i dont have any space in data.

    i try it in my friend lap top (RM version 9.2) but still it has problem :'(
  • varunm1
    varunm1 New Altair Community Member
    Did you try my excel files to check, if it is having an error with these as well?
  • [Deleted User]
    [Deleted User] New Altair Community Member
    your file is ok but mine still has problem
  • varunm1
    varunm1 New Altair Community Member
    edited October 2019
    Unfortunately, I cannot help much with this without your files. I thought about most of the options. My understanding is that the issue is a formatting error in that particular excel file. By the way, did you create these excel files manually, or did you get them from another system or software as an output?

    Also, try creating dummy names for the column name and see how it works.
  • [Deleted User]
    [Deleted User] New Altair Community Member
    @varunm1

    thank you for your help

     i did this file manually and copy it to other excel and fix it but now half of my data has label and the other part doesnt have label . so in this situation what is your suggestion?

    regards 

    mbs
  • varunm1
    varunm1 New Altair Community Member
    You want to use the unlabelled data for prediction right. You can separate labeled and unlabelled data in rapidminer using filter examples operator. I mentioned this in my earlier post in this thread. You can use the labeled data for model building and unlabelled data to make predictions from that model.
  • [Deleted User]
    [Deleted User] New Altair Community Member
    i know but if you remember i had problem with two data which i told you so because of that i combine them

    now is there any other way for that?

    thank you
  • varunm1
    varunm1 New Altair Community Member
    Answer ✓
    You informed me that you already created a single file (manually) with data consisting of unlabeled and labeled samples, Am I right? If you did this, just import that new file and use filter examples as I said earlier. If you still have a problem with attribute names, delete that row and try to create new attribute names.
  • [Deleted User]
    [Deleted User] New Altair Community Member
    yes yes 

    Finally it works o:)

    thank you very much @varunm1