Combine two files

[Deleted User] · October 2019

hi

Is it possible to combine two data(train+test) and make a new data which the train part has label and the test with out label?

thank you

varunm1 · October 2019

Hello @mbs

If the dataset has the same attribute names, you can use the append operator to append two datasets. If the test or train datasets have extra columns compared to each other, then you can use append (superset). Regarding testing data, do you have labels for testing data? If you do not have labels, you can use filter examples to separate testing data and use it for prediction.

If you have labels for testing data, my suggestion is to use "generate attribute" to add a new attribute (with the same name) in both training and testing sets, this new attribute can have a value "Tra" for training and "Tes" for testing. This way, we can filter them after appending using this new column.

varunm1 · October 2019

You informed me that you already created a single file (manually) with data consisting of unlabeled and labeled samples, Am I right? If you did this, just import that new file and use filter examples as I said earlier. If you still have a problem with attribute names, delete that row and try to create new attribute names.

varunm1 · October 2019

Hello @mbs

If the dataset has the same attribute names, you can use the append operator to append two datasets. If the test or train datasets have extra columns compared to each other, then you can use append (superset). Regarding testing data, do you have labels for testing data? If you do not have labels, you can use filter examples to separate testing data and use it for prediction.

If you have labels for testing data, my suggestion is to use "generate attribute" to add a new attribute (with the same name) in both training and testing sets, this new attribute can have a value "Tra" for training and "Tes" for testing. This way, we can filter them after appending using this new column.

[Deleted User] · October 2019

@varunm1

thank you for the answer but it is a bit complex so could you please send me an example ( process)?

you saw my data before

thank you

varunm1 · October 2019

Hello @mbs

Here is the dummy data created and the .rmp file that you can import to rapidminer and see. The append (superset) operator is in the opertor toolbox that you need to install from the marketplace.

Image: https://us.v-cdn.net/6030995/uploads/editor/m0/4kmmrib7bbet.png

[Deleted User] · October 2019

@varunm1

it has read excel operator and again i got error

[Deleted User] · October 2019

combine both data plz

thanks

varunm1 · October 2019

You can import data into a repository and then drag and drop those instead of reading excel operators. If this doesn't work out, my suggestion is to create a single excel file with both train and test data and then import them to rapidminer. You can then apply filter to divide training and test datasets. I attached a dummy excel file with a new attribute that defines either that sample belongs to training (Tra) or testing (Tes). This column is used to separate the data (Filter examples).

Image: https://us.v-cdn.net/6030995/uploads/editor/3m/3uycfh7h07px.png

[Deleted User] · October 2019

thank you

does it has any label?

label is important for my work

varunm1 · October 2019

The data I created has a label for training and missing values for testing. If you have labels in testing data that is fine, you are filtering out test data using the column that says either the data belongs to training or testing. See the "Data_type" column in the excel sheet attached in the previous post, that column specifies which sample the data belongs to. Once you separate them and use the apply model, it will take care of testing.

[Deleted User] · October 2019

@varunm1

hi

still i see error

the name of some empty column in my data is error

varunm1 · October 2019

@mbs looks like something weird happening in your excel file. The hidden spaces may be causing an issue for your data import. Not sure though.

[Deleted User] · October 2019

@varunm1

i dont have any space in data.

i try it in my friend lap top (RM version 9.2) but still it has problem

varunm1 · October 2019

Did you try my excel files to check, if it is having an error with these as well?

[Deleted User] · October 2019

your file is ok but mine still has problem

varunm1 · October 2019

Unfortunately, I cannot help much with this without your files. I thought about most of the options. My understanding is that the issue is a formatting error in that particular excel file. By the way, did you create these excel files manually, or did you get them from another system or software as an output?

Also, try creating dummy names for the column name and see how it works.

[Deleted User] · October 2019

@varunm1

thank you for your help

i did this file manually and copy it to other excel and fix it but now half of my data has label and the other part doesnt have label . so in this situation what is your suggestion?

regards

mbs

varunm1 · October 2019

You want to use the unlabelled data for prediction right. You can separate labeled and unlabelled data in rapidminer using filter examples operator. I mentioned this in my earlier post in this thread. You can use the labeled data for model building and unlabelled data to make predictions from that model.

[Deleted User] · October 2019

i know but if you remember i had problem with two data which i told you so because of that i combine them

now is there any other way for that?

thank you

varunm1 · October 2019

You informed me that you already created a single file (manually) with data consisting of unlabeled and labeled samples, Am I right? If you did this, just import that new file and use filter examples as I said earlier. If you still have a problem with attribute names, delete that row and try to create new attribute names.

[Deleted User] · October 2019

yes yes

Finally it works

thank you very much @varunm1

Combine two files

Welcome!

Best Answers

Answers

Welcome!

Welcome!

Quick Links

Categories