Combine two files
[Deleted User]
New Altair Community Member
Best Answers
-
Hello @mbs
If the dataset has the same attribute names, you can use the append operator to append two datasets. If the test or train datasets have extra columns compared to each other, then you can use append (superset). Regarding testing data, do you have labels for testing data? If you do not have labels, you can use filter examples to separate testing data and use it for prediction.
If you have labels for testing data, my suggestion is to use "generate attribute" to add a new attribute (with the same name) in both training and testing sets, this new attribute can have a value "Tra" for training and "Tes" for testing. This way, we can filter them after appending using this new column.1 -
You informed me that you already created a single file (manually) with data consisting of unlabeled and labeled samples, Am I right? If you did this, just import that new file and use filter examples as I said earlier. If you still have a problem with attribute names, delete that row and try to create new attribute names.0
Answers
-
Hello @mbs
If the dataset has the same attribute names, you can use the append operator to append two datasets. If the test or train datasets have extra columns compared to each other, then you can use append (superset). Regarding testing data, do you have labels for testing data? If you do not have labels, you can use filter examples to separate testing data and use it for prediction.
If you have labels for testing data, my suggestion is to use "generate attribute" to add a new attribute (with the same name) in both training and testing sets, this new attribute can have a value "Tra" for training and "Tes" for testing. This way, we can filter them after appending using this new column.1 -
@varunm1
thank you for the answer but it is a bit complex so could you please send me an example ( process)?
you saw my data before
thank you0 -
Hello @mbs
Here is the dummy data created and the .rmp file that you can import to rapidminer and see. The append (superset) operator is in the opertor toolbox that you need to install from the marketplace.
2 -
0
-
combine both data plz
thanks0 -
You can import data into a repository and then drag and drop those instead of reading excel operators. If this doesn't work out, my suggestion is to create a single excel file with both train and test data and then import them to rapidminer. You can then apply filter to divide training and test datasets. I attached a dummy excel file with a new attribute that defines either that sample belongs to training (Tra) or testing (Tes). This column is used to separate the data (Filter examples).
0 -
thank you
does it has any label?
label is important for my work0 -
The data I created has a label for training and missing values for testing. If you have labels in testing data that is fine, you are filtering out test data using the column that says either the data belongs to training or testing. See the "Data_type" column in the excel sheet attached in the previous post, that column specifies which sample the data belongs to. Once you separate them and use the apply model, it will take care of testing.1
-
0
-
@varunm1
i dont have any space in data.
i try it in my friend lap top (RM version 9.2) but still it has problem0 -
Did you try my excel files to check, if it is having an error with these as well?0
-
your file is ok but mine still has problem0
-
Unfortunately, I cannot help much with this without your files. I thought about most of the options. My understanding is that the issue is a formatting error in that particular excel file. By the way, did you create these excel files manually, or did you get them from another system or software as an output?
Also, try creating dummy names for the column name and see how it works.0 -
@varunm1
thank you for your help
i did this file manually and copy it to other excel and fix it but now half of my data has label and the other part doesnt have label . so in this situation what is your suggestion?
regards
mbs0 -
You want to use the unlabelled data for prediction right. You can separate labeled and unlabelled data in rapidminer using filter examples operator. I mentioned this in my earlier post in this thread. You can use the labeled data for model building and unlabelled data to make predictions from that model.0
-
i know but if you remember i had problem with two data which i told you so because of that i combine them
now is there any other way for that?
thank you0 -
You informed me that you already created a single file (manually) with data consisting of unlabeled and labeled samples, Am I right? If you did this, just import that new file and use filter examples as I said earlier. If you still have a problem with attribute names, delete that row and try to create new attribute names.0
-
1