Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Copy Dataset Properties
btibert
This is a "is it possible, if not, best way to handle this" type of question.
My use-case is one where I have two files; a training file and a validation set. The training is meant to fit the model ,and the validation has the same columns short of the label. I am doing a decent amount of preprocessing, and want to leverage that work.
I am hitting a roadblock because when I do Read CSV on the validation set, the predicted data type for a given column varies (train = polynominal, test = integer), and even though I can bring forward the preprocessing steps via Apply Model, the column is not being dummy encoded with the Nominal to Numeric operator I am carrying forward. As such, applying the model to the validation set fails because the column is not present.
I know that I could manually fix the file on load or via an operator, but I am wondering if there is a "copy data type" when columns share the same name. I would prefer this type of error not to happen during my in-class data competitions, and with a dataset that has 50 columns, my end goal would be to try to avoid having them ensure column types 1 by 1.
Find more posts tagged with
AI Studio
Accepted answers
BalazsBaranyRM
Hi,
"copy data types" means different things in different contexts.
In the case of CSV files, it's always best to set up the data types the way you know they are right. One day you'll encounter a file that has content that makes the automatic detection change its decision (as you already did).
To reuse this, create a process with just the Read CSV operator, set it up using the wizard, and then connect the process input (left side: "inp") to the "fil" input of your CSV file. Then you can use this process as a subprocess in another, and it will read the files you send it in the same way. Use Open File in the calling process to access your file (training or validation file).
This approach could also be used with Read Excel.
When reading from databases, you can specify the data set structure in the query, and so on.
As Martin wrote, when you want to apply the same preprocessing, you use the "pre" output to get the preprocessing model and ideally Group Models to combine the preprocessing and the actual model building to one integrated model.
Regards,
Balázs
All comments
MartinLiebig
Hi,
Nominal to Numerical has a preprocessing model. you can group this with your prediction model, so that you always to do both at the same time.
Best,
Martin
btibert
Thanks Martin, I leveraged that, but its more about how the two files get read in with a different datatype to start. I can set it manually but have been trying to explicitly use operators for everything, which is why I was curious to know if there was a "copy data types" from one raw file to another. By data types, I simply mean numeric, nominal, text, id, etc. Not a huge deal, just wondering as I am the farthest thing from an expert on all of the tooling that is baked into RM.
BalazsBaranyRM
Hi,
"copy data types" means different things in different contexts.
In the case of CSV files, it's always best to set up the data types the way you know they are right. One day you'll encounter a file that has content that makes the automatic detection change its decision (as you already did).
To reuse this, create a process with just the Read CSV operator, set it up using the wizard, and then connect the process input (left side: "inp") to the "fil" input of your CSV file. Then you can use this process as a subprocess in another, and it will read the files you send it in the same way. Use Open File in the calling process to access your file (training or validation file).
This approach could also be used with Read Excel.
When reading from databases, you can specify the data set structure in the query, and so on.
As Martin wrote, when you want to apply the same preprocessing, you use the "pre" output to get the preprocessing model and ideally Group Models to combine the preprocessing and the actual model building to one integrated model.
Regards,
Balázs
btibert
Got it, that makes sense in terms of how to do it. Thank you.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups