🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How to select the right data for prediction?

User111113User: "User111113"
New Altair Community Member
Updated by Jocelyn
Hi All,

I have about 2 years of historical data which I can probably use to predict responses.

For example if I have to predict my response rate for Jan 2020 how can I say how much data would be enough to come close to actual rate.

------ should I look at how my data performed in Jan 2018, Jan 2019 and may be last 4 months from 2019 

----- or it should be last for months of 2019 and Jan 2019

----- or may be use everything I have which I am not comfortable with because of many outliers

when I compared actual and predicted for past few months they don't seem close at all because it was done manually (on a piece of paper)

How to select right data? 

Thank you.

Sort by:
1 - 1 of 11
    PaulMSimpsonUser: "PaulMSimpson"
    New Altair Community Member
    Accepted Answer
    Updated by PaulMSimpson
    Let me help you split your data on a date, as many months back as you prefer. I'm fairly new to RapidMiner, having done most of my data science work in R previously. Therefore, I don't know if what I'm about to show you is the simplest or best way to split a dataset on a date, but it does work. 

    First, you would need to create a third column, one that holds your month column,  "/1/" and your year column, so that now you will have actual date values for all of your records, such as 5/1/2018. I recommend using the Generate Attributes operator, then Edit List by adding an attribute name of "myDate", and in the function expressions field, put this: date_parse([yourMonthCol] + "/" + [yourYearCol]), using the name of your own month column and year column, of course.

    Second, after your retrieve operator, place only one Filter Examples operator (You only need one of these because you will pipe the "unm" node with all unmatched records to be your test data. Anyway, I used the "expression" condition class, and note what I put into the parameter expression, using the date_before() function. The first param is your date field's name, and the second is a date_parse(), where you convert a string that represents the date that you plan to be the date split point into a date data type.