🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

RapidMiner Data Modeling CHALLENGE - $200 in cash and prizes

User: "sgenzer"
Altair Employee
Updated by Jocelyn

Hello RapidMiners -

 

I thought it would be fun (and useful) if we had some Kaggle-like challenges here on the community forum.  So I am sponsoring the very first RapidMiner Data Modeling Challenge.  :)  This is a real training data set that is in need of a good model.  It is not like the classic iris data set; it has missing data, errors, etc..  Welcome to the real world.  Here's the challenge:

 

Goal: produce a model in RapidMiner 7.5 that will predict the label attribute given prior data in the series of the attached training set "RMChallengeDataSet" with the highest accuracy.  This will be verified via the SLIDING WINDOW VALIDATION operator.  As it a series of dates over an 18+ year span and no one wants to sit and watch their computer spin forever, I suggest the following parameters:

 

   training window width: 1000 (about three years' worth)

   training window step size: 3 (to cut down on iterations)

   test window width: 1 (I only want one day at a time)

   horizon: 1 (I want the next day)

   cumulative training: yes

   average performances only: yes

 

It is a SERIES - every day from 1968 to 1986 - with 6726 examples and 262 numerical attributes.  The label is an A/B/C selection.  You are welcome to do any feature selection, adding of attributes, etc... and use any model(s) as long as it's within RapidMiner and its publically-available extensions.  No scripting or APIs allowed.  The data are 1:1 hashed to protect the identity of the source - please do not try to reverse-engineer.

 

Winner: the winner of the competition is the one who can produce the highest accuracy % ≥ 60 as shown with the standard Performance operator within the cross-validation.  Why 60?  Because that's the highest I have gotten so far [honest disclaimer: I actually only got 60% accuracy with A/B labels but I know you are all smarter than I am...]

 

Submission: all submissions for this challenge must be in THIS THREAD so it is open for all to see.  All you need to do is submit your process XML as a reply to this message (please use the "insert code" item so it does not get long) AND a screenshot of your performance.  You can post as many submissions as you want (within reason).

 

Determination of winner: Hopefully the community will all agree on the winner (all submissions are public) but in case of some drama, I will be the sole judge and will verify the winner's submission.  If there is more than one identical (and highest) accuracy, the one which was submitted first will be the winner.

 

Who can enter: anyone who is a registered user on the RapidMiner Community Forum.  Yes even you, @IngoRM!

 

Due date: all entries must be posted in this forum by June 15, 2017 at 21:00 EST.

 

Notification: I will give myself three days to independently verify the winner and then post to this thread.  I will then PM the winner to get a mailing address and mail a check for $100!

 

Good luck!


Scott

 

Find more posts tagged with