🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How to enrich a data set with columns from other data sets? Merger of three data sets

User: "Mike0985"
New Altair Community Member
Updated by Jocelyn
Hello RM community,
First of all, I´m an absolute beginner in working with RapidMiner, so please be patient with me. I took a Basketball data set from Kaggle to get into Rapid Miner. I have three data sets, one for the "games_raw", one for the "teams_raw" and one for the "ranking_raw" of the teams. I would like to work with the games data set but there are some columns in the teams and ranking data set I would like to use for enrichment of the games data set (see "games_adj" as target data set). I build up a process but it seems to clumsy.

Do you have an idea how to build up the RM process a bit smarter and faster?

Thank you in advance!
Regards
Mike



Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "YYH"
    Altair Employee
    Accepted Answer
    Hi @Mike0985,

     thanks for sharing your use case! Sounds cool. The join operator is useful for data blending and merges. But it only take two inputs each time, so you need many “join” operators for multiple datasets. The snapshot of workflow looks fine to me.

    If you have several data sets that come in the same structure (same column names, same column type), you can leverage “Append” operator for a quick merge. But obviously your input data are not good for quick appending. Another code-free option is of course Turbo Prep. For beginners, I strongly recommend the online documentation and academy pages. https://academy.rapidminer.com/learn/video/turbo-prep-introduction

    cheers,
    YY


    User: "Mike0985"
    New Altair Community Member
    OP
    Hello YY,

    Thanks for having a look into my case and for your confirmation that my workflow looks fine. I saw the append operator in RapidMiner but as you said, it only works with same columns and therefore, this operator does not work in my case.

    Regards,
    Mike