football top 3 teams league prediction

f_nyst
f_nyst New Altair Community Member
edited November 5 in Community Q&A

Hi everybody!

I'm working on my thesis and want to discover how predictable the final top three teams are in a football league (for example The English Premier League). 

I am new to rapidminer and am not quite sure what to do. My data set consists of the league ranking per week over a period of 10 years. I would like to discover the accuracy percentage of the top 3 teams and next to analyze which team attributes may influence it (for example average weight of players in a team)

 

Anybody some advice?

 

thanks in advance for any help

 

Regards, Frederique

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi,

     

    it's easy, take german Bundesliga. Bayern München always wins :/.

     

    And seriously: You need to build a team profile. Like what is the average market value, average age etc. Based on this you can start to predict. Building the profile is key here.

     

    Best,

    Martin

  • f_nyst
    f_nyst New Altair Community Member

    Haha!! ;-)

     

    Thanks for your reply! If I have the team profile, do you have suggestions on which operator to run on it? Naive Bayes?

     

    And I also want see when I provide the league ranking over the 38 rounds to be played (in the English Premier League for example) after how many rounds a correct prediction comes out. In the example below, in which round can an operator predict the top three (or bottom three) of round 38.

    Thanks so much in advance for any help

     

    Rank

    round 01 Round 02 Round 03 Round 04 Round 05 Round 06 Round 07 Round 08 Round 09 Round 10 Round 11 Round 12 Round 13 Round 14 Round 15 Round 16 Round 17 Round 18 Round 19 Round 20 Round 21 Round 22 Round 23 Round 24 Round 25 Round 26 Round 27 Round 28 Round 29 Round 30 Round 31 Round 32 Round 33 Round 34 Round 35 Round 36 Round 37 Round 38
    1 Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United Manchester United
    2 Blackpool Arsenal Arsenal Arsenal Arsenal Manchester United Manchester City Manchester City Arsenal Arsenal Manchester United Manchester United Arsenal Manchester United Chelsea Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Arsenal Chelsea Chelsea Chelsea Chelsea Chelsea Chelsea
    3 Manchester United Manchester United Manchester United Manchester United Manchester United Arsenal Manchester United Arsenal Manchester United Manchester United Arsenal Arsenal Manchester United Arsenal Arsenal Chelsea Manchester City Chelsea Manchester City Manchester City Manchester City Manchester City Manchester City Manchester City Chelsea Manchester City Manchester City Chelsea Chelsea Chelsea Chelsea Chelsea Arsenal Arsenal Arsenal Arsenal Manchester City Manchester City
  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi,

     

    i guess you would build a process predicting the if a team is likely to be #1 or in #1-3 (not sure what's works best). Afterward you score a season and take the top3 in terms of likelihood.

    Since this is a classification problem you can take a lot of algorithms. Naive Bayes is one of them and a good start but i guess you need stronger algorithms for decent results. Eg. Random Forest, SVM, Deep Learning.

     

    Cheers,

    Martin

  • f_nyst
    f_nyst New Altair Community Member

    Thanks for your help!!

    One more question, hopefully the last one ;-)

    For these analyses I obviously need a target attribute. So far I thought is should be the attribute "ranking" (i.e. rank 1 to 20, the first column in the example above), however it should predict the ranking after the last match. Should therefore the target attribute be "round 38" in the example of the English Premier League?

     

    Best regards,

     

    Frederique