Auto Dream11 Selector

by Pritish Jadhav, Mrunal Jadhav - Mon, 03 Dec 2018
Tags: #python #Linear Programming #Fantasy Sports #dream11 prediction #resource allocation

Auto Dream11 Selector

With all the buzz around Indian Premier League (IPL), FIFA World Cup, Cricket World Cup-2019, fantasy sports websites like Dream11 are gaining traction. Fanatsy Sports Portal allows sports fans like me to be a part of it. Having said that, it is very difficult to keep track of all the players and their performance across the sports.


In this Tutorial, I will try to automate the fantasy selection process so that the probability of winning a fantasy league is maximized.



Lets get right to it by loading and inspecting the player data for an upcoming match between Chennai Superkings vs Mumbai Indians.

In [1]:
##import python libraries

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "none"

from IPython.display import display
from IPython.display import HTML

import os
import sys
import re

import pandas as pd
import numpy as np

from ast import literal_eval
import pulp

from sklearn.preprocessing import LabelBinarizer
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML("<style>.container { width:100% !important; }</style>"))

Data Description -

The input file for auto selecting players has following fields -

a. Cost - Most of the fantasy websites (including Dream11) allocates a total budget which cannot be exceeded.

b. last_5_matches_points - This field is a list of points accured by a player in his last 5 matches.

c. player_category - This field highlights the category of the player. In case of cricket, the possible categories are wicket-keeper, batsman, all-rounder, bowler. For a sport like football, the possible catgeories are - Goal keeper, Defender, Midfielder, Striker. This field is important because every fantasy website imposes a restriction on the number of players one can select from each of these categories.

d. player_name - This field highlights the player name on fantasy sports website.

e. team_name - This column in the dataframe highlights the team name of the player.

In [4]:
raw_player_data = pd.read_csv('data/dream11_performance_data.csv', converters={"last_5_matches_points": literal_eval})
display(HTML('<font size=2>'+raw_player_data.head().to_html()+'</font>'))
cost last_5_matches_points player_category player_name team_name
0 9.0 [25, 20, 15, 30, 18] wicket_keeper M.S. dhoni CSK
1 10.0 [30, 10, 23, 22, 16] batsman Suresh Raina CSK
2 8.5 [17, 13, 14, 27, 35] wicket_keeper kishan MI
3 10.5 [15, 7, 9, 40, 3] batsman rohit_sharma MI
4 9.5 [20, 18, 15, 22, 20] batsman lewis MI

Algorithm Pseudo Code / Workflow -

  1. We will be viewing the selection problem as a Integer Linear Programming (ILP) problem with constraints.

  2. The objective function for ILP (Integer Linear Programming) will be to maximize the expected points.

  3. The constraints will be defined as per the rules imposed by fantasy website (In this case, Dream11).

  4. Once a ILP problem is defined, it can be easily solved using the PuLP library in python.

For more mathematical details, I will be adding references at the end of the tutorial

As one can imagine, we cannot work directly with text data. Hence, we will be using the "get_dummies" function in pandas to convert categorical data into one hot encoded vectors.

For Eg - A single column of player_category in pandas dataframe will be split into number of unique values of player category. In this case, each player can either be a wicketkeeper, batsman, alrounder or a bowler. As a result, the player category for each player will be represented by a 4 dimensional one-hot encoded vector.


In [5]:
def get_dummies(data, col_names = ["player_category", "team_name"]):
    dummies_data = pd.get_dummies(raw_player_data, columns=["player_category", "team_name"])
    return dummies_data

processed_player_data = get_dummies(raw_player_data)
 
display(HTML('<font size=2>'+processed_player_data.drop(['cost', 'last_5_matches_points'], axis = 1).head().head().to_html()+'</font>'))
player_name player_category_all_rounder player_category_batsman player_category_bowler player_category_wicket_keeper team_name_CSK team_name_MI
0 M.S. dhoni 0 0 0 1 1 0
1 Suresh Raina 0 1 0 0 1 0
2 kishan 0 0 0 1 0 1
3 rohit_sharma 0 1 0 0 0 1
4 lewis 0 1 0 0 0 1

In this Section, we will start crunching numbers so that an optimal objective function can be designed. One of the most important features which dictates where a player should be included in a fantasy team in his recent performance. The column - "last_5_matches_points" is a list of points accrued by a player in his last 5 matches.

Now, Instead of taking a simple average of this points, we will be using a weighted average where weights are time decayed.

This will ensure that the very recent form of a player is captured and leveraged by the selection algorithm. To better illustrate the importance of weighted average, lets consider following scenario.



Say, the poinsts for player 1 and player 2 in last 5 matches are as follows -

player 1 - [10, 20 , 30 , 40 , 50]
player 2 - [50, 40 , 30, 20 , 10]

The approach of computing simple averages will yield averaged scores as follows -
player 1 - 30
player 2 - 30

The above stats imply that, both player 1 and player 2 accrue 30 points on an average. However it can be clearly seen that player 1 is growing in confidence and has a uptrend trend to his performance whereas player 2 has a downward trend to his performance. By taking simple average, we are losing an important insight.

Now, if we compute the averages using time decayed weights, the averaged points are as follows -
player 1 - 33.93
player 2 - 26.06

As it can seen from above numbers, even though both the players have amassed the same number of points, the trend in performances is now being captured with player 2 getting a higher average as compared to player 1. Such subtle differences will help us build a robust model.


So, lets compute the time decayed weighted averages for the eligible players in the dataframe.

In [3]:
def compute_weighted_points(points_vector, alpha = 0.20):
    weights = np.exp(list(reversed(np.array(range(1, len(points_vector)+1))*alpha * -1)))
    exponential_weighted_average = np.average(np.array(points_vector), weights = weights)
    return exponential_weighted_average
In [39]:
processed_player_data['weighted_player_points'] = processed_player_data['last_5_matches_points'].apply(compute_weighted_points)
processed_player_data.reset_index(inplace = True)
display(processed_player_data[['player_name', 'last_5_matches_points', 'weighted_player_points']].head())
player_name last_5_matches_points weighted_player_points
0 M.S. dhoni [25, 20, 15, 30, 18] 21.457434
1 Suresh Raina [30, 10, 23, 22, 16] 19.613900
2 kishan [17, 13, 14, 27, 35] 23.303382
3 rohit_sharma [15, 7, 9, 40, 3] 15.016017
4 lewis [20, 18, 15, 22, 20] 19.193689

Now, lets define the constraints for selecting players as defined by Dream11. These constraints are to be honoured by the algorithm while trying to maximize points.

For more information, check out dream11 FAQs.

In [41]:
max_players = 11
max_batsman = 5
max_allrounders = 3
max_bowlers = 5
max_keepers = 1
max_cost = 100
max_team1_players = 7
max_team2_players = 7
In [42]:
prob = pulp.LpProblem('Dreamteam', pulp.LpMaximize)
In [43]:
# define decision variables for each row in the input dataframe

decision_variables = []

for rownum, row in processed_player_data.iterrows():
    variable = str('x_{}'.format(str(rownum)))
    variable = pulp.LpVariable(variable, lowBound = 0, upBound = 1, cat = 'Integer' ) 
    decision_variables.append(variable)
    
print decision_variables
[x_0, x_1, x_2, x_3, x_4, x_5, x_6, x_7, x_8, x_9, x_10, x_11, x_12, x_13, x_14, x_15]
In [44]:
# Create optimization Function
print processed_player_data.columns
total_points = ''
for rownum, row in processed_player_data.iterrows():
    formula = row['weighted_player_points'] * decision_variables[rownum]
    total_points+= formula
prob += total_points
Index([u'index', u'cost', u'last_5_matches_points', u'player_name',
       u'player_category_all_rounder', u'player_category_batsman',
       u'player_category_bowler', u'player_category_wicket_keeper',
       u'team_name_CSK', u'team_name_MI', u'weighted_player_points'],
      dtype='object')
In [45]:
#set constrainst for keeper
total_keepers = ''
total_batsman = ''
total_allrounder = ''
total_bowler = ''
total_players = ''
total_cost = ''
total_team2 = ''
total_team1 = ''

for rownum, row in processed_player_data.iterrows():
    keeper_formula = row['player_category_wicket_keeper']* decision_variables[rownum]
    total_keepers += keeper_formula
    
    batsman_formula = row['player_category_batsman'] * decision_variables[rownum]
    total_batsman += batsman_formula
    
    allrounder_formula = row['player_category_all_rounder'] * decision_variables[rownum]
    total_allrounder+=allrounder_formula
    
    bowler_formula = row['player_category_bowler']*decision_variables[rownum]
    total_bowler += bowler_formula
    
    total_players_formula = decision_variables[rownum]
    total_players += total_players_formula
    
    total_cost_formula = row['cost']*decision_variables[rownum]
    total_cost += total_cost_formula
    
    formula = row['team_name_CSK']*decision_variables[rownum]
    total_team1 += formula
    
    formula = row['team_name_MI']*decision_variables[rownum]
    total_team2 += formula
    
prob += (total_keepers == max_keepers)
prob += (total_batsman <= max_batsman)
prob += (total_allrounder <= max_allrounders)
prob += (total_bowler <= max_bowlers)
prob += (total_players == max_players)
prob += (total_cost <= max_cost)
prob += (total_team1 <= max_team1_players)
prob += (total_team2 <= max_team2_players)



print(prob)
prob.writeLP('Dreamteam.lp')

optimization_result = prob.solve()
Dreamteam:
MAXIMIZE
21.4574342327*x_0 + 19.6138998634*x_1 + 18.507022734*x_10 + 16.1006207824*x_11 + 3.03595134142*x_12 + 30.6877000247*x_13 + 10.8823368063*x_14 + 18.6665650801*x_15 + 23.3033823875*x_2 + 15.0160173204*x_3 + 19.1936886525*x_4 + 35.6738860573*x_5 + 20.8026696164*x_6 + 20.8419816771*x_7 + 23.4099290007*x_8 + 12.0189162375*x_9 + 0.0
SUBJECT TO
_C1: x_0 + x_2 = 1

_C2: x_1 + x_3 + x_4 + x_5 + x_6 <= 5

_C3: x_10 + x_7 + x_8 + x_9 <= 3

_C4: x_11 + x_12 + x_13 + x_14 + x_15 <= 5

_C5: x_0 + x_1 + x_10 + x_11 + x_12 + x_13 + x_14 + x_15 + x_2 + x_3 + x_4
 + x_5 + x_6 + x_7 + x_8 + x_9 = 11

_C6: 9 x_0 + 10 x_1 + 9 x_10 + 9 x_11 + 8.5 x_12 + 8.5 x_13 + 8.5 x_14
 + 8 x_15 + 8.5 x_2 + 10.5 x_3 + 9.5 x_4 + 9 x_5 + 8.5 x_6 + 10.5 x_7 + 9 x_8
 + 9 x_9 <= 100

_C7: x_0 + x_1 + x_10 + x_14 + x_15 + x_5 + x_7 <= 7

_C8: x_11 + x_12 + x_13 + x_2 + x_3 + x_4 + x_6 + x_8 + x_9 <= 7

VARIABLES
0 <= x_0 <= 1 Integer
0 <= x_1 <= 1 Integer
0 <= x_10 <= 1 Integer
0 <= x_11 <= 1 Integer
0 <= x_12 <= 1 Integer
0 <= x_13 <= 1 Integer
0 <= x_14 <= 1 Integer
0 <= x_15 <= 1 Integer
0 <= x_2 <= 1 Integer
0 <= x_3 <= 1 Integer
0 <= x_4 <= 1 Integer
0 <= x_5 <= 1 Integer
0 <= x_6 <= 1 Integer
0 <= x_7 <= 1 Integer
0 <= x_8 <= 1 Integer
0 <= x_9 <= 1 Integer

In [48]:
variable_name = []
variable_value = []

for v in prob.variables():
    variable_name.append(v.name)
    variable_value.append(v.varValue)
    
df = pd.DataFrame({'index': variable_name, 'value': variable_value})
for rownum, row in df.iterrows():
    value = re.findall(r'(\d+)', row['index'])
    df.loc[rownum, 'index'] = int(value[0])

df = df.sort_values(by = 'index')
result = pd.merge(processed_player_data, df, on = 'index')
result = result[result['value'] == 1].sort_values(by = 'weighted_player_points', ascending = False)
selected_cols_final = ['player_name', 'team_name_CSK', 'team_name_MI', 'weighted_player_points']
final_set_of_players_to_be_selected = result[selected_cols_final]

display(final_set_of_players_to_be_selected)

print("We can accrue an estimated points of %f"%(final_set_of_players_to_be_selected['weighted_player_points'].sum()))
player_name team_name_CSK team_name_MI weighted_player_points
5 rayadu 1 0 35.673886
13 markande 0 1 30.687700
8 hardik 0 1 23.409929
2 kishan 0 1 23.303382
7 watson 1 0 20.841982
6 surya 0 1 20.802670
1 Suresh Raina 1 0 19.613900
4 lewis 0 1 19.193689
15 chahar 1 0 18.666565
10 bravo 1 0 18.507023
11 bumrah 0 1 16.100621
We can accrue an estimated points of 246.801346

There you GO !! We have our Dream Team !!

All that needs to be done is to select the team in the app and start earning money !!

Before, we wrap up this tutorial, I would like to highlight the features as well as the enhancement opportunities for the existing algorithm -

a. The existing algorithm is completely automated and literally spits out the Dream team that maximises the probability of scoring highest points.
b. In addition to that, it also conveys the estimated points that can be accrued through the selected team. The value would help us check the acccuracy of the system.
c. The algorithm is sensitive to player performance trends and it adjusts accordingly.

Enhancements -
a. The input data needs to be stored in a database. Currently, I am relying on manual efforts to fetch the required data.
b. The algorithm is not sensitive to injury news and other team updates. This is a significant miss and we will have to rely on scrapping and detecting such information through NLP on sports websites. It is an open ended question.

In [ ]:
 

Comments