Now showing 1 - 4 of 4
  • Publication
    Does the Estimation of the Propensity Score by Machine Learning Improve Matching Estimation? The Case of Germany's Programmes for Long Term Unemployed
    (North-Holland, 2020-08) ; ;
    Moczall, Andreas
    Wolff, Joachim
    Matching-type estimators using the propensity score are the major workhorse in active labour market policy evaluation. This work investigates if machine learning algorithms for estimating the propensity score lead to more credible estimation of average treatment effects on the treated using a radius matching framework. Considering two popular methods, the results are ambiguous: We find that using LASSO based logit models to estimate the propensity score delivers more credible results than conventional methods in small and medium sized high dimensional datasets. However, the usage of Random Forests to estimate the propensity score may lead to a deterioration of the performance in situations with a low treatment share. The application reveals a positive effect of the training programme on days in employment for long-term unemployed. While the choice of the “first stage” is highly relevant for settings with low number of observations and few treated, machine learning and conventional estimation becomes more similar in larger samples and higher treatment shares.
    Scopus© Citations 7
  • Publication
    Let's meet as usual: Do games played on non-frequent days differ? Evidence from top European soccer leagues
    Balancing the allocation of games in sports competitions is an important organizational task that can have serious financial consequences. In this paper, we examine data from 9,930 soccer games played in the top German, Spanish, French, and English soccer leagues between 2007/2008 and 2016/2017. Using a machine learning technique for variable selection and applying a semi-parametric analysis of radius matching on the propensity score, we find that all four leagues have a lower attendance as the share of stadium capacity in games that take place on non-frequently played days compared to the frequently played days. In addition, we find that in all leagues except for the English Premier League, there is a significantly lower home advantage for the underdog teams on non-frequent days. Our findings suggest that the current schedule favors underdog teams with fewer home games on non-frequent days. Therefore, to increase the fairness of the competitions, it is necessary to adjust the allocation of the home games on non-frequent days in a way that eliminates any advantage driven by the schedule. These findings have implications for the stakeholders of the leagues, as well as for coaches and players.
  • Publication
    Predicting Match Outcomes in Football by an Ordered Forest Estimator
    Predicting the outcome of football (i.e. soccer) games based on past information is a non-standard predictive task because of the nature of the game outcome, as well as because of the importance of uncertainty (luck and unobservables). The game outcome consists of the scores of the two teams that are usually either collapsed into a goal-difference or further aggregated to reflect whether the game ended as a win for the home or away team, or as a draw. From a statistical perspective, such outcomes have bounded support and, thus, standard linear modelling can be expected to perform poorly. The large amount of uncertainty in the game outcomes due to just luck or due to game- or team-specific unobservables (e.g. hidden injuries of players, etc.) makes it imperative to use prediction methods that fully exploit the potential of the available information, as well as to uncover the uncertainty of a match outcome. The latter is also relevant when interest is not only in single games but also in a league table at the end of the season. Obviously, such league tables should capture the uncertainty for the single games accumulated over a season to be useful guides on what to expect. Recently, machine learning methods have shown their power in all sorts of prediction problems, in particular in situations where the relation of the variables capturing the information used to predict with the target of the prediction, i.e. here the outcome of the game, is non-linear. However, so far there has been only little development in gearing these methods explicitly towards the estimation of the probabilities of ordered outcomes, such as score differences and points, or just wins, draws, and losses. Lechner and Okasa (2019) propose adapting classical random forest estimation, which is known to have excellent predictive performance (e.g. Biau and Scornet (2016), Fernández-Delgado et al. (2014)) to the problem of predicting probabilities of ordered categorical outcomes, such as the win-draw-loss problem of a football game. In this chapter, we use their approach to predict game outcomes of the German Bundesliga 1 (BL1) based on more than ten years' data on game outcomes as well as extensive information about teams, their players, and their environment. These predictions are then used to obtain the final season rankings in a way that reflects and shows the magnitude of the inherent uncertainty of football games.
  • Publication