Abstract

Data and analytics have been part of the sports industry from as early as the 1870s, when the first boxscore in baseball was recorded. However, it is only recently that advanced data mining and machine learning techniques have been utilized for facilitating the operations of sports franchises. While part of the reason is related with the ability to collect more fine-grained data, an equally important factor for this turn to analytics is the huge success and competitive advantage that early adopters of investment in analytics enjoyed (popularized by the best-seller ``Moneyball'' that described the success that Oakland Athletics had with analytics). Draft selection, game-day decision making and player evaluation are just a few of the applications where sports analytics play a crucial role today. Apart from the sports clubs, other stakeholders in the industry (e.g., the leagues' offices, media, etc.) invest in analytics. The leagues increasingly rely on data in order to decide on potential rule changes. For instance, the most recent rule change in NFL, i.e., the kickoff touchback, was a result of thorough data analysis of concussion instances. In this tutorial we will review the literature in data mining and machine learning techniques for sports analytics. We will introduce the audience to the design and methodologies behind advanced metrics such as the adjusted plus/minus for evaluating basketball players, spatial metrics for evaluating the ability of a player to spread the defense in basketball, and the Player Efficiency Rating (PER). We will also go in depth in advanced data mining methods, and in particular tensor mining, that can analyze heterogenous data similar to the ones available in today's sports world.


Current Practises and State-of-the-Art

In this part of the tutorial we will present the status of analytics in operations of sports franchisesranging from the business side operations to game strategy and player development and evaluation. We will particularly focus on the different types of data that are possible to be collected today. These range from sensor biometric data to tracking data captured by wearables and/or computer vision technologies. We will also briefly describe novel applications of data mining insports that have provided significant competitive advantage to the adopting entities.

Presentation slides: (ppt)


Advanced Statistics: Beyond the Boxscore

In this part of the tutorial we will describe in detail statistical frameworks that are fundamental in the field of sports analytics. In particular, we will present the Bradley-Terry regression model used for probabilistically predicting the outcome of a comparison, while we will also present ranking systems such as the Elo rating. Furthermore, we will elaborate on the fundamentals of statistical bootstrap, which is predominately used when simulating sports matchups. Finally, we will discuss the design and computation of advanced statistics and metrics that have revolutionized the way teams and players are being evaluated. In particular, we will describe in detail regression models such as the adjusted plus/minusfor basketball. We will also discuss the Pythagorean theorem in sports used to predict outcomes of games/seasons as well as the famous study of the "hot hand".

Presentation slides: (ppt)


Network Analysis for Sports

In this last part of the tutorial we will discuss how network mining tools and methods have been used in a number of different sports. The main type of interaction analyzed is pass exchanges between teammates (e.g., in basketball and soccer). Passing networks can facilitate player and team strategy evaluation. However, they can also help quantify intangibles in sports such as team coherence. We will also discuss what other network strucutres can be analyzed, while we will also discuss the relation between Braess' paradox and sports networks that can help explain the "Ewing Theory".

Presentation slides: (ppt)


Spatio-temporal Analysis of Sports Trajectories

One of the most valuable type of sports data that have been made available during the past few years are spatio-temporal trajectories,i.e., detailed location information of players through thecourse of a game. This type of data have enabled a paradigm shift in sports analytics. In this part of the tutorial we will start by introducing the machine learning literature that facilitates automated tracking of players through computer vision, as well as, the application on analyzing spatio-temporal data of state-of-the-art frameworks (e.g., deep learning) that has made its wayinto sports analytics. We will then present the new possibilities opened by these data as wellas the research in data mining and machine learning that tackles the associated challenges. Forexample, we will introduce spatial methods that quantify the spatial spreading of a defense atbasketball as well as models for court realty based on the usage of space. We will also present how ideas from complex systems, and in particular fractal dimensionality, can also be used to analyze spatial sports data.

Presentation slides: (ppt)