Difference between revisions of "Sports analytics"

From SI410
Jump to: navigation, search
(sports gambling)
(draft)
Line 6: Line 6:
 
=== Major League Baseball (MLB) ===
 
=== Major League Baseball (MLB) ===
 
Sports analytics in baseball, also known as sabermetrics, is the application of statistical analysis to baseball in order to measure in-game activity. The term is derived from the acronym SABR, which stands for the Society for American Baseball Research, founded in 1971. The field was popularized by American baseball writer, historian, and statistician Bill James in the 1980s and has since been used by many major league baseball teams to assist in decision making. Sabermetrics can be used to measure a player's performance, a team's performance, and even the performance of individual pitches. It can also be used to make predictions about future performance and to identify undervalued players. Some common statistics that have become vital to the game include:
 
Sports analytics in baseball, also known as sabermetrics, is the application of statistical analysis to baseball in order to measure in-game activity. The term is derived from the acronym SABR, which stands for the Society for American Baseball Research, founded in 1971. The field was popularized by American baseball writer, historian, and statistician Bill James in the 1980s and has since been used by many major league baseball teams to assist in decision making. Sabermetrics can be used to measure a player's performance, a team's performance, and even the performance of individual pitches. It can also be used to make predictions about future performance and to identify undervalued players. Some common statistics that have become vital to the game include:
 
 
* '''Batting average''' measures a player's batting performance by dividing the number of hits by the number of at-bats. As one of the most commonly discussed baseball statistics, it primarily shows a player's tendencies when batting against different types of pitches. Batting average is expressed a decimal to three decimal points. A player with a batting average of .300 is commonly said to be "batting three-hundred". Batting averages could be taken beyond the .001 measurement. In this context, .001 is considered a "point", such that a .235 batter is 5 points higher than a .230 batter. A high batting average is considered an indicator of a good batter.
 
* '''Batting average''' measures a player's batting performance by dividing the number of hits by the number of at-bats. As one of the most commonly discussed baseball statistics, it primarily shows a player's tendencies when batting against different types of pitches. Batting average is expressed a decimal to three decimal points. A player with a batting average of .300 is commonly said to be "batting three-hundred". Batting averages could be taken beyond the .001 measurement. In this context, .001 is considered a "point", such that a .235 batter is 5 points higher than a .230 batter. A high batting average is considered an indicator of a good batter.
 
 
* '''On-base percentage (OBP)''' is the percentage of times a player reaches a base. It's significance as an offensive statistic is vital, as it identifies how often a batter can avoid being put out at the plate. It takes into consideration whether the batter hits, walks, or being hit by a pitch.
 
* '''On-base percentage (OBP)''' is the percentage of times a player reaches a base. It's significance as an offensive statistic is vital, as it identifies how often a batter can avoid being put out at the plate. It takes into consideration whether the batter hits, walks, or being hit by a pitch.
 
 
* '''Slugging average (SLG)''' is a calculation that showcases the number of bases a player earns based on their hits. The higher the slugging average, the more likely the batter is going to hit for extra bases (i.e. a double, triple, or home run). Batters now watch film and study the tendencies of pitchers in an attempt to increase their slugging average.
 
* '''Slugging average (SLG)''' is a calculation that showcases the number of bases a player earns based on their hits. The higher the slugging average, the more likely the batter is going to hit for extra bases (i.e. a double, triple, or home run). Batters now watch film and study the tendencies of pitchers in an attempt to increase their slugging average.
 
 
* '''Walks plus Hits per Inning Pitched (WHIP)''' is a metric that measures how successful a pitcher is based on how many baserunners are allowed on both hits and walks. It also measures a pitcher's efficiency. Pitchers also watch film and study batters to help determine the type and location of the pitch to increase their WHIP.
 
* '''Walks plus Hits per Inning Pitched (WHIP)''' is a metric that measures how successful a pitcher is based on how many baserunners are allowed on both hits and walks. It also measures a pitcher's efficiency. Pitchers also watch film and study batters to help determine the type and location of the pitch to increase their WHIP.
  
Line 32: Line 28:
 
* '''Fenwick''', also known as unblocked shot attempts, is a variation of Corsi that only counts shots on goal and missed shots; blocked shots, either for or against are not included. Fenwick helps a team or player judge performances that use shot blocking in their game plan.
 
* '''Fenwick''', also known as unblocked shot attempts, is a variation of Corsi that only counts shots on goal and missed shots; blocked shots, either for or against are not included. Fenwick helps a team or player judge performances that use shot blocking in their game plan.
 
* '''xG''' is a model that gives each shot attempt a value based on it's type and shot location. It takes into account whether shots are coming off rebounds or rush chances. This metric compensates for some of the problems with Corsi, where every shot has equal value. The more high quality shots a player attempts, the more likely they are to score.
 
* '''xG''' is a model that gives each shot attempt a value based on it's type and shot location. It takes into account whether shots are coming off rebounds or rush chances. This metric compensates for some of the problems with Corsi, where every shot has equal value. The more high quality shots a player attempts, the more likely they are to score.
 +
 
== Ethics ==
 
== Ethics ==
 +
=== Data Privacy ===
 +
Sports data is considered an intellectual property asset and major sports organizations across the world are continuing to invest in data gathering and analysis. Leagues and teams have claimed ownership of sports data, with the business plan of selling their official data to data analytics companies and oddsmakers, or charging integrity or data rights fees to the gaming industry. In recent years, more intimate data like blood pressure, sugar levels, and sleep patterns are being collected and the ethical question of "who does the data belong to?" and "how critical is privacy?" are starting to surface. These ethical questions aren't currently troubling investors as more than 3,000 deals involving companies that deal with data in sports have been signed between 2014 and 2019.
 +
=== Data Validity ===
 +
The reliability of sport tracking data is crucial as it greatly impacts performance decisions. Incorrect readings or analysis will lead to over- or under-determination of performance capabilities and subsequently harmful decisions may be made. An athlete might push themselves mentally and physically to the extreme to some performance detriment. Analytical models based on algorithms are also vulnerable to algorithm bias, where systemic errors in systems may create "unfair" analysis for certain groups of people. The analysis of sport biometric data presents a unique challenge to algorithms: there is an overload of data that requires interpretation but an undersupply of historical, validated data to develop a valid algorithm.
 
=== Fair Play ===
 
=== Fair Play ===
=== Algorithm Bias ===
+
With the advent of sports analytics and advancements in modern technology, players and teams have access to an unprecedented amount of tools and information to gain an advantage against their opponent. For example, many NBA teams now use high-quality motion tracking cameras positioned near the backboard that trace the ball as it enters the basket and notes the shooter's position. The ball's arc, alignment, and depth are also tracked which provides a player with exponentially more detail about a their shooting than a simple make-or-miss notation could ever tell. Players and teams who lack access to these resources or technology may fall short of the competition and certain sports may fail to uphold a level playing field for all athletes. Some argue that
=== Privacy ===
+
 
+
 
== Gambling ==
 
== Gambling ==
 
Sports analytics have also had a significant impact on [[Online Sports Betting]] as bettors now have access to more information to aid decision making. New avenues of gambling, like parlays and fantasy leagues have lead to the rise of new analytical tools. For example, companies and webpages can now provide fans with up to the minute information for their betting needs.
 
Sports analytics have also had a significant impact on [[Online Sports Betting]] as bettors now have access to more information to aid decision making. New avenues of gambling, like parlays and fantasy leagues have lead to the rise of new analytical tools. For example, companies and webpages can now provide fans with up to the minute information for their betting needs.

Revision as of 20:05, 25 January 2023

Sports analytics are a collection of statistics or biometric data that can provide a team or individual a competitive advantage. Through the collection, refinement, and analysis of data, coaches and other staff members are able to inform athletes about their performance in order to assist decision making both during and prior to sporting events. The term "sports analytics" was popularized by the 2011 film, Moneyball, in which Oakland Athletics General Manager Billy Beane (played by Brad Pitt) relies heavily on the use of player analytics to build a competitive MLB team on a limited budget.

There are two main types of sports analytics - on-field analytics and off-field analytics. On-field analytics involves tracking key on-field metrics that may influence an athlete's methodologies and in-game strategy. It also involves tracking an athlete's biometric data and vitals to influence their training or performance levels. Off-field analytics deals with the business side of sports. It handles monitoring key off-field metrics like ticket sales, merchandise sales, and fan engagement. Essentially, it provides shareholders with information that would lead to higher growth and profits.

Sport-specific analytics

Major League Baseball (MLB)

Sports analytics in baseball, also known as sabermetrics, is the application of statistical analysis to baseball in order to measure in-game activity. The term is derived from the acronym SABR, which stands for the Society for American Baseball Research, founded in 1971. The field was popularized by American baseball writer, historian, and statistician Bill James in the 1980s and has since been used by many major league baseball teams to assist in decision making. Sabermetrics can be used to measure a player's performance, a team's performance, and even the performance of individual pitches. It can also be used to make predictions about future performance and to identify undervalued players. Some common statistics that have become vital to the game include:

  • Batting average measures a player's batting performance by dividing the number of hits by the number of at-bats. As one of the most commonly discussed baseball statistics, it primarily shows a player's tendencies when batting against different types of pitches. Batting average is expressed a decimal to three decimal points. A player with a batting average of .300 is commonly said to be "batting three-hundred". Batting averages could be taken beyond the .001 measurement. In this context, .001 is considered a "point", such that a .235 batter is 5 points higher than a .230 batter. A high batting average is considered an indicator of a good batter.
  • On-base percentage (OBP) is the percentage of times a player reaches a base. It's significance as an offensive statistic is vital, as it identifies how often a batter can avoid being put out at the plate. It takes into consideration whether the batter hits, walks, or being hit by a pitch.
  • Slugging average (SLG) is a calculation that showcases the number of bases a player earns based on their hits. The higher the slugging average, the more likely the batter is going to hit for extra bases (i.e. a double, triple, or home run). Batters now watch film and study the tendencies of pitchers in an attempt to increase their slugging average.
  • Walks plus Hits per Inning Pitched (WHIP) is a metric that measures how successful a pitcher is based on how many baserunners are allowed on both hits and walks. It also measures a pitcher's efficiency. Pitchers also watch film and study batters to help determine the type and location of the pitch to increase their WHIP.

National Basketball Association (NBA)

The field of basketball analytics has recently seen a large surge in popularity in the last decade, with many teams in the NBA utilizing advanced statistical methods to analyze player roster, team shot selection, and offensive/defensive performance. The use of analytics in basketball is based on the idea that traditional basketball statistics, such as points scored, assists, and rebounds, turnover, etc. do not fully capture a player's or team's performance. Popular metrics used by many teams include:

  • Player Efficiency Rating (PER) is a player metric developed by ESPN.com columnist John Hollinger. The PER sums up all a player's positive accomplishments, subtracts the negative accomplishments, and returns a per-minute rating of a player's performance.
  • Win Shares is used to estimate the number of wins contributed by a player.
  • Offensive and Defensive Rating are a team based metrics that rates how effective a team is on the offensive and defensive side. Offensive Rating is the number of points scored per 100 possessions and Defensive Rating is the number of points allowed per 100 possessions.
  • Plus/Minus, also known as box score, measures the point differential when a player is on the court per 100 possessions compared to a league-average player.
  • True Shooting Percentage (TS%) is a measure of a player's shooting efficiency that takes into account field goals, three-point field goals, and free throws.

National Football League (NFL)

Sports analytics in the NFL was thought to first appear on a fan made website Football Outsiders in 2003. It pioneered American football's first comprehensive advanced metric, Defense-adjusted Value Over Average (DVOA), which compares the success of a player on each play to the league average. Variables including down, distance, location on field, current score differential, quarter, and strength of opponent all factor into DVOA. A few year later, Pro Football Focus launched a statistical database and featured a new player grading system and the following statistics:

  • Expected Points Added (EPA)
  • Win Probability Added (WPA)

National Hockey League (NHL)

The NHL has recorded game statistics since its inception, yet is relatively new in terms of adopting analytics-based decision making. In 2014, the Toronto Maple Leafs hired assistant general manager Kyle Dubas as the first member of management with a major analytical background. The three most commonly used basic statistics in the NHL include:

  • Corsi, also known as shot attempts, is the sum of shots on goal, missed shots and blocked shots. It was named after coach Jim Corsi, but was actually created by a blogger who developed statistics to better measure the workload of a goaltender during a game. In modern day, Corsi For percentage (CF%) approximates the length of time a team or player possess the puck. Most players have a CF% between 40 and 60. Players or teams that have a CF% over 55% are considered by many ot be elite.
  • Fenwick, also known as unblocked shot attempts, is a variation of Corsi that only counts shots on goal and missed shots; blocked shots, either for or against are not included. Fenwick helps a team or player judge performances that use shot blocking in their game plan.
  • xG is a model that gives each shot attempt a value based on it's type and shot location. It takes into account whether shots are coming off rebounds or rush chances. This metric compensates for some of the problems with Corsi, where every shot has equal value. The more high quality shots a player attempts, the more likely they are to score.

Ethics

Data Privacy

Sports data is considered an intellectual property asset and major sports organizations across the world are continuing to invest in data gathering and analysis. Leagues and teams have claimed ownership of sports data, with the business plan of selling their official data to data analytics companies and oddsmakers, or charging integrity or data rights fees to the gaming industry. In recent years, more intimate data like blood pressure, sugar levels, and sleep patterns are being collected and the ethical question of "who does the data belong to?" and "how critical is privacy?" are starting to surface. These ethical questions aren't currently troubling investors as more than 3,000 deals involving companies that deal with data in sports have been signed between 2014 and 2019.

Data Validity

The reliability of sport tracking data is crucial as it greatly impacts performance decisions. Incorrect readings or analysis will lead to over- or under-determination of performance capabilities and subsequently harmful decisions may be made. An athlete might push themselves mentally and physically to the extreme to some performance detriment. Analytical models based on algorithms are also vulnerable to algorithm bias, where systemic errors in systems may create "unfair" analysis for certain groups of people. The analysis of sport biometric data presents a unique challenge to algorithms: there is an overload of data that requires interpretation but an undersupply of historical, validated data to develop a valid algorithm.

Fair Play

With the advent of sports analytics and advancements in modern technology, players and teams have access to an unprecedented amount of tools and information to gain an advantage against their opponent. For example, many NBA teams now use high-quality motion tracking cameras positioned near the backboard that trace the ball as it enters the basket and notes the shooter's position. The ball's arc, alignment, and depth are also tracked which provides a player with exponentially more detail about a their shooting than a simple make-or-miss notation could ever tell. Players and teams who lack access to these resources or technology may fall short of the competition and certain sports may fail to uphold a level playing field for all athletes. Some argue that

Gambling

Sports analytics have also had a significant impact on Online Sports Betting as bettors now have access to more information to aid decision making. New avenues of gambling, like parlays and fantasy leagues have lead to the rise of new analytical tools. For example, companies and webpages can now provide fans with up to the minute information for their betting needs.