New MOOC specialization offers glimpse into modern, data-driven sports world

Laurel Thomas, University of Michigan News

“Adapt or die.”

The line uttered by Brad Pitt in the 2011 movie “Moneyball” is perhaps a tad overly dramatic, but the sentiment represents the turning point a little more than a decade ago when many in the sports world began to embrace a more sophisticated, scientifically data-driven way to do business.

From player recruitment and athlete training to fan marketing and amateur gambling, managers of the most successful franchises and those who follow their teams have increasingly come to rely on data to make decisions about our favorite sports.

An interdisciplinary group of University of Michigan faculty from the School of Kinesiology and the School of Information have come together for a massive open online Sports Performance Analytics Specialization to share the science behind sports analytics.

Working with the Center for Academic Innovation, the team will launch the first three of five courses May 17. The initial courses on the Coursera platform will introduce sports analytics, take a closer look at the Moneyball sabermetrics approach, and show how to use various prediction models with sports data. Included is a discussion about responsible sports gambling.

The fourth course, set to launch in June, will focus on the increased use of sports performance wearable technology—consumer level smart devices like Fitbits and Apple watches to more sophisticated equipment used in sports practice and play. The fifth course scheduled for July is on machine learning. Students will program statistical models that learn from historical data, such as match outcomes and sensor information, in order to predict future results.

Enroll Now | Sports Performance Analytics

The faculty say the specialization fits well with the School of Information’s recent announcement of a gift from Activision Blizzard CEO Robert “Bobby” Kotick to establish an esports program that includes hiring an endowed chair and developing a minor by 2022.

In the first three courses, Stefan Szymanski, the Stephen J. Galetti Collegiate Professor of Sport Management at the U-M School of Kinesiology, and colleagues Wenche Wang and Youngho Park walk participants through the science of creating and programming the computers that yield rich data by using algorithms and statistical models. This includes the now famous Moneyball player analysis and an exercise that seeks to address the validity of the “hot hand,” a basketball term that suggests a player who has done well shooting will repeat that success. The models are from four sports, baseball, basketball, soccer and cricket. Learners are then encouraged to create their own statistical analyses.

“We show people how by modeling you could produce predictions that are pretty much in line with bookmakers. What that means is if you bet with these predictions, you win some and you lose some,” Szymanski said. “You won’t get rich. It will enable you to make sports gambling more interesting and better informed, but you should expect it’s a consumption activity. It’s not an investment. It’s a way of having fun.”

The faculty say the five-course specialization offers something for many audiences. It’s an introduction to the field for students who might be thinking about working in sports analytics. The series assumes at least a base level of Python experience, the software used for the modeling. Szymanski notes, however, that anyone who works in R, the program used by many now in the sports analytics field, can find information about adapting the exercises to this software.

Another broad audience is sports fans who want to apply science to their March Madness brackets, fantasy football leagues, FanDuel betting, and the like.

“There are two groups of sports fans,” Szymanski said. “There’s the emotional, nostalgic, traditional fan who believes in the mystery of the whole thing. They just want to see the stars perform and that’s enough for them. So probably not them.

“Then there are the other people who say there’s something behind this; there’s a reason behind why these things happen. This isn’t all just random. There must be some explanation. And there must be some patterns that emerge, some predictability in this. And those are the people who are potentially interested in the data analysis.”

What differentiates the U-M courses from others is that the faculty teach participants the tools using numerous real examples, says Wang, assistant professor of sports management, who teaches about basic data preparation tools in Python, summary and descriptive analyses, and regression analyses, using the Hot Hand concept for one illustration.

“Unlike traditional statistics and data analysis courses, we use actual sports data and discuss various issues that one may come across when working with real-world data,” she said. “While there are some blog posts and forums by interested fans and others that provide some analytic examples using actual sports performance data, our course teaches learners more rigorous statistical and econometric tools beyond basic descriptive.”

Park, a lecturer in kinesiology, teaches about the application of the ordered logistic regression model.

“This model is particularly useful for predicting the outcome of a sporting contest as to whether my favorite team wins/loses/ties the match,” he said. “I want the learners to realize that the powerful forecasting model doesn’t necessarily have to be the most complex one. With a reliable/valid independent variable, we could fit the realistic and practical forecasting model to make accurate predictions.”

The wearable technology course will focus on what the devices actually measure and how they can be used to gauge athlete stress and recovery, says Peter Bodary, clinical assistant professor of applied exercise science and movement science at the School of Kinesiology.

His team will spend time looking “under the hood” of the wearables and their course will include exploration of actual datasets from college-level athletic teams, including in-game collections (minute-by-minute datasets) as well as daily metrics from across entire seasons.

“On the training and recovery side of sports and sport performance, the wearables provide insights about the athletes and their training stress,” Bodary said. “The data generated can help coaches and trainers prevent athletes from under- and overtraining. Ultimately, there is a great desire to be able to use wearable data to help reduce player injuries by identifying, and then reducing, the factors that lead to a higher risk of injury.”

One way fans can engage with wearables is by using athlete data to predict outcomes, says Christopher Brooks, assistant professor of information at the School of Information, who will lead the final MOOC on machine learning.

For example, he says, one exercise will involve predicting boxing punches by the way arms move or the spine is positioned. Another activity will show how to predict player or league success, using modern machine learning data science methods.

“This is really state of the art and opens us up to the next decade, two decades, of sports analytics work,” Brooks said. “If they get really bitten by the bug and they want to start practicing some of this stuff—and whether they are practicing it to win the office pool, to win on DraftKings or to make sense of their own sensor data—these are the kinds of techniques they can use to do that, and this course will give them an introduction so they can start to think about what they would do next.”

He said those who want to work in the industry for a team, pundits or news organizations will come away with a good sense of the skills needed, but advanced training might also be required, like that offered through the School of Information’s online Master of Applied Data Science.

Sports Performance Analytics Specialization

The five courses

  • Foundations of Sports Analytics, begins May 17
  • Moneyball and Beyond, begins May 17
  • Prediction Models with Sports Data, begins May 17
  • Wearable Technologies and Sports Analytics, begins in June, date TBD
  • Introduction to Machine Learning in Sports Analytics, begins in July, date TBD