NHL Line Analysis

Investigating the relationship between top line usage and team performance

Data Analytics Sports Analytics Python - Pandas, BS4, Seaborn, sklearn API

Project Overview

In this project, I investigated the dependency of each NHL team on their "top line" throughout the 2020-2021 and 2021-2022 seasons, revealing insights about roster construction and coaching strategies.

Key Questions Explored

  • How does top line usage correlate with team performance?
  • Do successful teams have more balanced line distributions?
  • How do different coaching strategies manifest in line usage patterns?
  • Can we identify optimal line usage strategies based on roster composition?

A typical NHL team uses 4 "lines" of forwards, rotating them frequently throughout the game. Each line is a set of 3 players, and the "top" or "1st" line generally receives the highest percentage of playing time in a game, decreasing with each line.

Looking at the percentage of total playing time that each team gives to its top line and comparing it with overall team performance reveals interesting insights about the tradeoffs between having star players and having a "deep" roster where the lesser lines share more playing time.

All data used for this project was scraped from the public NHL API, with analysis performed using Python and associated libraries including pandas, matplotlib, and seaborn.

Key Findings & Visualizations

NHL Top Line Usage Analysis

Line Usage Distribution

Analysis of how teams distribute ice time across their forward lines.

NHL Team Performance Correlation

Performance Correlation

Relationship between top line usage and overall team success.

NHL Game Situation Analysis

Situational Deployment

How line usage changes based on game situations and score.

NHL Coaching Strategy Comparison

Coaching Strategy Comparison

Different approaches to line management across coaching styles.

Methodology & Data Sources

This analysis was conducted using Python, leveraging pandas for data manipulation, matplotlib and seaborn for visualization, and scikit-learn for statistical modeling. The dataset was compiled by accessing the NHL's public API endpoints.

Data Collection

  • Game events and shift data were collected for all 82 regular season games for each team.
  • Line combinations were identified using shift overlaps and on-ice events.
  • Time on ice was calculated from precise shift start and end timestamps.

Analysis Approach

  • Line identification algorithms were developed to account for in-game line changes and special teams play.
  • Statistical correlations were measured between line usage metrics and team performance indicators.
  • Multilevel modeling was used to account for team-specific effects while identifying league-wide patterns.

Project Resources

Interested in more projects?

Check out my other data science and analytics work

View Portfolio