Project Overview
In this project, I investigated the dependency of each NHL team on their "top line" throughout the 2020-2021 and 2021-2022 seasons, revealing insights about roster construction and coaching strategies.
Key Questions Explored
- How does top line usage correlate with team performance?
- Do successful teams have more balanced line distributions?
- How do different coaching strategies manifest in line usage patterns?
- Can we identify optimal line usage strategies based on roster composition?
A typical NHL team uses 4 "lines" of forwards, rotating them frequently throughout the game. Each line is a set of 3 players, and the "top" or "1st" line generally receives the highest percentage of playing time in a game, decreasing with each line.
Looking at the percentage of total playing time that each team gives to its top line and comparing it with overall team performance reveals interesting insights about the tradeoffs between having star players and having a "deep" roster where the lesser lines share more playing time.
All data used for this project was scraped from the public NHL API, with analysis performed using Python and associated libraries including pandas, matplotlib, and seaborn.
Key Findings & Visualizations
Line Usage Distribution
Analysis of how teams distribute ice time across their forward lines.
Performance Correlation
Relationship between top line usage and overall team success.
Situational Deployment
How line usage changes based on game situations and score.
Coaching Strategy Comparison
Different approaches to line management across coaching styles.
Methodology & Data Sources
This analysis was conducted using Python, leveraging pandas for data manipulation, matplotlib and seaborn for visualization, and scikit-learn for statistical modeling. The dataset was compiled by accessing the NHL's public API endpoints.
Data Collection
- Game events and shift data were collected for all 82 regular season games for each team.
- Line combinations were identified using shift overlaps and on-ice events.
- Time on ice was calculated from precise shift start and end timestamps.
Analysis Approach
- Line identification algorithms were developed to account for in-game line changes and special teams play.
- Statistical correlations were measured between line usage metrics and team performance indicators.
- Multilevel modeling was used to account for team-specific effects while identifying league-wide patterns.