Sports data ELT tool.
Find a file
2026-03-11 20:13:08 -04:00
dbt feat(): split nba shot stats 2026-03-11 20:13:08 -04:00
docs docs(): add nba readme 2026-02-26 20:35:02 -05:00
nfl_data feat(): add month parameter to nba schedule task 2026-03-04 22:33:44 -05:00
.dockerignore chore(): add architecture docs 2026-02-12 22:37:48 -05:00
.env.example feat(): add nba 2026-02-26 18:29:29 -05:00
.gitignore fix(): events in toronto missing local timestamp 2026-02-26 19:09:40 -05:00
app.py feat(): add nba 2026-02-26 18:29:29 -05:00
Dockerfile feat(): add dbt deps to dockerfile 2026-02-10 23:50:58 -05:00
README.md docs(): add nba readme 2026-02-26 20:35:02 -05:00
requirements.txt feat(): add weather 2026-01-04 00:33:59 -05:00

NFL-DATA

Sports data extraction, load, and transformation using Python and dbt.

Project setup

Set up and activate a virtual environment with the following command:

python -m venv venv

# Windows
./venv/Scripts/activate

# Linux
source venv/bin/activate

Install all the necessary packages using the following command:

python -m pip install -r requirements.txt

Environment variables

Secrets management is made possible using Infisical. Install the CLI to inject secrets.

Otherwise, use a .env to specify the required variables. See .env.example.

Command reference

NFL-DATA CLI

Infisical and secrets

Using Infisical for secrets management is recommended. Commands can be wrapped in an Infisical call:

infisical run --command "python app.py ..."

Load strategy

All commands have the --load-strategy or -ls option to specify how NFL-DATA will load new JSON data into the warehouse:

  • replace clears out all previously loaded JSON data before inserting.
  • add will load JSON data without any additional checks. Will introduce multiple entries for the same object.
  • skip will only load JSON data if no other data exists for a given object.
  • day_replace will replace JSON data if the same object was previously loaded today.

Modules

There are two available modules:

  • nfl for NFL data, including extraction from ESPN's API and modeling.
  • nba for NBA data, including extraction from ESPN's API and modeling.
  • sms for Super Mario Sluggers statfile processing.

All commands and subcommands can be followed by the --help flag to describe commands, arguments, and options.

python app.py nfl load-game --help

dbt

To use dbt commands, first change directory into the dbt subdirectory.

cd dbt/

From here, all dbt commands can be accessed. Take a look at the dbt Command Reference page for details.

Running a full build will model newly loaded raw data since last full build:

dbt build

Core architecture

Data pipelines in this repository adopt the following basic structure:

    graph TD;
    A@{ shape: cloud, label: "External API"}-->B[("S3-compatible storage")]-->|to landing zone table|C[("Database")];
    D@{ shape: processes, label: "Manual file upload"}-->B
    C-->E("dbt modeling")-->C;

Data is ingested from an external API or from a manual file upload to an S3-compatible storage service (AWS S3, Minio, etc.) as a caching layer. The raw data, typically JSON, is then loaded from S3 into a landing zone table in a database.

The landing zone table has the following structure:

object_path object_type raw_data
Path to object in S3 Semantic categorization for object (e.g. athlete, event) Raw data as stored in S3

By caching in S3 and loading the raw data into the database, it can be processed, normalized, and modeled in-database using dbt.