CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Python workspace for Jupyter notebooks that accompany blog posts on the parent Jekyll site (git-steven.github.io). Notebooks cover data visualization, PySpark, scikit-learn feature engineering, coupling metrics analysis, and architectural proposals (IoC/DI patterns, DIKW frameworks).

This directory lives inside the parent Jekyll repo but has its own Python toolchain (Poetry, .venv, pyproject.toml).

Common Commands

# Install dependencies (uses Poetry with pyproject.toml)
poetry install

# Start JupyterLab
bin/start                    # or: poetry run jupyter lab

# Run tests
poetry run pytest

# Run a single test file
poetry run pytest tests/test_foo.py

# Run a single test
poetry run pytest tests/test_foo.py::test_bar

# Lint
poetry run ruff check .

# Build Spark Docker image (for PySpark notebooks)
./build-spark.sh

# Start Spark cluster
docker compose up

# Shell into Spark container
./docker-sh.sh

Architecture

Notebook-Centric Layout

This is not a traditional Python application. Content is organized as standalone Jupyter notebooks at the root level, each supporting a blog post topic:

Key Dependencies (from pyproject.toml)

Spark Infrastructure

PySpark notebooks use a Dockerized Spark cluster:

Other Artifacts

Pytest Configuration Notes

The pyproject.toml pytest config targets --cov=modgud and testpaths = ["tests"]. There is currently no tests/ directory or modgud package in this workspace; these settings carry over from a sibling project. When adding tests here, update addopts to match the correct package name or remove the coverage flag.