The Hard Parts

Engineering reference

Failure modes.
Red flags.
Trade-offs.
Playbooks.

For senior engineers,
architects, and
engineering leaders.

Issue 01

Software fails
the same way.
Every time.

A field guide for staff engineers, tech leads, architects, and engineering managers dealing with recurring software delivery problems.

Sections

Entries

151

Use in

Incident reviews.
Architecture reviews.
AI adoption discussions.
Retros & decisions.

Essay 13 May 2026

AI Accelerates Old Failure Modes

AI did not invent the ordinary gaps in software delivery like incomplete specifications, rushed reviews unclear ownership, or architecture that is harder to explain than we would like. It made those gaps easier to carry forward into working code. This is why engineering judgment matters more than ever.

Pick your way in

Start from the question you have

From Failure Modes

A glimpse of the catalog. Each entry walks through how the pattern starts, how it escalates, what it looks like at early, mid, and late stages, and what good responses look like.

FM-001 planning

The Friendly Rewrite

A rewrite framed as cleanup becomes a long-running replacement with no stable landing zone.

Freq · very common

FM-002 people

The Hero Trap

One person becomes the informal system of record for critical knowledge, decisions, and rescue work.

Freq · universal

FM-003 technical

Abstraction Addiction

The system grows more layers, indirection, and generic structure than current reality actually demands.

Freq · common

FM-004 process

Ticket Theater

Work tracking becomes performance for stakeholders instead of coordination for delivery.

Freq · very common

FM-005 leadership

The Invisible Deadline

A date exists socially or politically, but not explicitly enough for the team to manage the trade-offs honestly.

Freq · common

FM-015 ai

Autocomplete Architecture

Teams accept AI-suggested structures faster than they understand or own them, embedding design decisions nobody made consciously.

Freq · increasing

Browse all 31 Failure Modes →

From Tech Decisions

One decision per axis the catalog covers: architecture, delivery, team, quality, and AI systems. Each entry lays out two concrete options with their real conditions, costs, hidden costs, and failure modes when misapplied.

TD-01 architecture

Monolith vs Microservices

Usually a team-shape and operational-maturity decision disguised as an architecture preference.

Freq · very common

TD-10 product-delivery

Build vs Buy

Usually a control-vs-focus decision, not an engineering pride decision.

Freq · very common

TD-20 team-operations

Specialist Teams vs Cross-Functional Teams

Usually a coordination-vs-depth decision, not a modernity decision.

Freq · common

TD-25 quality-delivery

Test Pyramid vs Heavy End-to-End

Usually a feedback-speed vs system-confidence decision.

Freq · very common

TD-33 ai-systems

RAG vs Fine-Tuning

Usually a knowledge-grounding vs behavior-shaping decision.

Freq · increasing

TD-35 ai-systems

Human-in-the-Loop vs Full Automation

Usually a trust-boundary and consequence-of-error decision.

Freq · increasing

Browse all 38 Tech Decisions →

From Red Flags

One signal per layer the catalog covers: code, team, process, leadership, and AI. Each entry opens with what you would actually notice and walks through what it usually indicates and what to check next.

RF-03 Architectural

Changes always touch too many places

Even ordinary changes require edits across many files, layers, or services.

Freq · very common

RF-11 Behavioral

Everyone asks the same person

One person becomes the default source of truth, escalation path, or decision gateway for too many important areas.

Freq · universal

RF-19 Delivery

Work enters faster than it leaves

Incoming work volume consistently outpaces completion, so queues, context switching, and churn grow silently.

Freq · very common

RF-29 Communication

Reporting looks healthier than delivery feels

Dashboards, status updates, and leadership narratives stay calm and positive while the teams doing the work feel far more fragility and risk.

Freq · common

RF-37 Ai Quality

AI-generated artifacts are trusted more than source material

Summaries, synthesized docs, or generated analyses start becoming the operational truth instead of pointers back to real sources.

Freq · increasing

RF-39 Ai Quality

Benchmarks are discussed more than real user outcomes

Teams spend more time on benchmark scores and synthetic eval wins than on whether the system helps real users in real tasks.

Freq · increasing

Browse all 42 Red Flags →

From Engineering Playbook

One playbook per subcategory the catalog covers: delivery, team, architecture, operations, and AI adoption. Each entry opens with when to use it, when not to, and walks you through the steps, common mistakes, and signals that it actually landed.

EP-17 Delivery

Run a phased migration

Move from old to new in controlled slices, where each slice has explicit ownership, cutover criteria, rollback, and retirement of the old path.

tech lead

EP-37 Team

Repair trust after a painful incident

Repair trust by making the event intelligible, changing the conditions that produced it, and demonstrating through behavior that the team is safer, more honest, and more accountable than before.

engineering manager

EP-16 Architecture

Refactor a dangerous hotspot

Refactor a hotspot by targeting the specific reasons it is dangerous: high churn, poor testability, unclear ownership, or oversized responsibility - and improving it in narrow, repeatable steps.

maintainer

EP-25 Operations

Run an incident review that actually helps

Turn an incident review into a system-learning exercise that explains what happened, why it made sense at the time, what conditions enabled it, and what changes will reduce recurrence.

incident lead

EP-01 Ai

Upgrade code review for AI-assisted work

Redesign review so that AI-assisted changes are judged by risk, understanding, and behavioral correctness, not by surface polish or author confidence.

tech lead

EP-05 Ai

Evaluate an AI feature against real tasks

Evaluate the feature against real user jobs, realistic failure patterns, and operational constraints so the team learns whether the system actually helps, not just whether it performs well on curated examples.

evaluation owner

Browse all 40 Playbooks →