The Hard Parts
Engineering reference
- Failure modes.
- Red flags.
- Trade-offs.
- Playbooks.
For senior engineers,
architects, and
engineering leaders.
Software fails
the same way.
Every time.
A field guide for staff engineers, tech leads, architects, and engineering managers dealing with recurring software delivery problems.
Sections
04
Entries
151
Use in
- Incident reviews.
- Architecture reviews.
- AI adoption discussions.
- Retros & decisions.
AI Accelerates Old Failure Modes
AI did not invent the ordinary gaps in software delivery like incomplete specifications, rushed reviews unclear ownership, or architecture that is harder to explain than we would like. It made those gaps easier to carry forward into working code. This is why engineering judgment matters more than ever.
Start from the question you have
- I’ve seen this project go wrong before. What am I looking at? FM Failure Modes (31)
- What trade-off am I actually making here, and on what? TD Tech Decisions (38)
- Something about this feels off. What am I noticing? RF Red Flags (42)
- I know the situation. What’s the right way to run it? EP Engineering Playbook (40)
From Failure Modes
A glimpse of the catalog. Each entry walks through how the pattern starts, how it escalates, what it looks like at early, mid, and late stages, and what good responses look like.
The Friendly Rewrite
A rewrite framed as cleanup becomes a long-running replacement with no stable landing zone.
The Hero Trap
One person becomes the informal system of record for critical knowledge, decisions, and rescue work.
Abstraction Addiction
The system grows more layers, indirection, and generic structure than current reality actually demands.
Ticket Theater
Work tracking becomes performance for stakeholders instead of coordination for delivery.
The Invisible Deadline
A date exists socially or politically, but not explicitly enough for the team to manage the trade-offs honestly.
Autocomplete Architecture
Teams accept AI-suggested structures faster than they understand or own them, embedding design decisions nobody made consciously.
From Tech Decisions
One decision per axis the catalog covers: architecture, delivery, team, quality, and AI systems. Each entry lays out two concrete options with their real conditions, costs, hidden costs, and failure modes when misapplied.
Monolith vs Microservices
Usually a team-shape and operational-maturity decision disguised as an architecture preference.
Build vs Buy
Usually a control-vs-focus decision, not an engineering pride decision.
Specialist Teams vs Cross-Functional Teams
Usually a coordination-vs-depth decision, not a modernity decision.
Test Pyramid vs Heavy End-to-End
Usually a feedback-speed vs system-confidence decision.
RAG vs Fine-Tuning
Usually a knowledge-grounding vs behavior-shaping decision.
Human-in-the-Loop vs Full Automation
Usually a trust-boundary and consequence-of-error decision.
From Red Flags
One signal per layer the catalog covers: code, team, process, leadership, and AI. Each entry opens with what you would actually notice and walks through what it usually indicates and what to check next.
Changes always touch too many places
Even ordinary changes require edits across many files, layers, or services.
Everyone asks the same person
One person becomes the default source of truth, escalation path, or decision gateway for too many important areas.
Work enters faster than it leaves
Incoming work volume consistently outpaces completion, so queues, context switching, and churn grow silently.
Reporting looks healthier than delivery feels
Dashboards, status updates, and leadership narratives stay calm and positive while the teams doing the work feel far more fragility and risk.
AI-generated artifacts are trusted more than source material
Summaries, synthesized docs, or generated analyses start becoming the operational truth instead of pointers back to real sources.
Benchmarks are discussed more than real user outcomes
Teams spend more time on benchmark scores and synthetic eval wins than on whether the system helps real users in real tasks.
From Engineering Playbook
One playbook per subcategory the catalog covers: delivery, team, architecture, operations, and AI adoption. Each entry opens with when to use it, when not to, and walks you through the steps, common mistakes, and signals that it actually landed.
Run a phased migration
Move from old to new in controlled slices, where each slice has explicit ownership, cutover criteria, rollback, and retirement of the old path.
Repair trust after a painful incident
Repair trust by making the event intelligible, changing the conditions that produced it, and demonstrating through behavior that the team is safer, more honest, and more accountable than before.
Refactor a dangerous hotspot
Refactor a hotspot by targeting the specific reasons it is dangerous: high churn, poor testability, unclear ownership, or oversized responsibility - and improving it in narrow, repeatable steps.
Run an incident review that actually helps
Turn an incident review into a system-learning exercise that explains what happened, why it made sense at the time, what conditions enabled it, and what changes will reduce recurrence.
Upgrade code review for AI-assisted work
Redesign review so that AI-assisted changes are judged by risk, understanding, and behavioral correctness, not by surface polish or author confidence.
Evaluate an AI feature against real tasks
Evaluate the feature against real user jobs, realistic failure patterns, and operational constraints so the team learns whether the system actually helps, not just whether it performs well on curated examples.