Extreme Software Process for AI-driven development

Engineering rigor, calibrated to the project.

X-PRO.ai compiles a project profile into a library of plain-Markdown guardrails an AI coding agent reads while it builds. The same global catalog produces a throwaway-PoC ruleset or a regulated-system ruleset depending on the answers — and every choice to include, defer, or drop a practice is written down with its reason.

4
EA layers · 40 practices
T0–T3
calibrated tiers
100%
deterministic output
5
reference frameworks
01 / The problem it solves

Best-practice catalogs tell you what excellent systems do. They never tell you how much.

AWS & Azure Well-Architected, Google SRE, 12-factor, DORA — they describe what great systems do. Applied literally to a two-week internal tool, they are wasteful. Applied loosely to a payment platform, they are dangerous. The hard part is not knowing the practices. It is deciding how much of each a given project warrants, and being honest about what you chose to skip.

over-engineered Everything, everywhere

Multi-region, circuit breakers, full tactical DDD and GitOps on a PoC that lives for two weeks. The rigor is real, the cost is real, and none of it moves the project's actual risk.

under-engineered Loose on the critical path

A payment surface shipped with home-grown auth, secrets in the repo, and no tested restore — because "best practices" were treated as an optional backlog item.

The X-PRO.ai stance

Treat calibration as a first-class, auditable computation. You profile the project once; a Tier is derived; the Tier modulates each practice from Required down to Discarded; and the trade-offs are recorded rather than left implicit. It is a trade-off engine — not a "best of all worlds" template.

02 / Mental model

One catalog plus your answers → one calibrated instance.

The catalog is the data; your answers select from it; the artifacts are the output. Change the catalog and every project that upgrades inherits the improvement. Change the answers and only that project's artifacts move.

input

Answers

  • Criticality (5×0–3)
  • Complexity (4×0–2)
  • 4 layer batteries
  • Flags & constraints
layer 0

Tier engine

  • C × K matrix → base tier
  • Override ratchet (raise-only)
  • Per-dimension tiers
  • Conflict detection
stage 3

Gating

  • required_if (answer-driven)
  • tier_required / recommended
  • fast_mvp modulation
  • Req · Rec · Def · Dis
output

Artifacts

  • 00-PROFILE + 4 layers
  • TRADE-OFFS / DoD
  • AI-AGENT-RULES.md
  • → the agent reads these

The generator is deterministic: same answers + same catalog version = byte-identical output. The lock file records an answers_digest; if an answer changes, the digest diverges and the artifacts are flagged stale. That property is the prerequisite for auditing and versioning the output.

03 / Layer 0 — the tier engine

Two axes, scored from short questionnaires, derive the Tier.

Move the sliders. Criticality sums five questions (impact, sensitivity, blast radius, SLA, reversibility). Complexity sums four (domain, integrations, data, distribution). Each band pair maps to a base Tier through the matrix — then the override flags ratchet it up. This calculator runs the real tier-engine.yaml logic.

Criticality is the dominant axis. Complexity alone never reaches T3 — it caps at T1 when criticality is low and T2 when it is medium. T3 is reached only by C-High × K-High, by C-Critical at any complexity, or by an override flag. Complexity raises the floor; it does not set the ceiling.

Criticality (C) C-Low · 0/15
0
0
0
0
0
Complexity (K) K-Low · 0/8
0
0
0
0
Override flags ratchet · raise-only
PII / financial → sec ≥ T2 payment / PCI → sec ≥ T3 regulated → global T3 life-safety → global T3 fast_mvp constraint
K-LowK-MedK-High
C-CriticalT3T3T3
C-HighT2T2T3
C-MedT1T2T2
C-LowT0T1T1
GLOBAL TIER
T0
base T0
reliabilityT0
securityT0
performanceT0
costT0
operationalT0
no overrides applied
T0

Throwaway / PoC. Heavy rigor does not leak in — only the "must decide" basics plus the secrets ratchet.

T1

Internal, low-criticality. Critical path tested, structured logs, daily backup, single decision authority.

T2

Product. Modular monolith, RED metrics, integration tests, multi-AZ, IaC and ADR governance.

T3

Critical / regulated. Circuit breakers, tracing + SLOs, multi-region, zero-trust, continuous PITR, canary.

04 / Overrides — a one-way ratchet

Some answers floor the Tier regardless of budget or deadline.

Overrides only ever raise the Tier, never lower it. The result is a global Tier plus a per-dimension Tier. A cost-conscious product can run at T2 globally while its security dimension is ratcheted to T3 by a payment flag.

ConditionEffectWhy it can only raise
Data is regulatedglobal floor T3Regulatory exposure is non-negotiable; the deadline does not change the law.
Data is PII / financialsecurity ≥ T2Sensitivity is a property of the data, not of the budget.
Payment / PCI flagsecurity ≥ T3Card data carries a fixed minimum bar of controls.
Life-safety impactglobal floor T3Failure can hurt people; rigor cannot be traded away.
Secret in codeforbidden — any tierA hard ratchet: APP-09 fires even at the T0 floor.

Conflicts — detected, not blocked

When required rigor exceeds declared capacity — a 99.9% SLA against a "tight deadline / small team" constraint — it is flagged as an accepted risk in TRADE-OFFS.md. Generation proceeds. The point is a conscious decision, not a hard stop.

dimension_tier = max(global, override)

Each of reliability, security, performance, cost, operational, sustainability resolves to the maximum of the global Tier and any applicable override. The ratchet is monotonic — there is no path that lowers a dimension below its floor.

05 / The gating function

The answer decides whether a practice applies. The Tier decides how much.

This is the core of stage 3. For every one of the 40 practices, the generator reads the answer, takes the effective tier of the practice's dimension, and resolves a status. Two ideas matter: required_if makes a practice mandatory because of what the project is; the Tier then scales the rigor of everything else.

1
required_if matches the answer Required Answer-driven mandate. High data volume → partitioning is Required, no matter the Tier.
2
override matches the answer Required The ratchet. PII/regulated forces auth, encryption, residency to Required.
3
teff ≥ tier_required Required Tier gating: the dimension's effective tier reaches the required threshold.
4
teff ≥ tier_recommended Recommended Below required but above the recommended floor: follow unless justified in an ADR.
5
practice has a trade_off Deferred Otherwise Discarded. Deferred items get a reactivation trigger in TRADE-OFFS.md.
6
fast_mvp & Recommended & deferrable & not security Deferred Speed modulation. Only relaxes deferrable, non-security rigor — never anything already Required.
# gating function — v1.1 (stage 3) teff = tier_by_dimension[ practice.dimension ] # (1) does the practice apply at all? — answer-driven if gate.required_if matches answer -> Required elif gate.override matches answer -> Required # ratchet # (2) how much rigor? — tier-driven elif teff >= gate.tier_required -> Required elif teff >= gate.tier_recommended -> Recommended elif practice has a trade_off -> Deferred else -> Discarded # (3) speed modulation — deferrable, non-security only if time_to_market == fast_mvp and status == Recommended and practice.deferrable and dimension != security: status = Deferred # recorded in TRADE-OFFS

Calibration locked the two refinements above with golden fixtures: deferrable stops fast_mvp from downgrading cheap hygiene (logs, modular monolith, basic CI/CD), and required_if catches answer-driven mandates the Tier alone would miss.

06 / The decision block & the four layers

Every practice is one catalog record, rendered into many destinations.

A single record carries its directive (Do / Don't / Example / Verification). The Do/Don't compile into AI-AGENT-RULES.md; the Verification compiles into DEFINITION-OF-DONE.md; a trade_off becomes an entry in TRADE-OFFS.md. One record, several destinations.

### [APP-05] Failure resilience — Required
- Gate: tier_required=T3 · tier_recommended=T2 · dimension=reliability
- Origin: APP-Q5 = "circuit_breaker_bulkhead"
- Reference: AWS WAF · Reliability · resilience
- Do: timeout + retry(exp,jitter); breaker per dependency; bulkhead + fallback
- Don't: distribute a transaction without a pattern
- Verification: a dependency failure does not bring the service down
- Trade-off: → TRADE-OFFS.md when below the gate
Status
Required · Recommended · Deferred · Discarded — resolved by the gating function.
Gate
The thresholds and the effective dimension tier that decide the status.
Origin
The exact battery question + answer that produced this block — full traceability.
Reference
The source framework (WAF / SRE / DORA / 12-factor / Fowler) for the practice.
Directive
Structured Do / Don't / Example / Verification — scales by tier via inherits.
LAYER 01

Business

Mostly does not become code — it becomes constraints the other layers inherit, carried in a propagates line.

BUS-01 … BUS-10
LAYER 02

Application

Presentation, domain modeling, communication, consistency, resilience, observability, auth, testing, config, extensibility.

APP-01 … APP-10
LAYER 03

Data

Classification, model, volume, integrity, access, retention, privacy, lineage, backup/RPO, movement.

DATA-01 … DATA-10
LAYER 04

Infrastructure

Hosting, compute, availability, RTO/DR, scalability, security posture, exposure, IaC, CI/CD, cost.

INFRA-01 … INFRA-10
07 / What the generator emits

A library of plain Markdown an agent reads as guardrails.

Point Claude Code, Cursor, Copilot, or any generic LLM agent at AI-AGENT-RULES.md — it flattens the Required and Recommended directives into imperative rules the agent follows while building. Filenames stay version-free; every file declares its version internally.

00-PROJECT-PROFILE.mdLayer-0 output: the Tier, scores, overrides, and detected conflicts. Source of truth.
01·02·03·04-*.mdThe four EA layers, each a battery of decision blocks rendered from the catalog.
AI-AGENT-RULES.mdFlattened Required/Recommended directives. Thin variants for CLAUDE.md, .cursorrules, copilot-instructions.
DEFINITION-OF-DONE.mdThe done gate — assembled from verification fields, tier-gated. No Required item ships unmet.
TRADE-OFFS.mdEvery Deferred/Discarded practice + reason + reactivation trigger. The file that justifies absences.
NFR.md · SECURITY-BASELINE.mdNon-functional targets (SLO, RTO/RPO, budgets) and minimum controls from classification + tier.
08 / Worked example

Expense-approval platform — where the engine earns its keep.

An internal expense-approval platform: ~2,000 employees, integrates an ERP and a payment gateway, built by a small team under a tight deadline.

Scores: C = 8 (C-High), K = 4 (K-Med) → base T2. The payment_pci flag ratchets security to T3. The required execution rigor exceeds the declared team/deadline capacity — recorded as an accepted risk, not a blocker.

The example is generated by the same generator from t2-expense.yaml, so it stays in lock-step with the catalog.

25
Required
7
Recommended
8
Deferred

APP-07 auth → Required (financial override) · APP-05 resilience → Recommended (deferrable=false, survives fast_mvp) · APP-10 extensibility → Deferred (deferrable=true). Each of the 8 Deferred items appears in TRADE-OFFS.md with a reactivation trigger.

09 / The author

From research on cloud capacity planning to calibrated engineering rigor.

CD

Carlos Diego C. P., DSc  · computer scientist · entrepreneur · professor

Computer scientist, entrepreneur and professor with an MSc and DSc in Computer Science. Founder and CEO & CTO of Valcann (cloud computing), professor at CESAR School, Visiting Fellow at MIT, and an AWS Ambassador in Latin America — AWS Ambassador of the Year 2021, winner in Latin America and 2nd worldwide. Research spans software engineering, distributed systems, cloud computing, data and AI.

X-PRO.ai grows directly out of that work. His 2023 doctoral thesis, “Capacity Planning of Cloud Computing Workloads” — the first thesis of a professional doctorate in Software Engineering in Brazil — is the same instinct applied to architecture: match the investment to the real demand, and make the trade-off explicit. The framework turns that judgment into auditable, versioned data instead of tribal knowledge.

Contributions to the catalog, the tier engine, and the calibration fixtures are exactly what the project needs next. The form below goes straight to the inbox.

cdiego.com ↗ GitHub ↗ LinkedIn ↗ Join the project
10 / Join the project

Tell me how you'd like to contribute.

Catalog practices, tier-engine tuning, calibration fixtures, the generator, docs, or a real-world case study to harden the framework against. Send a note and I'll reply directly.

Submitting sends an email to the maintainer. Your address is used only to reply.