dcch Deep Dive: Retry, Backoff, JSON Mode, Model Downshift

I've written about why we have a shared Claude wrapper. This post is the deep dive: what's actually in the code, how each piece works, and what I'd do differently now.

The package is dangercorn-claude-helper, imported as dcch. It's about 1,100 lines of Python. Every Dangercorn vertical that touches Claude imports it:

from dcch import Client, JsonFormat
client = Client(api_key=os.environ["ANTHROPIC_API_KEY"])
response = client.complete(
    model="claude-opus-4-5",
    system="You extract cheese tasting notes.",
    messages=[{"role": "user", "content": prompt}],
    max_tokens=800,
    json_format=JsonFormat.TASTING_NOTES,
)

Under the hood, that one call is doing retry, rate-limit handling, JSON validation against a Pydantic schema, cost tracking, and model downshift if the primary is unavailable. Let's walk through each.

Retry Policy

Claude's API returns a 529 status code when it's overloaded. It returns 429 when you're rate-limited. It returns 503 in rare cases of service degradation. All three are retryable.

dcch's retry policy:

Retries on 429, 503, 529, and on network errors (ConnectionError, Timeout).
Does not retry on 400 (bad request), 401 (auth), or 404 (unknown model).
Uses exponential backoff with jitter: 1s, then 2s, then 4s, then 8s, capping at 60s.
Maximum 5 retries total. After that, it raises.

from random import uniform

def _retry_sleep(attempt):
    base = min(60, 2 ** attempt)
    jitter = uniform(0, base * 0.3)
    return base + jitter

for attempt in range(MAX_RETRIES + 1):
    try:
        return self._make_request(payload)
    except (RateLimited, Overloaded, ServiceUnavailable, ConnectionError):
        if attempt == MAX_RETRIES:
            raise
        time.sleep(_retry_sleep(attempt))

The jitter matters. Without it, when Claude goes through a sustained overload, every client that saw the first 529 retries at the same second. The retry storm makes the overload worse. Jitter spreads the retries over a window and the service recovers cleanly.

Model Downshift

Every call specifies a primary model and (optionally) a fallback chain. If the primary is unavailable after retry exhaustion, dcch tries the fallback. If that fails, the next fallback. Only after the whole chain fails does the call actually error out.

models = ["claude-opus-4-7", "claude-sonnet-4-5", "claude-haiku-4-5"]
for model in models:
    try:
        return self._call_with_retry(model, payload)
    except ClaudeAPIError as e:
        if not e.is_retryable:
            raise
        last_error = e
        continue
raise last_error

The downshift means an app can specify "prefer Opus, but Sonnet is fine, Haiku is acceptable" and the system degrades gracefully. An app that critically needs Opus (say a vision task) can pass a single-model list and fail fast.

JSON Mode

Claude can return JSON. It's surprisingly good at it. It can also return valid JSON that doesn't match your expected schema — extra fields, wrong types, missing required keys.

dcch's JSON mode validates against a Pydantic schema you define. If validation fails, it retries with a clarification: "your previous response didn't match the expected schema; please re-emit as valid JSON matching this shape: ...". Up to 2 retries.

class TastingNotes(BaseModel):
    flavor: list[str]
    texture: str
    rind_color: str
    stage_estimate_days: int
    next_action: str

response = client.complete(
    ...,
    json_format=JsonFormat(schema=TastingNotes),
)
# response.parsed is a validated TastingNotes instance
# response.raw has the original response JSON string

This is maybe the most valuable piece of dcch. Before the JSON-mode wrapper, every vertical that wanted structured Claude output had its own ad-hoc try/except around json.loads and its own schema validation. Half of them had bugs where Claude returned valid JSON with a subtle type mismatch (a string where a number was expected, for example) and the app silently corrupted.

Centralizing the validation fixed a category of bugs across the entire portfolio.

Cost Tracking

Every successful call logs: model, input tokens, output tokens, cost in USD, app name, timestamp. Costs are computed from Anthropic's published pricing, stored in a dict in dcch.

PRICING = {
    "claude-opus-4-7": {"input_per_mtok": 15.0, "output_per_mtok": 75.0},
    "claude-sonnet-4-5": {"input_per_mtok": 3.0, "output_per_mtok": 15.0},
    "claude-haiku-4-5": {"input_per_mtok": 0.80, "output_per_mtok": 4.0},
}

The cost log is written to SQLite (one claude_call_log table per app) and also streamed to a central aggregator on Johnny-5. That aggregator is how I know that across all 220 apps we spent $47.30 on Claude API last month. If one app starts burning unexpectedly, it shows up in the aggregator within 5 minutes.

Streaming

dcch supports streaming for responses that need it (long-form generation, user-facing chat). The stream API is async and yields chunks as they arrive. The cost tracking and retry logic still work — on a stream failure mid-response, we can retry from the beginning and replay only the final complete response to the caller.

Most apps don't need streaming. The classification and extraction workloads complete in under 3 seconds and are simpler to call synchronously.

The Rate Limiter

dcch includes a token-bucket rate limiter to keep apps from hitting Anthropic's per-account rate limit. The limiter is configured per-app (defaults to 20 requests/second, 100k tokens/second). If the limit is hit, the call sleeps until a token is available, up to a configurable timeout.

This matters for batch jobs — an app doing 10,000 classifications in a loop should rate-limit itself, not discover at hour 3 that Anthropic throttled it.

Template Fallback

For certain operations (landing-page generation, email copywriting), we maintain a template that runs if Claude is completely unavailable. The template produces a generic-but-acceptable output. This lets the app keep functioning — with degraded quality — during Claude outages.

Not every call has a template fallback. Vision analysis doesn't. Summarization doesn't. But the operations that are on the critical path for landing pages and email dispatching do, and it's saved us twice during Claude outages.

What I'd Do Differently

Three things.

First, I'd build the cost-tracking aggregator on day one. We added it retroactively, and the early months of cost data are spottier than they should be.

Second, I'd make the retry policy per-call configurable from the start. We hardcoded 5 retries early and it took a refactor to make it per-call. Some calls (user-facing, latency-sensitive) should retry fewer times; some (background batch) can retry more aggressively.

Third, I'd add request deduplication earlier. We have a pattern where apps occasionally fire the same classification twice due to a retry at a different layer. dcch now dedups within a 10-second window based on a hash of (model + system + messages). Should have been there from the start.

The right level of abstraction for Claude API code is 'one wrapper for the whole company.' Fix a bug once, fix it everywhere. Audit cost in one place.

Why the wrapper exists at all. How we use Claude for landing-page copy. cheesemaking, honeybees, meetingmind all call through dcch.

Repo: github.com/Dangercorn-Enterprises/dangercorn-claude-helper.