AI Engineering Portfolio

Ben Rivkin

Infrastructure engineering for autonomous AI agents — multi-provider redundancy, context window optimization, distributed systems debugging, and self-healing architecture.

Active May 21 – Jun 1, 2026 ~100–120 hrs 10 projects

~$99/mo

Infrastructure cost

5.4B

Tokens/month

96%

Cost reduction via cache

Autonomous cron jobs

AI providers

Projects

Multi-Provider Watchdog Hierarchy with SMS Alerting

Three-tier system health monitoring with provider-level redundancy and silent escalation to SMS.

NVIDIA Health Watchdog: runs every 10 min on free-tier NIM (llama-3.1-nemotron-8b), independent of primary DeepSeek provider. Silent unless broken.
DeepSeek Backup Check: verifies NVIDIA watchdog is alive twice daily. Cross-provider verification.
Daily System Overseer: full health audit at 9am — cron jobs, gateway, MCP count, disk, DB locks, backup status. Delivers via SMS (Verizon email-to-SMS gateway).
Silent escalation pattern: [SILENT] when healthy, SMS when broken. Production monitoring design.

Why it matters: Dual-provider redundancy means no single API outage can blind the monitoring system. The silent-escalation pattern is what production systems use — alert on anomaly, not noise.

Cron Job Architecture Overhaul

Full audit and optimization of 14 autonomous cron jobs — converted expensive LLM calls to zero-token scripts.

Converted Hourly Build Tracker from LLM-driven to pure Python script (track-build.py, 350 lines). Eliminated 720 LLM runs/month.
Tuned cadences: git tracker 2min → 1min, watchdog 30min → 5min. Balanced detection speed vs resource overhead.
Removed 3 redundant jobs: save-obsidian (duplicate of vault-sync), skills DB scraper (87K+ noise post-OpenClaw merge), Top Step refund tracker.
Result: 13 active jobs. 3 LLM-driven, 11 pure scripts. Near-zero token burn from cron.

Why it matters: Auditing autonomous agent fleets, identifying redundancy, converting costly LLM calls to zero-token scripts, and tuning detection cadences is systems engineering — not prompt engineering.

10-Layer System Prompt Architecture Discovery

Source-level debugging of Hermes Agent's prompt assembly pipeline. Found a bug that silently overrode user configuration for months.

Discovered 10-layer prompt assembly with hardcoded authority chain — Layer 2 enforcement directives overriding user MEMORY.md guardrails.
Root cause: TOOL_USE_ENFORCEMENT_MODELS in prompt_builder.py:276 included 'deepseek' — substring match on 'deepseek-v4-pro' triggered enforcement injection.
Fix: changed tool_use_enforcement from 'auto' to explicit list excluding deepseek.
Cross-model verification: used Claude Opus 4.7 as independent auditor — IV&V for AI internals.

Why it matters: This is source-level debugging of an AI framework's prompt assembly pipeline with cross-model verification. Aerospace calls this Independent Verification & Validation (IV&V).

Delegation Architecture — Provider Contamination Root Cause

Diagnosed a 24+ hour silent failure in subagent spawning. Four-level root cause chain across distributed system layers.

delegation.provider was set to 'holographic' (memory provider, not an LLM) — subagents never spawned, zero error messages.
Gateway's delegation code path bypassed normal provider resolution. Runtime caching blocked config propagation.
Marker-file debugging proved gateway never reached delegation handler.
Fix: changed provider to deepseek, verified with 5-agent and 10-agent stress tests, later scaled to 20 concurrent subagents.

Why it matters: Diagnosing distributed system failures where config says one thing but runtime does another — and the error is silent. This is the debugging that separates engineers from users.

Multi-Target Atomic Backup Architecture

Five-target backup system with atomic staging, integrity verification, and encrypted cloud push. Survived 8/8 attack tests.

Atomic mirror: staging → integrity check → mirror swap → cloud push. Old mirror deleted only after new verified.
Five targets: NAS (SMB, rclone sync), Backblaze B2 (encrypted, 5MB/s throttle), GitHub, local mirror, USB (air-gapped).
B2 version accumulation bug diagnosed — 48 syncs/day creating hidden versions past 10GB cap. Fixed with --max-age filtering.
Google Drive OAuth silent sync discovered running for 8 days. Disabled.
Survived 8/8 attack tests: corruption, concurrency, lock races, stale locks, idempotency.

Why it matters: Backup systems that silently fail are worse than no backup. This one was attacked and survived. Atomic-swap pattern prevents the most common failure mode — partial write during crash.

PII Pre-Transfer Security Scanner

Automated security gate preventing personal data leaks during AI agent replication. Fail-closed design.

Single-pass grep against 16 fingerprint patterns: SSN, DL, phone, API keys, seed phrases, wallet addresses.
Blocks transfer on ANY match — fail-closed, not fail-open.
Built after discovering live Cloudflare API tokens and GoDaddy API keys in skill documentation files treated as inert reference material.
Part of 3-build replication architecture: Full → PII-scrubbed (verified by scanner) → Target deployment.

Why it matters: The category error of treating documentation as inert is a real security problem. Most developers ship dotfiles without auditing. This scanner catches credentials nobody remembered were there.

Context Window Engineering — Attention Dilution Discovery

Discovered that 14 loaded skills (441KB, 113K tokens) were mathematically inaccessible due to Lost in the Middle attention dilution.

Behavioral verification skills were passive text, not enforcement mechanisms — they couldn't actually prevent anything.
Redesigned skill stack around tool-backed skills only. Stripped 8 behavioral skills — 52% context bloat reduction.
Configured LCM (DAG-based context management): 60% compaction threshold, 2-depth summarization, 8K reserve floor.

Why it matters: Understanding that more context ≠ better performance — that attention dilution makes content invisible — is counterintuitive. Most users add MORE instructions. I removed them and got better results.

Token Economics — 96% Cost Reduction Through Cache Engineering

Achieved ~$0.018/M token effective rate vs $0.435/M list price. 120x differential exploited through cache architecture.

DeepSeek's disk-level caching: $0.003625/M for hits vs $0.435/M for misses.
Session architecture designed to keep system prompt + early conversation identical across turns — cache prefix never breaks.
LCM prevents conversation growth from breaking cache boundary.
$5 prepaid loads, no auto-billing. ~5.4B tokens/month, ~36K API calls, ~$99/month.

Why it matters: Understanding provider pricing at the cache-architecture level and engineering usage patterns to exploit it. This is the difference between "I use an AI API" and "I understand how the cache works."

MCP Orphan Process Watchdog

Automated detection and cleanup of orphaned MCP server processes silently accumulating and freezing CLI sessions.

Discovery: every CLI session spawned its own MCP server pair. After 12+ hours: 33 processes, new sessions freezing.
Three detection modes: orphans (parent dead), duplicates (>2 per CLI), excess (total >6).
Architecture finding: CLI sessions operate in DIRECT mode independent of gateway — gateway health checks were blind to CLI-side leaks.
Strategy: kill orphans → kill oldest duplicates → flag but don't kill legitimate excess.

Why it matters: Process lifecycle management for a multi-process AI system. The architecture discovery that CLI and gateway MCP stacks are independent was itself significant.

Self-Healing Infrastructure

The system diagnoses and fixes its own problems — agent as operator, not as tool.

Gateway crash at 7:45 AM → auto-diagnosed DeepSeek API outage, confirmed automatic recovery.
Memory system at 98% capacity → trimmed and reorganized autonomously.
Backup resource exhaustion → identified parallel cloud upload bottleneck, paused and patched.
14 orphaned MCP processes respawning every 43 minutes → traced to stale config, permanent kill.
USR1 signal propagation bug → diagnosed pgrep cascade killing all Hermes processes.

Why it matters: An AI agent that debugs its own infrastructure failures is a qualitatively different thing from one that answers questions. Agent as operator.

Technical Stack

Primary Model

DeepSeek V4 Pro (1M ctx, xhigh)

Framework

Hermes Agent + LCM plugin

Concurrency

20 parallel subagents, 2-level

Providers

DeepSeek, NVIDIA NIM (free)

Throughput

5.4B tok/mo, 36K calls, $99/mo

Effective Rate

$0.018/M tokens (96% off)

Hardware

11yr ASUS laptop, 16GB RAM

Interface

Tabby, 28 SSH tabs, voice dictation

Cron Jobs

13 autonomous (3 LLM, 11 scripts)

Memory

Holographic store, 5K char limit