AI Engineering Portfolio

Ben Rivkin

Infrastructure engineering for autonomous AI agents — multi-provider redundancy, context window optimization, distributed systems debugging, and self-healing architecture.

Active May 21 – Jun 1, 2026 ~100–120 hrs 10 projects
~$99/mo
Infrastructure cost
5.4B
Tokens/month
96%
Cost reduction via cache
13
Autonomous cron jobs
3
AI providers

Projects

01

Multi-Provider Watchdog Hierarchy with SMS Alerting

Three-tier system health monitoring with provider-level redundancy and silent escalation to SMS.

Why it matters: Dual-provider redundancy means no single API outage can blind the monitoring system. The silent-escalation pattern is what production systems use — alert on anomaly, not noise.
02

Cron Job Architecture Overhaul

Full audit and optimization of 14 autonomous cron jobs — converted expensive LLM calls to zero-token scripts.

Why it matters: Auditing autonomous agent fleets, identifying redundancy, converting costly LLM calls to zero-token scripts, and tuning detection cadences is systems engineering — not prompt engineering.
03

10-Layer System Prompt Architecture Discovery

Source-level debugging of Hermes Agent's prompt assembly pipeline. Found a bug that silently overrode user configuration for months.

Why it matters: This is source-level debugging of an AI framework's prompt assembly pipeline with cross-model verification. Aerospace calls this Independent Verification & Validation (IV&V).
04

Delegation Architecture — Provider Contamination Root Cause

Diagnosed a 24+ hour silent failure in subagent spawning. Four-level root cause chain across distributed system layers.

Why it matters: Diagnosing distributed system failures where config says one thing but runtime does another — and the error is silent. This is the debugging that separates engineers from users.
05

Multi-Target Atomic Backup Architecture

Five-target backup system with atomic staging, integrity verification, and encrypted cloud push. Survived 8/8 attack tests.

Why it matters: Backup systems that silently fail are worse than no backup. This one was attacked and survived. Atomic-swap pattern prevents the most common failure mode — partial write during crash.
06

PII Pre-Transfer Security Scanner

Automated security gate preventing personal data leaks during AI agent replication. Fail-closed design.

Why it matters: The category error of treating documentation as inert is a real security problem. Most developers ship dotfiles without auditing. This scanner catches credentials nobody remembered were there.
07

Context Window Engineering — Attention Dilution Discovery

Discovered that 14 loaded skills (441KB, 113K tokens) were mathematically inaccessible due to Lost in the Middle attention dilution.

Why it matters: Understanding that more context ≠ better performance — that attention dilution makes content invisible — is counterintuitive. Most users add MORE instructions. I removed them and got better results.
08

Token Economics — 96% Cost Reduction Through Cache Engineering

Achieved ~$0.018/M token effective rate vs $0.435/M list price. 120x differential exploited through cache architecture.

Why it matters: Understanding provider pricing at the cache-architecture level and engineering usage patterns to exploit it. This is the difference between "I use an AI API" and "I understand how the cache works."
09

MCP Orphan Process Watchdog

Automated detection and cleanup of orphaned MCP server processes silently accumulating and freezing CLI sessions.

Why it matters: Process lifecycle management for a multi-process AI system. The architecture discovery that CLI and gateway MCP stacks are independent was itself significant.
10

Self-Healing Infrastructure

The system diagnoses and fixes its own problems — agent as operator, not as tool.

Why it matters: An AI agent that debugs its own infrastructure failures is a qualitatively different thing from one that answers questions. Agent as operator.

Technical Stack

Primary Model
DeepSeek V4 Pro (1M ctx, xhigh)
Framework
Hermes Agent + LCM plugin
Concurrency
20 parallel subagents, 2-level
Providers
DeepSeek, NVIDIA NIM (free)
Throughput
5.4B tok/mo, 36K calls, $99/mo
Effective Rate
$0.018/M tokens (96% off)
Hardware
11yr ASUS laptop, 16GB RAM
Interface
Tabby, 28 SSH tabs, voice dictation
Cron Jobs
13 autonomous (3 LLM, 11 scripts)
Memory
Holographic store, 5K char limit