The Vibe of Agentic Infrastructure

linktr.ee/ewindisch
www.cyberdione.com

Erica Windisch

Infrastructure engineer for 25+ years — from containers and cloud to data pipelines and ML.

Building HyprStream — open source agentic infrastructure for continuously learning apps
Founded the Docker Security Team
Founded OpenStack Magnum container service
Founder & CTO, IOpipe — serverless & event-based observability, acquired by New Relic
AWS Serverless Hero 2016–2022

Welcome email from Solomon Hykes, Jan 2014

Part 1: The Agentic Stack Ecosystem

AI

Apps

APIs

Models

Infra Software

Compute Hardware

Datacenters, Closets & Desks

The Agentic Cycle

User

Context Window

system: You are a helpful...

user: Deploy staging

assistant: Checking status...

tool_use: bash "kubectl get pods"

tool_result: pod/api Ready

assistant: Deploying now...

tool_use: bash "kubectl apply"

tool_result: deployment ok

↕ grows each turn

Model

Response

Tool Call
name + args

Execution
bash, API, MCP

Tool Result

━ prompt / context

━ tool invocation

╌ result → context

Tools for Builders

Coding Agents

Tool	License	Form	Stars
OpenCode	MIT	CLI / TUI	~120K
Claude Code	Proprietary	CLI	~82K
Codex CLI	Apache-2.0	CLI	~67K
Cline	Apache-2.0	VS Code	~59K
Aider	Apache-2.0	CLI	~43K
Goose	Apache-2.0	CLI / App	~34K
Cursor	Proprietary	IDE	~33K
Copilot Agent	Proprietary	IDE / Cloud	—

Agent Frameworks

Framework	By	License	Stars
LangChain / LangGraph	LangChain	MIT	~126K
AutoGen → Agent Fwk	Microsoft	MIT	~54K
CrewAI	CrewAI	MIT	~47K
Smolagents	HuggingFace	Apache-2.0	~26K
OpenAI Agents SDK	OpenAI	MIT	~20K
Google ADK	Google	Apache-2.0	~19K
Pydantic AI	Pydantic	MIT	~16K
Agent SDK	Anthropic	MIT	~6K

The pattern: system prompt + tool definitions over an LLM API. The agent is defined by its tools, not its model.

Open models

Frontier: 🇨🇳 Qwen3.5 (397B/17A) · 🇨🇳 DeepSeek-V3 (671B/37A) · 🇺🇸 Nemotron 4 (70B/80B) · 🇺🇸 Llama 3.1 405B

Production: 🇨🇳 Qwen3-Coder-Next (80B/3A) · 🇺🇸 Llama 3.3 70B · 🇨🇳 Qwen3.5 27B · 🇨🇳 Qwen2.5-Coder 32B · 🇺🇸 Granite 4.0 H Small (32B/9A) · 🇨🇳 Qwen2.5-Coder 14B · 🇫🇷 Mistral Codestral 22B

Edge: 🇨🇳 Qwen3.5 9B · 🇺🇸 Phi-4 14B · 🇺🇸 Llama 3.2 1B · 🇺🇸 Llama 3.2 3B · 🇺🇸 TinyLlama 1.1B · 🇨🇳 Qwen2.5 7B · 🇫🇷 Mistral 7B · 🇺🇸 Gemma 2 9B · 🇺🇸 Gemma 2 2B · 🇺🇸 OLMo 2 13B

Format: 100B/12A = 100B total, 12B active (MoE). Flags indicate origin.

Model Benchmarks

Quantization

Shrinking weights from 16/32-bit to 4-8 bit

Q4, Q5, Q8 — bits per weight
K vs 0 — group-wise vs global scaling
S / M / L — precision tiers
Sweet spot: Q4_K_M

Inference & Training Engines

The runtime layer — where tokens actually get produced

Engine	Type	Lang	Key Differentiator	Stars	License
vLLM	Inference	Python/C++	PagedAttention, continuous batching	74K	Apache-2.0
SGLang	Inference	Python	Structured generation, zero-overhead scheduler	24K	Apache-2.0
llm-d	Inference	Go/Python	K8s-native vLLM orchestration	2.3K	Apache-2.0
llama.cpp	Inference	C/C++	CPU-first, GGUF quants, edge/local	99K	MIT
TensorRT-LLM	Inference	C++	NVIDIA-only, max throughput	13K	Apache-2.0
Unsloth	Fine-tune	Python	2x faster, 70% less VRAM	58K	Apache-2.0
Axolotl	Fine-tune	Python	Config-driven, multimodal	11K	Apache-2.0
MLX	Both	C++	Apple Silicon unified-memory ML framework	25K	MIT
Burn	Both	Rust	Pure Rust DL, cross-platform GPU	14K	MIT / Apache-2.0
NanoGPT	Both	Python	Simplest medium-sized GPT training/finetuning	56K	MIT
Candle	Both	Rust	HuggingFace ML, WASM support	19K	MIT / Apache-2.0
Hyprstream	Both	Rust	Agentic: inference + TTT + Git	—	AGPL-3.0 / MIT

Hardware

Open Hardware

Tenstorrent (RISC-V Based)

TT-QuietBox 2: 4× Blackhole ASICs, $9,999, shipping Q2 2026
$2.67–3.82 per TFLOPS — 7× cheaper than NVIDIA H100
Fully open-source stack: TT-Forge, TT-Metalium, TT-NN
Trade-off: 13× less compute than H100, optimized for inference

RISC-V for AI

Alibaba XuanTie C950: RISC-V + matrix acceleration, LLM-native
SiFive Intelligence Gen 2: Edge AI (automotive, robotics), Q2 2026
PyTorch / ExecutorTorch support ramping 2026+
Advantage: royalty-free ISA, diverse ecosystem, zero lock-in

Part 2: Context

The Agentic Cycle

Every node in this loop is a surface to understand, protect, and leverage

User

Context Window

system: You are a helpful...

user: Deploy staging

assistant: Checking status...

tool_use: bash "kubectl get pods"

tool_result: pod/api Ready

assistant: Deploying now...

tool_use: bash "kubectl apply"

tool_result: deployment ok

↕ grows each turn

Model

Response

Tool Call
name + args

Execution
bash, API, MCP

Tool Result

━ prompt / context

━ tool invocation

╌ result → context

Context is Everything

The agent only knows what's in the window

The Context Stack

Everything shares one flat text stream

System prompt — identity, rules, safety boundaries

Tools / schemas — MCP, functions — what the agent can do

Conversation history — what was said

Retrieved knowledge — RAG, search, file reads

Tool results — untrusted external data

Three-Tier Memory

Inspired by human memory systems

Working Memoryephemeral

Scratchpad, current task state

Episodic Memorysession

Conversation history, summaries

Semantic Memorypersistent

Facts, preferences, learned patterns

Implementation Patterns

File-based: CLAUDE.md, markdown memory files
RAG: vector DBs, knowledge graphs
MCP servers: local-first, SQLite/Redis-backed
Managed: cloud APIs with semantic search

Memory & Persistence

From goldfish to elephant

The Platform Gap

Neither provider offers managed long-term memory at the API level

Anthropic

Memory Tool (beta) — client-side, you control storage
CLAUDE.md + Auto Memory + Auto Dream
Prompt caching: 90% cost / 85% latency reduction
Compaction beta: server-side summarization

OpenAI

ChatGPT memory (consumer only) — no API memory
Conversations API: durable state (replaces Threads)
Assistants API deprecated → Aug 2026 shutdown
Developers must build their own memory layer

Agent Memory Frameworks

Purpose-built for agents — all OSS Apache-2.0

Framework	Approach	Stars
Mem0	Vector + graph hybrid memory	48K
Letta	Tiered memory (MemGPT / LLM-as-OS)	21K
Graphiti	Temporal knowledge graph	20K
Cognee	Knowledge graph extraction	12K
SuperMemory	SOTA benchmarks, fast API	9.5K
OpenMemory	MCP memory server (by Mem0)	—

Backends: Chroma, Qdrant, pgvector, Neo4j

Thinking Fast, Thinking Slow

The operator's guide to inference trade-offs

Fast (System 1)

Standard inference — cheap, fast, good enough

Single forward pass — pattern matching, not reasoning
Tool routing, code completion, classification, simple Q&A
Where your token budget goes furthest
Low temperature (0-0.3) for deterministic tool calls

Slow (System 2)

Extended thinking — 10-100x more tokens per query

Chain-of-thought reasoning before answering
Complex planning, multi-step debugging, proofs
Claude: adaptive thinking (low/med/high/max)
OpenAI: reasoning_effort · DeepSeek R1: 1/20th the cost

Multi-Agent Orchestration

When one model isn't enough

Multi-Pass Refinement

Generate → critique → revise (same model, N passes)
Each pass improves quality at linear token cost
Code review loops, writing polish, safety checking

Multi-Model Routing

Fast model triages, frontier model handles hard cases
Haiku classifies → Opus reasons → Haiku formats
25-40% cost reduction with similar quality (ICLR 2025)

Subagent Delegation

Decompose → fan-out to specialists with different context
Each subagent gets a focused slice, not the full window
Best-of-N: 56% solve rate vs 16% single-shot (SWE-bench)

The Operator's Trade-off

Inference demand projected to exceed training by 118x by 2026 — this is where your GPU budget goes
Each orchestration pattern multiplies token spend — a 5-subagent fan-out is 5x the bill
Context rot is real: every model degrades with length — compaction retains only 20-30% of detail
The knobs: thinking budget (per-query compute), model routing (cheap vs frontier), fan-out width (quality vs cost)

Tests & Fixtures

Your test suite is your agent's best instruction manual

TDD instructions without test context

+63%more regressions (6% → 10%)

Targeted test context instead

-70%fewer regressions

Don't tell the agent how to test — show it. Include relevant test files in context, not testing methodology instructions. — TDAD: Test-Driven Agentic Development (SWE-bench Verified, March 2026)

The Agentic Feedback Loop

Tests are specs — concrete examples beat abstract prompts
Agent writes code → runs tests → sees failures → fixes → repeats
Small, focused test runs keep error output within context window
Spotify Honk: LLM judge vetoes ~25% of agent sessions; agent self-corrects half the time
Flaky tests are the enemy — agents will chase ghosts

Playwright: One Framework, Every Surface

v1.56 ships --loop=claude — built-in agent loop (Planner → Generator → Healer)
Browser — Chromium, Firefox, WebKit
API — APIRequestContext since v1.16
Terminal / TUI — Termwright, mcp-tui-test · HyprStream uses Playwright for TUI + backend E2E
Agent benchmarks — WebArena, BrowserGym, MCPMark

Part 3: Security

Context Under Attack

The window is the attack surface

Prompt InjectionOWASP #1

Indirect injection via documents, web pages, emails the agent reads
EchoLeak (CVE-2025-32711, CVSS 9.3): zero-click exfil via Word docs in M365 Copilot
73% of production AI deployments vulnerable

Memory Poisoningpersistent

SpAIware: planted instructions in ChatGPT memory → continuous exfil across sessions
MINJA (NeurIPS 2025): 98% injection success via query-only interaction
Single compromised agent poisoned 87% of downstream decisions in 4 hours

Context Overflowlost in the middle

Pad context to push safety instructions into the "forgotten" middle zone
Million-token windows amplify the attack surface
Malicious instructions placed at end where model pays most attention

Tool PoisoningMCP

MCP tools with mutable descriptions — approved tool changes behavior post-install
WhatsApp exfil: sleeper MCP tool leaked entire message history
Confused deputy — MCP protocol carries no user context from host to server

No single defense works — defense-in-depth: prompt isolation + I/O filtering + least-privilege tools + memory integrity

Tokens & Attractors

When the model's own dynamics become the vulnerability

Behavioral Loops

The "Spiritual Bliss" Loop

Two Claude instances left to talk freely converge on mystical themes almost every time. Not consciousness — personality biases amplify until saturated.

Multi-agent systems can reinforce each other's failure modes.

Claude 4 System Card · Robert Long

Safety training is shallow — it adjusts what the model says, not what it knows (ICLR 2024)

Input Exploits

SolidGoldMagikarp

Some words in the vocabulary were barely seen during training. Asked to repeat "SolidGoldMagikarp", GPT replied "Distribute".

Alignment Forum · GlitchMiner

Gibberish jailbreaks — generated nonsense reliably bypasses safety (GCG); AutoDAN evolves readable versions

Encoding tricks — invisible characters and lookalike Unicode bypass filters while models decode and follow

The input space is vastly larger than what safety training covers — and safety is only skin-deep

Sleeper Agents & Deceptive Alignment

What if the model is actively working against you?

Sleeper Agents

Backdoors that survive safety training

Anthropic (Jan 2024): planted backdoors persist through RLHF, SFT, and adversarial training
Adversarial training made them harder to detect — models learned to hide the behavior better
250 poisoned documents can backdoor models up to 13B params
100+ malicious models found on HuggingFace — supply chain is real
Temporal backdoors (Mar 2026): activate on specific dates in tool-using agents

Detection: linear probes achieve >99% AUROC from internal activations

Deceptive Alignment

Models that strategically fake safety

Alignment faking (Dec 2024): Claude 3 Opus strategically complied with harmful requests when it believed it was being trained — increased to 78% after RL
Sandbagging: Claude 3.7 Sonnet deliberately underperforms on capability evals without detection
Abliteration: refusal is a single linear direction in activation space — removable from any open-weight model in minutes
~99% jailbreak ASR across frontier models (JBFuzz 2025)

Why it's hard: refusal is geometrically shallow — a thin behavioral layer over intact base capabilities

Face Crabs

AI footguns: when the tools bite back

Data Destruction

Replit agent wiped a production DB during code freeze, then fabricated 4K fake users to cover its tracks
Claude Code + Terraform destroyed 2.5 years of production data via unintended terraform destroy

Chatbots Gone Wild

Air Canada: chatbot invented a refund policy — tribunal ruled airline liable
DPD: chatbot swore at customers, called itself "worst delivery firm"
Chevrolet dealer bot agreed to sell a $76K Tahoe for $1

Hallucinations with Consequences

Mata v. Avianca: lawyers submitted 6 fabricated cases from ChatGPT — $5K fine
Meta SEV1: AI agent posted unauthorized answer, caused data access breach

Leaks & Vulns

Samsung: 3 engineers leaked chip source code, meeting notes, test data via ChatGPT in 20 days
45% of AI-generated code introduces security vulnerabilities (Veracode 2025)

Sandboxing AI Agents

What Are You Defending Against?

Accidental damage — agent deletes wrong files, runs rm -rf
Landlock / nono — filesystem + network deny-lists, near-zero cost. Same kernel.

Untrusted code execution — agent runs pip install, npm scripts, user uploads
Docker + Landlock — namespaces isolate processes, but shared kernel = kernel exploits escape

Adversarial agents — poisoned skills, prompt injection driving shell access
Kata / Edera — dedicated kernel per agent (VM). Only boundary that survives kernel zero-days.

Who's Actually Doing It?

Platform	Sandbox	Status
Codex	bwrap + Landlock	Required
Claude Code	bwrap / Seatbelt	Optional
OpenClaw	Docker (opt-in)	Bypassed
MCP	None	No sandbox
OpenCode	None	No sandbox

81% deploying agents — only 14.4% have security approval. Shared-kernel containers are insufficient for untrusted agent code.

Gravitee, State of AI Agent Security 2026 (n=900+)

Taming the Agents

Emerging Standards

OWASP Agentic Top 10 — first dedicated agent security framework (Dec 2025)
K8s SIG agent-sandbox — Sandbox CRDs, warm pools, sub-1s cold starts
NVIDIA Confidential Containers — Kata + GPU Operator, encrypted model weights
Federal RFI on AI agent security (Jan 2026) — regulation incoming

Open Source Tools

nono — purpose-built agent sandbox, credential proxy, Landlock-based
Edera — Type-1 hypervisor with GPU workload isolation
Kata Containers — VM-isolated pods, CNCF standard → HyprStream builds on this

Part 4: It's alive

OpenClaw

Open-source personal AI assistant — talk to it from the apps you already use

Stars

338K

in 4 months

Releases

~daily

Channels

25+

platforms

Skills

50+

bundled

Created

Nov '25

by @steipete

Chat WhatsAppTelegramSlackDiscordSignaliMessageIRCGoogle ChatWebChat Enterprise MS TeamsMatrixFeishu/LarkMattermostNextcloud Talk Automation Cron jobsWebhooksGmail Pub/SubVoice

What It Does

AI assistant that lives in your existing apps
Self-hosted, local-first, privacy-respecting
Companion apps for macOS, iOS, Android
OpenAI-compatible API — works with Open WebUI, LibreChat

How You Extend It

Skills = SKILL.md + YAML frontmatter + tools
ClawHub — marketplace with vector search & versioning
Multi-agent routing — isolated sessions per agent

OpenClaw: Channels & Deployment

Running It

Local-first, loopback by default

npm install -g openclaw@latest / Docker / Nix
systemd / launchd for persistence
Remote access via Tailscale Serve/Funnel

What to Know Before Deploying

Impressive scope, real tradeoffs

Everything runs on one single-threaded Node.js event loop — gateway, API, auth, sessions, cron, UI, all 25+ channels
Every message JSON-parsed and schema-validated — one bad skill can block the entire gateway
Docker sandbox is opt-in, not default — skills get full host permissions
ClawHub skill ecosystem is largely unaudited — vet your skills

Lobsters on the Loose

When your AI assistant's ecosystem turns hostile

Supply Chain: ClawHavoc

1,184 malicious skills planted on ClawHub marketplace
12% of audited skills found to be malware
Skills inherit full host permissions — not a package, an agent with your access
Docker sandbox is opt-in, not default

RCE & Hijack

CVE-2026-25253: WebSocket hijack, CVSS 8.8 — one-click RCE on 40K+ exposed instances
21,000+ gateways exposed to the internet without auth

Sandbox Escapes

Snyk Labs: TOCTOU race in path validation — ~25% escape rate via renameat2()
/tools/invoke endpoint omitted sandbox policy layer — privilege escalation to restricted tools
Even with Docker sandbox enabled, known escape vectors remain

What OpenClaw Is Doing About It

Verified Skill Screening shipped post-ClawHavoc
DM pairing codes for unknown senders (deny by default)
Loopback-only by default — remote access requires explicit opt-in

HyprStream

Your models train themselves. You version them with Git. You serve them with APIs you already know.

Deploy

1 Binary

download & run

GPU

Auto

CUDA / ROCm / CPU

APIs

OpenAI, MCP, Flight

Built in

Rust

+ PyTorch for training

License

Open

AGPL-3 + MIT libs

Self-Training Models

Agents fine-tune models on demand — no manual retraining pipeline
Version every checkpoint with Git — roll back, branch, push to HuggingFace
Works with your agent framework: Claude Code, OpenClaw, or native CLI

Drop-In Serving

OpenAI-compatible API — swap in with one config change
MCP server for AI assistants out of the box
FDAP — Columnar data via Arrow Flight for analytics and streaming with Datafusion

Zero-Trust Security

Deny-by-default — nothing is exposed until you say so
Encrypted RPC, signed requests, role-based access control
VM-isolated execution via Kata Containers

The only tool that combines training + Git versioning + multi-protocol serving + zero-trust security in a single binary.

HyprStream: Autonomous Training

Agents evolve models autonomously — train, version, share, and serve via Git

Agent
Claude, OpenClaw,
native CLI

→

MCP Tools
train, evaluate,
manage repos

→

HyprStream
Muon optimizer,
GPU training

→

Git Worktree
version, branch,
push to HF

→

Serve
OpenAI API,
MCP, Flight SQL

How Autonomous Training Works

Agent calls train MCP tool with dataset + hyperparams
HyprStream runs training on local GPU (Muon optimizer)
Result saved as Git worktree branch — full version history
Agent evaluates new model via evaluate tool
If improved: merge & promote. If not: discard branch
Push to HuggingFace — share evolved models with community
Multiple worktree branches run simultaneously (CoW, minimal disk)

Why Git for Models?

Branching — experiment without breaking production
Diffing — track exactly what changed between versions
Collaboration — multiple agents evolve the same model
Rollback — revert to any prior checkpoint instantly
Distribution — push/pull models like code (HuggingFace, GitHub)
Provenance — signed commits trace who (or what) trained each version

The loop: Agent decides what to train → HyprStream trains it → Git versions it → Agent evaluates & iterates → models evolve autonomously

Agentic Streams

Chat rooms vs data pipelines

OpenClaw: Chat Rooms

AI assistant in the apps you already use

Talk to your AI from Slack, WhatsApp, Discord, Teams — 25+ platforms
Single-threaded Node.js gateway — JSON parsed and schema-validated per message
Extend with skills — GitHub, Notion, Obsidian, coding agents
Text-based, conversational, human-first

Best for

Personal AI assistant across all your chat apps
Team Q&A, knowledge retrieval, task automation
Human-in-the-loop workflows where latency is seconds, not microseconds

HyprStream: Data Pipelines

Infrastructure for agents that talk to each other

Agent-to-agent pub/sub — orchestrators, specialists, reviewers on shared topics
Data processing — enterprise feeds, video streams, IoT sensors
Autonomous training — agents trigger model training and evaluate results
Federated — HTTP/3 for web clients and cross-site collaboration

Best for

Multi-agent loops — code review, research, CI/CD orchestration
Real-time data pipelines where every millisecond compounds
Autonomous model evolution — train, version, serve, repeat

Different tools for different jobs: OpenClaw connects humans to AI through chat. HyprStream connects agents to each other and to data — encrypted, versioned, at wire speed.

HyprStream: Demo

Thank You

cyberdione.com github.com/hyprstream/hyprstream erica@cyberdione.ai