Infrastructure engineer for 25+ years — from containers and cloud to data pipelines and ML.

| Tool | License | Form | Stars |
|---|---|---|---|
| OpenCode | MIT | CLI / TUI | ~120K |
| Claude Code | Proprietary | CLI | ~82K |
| Codex CLI | Apache-2.0 | CLI | ~67K |
| Cline | Apache-2.0 | VS Code | ~59K |
| Aider | Apache-2.0 | CLI | ~43K |
| Goose | Apache-2.0 | CLI / App | ~34K |
| Cursor | Proprietary | IDE | ~33K |
| Copilot Agent | Proprietary | IDE / Cloud | — |
| Framework | By | License | Stars |
|---|---|---|---|
| LangChain / LangGraph | LangChain | MIT | ~126K |
| AutoGen → Agent Fwk | Microsoft | MIT | ~54K |
| CrewAI | CrewAI | MIT | ~47K |
| Smolagents | HuggingFace | Apache-2.0 | ~26K |
| OpenAI Agents SDK | OpenAI | MIT | ~20K |
| Google ADK | Apache-2.0 | ~19K | |
| Pydantic AI | Pydantic | MIT | ~16K |
| Agent SDK | Anthropic | MIT | ~6K |
Format: 100B/12A = 100B total, 12B active (MoE). Flags indicate origin.
Shrinking weights from 16/32-bit to 4-8 bit
The runtime layer — where tokens actually get produced
| Engine | Type | Lang | Key Differentiator | Stars | License |
|---|---|---|---|---|---|
| vLLM | Inference | Python/C++ | PagedAttention, continuous batching | 74K | Apache-2.0 |
| SGLang | Inference | Python | Structured generation, zero-overhead scheduler | 24K | Apache-2.0 |
| llm-d | Inference | Go/Python | K8s-native vLLM orchestration | 2.3K | Apache-2.0 |
| llama.cpp | Inference | C/C++ | CPU-first, GGUF quants, edge/local | 99K | MIT |
| TensorRT-LLM | Inference | C++ | NVIDIA-only, max throughput | 13K | Apache-2.0 |
| Unsloth | Fine-tune | Python | 2x faster, 70% less VRAM | 58K | Apache-2.0 |
| Axolotl | Fine-tune | Python | Config-driven, multimodal | 11K | Apache-2.0 |
| MLX | Both | C++ | Apple Silicon unified-memory ML framework | 25K | MIT |
| Burn | Both | Rust | Pure Rust DL, cross-platform GPU | 14K | MIT / Apache-2.0 |
| NanoGPT | Both | Python | Simplest medium-sized GPT training/finetuning | 56K | MIT |
| Candle | Both | Rust | HuggingFace ML, WASM support | 19K | MIT / Apache-2.0 |
| Hyprstream | Both | Rust | Agentic: inference + TTT + Git | — | AGPL-3.0 / MIT |
Every node in this loop is a surface to understand, protect, and leverage
The agent only knows what's in the window
Everything shares one flat text stream
Inspired by human memory systems
From goldfish to elephant
Neither provider offers managed long-term memory at the API level
Purpose-built for agents — all OSS Apache-2.0
| Framework | Approach | Stars |
|---|---|---|
| Mem0 | Vector + graph hybrid memory | 48K |
| Letta | Tiered memory (MemGPT / LLM-as-OS) | 21K |
| Graphiti | Temporal knowledge graph | 20K |
| Cognee | Knowledge graph extraction | 12K |
| SuperMemory | SOTA benchmarks, fast API | 9.5K |
| OpenMemory | MCP memory server (by Mem0) | — |
The operator's guide to inference trade-offs
Standard inference — cheap, fast, good enough
Extended thinking — 10-100x more tokens per query
When one model isn't enough
Your test suite is your agent's best instruction manual
Don't tell the agent how to test — show it. Include relevant test files in context, not testing methodology instructions. — TDAD: Test-Driven Agentic Development (SWE-bench Verified, March 2026)
--loop=claude — built-in agent loop (Planner → Generator → Healer)The window is the attack surface
When the model's own dynamics become the vulnerability
Two Claude instances left to talk freely converge on mystical themes almost every time. Not consciousness — personality biases amplify until saturated.
Multi-agent systems can reinforce each other's failure modes.
Safety training is shallow — it adjusts what the model says, not what it knows (ICLR 2024)
Some words in the vocabulary were barely seen during training. Asked to repeat "SolidGoldMagikarp", GPT replied "Distribute".
Gibberish jailbreaks — generated nonsense reliably bypasses safety (GCG); AutoDAN evolves readable versions
Encoding tricks — invisible characters and lookalike Unicode bypass filters while models decode and follow
What if the model is actively working against you?
Backdoors that survive safety training
Models that strategically fake safety
AI footguns: when the tools bite back
terraform destroy| Platform | Sandbox | Status |
|---|---|---|
| Codex | bwrap + Landlock | Required |
| Claude Code | bwrap / Seatbelt | Optional |
| OpenClaw | Docker (opt-in) | Bypassed |
| MCP | None | No sandbox |
| OpenCode | None | No sandbox |
Open-source personal AI assistant — talk to it from the apps you already use
SKILL.md + YAML frontmatter + toolsLocal-first, loopback by default
npm install -g openclaw@latest / Docker / NixImpressive scope, real tradeoffs
When your AI assistant's ecosystem turns hostile
renameat2()/tools/invoke endpoint omitted sandbox policy layer — privilege escalation to restricted toolsYour models train themselves. You version them with Git. You serve them with APIs you already know.
The only tool that combines training + Git versioning + multi-protocol serving + zero-trust security in a single binary.
Agents evolve models autonomously — train, version, share, and serve via Git
train MCP tool with dataset + hyperparamsevaluate toolChat rooms vs data pipelines
AI assistant in the apps you already use
Infrastructure for agents that talk to each other