Learn

Understanding the full stack.

Ten modules mapped to the AI value chain. Each module covers one layer — what it does, why it matters, and how to work with it.

Everything in AI runs on matrix math, and compute is the hardware that executes it. Understanding GPU architecture, cloud vs on-prem tradeoffs, and cost structures is the foundation. You do not need to buy GPUs to understand this layer — but you need to know what drives the cost of every layer above it.

Chip Server Cluster Cloud

01.1 Silicon

GPU vs TPU vs custom silicon. Memory bandwidth is the bottleneck, not FLOPS. CUDA dominates.

Key concepts

  • Hardware tradeoffs
  • Memory bandwidth
  • CUDA lock-in
  • Chip generations

Practical skills

  • Read GPU specs
  • Compare silicon options

01.2 GPU Clouds

CoreWeave, Lambda, RunPod, Modal. Reserved vs on-demand vs spot pricing.

Key concepts

  • GPU cloud vs hyperscaler
  • Serverless GPU
  • Marketplace models

Practical skills

  • Compare cloud GPU pricing
  • Estimate compute costs

01.3 Training Infra

Distributed training software. Data/model/pipeline parallelism.

Key concepts

  • Parallelism strategies
  • NCCL
  • Distributed frameworks

Practical skills

  • Set up multi-GPU training
  • Choose parallelism strategy

See this layer in the value chain → L01 Compute

Pre-training builds foundation models from massive datasets — trillions of tokens of text, code, and multimodal data. This is the most capital-intensive layer ($50M-$500M+ per frontier model). Most teams never touch pre-training directly, but understanding how base models are made explains their capabilities and limitations.

Data Tokenize Train Checkpoint

02.1 Foundation Models

Labs building base models. Closed-weight vs open-weight.

Key concepts

  • Model families and architectures
  • MoE vs dense
  • Scaling laws

Practical skills

  • Compare models by capability and openness

02.2 Training Data

Quality over quantity. Web crawls, annotation, synthetic data.

Key concepts

  • Data quality shift
  • Deduplication
  • Programmatic labeling
  • Synthetic generation

Practical skills

  • Evaluate training datasets
  • Understand data-behavior link

02.3 Training Frameworks

HF Transformers, nanoGPT. Training loop implementations.

Key concepts

  • Tokenizer design
  • Checkpoint management
  • Mixed precision

Practical skills

  • Read model implementations
  • Interpret training curves

See this layer in the value chain → L02 Pre-Training

Post-training transforms base models into useful tools. SFT teaches instruction following, RLHF and DPO align outputs with human preferences, LoRA enables efficient adaptation, and distillation creates smaller specialized models. This is where most teams first interact with the training pipeline — fine-tuning a 70B model with LoRA can cost under $100.

Base Model SFT RLHF / DPO LoRA Distillation

03.1 Alignment

SFT, RLHF, DPO. Making models useful and safe.

Key concepts

  • SFT as foundation
  • RLHF vs DPO
  • Constitutional AI
  • Preference data quality

Practical skills

  • Prepare SFT datasets
  • Choose alignment technique

03.2 Efficient Adaptation

LoRA, QLoRA, distillation, merging.

Key concepts

  • Parameter-efficient fine-tuning
  • Distillation
  • Model merging
  • Adapter composition

Practical skills

  • Run LoRA jobs
  • Evaluate fine-tuned quality

03.3 Managed Fine-Tuning

Predibase, Together AI, Replicate. Upload and tune.

Key concepts

  • Managed vs self-hosted tradeoffs
  • Cost comparison
  • Data privacy

Practical skills

  • Fine-tune via API
  • Compare platforms

03.4 Benchmarking

Evaluating fine-tuned model quality. Standard benchmarks, A/B comparison, regression testing.

Key concepts

  • Standard benchmarks (MMLU, HumanEval, MATH)
  • A/B comparison and arena-style ranking
  • Regression testing post-training
  • Task-specific eval vs general capability

Practical skills

  • Run standard benchmarks on fine-tuned models
  • Set up regression tests for model quality

See this layer in the value chain → L03 Post-Training

Inference is running trained models to produce outputs — every AI application consumes it. The engineering challenges are latency, throughput, cost, and hardware utilization. The choice between cloud APIs, self-hosted engines, and local runners determines your cost structure, data privacy posture, and scaling ceiling.

Prompt Tokenize Forward Pass Decode Response

04.1 Serving Engines

vLLM, llama.cpp, TGI, SGLang. Self-hosted runtimes.

Key concepts

  • PagedAttention
  • Continuous batching
  • Quantization formats
  • Speculative decoding

Practical skills

  • Deploy vLLM
  • Quantize models
  • Benchmark performance

04.2 Provider APIs

OpenAI, Anthropic, Vertex, Bedrock. Hosted endpoints.

Key concepts

  • Token pricing
  • Rate limits
  • Data privacy
  • Multi-provider strategy

Practical skills

  • Estimate API costs
  • Implement retry logic

04.3 Local Runners

Ollama, LM Studio, MLX. Desktop/edge inference.

Key concepts

  • Hardware requirements
  • Quantization tradeoffs
  • Apple Silicon vs NVIDIA

Practical skills

  • Set up Ollama
  • Choose quantization levels

04.4 Multimodal

Voice, image, video. ElevenLabs, Whisper, Sora.

Key concepts

  • STT/TTS pipelines
  • Diffusion architectures
  • Real-time latency requirements

Practical skills

  • Integrate TTS APIs
  • Choose multimodal providers

04.5 Optimization

Reducing token usage and inference cost. Prompt caching, compression, structured outputs for efficiency.

Key concepts

  • Prompt caching (prefix and semantic)
  • Prompt compression techniques
  • KV cache optimization
  • Structured outputs for token efficiency

Practical skills

  • Enable prompt caching with a provider API
  • Measure and reduce token usage

See this layer in the value chain → L04 Inference

Routing decides which model handles which request. Without it, you are locked to a single provider. With it, you can optimize for cost, latency, and quality per task. The simplest routing is a fallback chain; the most sophisticated classifies requests and dispatches to the best-fit model automatically.

Request Router Model A Model B Model C

05.1 Gateways & Proxies

OpenRouter, LiteLLM, Portkey. Unified API access.

Key concepts

  • OpenAI-compatible format
  • Caching
  • Load balancing
  • Format translation

Practical skills

  • Set up LiteLLM proxy
  • Configure multi-provider access

05.2 Intelligent Routing

Martian, Not Diamond, Unify. Quality/cost optimization.

Key concepts

  • Task classification
  • Quality prediction
  • Benchmark-driven routing

Practical skills

  • Implement task-type routing
  • Set cost-based rules

05.3 Observability Proxies

Helicone, Keywords AI. Routing with monitoring.

Key concepts

  • Request logging
  • Cost tracking
  • Latency monitoring

Practical skills

  • Set up cost tracking
  • Build usage dashboards

See this layer in the value chain → L05 Routing

Agent = Model + Harness. The harness is everything that is NOT the model — the infrastructure that makes a single agent effective. Sandboxes, state, tools, verification, and constraints. The model is a commodity; the harness is the durable asset.

06.1 Execution Environments

E2B, Castari, Kernel AI. Sandboxes and containers for agent code.

Key concepts

  • Sandboxing for code-executing agents
  • Cloud sandbox vs local vs unikernels
  • Resource limits and network policies
  • Cold start latency

06.2 State & Continuity

File-driven state machines, crash recovery, session bridging.

Key concepts

  • File-driven state vs in-memory state
  • Crash recovery and checkpointing
  • Git-tracked task files

06.3 Tool Infrastructure

Tool registries, code execution, MCP runtime, file system access.

Key concepts

  • Tool registries and discovery
  • Code execution infrastructure
  • Browser automation for verification

06.4 Verification Loops

Write-test-fix cycles, self-verification, quality gates.

Key concepts

  • Write-test-fix as core agent pattern
  • Automated test execution as oracle
  • Must-have verification (see Superpowers for hard-gate enforcement — agents must show evidence before claiming completion)

06.5 Context Management

Compaction, progressive disclosure, smart zone optimization.

Key concepts

  • ~40% context utilization sweet spot
  • Pre-inlining vs discovery
  • AGENTS.md / CLAUDE.md as static context

06.6 Constraints & Linting

Architectural linters, structural tests, dependency enforcement.

Key concepts

  • Constraints multiply agent effectiveness
  • Dependency layering
  • Error messages as remediation instructions

See this layer in the value chain → L06 Harness

Orchestration coordinates multiple agents and multi-step workflows. While the harness (L06) makes a single agent effective, orchestration is how agents work together — delegation, handoffs, human oversight gates, and structured pipelines.

07.1 Agent Patterns

LangGraph, CrewAI, Anthropic SDK. The agent loop.

Key concepts

  • Plan-act-observe-iterate
  • Single vs multi-agent
  • First-party vs third-party
  • Framework vs raw SDK

07.2 Workflow Design

Dify, n8n, Trigger.dev. Pipelines and DAGs.

Key concepts

  • Task decomposition
  • Chain/branch/loop patterns
  • Cost budgets
  • Handoff contracts

07.3 Human Oversight

The control spectrum. Approval gates and escalation.

Key concepts

  • Human-in-the-loop vs human-on-the-loop vs bounded autonomy
  • Reversibility/stakes/confidence
  • Trust boundaries

07.4 Multi-Agent

Agent teams, delegation, handoffs, specialization.

Key concepts

  • Agent specialization — planner, researcher, coder, reviewer
  • Delegation patterns and handoff contracts (see Superpowers' subagent-driven development — fresh agent per task with spec + quality reviewers)
  • Shared state vs message-passing (see ClawTeam for a filesystem-as-message-bus implementation)
  • Conflict resolution

See this layer in the value chain → L07 Orchestration

Context is everything the model sees beyond training data. RAG pipelines, vector databases, embeddings, memory systems, and knowledge management all live here. The quality of your context determines the quality of your outputs — a capable model with bad context produces bad results. Context engineering is the discipline of designing what enters the window.

Documents History Instructions Context Window Model

08.1 Vector Storage

Pinecone, Weaviate, Chroma, Qdrant, pgvector.

Key concepts

  • ANN search algorithms
  • Managed vs self-hosted
  • pgvector
  • Index tuning

Practical skills

  • Set up vector DB
  • Tune index parameters

08.2 Retrieval & Search

RAG pipelines, parsing, reranking.

Key concepts

  • Document parsing quality
  • Chunking strategy
  • Hybrid search
  • Reranking
  • Documentation-as-context

Practical skills

  • Build RAG pipeline
  • Design chunking
  • Add reranking

08.3 Memory Systems

Conversation and long-term memory.

Key concepts

  • Volatile vs durable memory
  • Sliding window/summarization
  • Cross-session persistence

Practical skills

  • Implement conversation memory
  • Design long-term memory

08.4 Embeddings

Voyage AI, Jina, HF TEI, Sentence Transformers.

Key concepts

  • Model quality variation
  • Domain-specific embeddings
  • Dimensions
  • Batch embedding

Practical skills

  • Choose embedding model
  • Set up TEI serving

See this layer in the value chain → L08 Context

Integrations connect AI systems to the tools, APIs, and data sources they need to be useful. Function calling is the core interface — models emit structured tool invocations, and the integration layer routes them to the right target. This spans REST APIs, database connectors, file systems, CI/CD pipelines, IDE plugins, and emerging standards like MCP. This layer determines what actions the model can take beyond generating text.

Model Files APIs IDEs Databases Git Tools

09.1 Protocols & Standards

MCP, function calling, structured outputs, A2A.

Key concepts

  • MCP standardization
  • Function calling as universal interface
  • JSON mode
  • Agent interoperability

Practical skills

  • Build MCP server
  • Implement function calling

09.2 Code & Dev Tools

Cursor, Windsurf, Aider, Copilot.

Key concepts

  • Completion vs chat vs agent depth
  • Editor vs terminal tools
  • Codebase context

Practical skills

  • Set up coding assistant
  • Configure context

09.3 Connectors

Composio, Zapier AI, Vercel AI SDK.

Key concepts

  • Pre-built vs custom integrations
  • OAuth management
  • SDK vs platform

Practical skills

  • Connect via connector platform
  • Implement streaming

See this layer in the value chain → L09 Integrations

Eval determines whether the system works. Safety determines whether it fails safely. Without evals, every prompt change is a blind experiment. Without guardrails, harmful or incorrect outputs reach users unchecked. This layer should be the first thing built, not the last.

Output Format Safety Quality Pass / Fail

10.1 Evaluation

Promptfoo, Braintrust, DeepEval. Test suites and CI eval.

Key concepts

  • Golden test sets
  • Benchmark suites
  • CI-integrated evals
  • Model-as-judge

Practical skills

  • Build golden test set
  • Set up Promptfoo
  • CI integration

10.2 Observability

Langfuse, LangSmith, Phoenix. Production monitoring.

Key concepts

  • Request tracing
  • Cost tracking
  • Latency monitoring
  • Quality trending

Practical skills

  • Set up Langfuse
  • Build dashboards
  • Alerting

10.3 Guardrails & Security

Guardrails AI, Llama Guard, Lakera.

Key concepts

  • Content safety classification
  • Prompt injection detection
  • Format validation
  • Layered defense

Practical skills

  • Implement output validation
  • Prompt injection detection

10.4 Formal Verification

Lean 4, Dafny, DeepSeek-Prover-V2. Proof, not testing.

Key concepts

  • Theorem provers vs verification-aware languages
  • Proof-carrying code
  • LLM-assisted proof generation
  • Vericoding

Practical skills

  • Read a Lean 4 proof
  • Write a Dafny specification
  • Evaluate proof vs test tradeoffs

See this layer in the value chain → L10 Eval & Safety

The product layer is where infrastructure becomes useful to people. Products combine inference, routing, orchestration, context, and integrations into something humans interact with. The key question at this layer is defensibility — if your product is a thin API wrapper, the model provider will build your feature natively.

Chatbot Copilot Agent Search Creative Vertical

11.1 Assistants & Copilots

ChatGPT, Claude, Gemini, GitHub Copilot.

Key concepts

  • Assistant vs copilot
  • General vs domain-specific
  • Context and tools as differentiators

Practical skills

  • Evaluate products for workflow
  • Design copilot integration

11.2 Autonomous Agents

Devin, Claude Code, Codex.

Key concepts

  • Agent-as-product vs agent-as-feature
  • Trust verification
  • Escalation
  • Pricing

Practical skills

  • Evaluate agent reliability
  • Design human oversight

11.3 Creative Tools

Midjourney, Runway, Suno, Lovable.

Key concepts

  • Image/video/music/code generation categories
  • Quality vs control
  • Iterative refinement

Practical skills

  • Use creative tools effectively
  • Design AI+human workflows

11.4 Vertical AI

Harvey, Abridge, Writer, Glean.

Key concepts

  • Domain context as moat
  • Regulatory barriers
  • Workflow integration depth
  • Thin wrapper risk

Practical skills

  • Identify vertical moats
  • Evaluate build-vs-buy

See this layer in the value chain → L11 Products