Learn — ctx

01 Compute

Everything in AI runs on matrix math, and compute is the hardware that executes it. Understanding GPU architecture, cloud vs on-prem tradeoffs, and cost structures is the foundation. You do not need to buy GPUs to understand this layer — but you need to know what drives the cost of every layer above it.

01.1 Silicon

GPU vs TPU vs custom silicon. Memory bandwidth is the bottleneck, not FLOPS. CUDA dominates.

Key concepts

Hardware tradeoffs
Memory bandwidth
CUDA lock-in
Chip generations

Practical skills

Read GPU specs
Compare silicon options

01.2 GPU Clouds

CoreWeave, Lambda, RunPod, Modal. Reserved vs on-demand vs spot pricing.

Key concepts

GPU cloud vs hyperscaler
Serverless GPU
Marketplace models

Practical skills

Compare cloud GPU pricing
Estimate compute costs

01.3 Training Infra

Distributed training software. Data/model/pipeline parallelism.

Key concepts

Parallelism strategies
NCCL
Distributed frameworks

Practical skills

Set up multi-GPU training
Choose parallelism strategy

See this layer in the value chain → L01 Compute

02 Pre-Training

Pre-training builds foundation models from massive datasets — trillions of tokens of text, code, and multimodal data. This is the most capital-intensive layer ($50M-$500M+ per frontier model). Most teams never touch pre-training directly, but understanding how base models are made explains their capabilities and limitations.

02.1 Foundation Models

Labs building base models. Closed-weight vs open-weight.

Key concepts

Model families and architectures
MoE vs dense
Scaling laws

Practical skills

Compare models by capability and openness

02.2 Training Data

Quality over quantity. Web crawls, annotation, synthetic data.

Key concepts

Data quality shift
Deduplication
Programmatic labeling
Synthetic generation

Practical skills

Evaluate training datasets
Understand data-behavior link

02.3 Training Frameworks

HF Transformers, nanoGPT. Training loop implementations.

Key concepts

Tokenizer design
Checkpoint management
Mixed precision

Practical skills

Read model implementations
Interpret training curves

See this layer in the value chain → L02 Pre-Training

03 Post-Training

Post-training transforms base models into useful tools. SFT teaches instruction following, RLHF and DPO align outputs with human preferences, LoRA enables efficient adaptation, and distillation creates smaller specialized models. This is where most teams first interact with the training pipeline — fine-tuning a 70B model with LoRA can cost under $100.

03.1 Alignment

SFT, RLHF, DPO. Making models useful and safe.

Key concepts

SFT as foundation
RLHF vs DPO
Constitutional AI
Preference data quality

Practical skills

Prepare SFT datasets
Choose alignment technique

03.2 Efficient Adaptation

LoRA, QLoRA, distillation, merging.

Key concepts

Parameter-efficient fine-tuning
Distillation
Model merging
Adapter composition

Practical skills

Run LoRA jobs
Evaluate fine-tuned quality

03.3 Managed Fine-Tuning

Predibase, Together AI, Replicate. Upload and tune.

Key concepts

Managed vs self-hosted tradeoffs
Cost comparison
Data privacy

Practical skills

Fine-tune via API
Compare platforms

03.4 Benchmarking

Evaluating fine-tuned model quality. Standard benchmarks, A/B comparison, regression testing.

Key concepts

Standard benchmarks (MMLU, HumanEval, MATH)
A/B comparison and arena-style ranking
Regression testing post-training
Task-specific eval vs general capability

Practical skills

Run standard benchmarks on fine-tuned models
Set up regression tests for model quality

See this layer in the value chain → L03 Post-Training

04 Inference

Inference is running trained models to produce outputs — every AI application consumes it. The engineering challenges are latency, throughput, cost, and hardware utilization. The choice between cloud APIs, self-hosted engines, and local runners determines your cost structure, data privacy posture, and scaling ceiling.

04.1 Serving Engines

vLLM, llama.cpp, TGI, SGLang. Self-hosted runtimes.

Key concepts

PagedAttention
Continuous batching
Quantization formats
Speculative decoding

Practical skills

Deploy vLLM
Quantize models
Benchmark performance

04.2 Provider APIs

OpenAI, Anthropic, Vertex, Bedrock. Hosted endpoints.

Key concepts

Token pricing
Rate limits
Data privacy
Multi-provider strategy

Practical skills

Estimate API costs
Implement retry logic

04.3 Local Runners

Ollama, LM Studio, MLX. Desktop/edge inference.

Key concepts

Hardware requirements
Quantization tradeoffs
Apple Silicon vs NVIDIA

Practical skills

Set up Ollama
Choose quantization levels

04.4 Multimodal

Voice, image, video. ElevenLabs, Whisper, Sora.

Key concepts

STT/TTS pipelines
Diffusion architectures
Real-time latency requirements

Practical skills

Integrate TTS APIs
Choose multimodal providers

04.5 Optimization

Reducing token usage and inference cost. Prompt caching, compression, structured outputs for efficiency.

Key concepts

Prompt caching (prefix and semantic)
Prompt compression techniques
KV cache optimization
Structured outputs for token efficiency

Practical skills

Enable prompt caching with a provider API
Measure and reduce token usage

See this layer in the value chain → L04 Inference

05 Routing

Routing decides which model handles which request. Without it, you are locked to a single provider. With it, you can optimize for cost, latency, and quality per task. The simplest routing is a fallback chain; the most sophisticated classifies requests and dispatches to the best-fit model automatically.

05.1 Gateways & Proxies

OpenRouter, LiteLLM, Portkey. Unified API access.

Key concepts

OpenAI-compatible format
Caching
Load balancing
Format translation

Practical skills

Set up LiteLLM proxy
Configure multi-provider access

05.2 Intelligent Routing

Martian, Not Diamond, Unify. Quality/cost optimization.

Key concepts

Task classification
Quality prediction
Benchmark-driven routing

Practical skills

Implement task-type routing
Set cost-based rules

05.3 Observability Proxies

Helicone, Keywords AI. Routing with monitoring.

Key concepts

Request logging
Cost tracking
Latency monitoring

Practical skills

Set up cost tracking
Build usage dashboards

See this layer in the value chain → L05 Routing

06 Harness

Agent = Model + Harness. The harness is everything that is NOT the model — the infrastructure that makes a single agent effective. Sandboxes, state, tools, verification, and constraints. The model is a commodity; the harness is the durable asset.

06.1 Execution Environments

E2B, Castari, Kernel AI. Sandboxes and containers for agent code.

Key concepts

Sandboxing for code-executing agents
Cloud sandbox vs local vs unikernels
Resource limits and network policies
Cold start latency

06.2 State & Continuity

File-driven state machines, crash recovery, session bridging.

Key concepts

File-driven state vs in-memory state
Crash recovery and checkpointing
Git-tracked task files

06.3 Tool Infrastructure

Tool registries, code execution, MCP runtime, file system access.

Key concepts

Tool registries and discovery
Code execution infrastructure
Browser automation for verification

06.4 Verification Loops

Write-test-fix cycles, self-verification, quality gates.

Key concepts

Write-test-fix as core agent pattern
Automated test execution as oracle
Must-have verification (see Superpowers for hard-gate enforcement — agents must show evidence before claiming completion)

06.5 Context Management

Compaction, progressive disclosure, smart zone optimization.

Key concepts

~40% context utilization sweet spot
Pre-inlining vs discovery
AGENTS.md / CLAUDE.md as static context

06.6 Constraints & Linting

Architectural linters, structural tests, dependency enforcement.

Key concepts

Constraints multiply agent effectiveness
Dependency layering
Error messages as remediation instructions

See this layer in the value chain → L06 Harness

07 Orchestration

Orchestration coordinates multiple agents and multi-step workflows. While the harness (L06) makes a single agent effective, orchestration is how agents work together — delegation, handoffs, human oversight gates, and structured pipelines.

07.1 Agent Patterns

LangGraph, CrewAI, Anthropic SDK. The agent loop.

Key concepts

Plan-act-observe-iterate
Single vs multi-agent
First-party vs third-party
Framework vs raw SDK

07.2 Workflow Design

Dify, n8n, Trigger.dev. Pipelines and DAGs.

Key concepts

Task decomposition
Chain/branch/loop patterns
Cost budgets
Handoff contracts

07.3 Human Oversight

The control spectrum. Approval gates and escalation.

Key concepts

Human-in-the-loop vs human-on-the-loop vs bounded autonomy
Reversibility/stakes/confidence
Trust boundaries

07.4 Multi-Agent

Agent teams, delegation, handoffs, specialization.

Key concepts

Agent specialization — planner, researcher, coder, reviewer
Delegation patterns and handoff contracts (see Superpowers' subagent-driven development — fresh agent per task with spec + quality reviewers)
Shared state vs message-passing (see ClawTeam for a filesystem-as-message-bus implementation)
Conflict resolution

See this layer in the value chain → L07 Orchestration

08 Context

Context is everything the model sees beyond training data. RAG pipelines, vector databases, embeddings, memory systems, and knowledge management all live here. The quality of your context determines the quality of your outputs — a capable model with bad context produces bad results. Context engineering is the discipline of designing what enters the window.

08.1 Vector Storage

Pinecone, Weaviate, Chroma, Qdrant, pgvector.

Key concepts

ANN search algorithms
Managed vs self-hosted
pgvector
Index tuning

Practical skills

Set up vector DB
Tune index parameters

08.2 Retrieval & Search

RAG pipelines, parsing, reranking.

Key concepts

Document parsing quality
Chunking strategy
Hybrid search
Reranking
Documentation-as-context

Practical skills

Build RAG pipeline
Design chunking
Add reranking

08.3 Memory Systems

Conversation and long-term memory.

Key concepts

Volatile vs durable memory
Sliding window/summarization
Cross-session persistence

Practical skills

Implement conversation memory
Design long-term memory

08.4 Embeddings

Voyage AI, Jina, HF TEI, Sentence Transformers.

Key concepts

Model quality variation
Domain-specific embeddings
Dimensions
Batch embedding

Practical skills

Choose embedding model
Set up TEI serving

See this layer in the value chain → L08 Context

09 Integrations

Integrations connect AI systems to the tools, APIs, and data sources they need to be useful. Function calling is the core interface — models emit structured tool invocations, and the integration layer routes them to the right target. This spans REST APIs, database connectors, file systems, CI/CD pipelines, IDE plugins, and emerging standards like MCP. This layer determines what actions the model can take beyond generating text.

09.1 Protocols & Standards

MCP, function calling, structured outputs, A2A.

Key concepts

MCP standardization
Function calling as universal interface
JSON mode
Agent interoperability

Practical skills

Build MCP server
Implement function calling

09.2 Code & Dev Tools

Cursor, Windsurf, Aider, Copilot.

Key concepts

Completion vs chat vs agent depth
Editor vs terminal tools
Codebase context

Practical skills

Set up coding assistant
Configure context

09.3 Connectors

Composio, Zapier AI, Vercel AI SDK.

Key concepts

Pre-built vs custom integrations
OAuth management
SDK vs platform

Practical skills

Connect via connector platform
Implement streaming

See this layer in the value chain → L09 Integrations

10 Eval & Safety

Eval determines whether the system works. Safety determines whether it fails safely. Without evals, every prompt change is a blind experiment. Without guardrails, harmful or incorrect outputs reach users unchecked. This layer should be the first thing built, not the last.

10.1 Evaluation

Promptfoo, Braintrust, DeepEval. Test suites and CI eval.

Key concepts

Golden test sets
Benchmark suites
CI-integrated evals
Model-as-judge

Practical skills

Build golden test set
Set up Promptfoo
CI integration

10.2 Observability

Langfuse, LangSmith, Phoenix. Production monitoring.

Key concepts

Request tracing
Cost tracking
Latency monitoring
Quality trending

Practical skills

Set up Langfuse
Build dashboards
Alerting

10.3 Guardrails & Security

Guardrails AI, Llama Guard, Lakera.

Key concepts

Content safety classification
Prompt injection detection
Format validation
Layered defense

Practical skills

Implement output validation
Prompt injection detection

10.4 Formal Verification

Lean 4, Dafny, DeepSeek-Prover-V2. Proof, not testing.

Key concepts

Theorem provers vs verification-aware languages
Proof-carrying code
LLM-assisted proof generation
Vericoding

Practical skills

Read a Lean 4 proof
Write a Dafny specification
Evaluate proof vs test tradeoffs

See this layer in the value chain → L10 Eval & Safety

11 Products

The product layer is where infrastructure becomes useful to people. Products combine inference, routing, orchestration, context, and integrations into something humans interact with. The key question at this layer is defensibility — if your product is a thin API wrapper, the model provider will build your feature natively.

11.1 Assistants & Copilots

ChatGPT, Claude, Gemini, GitHub Copilot.

Key concepts

Assistant vs copilot
General vs domain-specific
Context and tools as differentiators

Practical skills

Evaluate products for workflow
Design copilot integration

11.2 Autonomous Agents

Devin, Claude Code, Codex.

Key concepts

Agent-as-product vs agent-as-feature
Trust verification
Escalation
Pricing

Practical skills

Evaluate agent reliability
Design human oversight

11.3 Creative Tools

Midjourney, Runway, Suno, Lovable.

Key concepts

Image/video/music/code generation categories
Quality vs control
Iterative refinement

Practical skills

Use creative tools effectively
Design AI+human workflows

11.4 Vertical AI

Harvey, Abridge, Writer, Glean.

Key concepts

Domain context as moat
Regulatory barriers
Workflow integration depth
Thin wrapper risk

Practical skills

Identify vertical moats
Evaluate build-vs-buy

See this layer in the value chain → L11 Products

Understanding the full stack.

01.1 Silicon

01.2 GPU Clouds

01.3 Training Infra

02.1 Foundation Models

02.2 Training Data

02.3 Training Frameworks

03.1 Alignment

03.2 Efficient Adaptation

03.3 Managed Fine-Tuning

03.4 Benchmarking

04.1 Serving Engines

04.2 Provider APIs

04.3 Local Runners

04.4 Multimodal

04.5 Optimization

05.1 Gateways & Proxies

05.2 Intelligent Routing

05.3 Observability Proxies

06.1 Execution Environments

06.2 State & Continuity

06.3 Tool Infrastructure

06.4 Verification Loops

06.5 Context Management

06.6 Constraints & Linting

07.1 Agent Patterns

07.2 Workflow Design

07.3 Human Oversight

07.4 Multi-Agent

08.1 Vector Storage

08.2 Retrieval & Search

08.3 Memory Systems

08.4 Embeddings

09.1 Protocols & Standards

09.2 Code & Dev Tools

09.3 Connectors

10.1 Evaluation

10.2 Observability

10.3 Guardrails & Security

10.4 Formal Verification

11.1 Assistants & Copilots

11.2 Autonomous Agents

11.3 Creative Tools

11.4 Vertical AI