AI in production is not a research project — it is a systems problem. This module builds the baseline: what models exist, what they are good at, where they fail, and how to think about AI as infrastructure rather than magic. Teams leave with a shared vocabulary and a clear picture of the landscape they are operating in.
- Model categories: completion models, chat models, embedding models, image models, audio models, and multimodal models. What each category does, what it costs, and where it fits in a production stack.
- Context window mechanics: how tokens work, what fits in a context window, how truncation happens, and why token budgeting is a core engineering skill — not an afterthought.
- Local-first vs cloud-first: when to keep data and computation local, when to use cloud endpoints, and how to make that decision based on sensitivity, latency, cost, and reliability requirements.
- Tooling survey: IDEs (VS Code, Cursor, Windsurf), terminals (iTerm2, Warp), orchestration frameworks (LangChain, LlamaIndex, custom), and agent runners (Claude Code, Codex, OpenHands). What each tool does and when to use it.
- Production vs prototype: the gap between a demo that works and a system that works reliably at scale. Error rates, latency budgets, cost projections, and failure mode analysis.
Deliverables: AI foundations decision document, glossary of terms the team will use consistently, and baseline prompt templates for common tasks.