What Is a Model?
Foundation vocabulary for machine learning: parameters, weights, logits, training vs inference, and why neural networks work
Atomic knowledge units covering distributed systems fundamentals. Each concept is interview-ready with real-world examples, production insights, and visual explanations.
Machine learning and AI engineering fundamentals
Foundation vocabulary for machine learning: parameters, weights, logits, training vs inference, and why neural networks work
Foundation for softmax, cross-entropy, temperature scaling, and sampling in AI systems
Geometric intuitions for vectors, cosine similarity, dot products, and matrix multiplication in AI
How neural networks learn: gradients, chain rule, vanishing gradients, and residual connections
Reference for cross-entropy, MSE, perplexity, and contrastive loss in training and evaluation
ReLU, GELU, SwiGLU, softmax, and sigmoid: what they do and when to use them
Classification and retrieval metrics: precision, recall, F1, perplexity, MRR, and NDCG
LayerNorm, BatchNorm, RMSNorm: what they do, when to use them, and Pre-Norm vs Post-Norm
Dropout, weight decay, early stopping, and label smoothing to prevent overfitting
SGD, Adam, AdamW, learning rate schedules, warmup, and gradient clipping for training
GPU memory, precision formats, quantization (INT4/INT8), and practical GPU selection for LLMs
Session, short-term, long-term, and episodic memory for AI agents and chatbots
Design patterns and architectures
Design principle where data structures cannot be modified after creation, simplifying distributed systems by eliminating update conflicts and race conditions
Automatic switching to a backup system or replica when the primary fails, ensuring service continuity with minimal downtime
Periodically saving processing state to enable recovery from failures without reprocessing all data from the beginning
The minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful, ensuring consistency despite failures
Operations that produce the same result when applied multiple times, critical for reliable distributed systems with retries and duplicate message handling
Distributing incoming requests across multiple servers to optimize resource utilization, minimize latency, and prevent any single server from becoming a bottleneck
A consistency model where updates eventually propagate to all replicas, prioritizing availability over immediate consistency in distributed systems
How distributed systems achieve fault tolerance and high availability by replicating data from a leader node to multiple follower nodes
How distributed systems copy data across multiple nodes to achieve high availability, fault tolerance, and geographic distribution—and the fundamental trade-offs involved
An architectural pattern that stores all changes to application state as a sequence of events, enabling complete audit trails and time-travel capabilities
An architectural pattern that separates read and write operations into distinct models, optimizing each for its specific use case
How distributed systems agree on a single value or state across multiple nodes, enabling coordination despite failures and network partitions
Event streaming and communication
Mechanisms by which message producers receive confirmation that their messages were successfully persisted, enabling reliability tradeoffs between latency and durability
How message producers batch records to achieve high throughput by amortizing network overhead and maximizing sequential I/O
How distributed systems divide data into partitions for parallel processing, ordering guarantees, and horizontal scalability
How distributed messaging systems track consumer progress through partitions using offsets, enabling fault tolerance, exactly-once processing, and replay capabilities
How multiple consumers coordinate to process partitions in parallel with fault tolerance, automatic rebalancing, and exactly-once guarantees
How distributed messaging systems guarantee each message is processed exactly once, eliminating duplicates while ensuring atomicity across multiple operations
Data persistence and retrieval
A technique where changes are written to a durable log before being applied to the database, enabling crash recovery and replication in database systems
How distributed systems use append-only logs for durable, ordered, and high-throughput data storage with time-travel and replay capabilities
How databases horizontally partition data across multiple servers for scalability, using partition keys to distribute and route data efficiently
The four guarantees that database transactions provide: Atomicity, Consistency, Isolation, and Durability—and how they enable reliable data operations
Core principles and building blocks
Monitoring and debugging
Explore our in-depth technical series that connect these concepts into production-grade knowledge.
Explore Deep Dives