Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
28:10
Fine-tuning Whisper to learn my Chinese dialect (Teochew)
44:06
LLM inference optimization: Architecture, KV cache and Flash attention
31:51
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
43:31
Building Brain-Like Memory for AI | LLM Agent Memory Systems
40:08
The Most Important Algorithm in Machine Learning
20:18
Why Does Diffusion Work Better than Auto-Regression?
50:55
Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
29:55