Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
28:10
Fine-tuning Whisper to learn my Chinese dialect (Teochew)
50:55
Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
9:28
How ChatGPT Cheaps Out Over Time
20:18
Why Does Diffusion Work Better than Auto-Regression?
25:21
Model Distillation: Same LLM Power but 3240x Smaller
44:06
LLM inference optimization: Architecture, KV cache and Flash attention
1:01:20
tinyML Talks: A Practical Guide to Neural Network Quantization
31:51