Quantization vs Pruning vs Distillation: Optimizing NNs for Inference · Minideo

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

28:10

Fine-tuning Whisper to learn my Chinese dialect (Teochew)

3:31:24

Deep Dive into LLMs like ChatGPT

20:18

Why Does Diffusion Work Better than Auto-Regression?

9:17

8 Best AI Apps for Studying: Boost Your Learning!

12:10

Optimize Your AI - Quantization Explained

25:21

Model Distillation: Same LLM Power but 3240x Smaller

10:41

AI Inference: The Secret to AI's Superpowers

44:06

LLM inference optimization: Architecture, KV cache and Flash attention