Ampliación de modelos de difusión enmascarada en texto
1:02:30
Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
40:14
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
52:39
WARP: On the Benefits of Weight Averaged Rewarded Policies
35:52
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
32:31
Round and Round We Go! What makes Rotary Positional Encodings useful?
45:05
Byte Latent Transformer: Patches Scale Better Than Tokens
45:48
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
38:55