How Transformers Learn Causal Structure with Gradient Descent

39:20
Pseudo-Labeling for Covariate Shift Adaptation

20:33
Gradient descent, how neural networks learn | DL2

57:45
Visualizing transformers and attention | Talk for TNG Big Tech Day '24

1:09:38
Modern paradigms of generalization, the heliocentric model of Aristarchus,...

53:31
Charlie Snell, UC Berkeley. Title: Scaling LLM Test-Time Compute

48:04
When reinforcement learning meets quantum computing

1:35:18
VASP Workshop at NERSC: Basics: DFT, plane waves, PAW method, electronic minimization, Part 1

27:14