Video
Efficient Distributed Orthonormal Optimizers for Large-Scale Training
Kwangjun delivered a 50-minute technical talk on recent advances in orthonormal update methods for large-scale AI model training. This topic has been rapidly gaining attention in the community, emerging as a strong successor to AdamW…