Yash Jain

ML Scientist II, Microsoft

051822_HeadShotDay_-236.jpg

Welcome! I am a researcher at Microsoft specializing in the intersection of diffusion models—which are generative models used to create synthetic data—and multimodal large-language models that integrate various data types such as text and images. I collaborate closely with Vibhav Vineet on projects aimed at enhancing the capabilities of these models.

Previously, I graduated from Georgia Tech and finished my thesis under the mentorship of Zsolt Kira. Before that, I earned my bachelor’s in Computer Science from IIT Bombay, where I received an excellence in research award under the guidance of Soumen Chakrabarti.

Feel free to connect with me through the social links below.

news

Jun 05, 2023 Joined Microsoft as an ML Scientist II at Redmond!
Aug 05, 2022 Applied Scientist Intern at Amazon Alexa Team! Excited to train large-scale audio-visual models from scratch!
May 03, 2021 Finished B.Tech., got Excellence in Research Award from the department!

selected publications

  1. peekaboo.gif
    PEEKABOO: Interactive Video Generation via Masked-Diffusion
    Yash Jain, Anshul Nasery, Vibhav Vineet, and Harkirat Behl
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  2. damex.jpg
    DAMEX: Dataset-aware Mixture-of-Experts for Visual Understanding of Mixture-of-Datasets
    Yash Jain, Harkirat Behl, Zsolt Kira, and Vibhav Vineet
    In Advances in Neural Information Processing Systems (NeurIPS), 2023
  3. 3m.jpg
    Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
    Yash Jain, D. Chan, P. Dheram, A. Khare, O. Shonibare, and 2 more authors
    In Joint International Conference on Computational Linguistics and Language Resources and Evaluation (LREC-COLING), 2024
  4. collossl.png
    Collossl: Collaborative Self-Supervised Learning for Human Activity Recognition
    Yash Jain, Chi Ian Tang, Chulhong Min, Fahim Kawsar, and Akhil Mathur
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (UbiComp), 2022
  5. rfid.png
    RFID Tattoo: A Wireless Platform for Speech Recognition
    Jingxian Wang, Chengfeng Pan, Haojian Jin, Vaibhav Singh, Yash Jain, and 3 more authors
    ACM Interactive, Mobile, Wearable and Ubiquitous Technologies (UbiComp), 2020