Yash Jain

Welcome! I am a researcher at Microsoft specializing in the intersection of diffusion models—which are generative models used to create synthetic data—and multimodal large-language models that integrate various data types such as text and images. I collaborate closely with Vibhav Vineet on projects aimed at enhancing the capabilities of these models.

Previously, I graduated from Georgia Tech and finished my thesis under the mentorship of Zsolt Kira. Before that, I earned my bachelor’s in Computer Science from IIT Bombay, where I received an excellence in research award under the guidance of Soumen Chakrabarti.

Reach out by email if you wish to collaborate!

news

Mar 13, 2025	Local Prompt Optimization Paper accepted at NAACL 2025 for Oral Presentation (Main Conference)!
Jun 05, 2023	Joined Microsoft as an ML Scientist II at Redmond!
Aug 05, 2022	Applied Scientist Intern at Amazon Alexa Team! Excited to train large-scale audio-visual models from scratch!
May 03, 2021	Finished B.Tech., got Excellence in Research Award from the department!

selected publications

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Yash Jain, Anshul Nasery, Vibhav Vineet, and Harkirat Behl

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Awarded PDF Code

Invited Talk at 5th Large Scale Holistic Video Understanding Workshop
DAMEX: Dataset-aware Mixture-of-Experts for Visual Understanding of Mixture-of-Datasets

Yash Jain, Harkirat Behl, Zsolt Kira, and Vibhav Vineet

In Advances in Neural Information Processing Systems (NeurIPS), 2023

PDF Code
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

Yash Jain, D. Chan, P. Dheram, A. Khare, O. Shonibare, and 2 more authors

In Joint International Conference on Computational Linguistics and Language Resources and Evaluation (LREC-COLING), 2024

PDF
Collossl: Collaborative Self-Supervised Learning for Human Activity Recognition

Yash Jain, Chi Ian Tang, Chulhong Min, Fahim Kawsar, and Akhil Mathur

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (UbiComp), 2022

PDF Code
RFID Tattoo: A Wireless Platform for Speech Recognition

Jingxian Wang, Chengfeng Pan, Haojian Jin, Vaibhav Singh, Yash Jain, and 3 more authors

ACM Interactive, Mobile, Wearable and Ubiquitous Technologies (UbiComp), 2020

Awarded PDF

Best Long Paper Award, IJCAI 2021 SC Best Papers