Education
Relevant Coursework: High Performance Machine Learning, Theoretical Foundations for LLMs, High Dimensional Stats for Biomedical Data, Advanced Programming, Data Structure, Calculus III, Linear Algebra, Machine Learning - Stanford (Coursera), Deep Learning Specialization - DeepLearning.AI (Coursera)
Publications
Adams E, Bai L, Lee M, Yiyang Yu, AlQuraishi M., From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models, BioRxiv, Feb 2025.
Ziqi Tang, Nirali Somia, Yiyang Yu, Peter K Koo., Evaluating the representational power of pre-trained DNA language models for regulatory genomics, BioRxiv, Sep 2024.
Yiyang Yu, Shivani Muthukumar, Peter K Koo, EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow, Bioinformatics, Volume 40, Issue 3, March 2024.
Selected Honors
- 2025 Goldwater Scholar
- 2024 Kaggle Competitions Master (3xGold, 4x Silver) Current Global Ranking: 133 / 203,363 (Top 0.065%)
- 2024 HackMIT Intersystems Challenge 1st Place $2000 prize
- 2024 MayPro Special Prize at Columbia Healthcare Hackathon
- 2024 Gilbert Family Scholarship from Columbia Engineering
- 2021 British Physics Olympiad Senior Physics Challenge Gold Prize (Global Ranking) Top 5%
Work Experience
- Developing a multimodal transformer model to predict binding affinities of protein-molecule bindings.
- Optimized the training and inference pipeline of protein language models by up to 30% by reimplementing models using flash attention and provided a validation method to evaluate data from wet-lab experiments.
- Designing algorithms to select a maximally diversified set of proteins for molecular dynamics simulations, generating data to support subsequent model development for predicting protein conformation trajectories.
- Developing methods to extract protein conformational ensembles from AlphaFold2 through latent space exploration and systematically created a benchmark library to compare with existing methods.
- Established fine-tuning pipelines with Low-Rank Adaptation (LoRA) and Supervised Fine-Tuning (SFT) for four pre-trained DNA language models to evaluate its representational power for regulatory genomics.
- Developed and implemented evolution-inspired data augmentations (EvoAug-TF) in TensorFlow for genomic deep neural networks and demonstrated its improvement in generalization and interpretability.
- Designed and evaluated more than 100 deep-learning models for predicting DNA promoters' expression rates using Python, TensorFlow, and WandB in the 2022 DREAM Challenge. Placed 7th in the final leaderboard.
- Developed a continuous individual crisis aid alert system (CICaidA), manufactured the prototypes, and tested the hardware backend interface system.
- Guided students in ESE123 to conduct experiments and write lab reports in the lab; hosted weekly office hours.
Projects
- Developed deep learning models to predict small molecule-protein interactions using the Big Encoded Library for Chemical Assessment. Implemented over 40 types of DL models, including CNNs, GNNs, Transformers, RNNs, and GDBT Models, and finally developed a robust solution after using up all 480 submissions.
Leadership Experience
- Co-leading the executive board to organize the annual Biotech Summit, making podcasts with professionals from industries and academia, established computational biology data science competition teams, and oversaw multiple project committees promoting interdisciplinary education at the intersection of biology, medicine, and computer science.
- Organizing and facilitating a series of biotech speaker events and investor dinners, connecting industry professionals with Columbia student entrepreneurs, resulting in mentorship opportunities and potential funding partnerships for early-stage biotech ventures.
- Hosted biweekly machine learning/python hands-on workshops to cultivate members in their interests in AI.
- Collaborated with various professors and graduate students to provide research opportunities to undergraduates.