Education

Columbia University in the City of New York
New York, NY
B.S. Biomedical Engineering, Minor in Computer Science. GPA: 3.95/4.0
Expected 2026

Relevant Coursework: High Performance Machine Learning, Theoretical Foundations for LLMs, High Dimensional Stats for Biomedical Data, Advanced Programming, Data Structure, Calculus III, Linear Algebra, Machine Learning - Stanford (Coursera), Deep Learning Specialization - DeepLearning.AI (Coursera)

Stony Brook University (SBU)
Stony Brook, NY
B.E. Electrical Engineering. GPA: 4.0/4.0
2022-2023

Publications

Adams E, Bai L, Lee M, Yiyang Yu, AlQuraishi M., From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models, BioRxiv, Feb 2025.

Ziqi Tang, Nirali Somia, Yiyang Yu, Peter K Koo., Evaluating the representational power of pre-trained DNA language models for regulatory genomics, BioRxiv, Sep 2024.

Yiyang Yu, Shivani Muthukumar, Peter K Koo, EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow, Bioinformatics, Volume 40, Issue 3, March 2024.

Rafi, A.M., Nogina, D., Penzar, D. et al. A community effort to optimize sequence-based deep learning models of gene regulation. Nat Biotechnol (2024). (Name in Consortium)

Selected Honors

  • 2025 Goldwater Scholar
  • 2024 Kaggle Competitions Master (3xGold, 4x Silver) Current Global Ranking: 133 / 203,363 (Top 0.065%)
  • 2024 HackMIT Intersystems Challenge 1st Place $2000 prize
  • 2024 MayPro Special Prize at Columbia Healthcare Hackathon
  • 2024 Gilbert Family Scholarship from Columbia Engineering
  • 2021 British Physics Olympiad Senior Physics Challenge Gold Prize (Global Ranking) Top 5%

Work Experience

Leash Bio
Salt Lake City, UT (Remote)
Machine Learning Intern
Aug 2024 - Present
  • Developing a multimodal transformer model to predict binding affinities of protein-molecule bindings.
  • Optimized the training and inference pipeline of protein language models by up to 30% by reimplementing models using flash attention and provided a validation method to evaluate data from wet-lab experiments.
Columbia University Irving Medical Center (Mohammed AlQuraishi Lab)
New York, NY
SURF Fellow / Undergraduate Researcher
December 2023 - Present
  • Designing algorithms to select a maximally diversified set of proteins for molecular dynamics simulations, generating data to support subsequent model development for predicting protein conformation trajectories.
  • Developing methods to extract protein conformational ensembles from AlphaFold2 through latent space exploration and systematically created a benchmark library to compare with existing methods.
Cold Spring Harbor Laboratory (Peter Koo Lab)
Cold Spring Harbor, NY
Research Intern
March 2022 - December 2023
  • Established fine-tuning pipelines with Low-Rank Adaptation (LoRA) and Supervised Fine-Tuning (SFT) for four pre-trained DNA language models to evaluate its representational power for regulatory genomics.
  • Developed and implemented evolution-inspired data augmentations (EvoAug-TF) in TensorFlow for genomic deep neural networks and demonstrated its improvement in generalization and interpretability.
  • Designed and evaluated more than 100 deep-learning models for predicting DNA promoters' expression rates using Python, TensorFlow, and WandB in the 2022 DREAM Challenge. Placed 7th in the final leaderboard.
Bioengineering Education, Application and Research (BEAR)
Stony Book, NY
Research Assistant
December 2022 - August 2023
  • Developed a continuous individual crisis aid alert system (CICaidA), manufactured the prototypes, and tested the hardware backend interface system.
SBU Electrical Engineering Department
Stony Book, NY
Teaching Assistant
January 2023 - May 2023
  • Guided students in ESE123 to conduct experiments and write lab reports in the lab; hosted weekly office hours.

Projects

Gold, NeurIPS 2024 - Predict New Medicines with BELKA, Kaggle (13/1946)
April 2024 - July 2024
  • Developed deep learning models to predict small molecule-protein interactions using the Big Encoded Library for Chemical Assessment. Implemented over 40 types of DL models, including CNNs, GNNs, Transformers, RNNs, and GDBT Models, and finally developed a robust solution after using up all 480 submissions.

Leadership Experience

Co-President, System Biology Initiative (SBI)
December 2024 - Present
  • Co-leading the executive board to organize the annual Biotech Summit, making podcasts with professionals from industries and academia, established computational biology data science competition teams, and oversaw multiple project committees promoting interdisciplinary education at the intersection of biology, medicine, and computer science.
HardCORE Initiative Co-Lead, Columbia Organizing of Rising Entrepreneurs (CORE)
January 2024 - Present
  • Organizing and facilitating a series of biotech speaker events and investor dinners, connecting industry professionals with Columbia student entrepreneurs, resulting in mentorship opportunities and potential funding partnerships for early-stage biotech ventures.
Founder and President, Artificial Intelligence Community at SBU
December 2022 - July 2023
  • Hosted biweekly machine learning/python hands-on workshops to cultivate members in their interests in AI.
  • Collaborated with various professors and graduate students to provide research opportunities to undergraduates.

Technical Skills

Languages: Python, Javascript, Typescript, Java, C, C++, HTML/CSS, LaTeX, Bash, MATLAB
Libraries: TensorFlow, PyTorch, Numpy, Pandas, WandB, OpenCV, HuggingFace, React, Node, Matplotlib
Tools: Git, GCP, HPC, Slurm, Jupyter, Fusion 360, Photoshop, Premiere Pro, Unity 3D, Blender