Education

Columbia University in the City of New York
New York, NY
B.S. Biomedical Engineering, Minor in Computer Science. GPA: 3.95/4.0
Expected 2026

Relevant Coursework: High Performance Machine Learning, Theoretical Foundations for LLMs, High Dimensional Stats for Biomedical Data, Advanced Programming, Data Structure, Calculus III, Linear Algebra, Machine Learning - Stanford (Coursera), Deep Learning Specialization - DeepLearning.AI (Coursera)

Stony Brook University (SBU)
Stony Brook, NY
B.E. Electrical Engineering. GPA: 4.0/4.0
2022-2023

Publications

Adams E, Bai L, Lee M, Yiyang Yu, AlQuraishi M., From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models, BioRxiv, Feb 2025.

Ziqi Tang, Nirali Somia, Yiyang Yu, Peter K Koo., Evaluating the representational power of pre-trained DNA language models for regulatory genomics, BioRxiv, Sep 2024.

Yiyang Yu, Shivani Muthukumar, Peter K Koo, EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow, Bioinformatics, Volume 40, Issue 3, March 2024.

Rafi, A.M., Nogina, D., Penzar, D. et al. A community effort to optimize sequence-based deep learning models of gene regulation. Nat Biotechnol (2024). (Name in Consortium)

Work Experience

Leash Bio
Salt Lake City, UT (Remote)
Machine Learning Intern
Aug 2024 - Present
  • Developing a multimodal transformer model to predict binding affinities of protein-molecule bindings.
  • Optimized the training and inference pipeline of protein language models by up to 30% by reimplementing models using flash attention and provided a validation method to evaluate data from wet-lab experiments.
Columbia University Irving Medical Center (Mohammed AlQuraishi Lab)
New York, NY
SURF Fellow / Undergraduate Researcher
December 2023 - Present
  • Designing algorithms to select a maximally diversified set of proteins for molecular dynamics simulations, generating data to support subsequent model development for predicting protein conformation trajectories.
  • Developing methods to extract protein conformational ensembles from AlphaFold2 through latent space exploration and systematically created a benchmark library to compare with existing methods.
Cold Spring Harbor Laboratory (Peter Koo Lab)
Cold Spring Harbor, NY
Research Intern
March 2022 - December 2023
  • Established fine-tuning pipelines with Low-Rank Adaptation (LoRA) and Supervised Fine-Tuning (SFT) for four pre-trained DNA language models to evaluate its representational power for regulatory genomics.
  • Developed and implemented evolution-inspired data augmentations (EvoAug-TF) in TensorFlow for genomic deep neural networks and demonstrated its improvement in generalization and interpretability.
  • Designed and evaluated more than 100 deep-learning models for predicting DNA promoters' expression rates using Python, TensorFlow, and WandB in the 2022 DREAM Challenge. Placed 7th in the final leaderboard.
Bioengineering Education, Application and Research (BEAR)
Stony Book, NY
Research Assistant
December 2022 - August 2023
  • Developed a continuous individual crisis aid alert system (CICaidA), manufactured the prototypes, and tested the hardware backend interface system.
SBU Electrical Engineering Department
Stony Book, NY
Teaching Assistant
January 2023 - May 2023
  • Guided students in ESE123 to conduct experiments and write lab reports in the lab; hosted weekly office hours.

Projects

Gold, NeurIPS 2024 - Predict New Medicines with BELKA, Kaggle (13/1946)
April 2024 - July 2024
  • Developed deep learning models to predict small molecule-protein interactions using the Big Encoded Library for Chemical Assessment. Implemented over 40 types of DL models, including CNNs, GNNs, Transformers, RNNs, and GDBT Models, and finally developed a robust solution after using up all 480 submissions.

Leadership Experience

Organizing Committee Leader, Columbia Organizing of Rising Entrepreneurs (CORE)
January 2024 - Present
  • Facilitating mentor connections and providing programmatic support for cohorts of 6-8 early-stage startups during Columbia's intensive 8-week Almaworks accelerator, helping teams prepare for the culminating investor pitch Demo Day to drive fundraising success.
Founder and President, Artificial Intelligence Community at SBU
December 2022 - July 2023
  • Hosted biweekly machine learning/python hands-on workshops to cultivate members in their interests in AI.
  • Collaborated with various professors and graduate students to provide research opportunities to undergraduates.

Selected Honors

  • 2024 Kaggle Competitions Master (3xGold, 4x Silver) Current Global Ranking: 133 / 203,363 (Top 0.065%)
  • 2024 HackMIT Intersystems Challenge 1st Place $2000 prize
  • 2024 MayPro Special Prize at Columbia Healthcare Hackathon
  • 2024 Gilbert Family Scholarship from Columbia Engineering
  • 2021 British Physics Olympiad Senior Physics Challenge Gold Prize (Global Ranking) Top 5%

Technical Skills

Languages: Python, Javascript, Typescript, Java, C, C++, HTML/CSS, LaTeX, Bash, MATLAB
Libraries: TensorFlow, PyTorch, Numpy, Pandas, WandB, OpenCV, HuggingFace, React, Node, Matplotlib
Tools: Git, GCP, HPC, Slurm, Jupyter, Fusion 360, Photoshop, Premiere Pro, Unity 3D, Blender