Resume - Yiyang (Steven) Yu

Education

Columbia University in the City of New York

New York, NY

B.S. Biomedical Engineering, Minor in Computer Science. GPA: 3.95/4.0

Expected 2026

Relevant Coursework: High Performance Machine Learning, Theoretical Foundations for LLMs, High Dimensional Stats for Biomedical Data, Advanced Programming, Data Structure, Calculus III, Linear Algebra, Machine Learning - Stanford (Coursera), Deep Learning Specialization - DeepLearning.AI (Coursera)

Stony Brook University (SBU)

Stony Brook, NY

B.E. Electrical Engineering. GPA: 4.0/4.0

2022-2023

Publications

Adams E, Bai L, Lee M, Yiyang Yu, AlQuraishi M., From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models, BioRxiv, Feb 2025.

Paper

Ziqi Tang, Nirali Somia, Yiyang Yu, Peter K Koo., Evaluating the representational power of pre-trained DNA language models for regulatory genomics, BioRxiv, Sep 2024.

Paper

Yiyang Yu, Shivani Muthukumar, Peter K Koo, EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow, Bioinformatics, Volume 40, Issue 3, March 2024.

Code Paper

Rafi, A.M., Nogina, D., Penzar, D. et al. A community effort to optimize sequence-based deep learning models of gene regulation. Nat Biotechnol (2024). (Name in Consortium)

Code Paper

Selected Honors

2025 Goldwater Scholar
2024 Kaggle Competitions Master (3xGold, 4x Silver) Current Global Ranking: 133 / 203,363 (Top 0.065%)
2024 HackMIT Intersystems Challenge 1st Place $2000 prize
2024 MayPro Special Prize at Columbia Healthcare Hackathon
2024 Gilbert Family Scholarship from Columbia Engineering
2021 British Physics Olympiad Senior Physics Challenge Gold Prize (Global Ranking) Top 5%

Work Experience

Leash Bio

Salt Lake City, UT (Remote)

Machine Learning Intern

Aug 2024 - Present

Developing a multimodal transformer model to predict binding affinities of protein-molecule bindings.
Optimized the training and inference pipeline of protein language models by up to 30% by reimplementing models using flash attention and provided a validation method to evaluate data from wet-lab experiments.

Columbia University Irving Medical Center (Mohammed AlQuraishi Lab)

New York, NY

SURF Fellow / Undergraduate Researcher

December 2023 - Present

Designing algorithms to select a maximally diversified set of proteins for molecular dynamics simulations, generating data to support subsequent model development for predicting protein conformation trajectories.
Developing methods to extract protein conformational ensembles from AlphaFold2 through latent space exploration and systematically created a benchmark library to compare with existing methods.

Cold Spring Harbor Laboratory (Peter Koo Lab)

Cold Spring Harbor, NY

Research Intern

March 2022 - December 2023

Established fine-tuning pipelines with Low-Rank Adaptation (LoRA) and Supervised Fine-Tuning (SFT) for four pre-trained DNA language models to evaluate its representational power for regulatory genomics.
Developed and implemented evolution-inspired data augmentations (EvoAug-TF) in TensorFlow for genomic deep neural networks and demonstrated its improvement in generalization and interpretability.
Designed and evaluated more than 100 deep-learning models for predicting DNA promoters' expression rates using Python, TensorFlow, and WandB in the 2022 DREAM Challenge. Placed 7th in the final leaderboard.

Bioengineering Education, Application and Research (BEAR)

Stony Book, NY

Research Assistant

December 2022 - August 2023

Developed a continuous individual crisis aid alert system (CICaidA), manufactured the prototypes, and tested the hardware backend interface system.

SBU Electrical Engineering Department

Stony Book, NY

Teaching Assistant

January 2023 - May 2023

Guided students in ESE123 to conduct experiments and write lab reports in the lab; hosted weekly office hours.

Projects

Gold, NeurIPS 2024 - Predict New Medicines with BELKA, Kaggle (13/1946)

April 2024 - July 2024

Developed deep learning models to predict small molecule-protein interactions using the Big Encoded Library for Chemical Assessment. Implemented over 40 types of DL models, including CNNs, GNNs, Transformers, RNNs, and GDBT Models, and finally developed a robust solution after using up all 480 submissions.

Leadership Experience

Co-President, System Biology Initiative (SBI)

December 2024 - Present

Co-leading the executive board to organize the annual Biotech Summit, making podcasts with professionals from industries and academia, established computational biology data science competition teams, and oversaw multiple project committees promoting interdisciplinary education at the intersection of biology, medicine, and computer science.

HardCORE Initiative Co-Lead, Columbia Organizing of Rising Entrepreneurs (CORE)

January 2024 - Present

Organizing and facilitating a series of biotech speaker events and investor dinners, connecting industry professionals with Columbia student entrepreneurs, resulting in mentorship opportunities and potential funding partnerships for early-stage biotech ventures.

Founder and President, Artificial Intelligence Community at SBU

December 2022 - July 2023

Hosted biweekly machine learning/python hands-on workshops to cultivate members in their interests in AI.
Collaborated with various professors and graduate students to provide research opportunities to undergraduates.

Technical Skills

Languages: Python, Javascript, Typescript, Java, C, C++, HTML/CSS, LaTeX, Bash, MATLAB

Libraries: TensorFlow, PyTorch, Numpy, Pandas, WandB, OpenCV, HuggingFace, React, Node, Matplotlib

Tools: Git, GCP, HPC, Slurm, Jupyter, Fusion 360, Photoshop, Premiere Pro, Unity 3D, Blender