Taehoon Ha

My focus is on building scalable data pipelines and applying machine learning to extract actionable insights from complex datasets. At Stony Brook Medicine, I design modeling strategies that integrate survey and clinical data, applying clustering, regression, and mixed-effects approaches in SAS/R to deliver reproducible, high-accuracy, and scalable analytics.

Previously, I led the Biostatistics Core at Cold Spring Harbor Laboratory, serving as its director and overseeing all analytics for the Cancer Center. I managed end-to-end statistical strategy and data science support for projects funded by multiple NIH mechanisms (R01, R21, P01), working with datasets spanning preclinical models, patient records, and high-dimensional molecular data. In this role, I designed scalable pipelines in R/Python, guided experimental design, and applied machine learning approaches—from regression frameworks to classification models—to accelerate discovery, improve data quality, and ensure reproducibility across dozens of cross-disciplinary teams.

Before that, I was a research assistant at Cornell, where I applied data science and statistical modeling to large-scale oncology and metabolic health studies. My work focused on integrating clinical, genomic, and survey data, using methods such as Bayesian modeling and predictive analytics to identify key risk factors and generate actionable insights for breast cancer, colorectal cancer, obesity, and inflammation research.

I hold an MS in Business Analytics (MQM) from Duke University’s Fuqua School of Business, where I worked on data-driven strategy and optimization projects, and an MS in Biostatistics & Data Science from Cornell University. This combined background enables me to bridge business analytics and machine learning with applied data science.

My resume is available HERE.