AI-generated image representing some of Danny Collinson's interests in machine learning and biology.

Danny Collinson

Data Scientist and recent Caltech graduate with five years of experience building data and machine learning solutions in academic and industry settings. Skilled at developing pipelines for processing and analyzing large datasets across biological, geospatial, and astrophysics applications. Experienced in applying advanced machine learning techniques, including state-of-the-art vision and language models. Co-founder of a pre-seed AI startup, I designed and engineered our machine learning models, data pipelines, databases, and underlying infrastructure. Significant experience developing statistical models to extract insights from complex datasets, including teaching graduate students at Caltech. With a diverse set of experiences, a deep skill set, and unmatched determination, I am equipped to tackle any challenge that I face and make a major impact on any team.

Experience

During my time at Caltech, I had a chance to experience roles outside the classroom in both academia and industry. My diverse experiences have given me a strong skillset and background to draw from, enabling me to rapidly integrate and contribute value to any team.

A screenshot from the class that Danny Collinson was a TA for

Teaching Assistant for Professor Justin Bois

September 2023 - December 2023

•  Instructed 85 students in the graduate-level course Introduction to Data Analysis in the Biological Sciences, developing their skills in statistical modeling, numerical optimization, data visualization, and exploratory data analysis in Python

A fluorescence microscopy image like those Danny Collinson worked on at Recursion

Data Science Intern at Recursion Pharmaceuticals

June 2023 - September 2023

•  Developed 2 new statistical metrics for analysis of large experimental datasets that were used to improve model performance
•  Implemented advanced machine learning techniques in PyTorch to improve data processing efficiency and reduce costs
•  Deployed monitoring tools to the data science and QA teams in collaboration with a 20-person cross-functional team, ensuring data integrity for downstream processing and models

A group of cells under fluorescence microscopy similar to those worked on by Danny Collinson

Computational Biology Researcher with the Parker Lab at Caltech

May 2022 - September 2022

•  Built data pipeline leveraging deep learning and statistical modeling to accelerate image processing speed by a factor of 100
•  Automated microscopy analysis and increased measurement accuracy by an estimated 10% for an upcoming publication

An image of a galaxy containing ULXs like the ones Danny Collinson studied

Computational Astrophysics Researcher with the Harrison Lab at Caltech

May 2020 - January 2021

•  Implemented statistical analysis methods using common data science tools including Python, NumPy, pandas, SciPy, and Jupyter notebooks on HPC clusters to perform source classification and deliver insights from large datasets
•  Initiated development of an end-to-end data pipeline for automated data processing and classification, improving processing speed and efficiency in handling large datasets and decision-making

Skills

With a wide variety of experiences to draw from, I have worked extensively with the core data science stack while also being exposed to a many of the technologies used across industry.

  • PyTorch, scikit-learn: Advanced
  • NumPy, pandas, SciPy: Advanced
  • Matplotlib, Bokeh, Jupyter: Advanced
  • SQL, git, GitHub, Bash, Linux: Proficient
  • AWS, GCP, Docker, Slurm: Intermediate
  • Shell scripts, MCMC, APIs: Intermediate
  • LLMs, Computer Vision, Deep Learning
  • Statistical modeling, large datasets, Bayesian methods

Projects

The Mavira M logo, which represents the startup that Danny Collinson is working on

Mavira AI | Co-founder and Machine Learning Engineer

September 2023 - December 2023

•  Co-founder and Machine Learning Engineer for pre-seed AI startup for personalized second-hand fashion recommendations
•  Transformed a business idea into reality by designing data pipelines, ML training and inference frameworks, and PostgreSQL databases, all custom-built to develop models using PyTorch on Google Cloud and serve them to the production website

A picture of the ISS, which houses the instruments that collected the data that Danny Collinson used for this project

Temperature Prediction from GIS Spectra | Project Lead

September 2023 - December 2023

•  Led team of 3 in ML project to predict surface temperatures from spectral data, achieving error of less than 1 C
•  Created a new ML dataset from raw NASA data in collaboration with JPL scientists to study of novel research questions
•  Built model training and testing frameworks and designed CNN-based model architectures alongside Professor Katie Bouman for use with autoencoder-generated embeddings to deliver accurate predictions

A chest x-ray like the ones Danny Collinson worked on in this project

CheXpert Machine Learning Competition

March 2023 - June 2023

•  Orchestrated training of multi-GPU PyTorch models on HPC clusters using Slurm Bash scripts to enable powerful classifiers
•  Developed a pipeline to process over 224k chest x-rays, adding augmentations to further improve model diagnosis accuracy
•  Placed 3rd as a solo contestant competing against teams of 5 in Caltech's annual machine learning competition