Hello, I'm Spandan Das

I am a research engineer at DatologyAI and I'm interested in frontier model research and entrepreneurship. I was previously a computer science student at Carnegie Mellon.

Spandan Das

Hey there, I'm Spandan Das. I'm a Member of Technical Staff at DatologyAI, where I'm working on training large-scale models to enable the world's best data curation. My primary interest is in artificial intelligence and using it to create a positive impact on the world.

Outside of work, I enjoy playing tennis and basketball, reading, working out, and listening to and playing music.

Education

Carnegie Mellon University

B.S. Computer Science

Relevant Coursework: (PhD) Intro to Deep Learning [Python], Deep Reinforcement Learning [Python], (PhD) Advanced NLP [Python], Algorithm Design and Analysis, Machine Learning with Large Datasets [Python], (PhD) Convex Optimization, Intro to ML [Python], Intro to Computer Systems [C], Probability and Computing, Statistics and Computing

Thomas Jefferson (TJHSST)

Ranked #1 U.S. High School

Relevant Coursework: Artificial Intelligence [Python], Computer Vision [C++], Machine Learning [Python], Parallel Computing [C], Probability Theory, Concrete Math, Multivariable Calculus, Linear Algebra

Clubs: Senior Computer Team (Captain), Intermediate Computer Team (Captain), Varsity Math Team

Experience

DatologyAI

July 2025 - Present

Training large-scale models that power DatologyAI’s data curation platform

NVIDIA

2024

Built anomaly detection system for NVIDIA TEGRA chip production environment

CMU Language Technologies Institute

February 2024 - May 2024

Developed an active learning based approach for data-efficient pretraining for LLMs by utilizing data impact models
[Paper] [Poster]

Apple

2023

Created an LLM integration library to automatically filter and annotate semantically similar Siri queries

CMU AirLab

May 2022 - September 2022

Developed an online camera calibration algorithm for a multi-view stereo setup on drones used to determine real-time depth maps

NASA Goddard Space Flight Center

June 2020 - August 2021

Predicted precipitation through an ice microphysics-based machine learning approach using remote sensing data from NASA's Global Precipitation Measurement Mission [Paper] [GitHub]

Projects

Yu, Z.; Das, S.; Xiong, C. MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models. 2024. https://arxiv.org/abs/2406.06046

Remote sensing precipitation study illustration

Das, S.; Wang, Y.; Gong, J.; Ding, L.; Munchak, S.J.; Wang, C.; Wu, D.L.; Liao, L.; Olson, W.S.; Barahona, D.O. A Comprehensive Machine Learning Study to Classify Precipitation Type over Land from Global Precipitation Measurement Microwave Imager (GPM-GMI) Measurements. Remote Sens. 2022, 14, 3631. https://doi.org/10.3390/rs14153631

Pandey, R.; Das, S.; Thrush, T.; Liang, P.P.; Salakhutdinov, R.; Morency, L.-P. Winoground{VQA}: Zero-shot Reasoning with Large Language Models for Compositional Visual Question Answering. 2023. [Link to Paper]

Clinical-language summary project illustration

Das, S.; Samuel, V.; Noroozizadeh, S. TLDR at SemEval-2024 Task 2: T5-generated Clinical-Language Summaries for DeBERTa Report Analysis. NAACL SemEval Conference 2024. https://arxiv.org/abs/2404.09136

Awards

USA Computing Olympiad (USACO)

Gold Division

2021 USA Math Olympiad

Top 2% (Top 550 out of 30,000+ contestants; 232.5 USAMO Index)

Carnegie Mellon University

Dean's List

2022 Goldman Sachs Quantathon

Honorable Mention

2021 CMU Math and Informatics Competition

8th Place Team (out of 220+)

2019 VCU High School Programming Competition

1st place team (out of 50+ teams)

2021 PurpleComet Math Meet

Honorable Mention Team (out of 3000+ teams); 1st in state

Hello, I'm Spandan Das

I am a research engineer at DatologyAI and I'm interested in frontier model research and entrepreneurship. I was previously a computer science student at Carnegie Mellon.

Spandan Das

Education

Carnegie Mellon University

B.S. Computer Science

Thomas Jefferson (TJHSST)

Ranked #1 U.S. High School

Experience

DatologyAI

July 2025 - Present

NVIDIA

2024

CMU Language Technologies Institute

February 2024 - May 2024

Apple

2023

CMU AirLab

May 2022 - September 2022

NASA Goddard Space Flight Center

June 2020 - August 2021

Projects

Yu, Z.; Das, S.; Xiong, C. MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models. 2024. https://arxiv.org/abs/2406.06046

Pandey, R.; Das, S.; Thrush, T.; Liang, P.P.; Salakhutdinov, R.; Morency, L.-P. Winoground{VQA}: Zero-shot Reasoning with Large Language Models for Compositional Visual Question Answering. 2023. [Link to Paper]

Das, S.; Samuel, V.; Noroozizadeh, S. TLDR at SemEval-2024 Task 2: T5-generated Clinical-Language Summaries for DeBERTa Report Analysis. NAACL SemEval Conference 2024. https://arxiv.org/abs/2404.09136

Awards

USA Computing Olympiad (USACO)

2021 USA Math Olympiad

Carnegie Mellon University

2022 Goldman Sachs Quantathon

2021 CMU Math and Informatics Competition

2019 VCU High School Programming Competition

2021 PurpleComet Math Meet

Contact

Contact Information

Connect