Hello, I'm Spandan Das

I am a researcher at DatologyAI and I'm interested in frontier model research and entrepreneurship. I was previously a computer science student at Carnegie Mellon.

View Resume

Image

Spandan Das

Hey there, I'm Spandan Das. I'm a Member of Technical Staff at DatologyAI, where I'm working on training large-scale models to enable the world's best data curation. My primary interest is in artificial intelligence and using it to create a positive impact on the world.

Outside of work, I enjoy playing tennis and basketball, reading, working out, and listening to and playing music.

Education

Image

Carnegie Mellon University

B.S. Computer Science

Relevant Coursework: (PhD) Intro to Deep Learning [Python], Deep Reinforcement Learning [Python], (PhD) Advanced NLP [Python], Algorithm Design and Analysis, Machine Learning with Large Datasets [Python], (PhD) Convex Optimization, Intro to ML [Python], Intro to Computer Systems [C], Probability and Computing, Statistics and Computing

Image

Thomas Jefferson (TJHSST)

Ranked #1 U.S. High School

Relevant Coursework: Artificial Intelligence [Python], Computer Vision [C++], Machine Learning [Python], Parallel Computing [C], Probability Theory, Concrete Math, Multivariable Calculus, Linear Algebra

Clubs: Senior Computer Team (Captain), Intermediate Computer Team (Captain), Varsity Math Team

Experience

DatologyAI logo placeholder

DatologyAI

July 2025 - Present

Training large-scale models that power DatologyAI’s data curation platform

Image

NVIDIA

2024

Built anomaly detection system for NVIDIA TEGRA chip production environment

Image

CMU Language Technologies Institute

February 2024 - May 2024

Developed an active learning based approach for data-efficient pretraining for LLMs by utilizing data impact models
[Paper] [Poster]

Image

Apple

2023

Created an LLM integration library to automatically filter and annotate semantically similar Siri queries

Image

CMU AirLab

May 2022 - September 2022

Developed an online camera calibration algorithm for a multi-view stereo setup on drones used to determine real-time depth maps

Image

NASA Goddard Space Flight Center

June 2020 - August 2021

Predicted precipitation through an ice microphysics-based machine learning approach using remote sensing data from NASA's Global Precipitation Measurement Mission [Paper] [GitHub]

Projects

MATES project illustration
Yu, Z.; Das, S.; Xiong, C. MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models. 2024. https://arxiv.org/abs/2406.06046
Remote sensing precipitation study illustration
Das, S.; Wang, Y.; Gong, J.; Ding, L.; Munchak, S.J.; Wang, C.; Wu, D.L.; Liao, L.; Olson, W.S.; Barahona, D.O. A Comprehensive Machine Learning Study to Classify Precipitation Type over Land from Global Precipitation Measurement Microwave Imager (GPM-GMI) Measurements. Remote Sens. 2022, 14, 3631. https://doi.org/10.3390/rs14153631
WinogroundVQA project illustration
Pandey, R.; Das, S.; Thrush, T.; Liang, P.P.; Salakhutdinov, R.; Morency, L.-P. Winoground{VQA}: Zero-shot Reasoning with Large Language Models for Compositional Visual Question Answering. 2023. [Link to Paper]
Clinical-language summary project illustration
Das, S.; Samuel, V.; Noroozizadeh, S. TLDR at SemEval-2024 Task 2: T5-generated Clinical-Language Summaries for DeBERTa Report Analysis. NAACL SemEval Conference 2024. https://arxiv.org/abs/2404.09136

Awards

Loading...