Hey there, I'm Spandan Das. I'm a Member of Technical Staff at DatologyAI, where I'm working on training large-scale models to enable the world's best data curation. My primary interest is in artificial intelligence and using it to create a positive impact on the world.
Outside of work, I enjoy playing tennis and basketball, reading, working out, and listening to and playing music.
Relevant Coursework: (PhD) Intro to Deep Learning [Python], Deep Reinforcement Learning [Python], (PhD) Advanced NLP [Python], Algorithm Design and Analysis, Machine Learning with Large Datasets [Python], (PhD) Convex Optimization, Intro to ML [Python], Intro to Computer Systems [C], Probability and Computing, Statistics and Computing
Relevant Coursework: Artificial Intelligence [Python], Computer Vision [C++], Machine Learning [Python], Parallel Computing [C], Probability Theory, Concrete Math, Multivariable Calculus, Linear Algebra
Clubs: Senior Computer Team (Captain), Intermediate Computer Team (Captain), Varsity Math Team
Training large-scale models that power DatologyAI’s data curation platform
Developed an active learning based approach for data-efficient pretraining for LLMs by utilizing data impact models
[Paper] [Poster]
Created an LLM integration library to automatically filter and annotate semantically similar Siri queries
Developed an online camera calibration algorithm for a multi-view stereo setup on drones used to determine real-time depth maps
Predicted precipitation through an ice microphysics-based machine learning approach using remote sensing data from NASA's Global Precipitation Measurement Mission [Paper] [GitHub]
Yu, Z.; Das, S.; Xiong, C. MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models. 2024. https://arxiv.org/abs/2406.06046
Das, S.; Wang, Y.; Gong, J.; Ding, L.; Munchak, S.J.; Wang, C.; Wu, D.L.; Liao, L.; Olson, W.S.; Barahona, D.O. A Comprehensive Machine Learning Study to Classify Precipitation Type over Land from Global Precipitation Measurement Microwave Imager (GPM-GMI) Measurements. Remote Sens. 2022, 14, 3631. https://doi.org/10.3390/rs14153631
Pandey, R.; Das, S.; Thrush, T.; Liang, P.P.; Salakhutdinov, R.; Morency, L.-P. Winoground{VQA}: Zero-shot Reasoning with Large Language Models for Compositional Visual Question Answering. 2023. [Link to Paper]
Das, S.; Samuel, V.; Noroozizadeh, S. TLDR at SemEval-2024 Task 2: T5-generated Clinical-Language Summaries for DeBERTa Report Analysis. NAACL SemEval Conference 2024. https://arxiv.org/abs/2404.09136