About Me
I am a Data Scientist & Infrastructure/Services Unit Manager at the Data Science Research Services (DSRS) unit at the GIES College of Business, University of Illinois at Urbana-Champaign. I specialize in Data Science, Machine Learning, Deep Learning, and AI/LLM Engineering.
At DSRS, I partner with faculty and research stakeholders to deliver datasets, models, and reproducible outputs. I lead the deployment of self-hosted AI services (LLM/VLM, embeddings, text-to-speech, image generation) on A100/H100 GPUs, manage infrastructure via Kubernetes, and mentor interns across Infrastructure and Services. I also lead internal tools like our Knowledge Base (RAG), semantic caching, and AI-powered paper highlighter.
I hold two Master’s degrees from UIUC - an M.S. in Statistics (Data Science concentration, GPA 4.0) and an M.S. in Data Science + Civil Engineering (GPA 4.0) - and a B.S. in Civil Engineering from Universidad San Francisco de Quito, Ecuador.
Beyond work, I follow advancements in AI, VR/AR, fast cars, drones, and video games. I love traveling and exploring diverse cuisines.
Skills
Programming languages and tools I use for data science, machine learning, AI engineering, and infrastructure.
-
Python
-
R
-
SQL
- Bash
-
JavaScript
Data Science, ML/AI, and infrastructure tools I work with:
Data Science and Machine Learning in Python
-
NumPy
-
Pandas
-
Matplotlib
-
Scikit-learn
-
PyTorch
-
TensorFlow
-
HuggingFace Transformers
-
LangChain / LlamaIndex
-
vLLM / llama.cpp
-
spaCy / NLTK
-
Streamlit
-
Plotly
Data Science and Analysis in R
-
Tidyverse
-
glmnet / xgboost
-
Shiny
Cloud Computing
-
AWS (EC2, S3, SageMaker)
-
Azure
-
HPC/GPU Clusters (A100/H100)
Containerization and Deployment Automation
-
Docker
-
Kubernetes (Helm/kubectl/k9s)
-
GitHub Actions
Data Engineering
-
PostgreSQL / pgvector
-
Redis
-
MongoDB
-
Spark (PySpark)
Data Visualization and Business Intelligence
-
Tableau
-
Power BI
Projects
A webcam-powered, gesture-controlled resume viewer using MediaPipe hand tracking. Navigate resume sections by tilting your hand as a dial, expand cards with an open palm, and collapse with a fist. Built with vanilla JavaScript and HTML5 Canvas - no frameworks, no build step. Try it live at fsaudm.github.io/hand.
LangExtract - LLM-Powered Structured Extraction
A tool for extracting structured information from unstructured text documents (PDFs, papers) using open-source LLMs deployed on institutional GPU clusters. Supports user-defined extraction instructions and ensures all extracted data has verifiable existence in the source document. Built with LangChain and locally hosted models.
DSRS Knowledge Base (RAG)
Centralized institutional knowledge system spanning 70+ DSRS repositories. Prototypes state-of-the-art RAG workflows using pgvector, FAISS, and LangChain/LlamaIndex for semantic search and retrieval across codebases, documentation, and research artifacts.
Semantic Cache for LLM APIs
Prototype caching layer to reduce redundant LLM API calls and cost. Uses embedding-based similarity matching to serve cached responses for semantically equivalent queries, significantly reducing external API dependence.
LLM Deployment & Inference Optimization
Deployment and optimization of open-source AI models (LLM/VLM, embeddings, TTS, image generation) on A100/H100-class GPUs using vLLM, llama.cpp, and HuggingFace. Includes single-GPU and multi-GPU/distributed inference configurations via OpenAI-compatible endpoints for scalable batch processing.
OpenAI Batch Processing at Scale
Production workflows for 100K+ structured API calls to OpenAI models for research data processing. Includes robust error handling, rate limiting, structured output parsing, and cost optimization for large-scale research workflows.
Ashby Prize in Computational Science - Finalist
Delivered a real-time demonstration of an agent-based AI image generation system combining proprietary and locally hosted models. Showcased practical LLM deployment and application for research workflows at the UIUC Ashby Prize hackathon.
Implemented object detection in construction sites using YOLOv10 and RT-DeTr architectures with PyTorch and Ultralytics. Compared performance in terms of speed and accuracy on AWS SageMaker with GPUs, using the SODA construction dataset.
Exploration of Nvidia’s Neural Infrastructure Modules (NIMs) with Llama 3.1 (8B and 405B) for reasoning and multi-lingual queries, plus StabilityAI’s Stable Diffusion XL for text-to-image generation.
Implementation of a Gaussian Mixture Model using EM algorithm, and a Hidden Markov Model through Baum-Welch and Viterbi algorithms. Detailed R markdown walkthrough available here.
Implementation of a Linear SVM using the Pegasos algorithm (Shalev-Shwartz et al., 2011). R markdown with details here.
Data analysis of historical sales from 45 Walmart stores with Robust Linear Regression on SVD-smoothed data, forecasting future sales and identifying top/bottom performing stores and departments.
Replication and extension of Professor Hadi Meidani’s Kaczmarz algorithm for real-time, short-term traffic prediction. Explored spatial vs. time-based ordering of multivariate vectors, finding spatial ordering achieved superior prediction accuracy for extreme weather conditions.
Fully-connected neural network from scratch using only NumPy, trained on MNIST (94% accuracy). Four-layer architecture with 164K parameters. All activation functions, forward/backward propagation implemented from scratch. Integrated with Weights & Biases for experiment tracking.
Awards & Certifications
ASHBY Prize in Computational Science Hackathon: 3rd Place & Best Presentation
Center for Artificial Intelligence Innovation at the National Center for Supercomputing Applications
May 2024
Awarded 3rd place (out of 50 participants) in the ASHBY Prize in Computational Science Hackathon, a competition focused on using LLMs as a front-end to computational workflows. We developed an end-to-end agent-based system capable of Retrieval Augmented Generation (RAG), integrated with an API-based model GPT-4 and a locally-hosted Llama 3 model. Received best presentation recognition for our great delivery and real-time demonstration!
Illinois Statistics Datathon 2024: 4th Place
Department of Statistics, UIUC, with Synchrony Financial
April 2024
Awarded 4th place (out of 345 participants) in the Illinois Statistics Datathon 2024. With my team, we performed extensive data pre-processing, exploratory data analysis (EDA), and feature engineering to identify and build key features for Synchrony’s Interactive Voice Response (IVR) System, a “real-world” dataset with millions of observations. We employed Logistic Regression (for its interpretability) to evaluate measured and engineered features’ effects on reducing the number of “floored” calls, effectively resulting in a data-driven decision with savings potential of $300,000 per 1% reduction of calls. You can read about our experience on this post.
Certifies competence in the completion of Workshop/Data Parallelism: How to Train Deep Learning Models on Multiple GPUs. Through this course, I successfully trained and deployed a set of Convolutional Neural Networks on the Fashion MNIST dataset using Python, CUDA, and the DDP library for Distributed Data Parallelism with 4 GPUs. I also applied advanced techniques such as gradual warmup, batch normalization, and the NovoGrad optimizer to enhance model performance and training efficiency. Great experience!
Experience
Data Science Research Services (DSRS) - Gies College of Business
January 2024 - Present
Data Scientist & Infrastructure/Services Unit Manager (Jan 2025 - Present)
- Partner with faculty/research stakeholders to scope research needs, deliver datasets, models, and clear written/visual outputs with reproducible code.
- Lead internal tools and prototypes: DSRS Knowledge Base (RAG across 70+ repos), semantic caching layer, AI-powered paper highlighter (LLM/VLM + OCR).
- Manage and mentor 4+ interns across Infrastructure and Services: screening, onboarding, code reviews, and milestone reviews.
- Deploy and operate self-hosted AI services (LLM/VLM, embeddings, TTS, image generation) on A100/H100 GPUs via OpenAI-compatible endpoints; optimize single/multi-GPU inference.
- Deploy and support applications on Kubernetes+Helm; manage databases (Postgres+pgvector), caching (Redis), CI/CD, and monitoring.
Data Science Intern - Infrastructure (Jan 2024 - Dec 2024)
- Supported DevOps for DSRS applications and infrastructure across Azure and on-prem: containerization, health checks, CI/CD (GitHub Actions).
- Executed scoped data science tasks end-to-end (data pulls, cleaning, analysis) with reproducible scripts/notebooks.
- Evaluated LLM deployments (transformers/vLLM): latency/throughput benchmarks to inform DSRS AI strategy.
Capital Programs - Facilities and Services
August 2023 - December 2023
Project Manager
- Led 2 end-to-end projects valued between $500K and $5M, ensuring on-time and on-budget delivery.
- Coordinated cross-functional teams and ran stakeholder check-ins to keep scope, schedule, and handoffs aligned.
University of Illinois at Urbana-Champaign
August 2021 - May 2024
Teaching Assistant
- Instructed over 200 students across 5 semesters and 13 sections.
- Consistently ranked as Excellent Teacher: Fall 2021, Spring 2022, Fall 2022, and Spring 2023 (4 time winner).
- Courses Taught: Intermediate Spanish, Spanish Composition
Universidad San Francisco de Quito
January 2018 - May 2019
Teaching Assistant
- Recipient of the undergraduate Teaching Assistantship (2018). Ranked as Excellent Teacher.
- Courses Taught: Topography, Geometrical Design of Roads
Education
University of Illinois at Urbana-Champaign
Master of Science in Statistics
2024
Concentration: Data Science
GPA: 4.0/4.0
Selected courses:
- Applied Machine Learning
- Statistical Learning
- Deep Learning
- Mathematical Statistics
- Statistical Modeling
- Advanced Data Analysis
- Time Series Analysis
- Big Data Analytics
University of Illinois at Urbana-Champaign
Master of Science in Data Science + Civil and Environmental Engineering
2021 - 2023
Concentration: Construction Engineering and Management
GPA: 4.0/4.0
Selected courses:
- Data Science for CEE
- Machine Learning for CEE
- Construction Optimization
- Construction Data Modeling
Universidad San Francisco de Quito
Bachelor of Science in Civil Engineering
2017 - 2021
GPA: 3.9/4.0
Thesis: “Productivity in Construction: Measurement Methodologies in Ecuador”. Score: 99/100
Languages
I highly value languages for the rich cultural connections they offer, as I believe they enable deeper engagement and insights across different regions and cultures. You can talk to me (limited in some cases!) in:
-
Spanish - Native Language
-
English - Full Professional Proficiency
-
Italian - Professional Working Proficiency
-
French - Beginner
-
Arabic - Beginner
A Little More About Me
Here are some additional details about me:
- I recently achieved a 1-year streak on Duolingo for practicing Italian, Arabic, and Korean!
-
So far, I have traveled to 18 countries and 24 states in the US.
- I am committed to continuous learning and growth, and I am always looking for opportunities to expand my knowledge and skills. I cannot recommend enough Kurzgesagt, StatQuest, and Brilliant. Andrej Karpathy and his series series are amazing, and I am very excited about Eureka Labs!