Farid Saud

Data Scientist

Email: gfs3@illinois.edu

Phone: 224-266-4646

Web: fsaudm.github.io

About Me

I am a huge enthisiast of Machine Learning, currently pursuing a Master of Science in Statistics at the University of Illinois at Urbana-Champaign. I specialize in Data Science, Machine Learning, and Deep Learning. My work explores the impact of these technologies across various industries, including Finance, Technology, Civil Engineering, and Real Estate, aiming to generate significant value.

Currently, I am a team lead at the Data Science Research Services unit, where we drive research within the GIES College of Business by assisting students, faculty, and staff with their data science, machine learning, computational infrastructure, and data acquisition needs. As I advance in my career as a Data Scientist, I am eager to dive deeper into Data Science itselft, and explore its applications and impact in the fields I mention above, as well as broader applications and the unique challenges different fields present.

Beyond academics, I like to follow the advancements in Artificial Intelligence, Virtual & Augmented Reality (VR/AR), fast cars and drones, and video games. I also love traveling and exploring diverse cuisines. I also hold a Master of Science in Civil and Environmental Engineering from the University of Illinois at Urbana-Champaign.

Skills

The following are the programming languagues that I use for data science and analysis (and have used for engineering). I am always looking to learn more about these (and new) tools to expand my skills. I have experience with:

  • Python icon Python
  • R icon R
  • SQL icon SQL
  • Julia icon Julia
  • Matlab icon Matlab
  • C++

I am also proficient with essential and popular tools for modern problem-solving and innovation in Data Manipulation, Analytics, Statistics, Machine Learning and Deep Learning. These technologies are:

Data Science and Machine Learning in Python

  • Numpy icon Numpy
  • Pandas icon Pandas
  • Matplotlib icon Matplotlib
  • Seaborn icon Seaborn
  • statsmodels icon statsmodels
  • Scikit-learn icon Scikit-learn
  • PyTorch icon PyTorch
  • TensorFlow icon TensorFlow
  • HuggingFace's Transformers, Accelerate, BitsandBytes icon HuggingFace's Transformers, Accelerate, BitsandBytes
  • Langchain icon Langchain
  • Streamlit icon Streamlit
  • Plotly icon Plotly

Data Science and Analysis in R

  • Tidyverse icon Tidyverse
  • Shiny icon Shiny
  • caret icon caret
  • glmnet icon glmnet
  • xgboost icon xgboost

Cloud Computing

  • AWS, SageMaker, S3 icon AWS, SageMaker, S3
  • Azure, Machine Learning Studio icon Azure, Machine Learning Studio

Containerization and Deployment Automation

  • Docker icon Docker
  • Kubernetes, at a basic level... icon Kubernetes, at a basic level...
  • GitHub Actions icon GitHub Actions

Data Visualization and Business Intelligence

  • Tableau icon Tableau
  • Power BI icon Power BI

Projects

Object Detection using YOLOv10 and RT-DeTr in AWS

Implemented object detection in construction sites using YOLOv10 and RT-DeTr architectures with PyTorch and Ultralytics libraries. This project aimed to compare the performance of these two state-of-the-art architectures in terms of speed and accuracy, providing insights into their suitability for real-time object detection tasks, and the inherent tradeoff between accuracy and speed. The models were trained on an AWS SageMaker equipped with GPUs and evaluated on a subset of the SODA: A large-scale open site object detection dataset for deep learning in construction. The original dataset can be found here. The results and analysis, as well as insightful visualizations of the label distribution and training process are available in the GitHub repository.

LLMs and other big Deep Learning Models with Nvidia NIMs

This project taps into Nvidia’s Neural Infrastructure Modules (NIMs) and presents two networks to run large deep learning models. The repository includes examples using the newly released Llama 3.1, run in both its 8B and 405B versions, and how to set the models to perform tasks like reasoning and multi-lingual queries. Additionally, for text-to-image generation, the project features StabilityAI’s Stable Diffusion XL, wrapped in a simple Python function to generate high-quality images. The provided Jupyter notebooks (llama3_1.ipynb and stable_diffusion.ipynb) are easy to use, making it straightforward to explore LLMs and diffusion models. Note that you will need an API key from Nvidia, which you can obtain from NVIDIA’s API catalog. The full implementation, including code, results, and setup instructions, is available on GitHub.

GMM and HMM - An implementation from scratch in R

This is an implementation of a Gaussian Mixture Model using the Expectation-Maximization algorithm, and a Hidden Markov Model through the Baum-Welch and Viterbi algorithms. A detailed, step-by-step R markdown file with the code can be found here.

Linear SVM using SGD - Implementation from scratch in R

This project is an implementation of a very efficient Linear Support Vector Machine that profits from Stochastic Gradient Descent via the Pegasos algorithm, as proposed by Shalev-Shwartz et al. (2011) in their Primal Estimated sub-GrAdient SOlver for SVM paper. You can find the associated R markdown file, with details on the algorithm and the implementation here.

Walmart Store Sales Forecasting

Data Analysis of historical sales from 45 Walmart stores spread across different regions and forecast of future sales using predictive models: Robust Linear Regression on smoothed data (through SVD), to identify and present quantitative insights through visual displays highlighting the stores and departments projected to have the highest and lowest sales in the upcoming months. You can find an R markdown file with the code and the analysis here.

A computationally fast online model for accurate prediction of post-disaster traffic conditions

The original research by Professor Hadi Meidani, from the Uncertainty Quantification (UQ) research group introduced the application of the Kaczmarz algorithm for real-time, short-term traffic prediction. The proposed method stands out due to its computational efficiency, adaptability to changing traffic conditions, and its unique capability to swiftly incorporate newly streamed data. By leveraging polynomial approximations and exploring various estimation techniques, the paper achieved robust predictions, specifically for atypical traffic scenarios. Expanding upon this foundational work, I sought to critically assess and replicate the methodology, identifying potential areas of refinement. This exploration resulted in the implementation of different ways of ordering the “multivariate vector.” The findings highlight the spatial ordering’s superior prediction accuracy over the time-based order. This deeper analysis further optimized the real-time traffic prediction in extreme weather conditions using the proposed predictive analysis. You can find this project here.

Neural Network using NumPy for MNIST

This project showcases the implementation of a fully-connected neural network from scratch using only NumPy. The network is trained on the MNIST dataset, and categorizes digits into one of ten classes (0-9) with 94% accuracy. The network features a four-layer architecture, with a total of 164,013 parameters. All activation and helper functions, such as ReLU, Softmax, and Cross-entropy Loss are implemented from scratch. The forward and backward propagation processes are vectorized, and the weights and biases are updated using the gradient descent algorithm. This project is integrated with Weights & Biases for comprehensive experiment tracking. You can dive into the full implementation and results on GitHub, and explore the training logs and hyperparameter optimization on Weights & Biases embedded in the repo. I recently added a notebook with a similar implementation for Fashion MNIST.

Awards & Certifications

ASHBY Prize in Computational Science Hackathon: 3rd Place & Best Presentation

Center for Artificial Intelligence Innovation at the National Center for Supercomputing Applications

May 2024

Awarded 3rd place (out of 50 participants) in the ASHBY Prize in Computational Science Hackathon, a competition focused on using LLMs as a front-end to computational workflows. We developed an end-to-end agent-based system capable of Retrieval Augmented Generation (RAG), integrated with an API-based model GPT-4 and a locally-hosted Llama 3 model. Received best presentation recognition for our great delivery and real-time demonstration!

Illinois Statistics Datathon 2024: 4th Place

Department of Statistics, UIUC, with Synchrony Financial

April 2024

Awarded 4th place (out of 345 participants) in the Illinois Statistics Datathon 2024. With my team, we performed extensive data pre-processing, exploratory data analysis (EDA), and feature engineering to identify and build key features for Synchrony’s Interactive Voice Response (IVR) System, a “real-world” dataset with millions of observations. We employed Logistic Regression (for its interpretability) to evaluate measured and engineered features’ effects on reducing the number of “floored” calls, effectively resulting in a data-driven decision with savings potential of $300,000 per 1% reduction of calls. You can read about our experience on this post.

Certifies competence in the completion of Workshop/Data Parallelism: How to Train Deep Learning Models on Multiple GPUs. Through this course, I successfully trained and deployed a set of Convolutional Neural Networks on the Fashion MNIST dataset using Python, CUDA, and the DDP library for Distributed Data Parallelism with 4 GPUs. I also applied advanced techniques such as gradual warmup, batch normalization, and the NovoGrad optimizer to enhance model performance and training efficiency. Great experience!

Experience

University of Illinois at Urbana-Champaign

August 2021 - Present

DSRS Team Lead (2024-Present)

  • Manage and contribute to Data Science projects with Python, R and SQL code for faculty and students of the GIES college of Business.
  • Build and maintain cloud services and databases, set up machine learning environments, and manage deep learning models inference on premises.
  • Manage DSRS infrastructure: Azure, AWS, Docker, and Kubernetes.

Teaching Assistant (2021-2024)

  • Instructed over 200 students across 5 semesters and 13 sections.
  • Led lecture sessions and office hours sessions, prepared and delivered engaging material, and graded papers and exams
  • Consistently ranked as Excellent Teacher: Fall 2021, Spring 2022, Fall 2022, and Spring 2023 (4 time winner).
  • Courses Taught: Intermediate Spanish, Spanish Composition

Project Manager (2023)

  • Manage design and construction execution, including project scope, contracts, budget, scheduling, and site visits. Consistently participating across multiple project phases, from initial planning and design to construction and eventual closeout
  • Regularly engage in meetings with contractors and clients, fostering effective communication and ensuring all parties are aligned on project objectives and deliverables.
  • Worked in projects with budgets that ranged from $500,000 to $5,000,000, with scopes identified as medium complexity.

Construction Management Intern (2022)

  • Supported supervision and coordination of project management services. This included assistance with project budgeting, cost estimates, scheduling, and site visits.
  • Assisted with the procurement of professional services, ensuring compliance with statutory requirements, as well as evaluating certificates.
  • Monitored and reviewed federal, state, and University rules and regulations affecting contract administration and procurement of professional services, materials and labor.

Universidad San Francisco de Quito

January 2018 - May 2019

Teaching Assistant

  • Recipient of the undergraduate Teaching Assistantship, Universidad San Francisco de Quito (2018).
  • Ranked as Excellent Teacher.
  • Courses Taught: Topography, Geometrical Design of Roads

Education

University of Illinois at Urbana-Champaign

Master of Science in Statistics

2023 - 2024 (expected)

Concentration: Data Science
GPA: 4.0/4.0
Selected courses:

  • Applied Machine Learning
  • Statistical Learning
  • Deep Learning
  • Mathematical Statistics
  • Statistical Modeling
  • Advanced Data Analysis
  • Time Series Analysis
  • Big Data Analytics
  • Machine Learning on the Cloud

University of Illinois at Urbana-Champaign

Master of Science in Data Science + Civil and Environmental Engineering

2021 - 2023

Concentration: Construction Engineering and Management
GPA: 3.95/4.0
Selected courses:

  • Data Science for CEE
  • Machine Learning for CEE
  • Construction Optimization
  • Construction Data Modeling
  • Construction Engineering
  • Construction Planning
  • Cost Analysis

Universidad San Francisco de Quito

Bachelor of Science in Civil Engineering

2017 - 2021

Concentration: Civil Engineering
GPA: 3.5/4.0
Thesis: “Productivity in Construction: Measurement Methodologies in Ecuador”. Score: 99/100
Selected courses:

  • Numerical Analysis
  • Probability & Statistics
  • Earthquake Engineering
  • Structural Analysis
  • Geotechnical Engineering
  • Transportation Engineering
    The complete curriculum can be found here.

Languages

I highly value languages for the rich cultural connections they offer, as I believe they enable deeper engagement and insights across different regions and cultures. You can talk to me (limited in some cases!) in:

  • Spanish flag Spanish - Native Language
  • English flag English - Full Professional Proficiency
  • Italian flag Italian - Professional Working Proficiency
  • French flag French - Beginner
  • Arabic flag Arabic - Beginner

A Little More About Me

Here are some additional details about me:

  • I recently achieved a 1-year streak on Duolingo for practicing Italian, Arabic, and Korean!
  • So far, I have traveled to 18 countries and 24 states in the US.

  • I am committed to continuous learning and growth, and I am always looking for opportunities to expand my knowledge and skills. I cannot recommend enough Kurzgesagt, StatQuest, and Brilliant. Andrej Karpathy and his series series are amazing, and I am very excited about Eureka Labs!