Resume
Basics
| Name | Yuan (Gabriel) Zhang |
| Position | Research Assistant - Computer Vision & Agricultural Robotics at USDA ARS |
Education
-
Sep 2023 - Dec 2024 Evanston, IL
Master of Science
Northwestern University
Machine Learning and Data Science
- Deep Learning
- Large Foundation Models
- Natural Language Processing
- Predictive Analytics
- Cloud Engineering
-
Aug 2019 - Jun 2023 Irvine, CA
Bachelor of Science
University of California, Irvine
Data Science
- Machine Learning
- Probability and Statistics
- Algorithms
- Big Data Analytics
- Information Retrieval
- Database Management
Work
-
May 2025 - Present Fargo, ND
Research Assistant - Computer Vision & Agricultural Robotics
US Department of Agriculture (USDA) Agricultural Research Service (ARS)
- Developing real-time computer vision system enabling autonomous weed detection and precision spray control on Farm-ng Amiga robotic platform
- Architecting multi-process system with IPC and threading to ensure low-latency integration of depth sensing, GPS, and camera feeds
- Deploying PyTorch models on NVIDIA Jetson Xavier via TensorRT FP16, achieving 8.6x inference speedup (718ms → 84ms)
- Engineering dual-camera GUI integrating weed detection and obstacle avoidance with automated safety-stop functionality
-
Sep 2024 - Dec 2024 Evanston, IL
Data Science Consultant
Kavi Global
- Developed an LLM-powered lead qualification chatbot to route website visitors across 3 persona flows and drive consultation bookings—in production since February 2025
- Achieved 81% response accuracy across 180+ test cases by engineering GPT-4o prompts and dynamically leveraging website content as a RAG knowledge base
- Designed intent classification and prospect qualification workflow from greeting through service matching and case study recommendations; customized human handoff escalation for complex queries
- Recommended MS Copilot Studio over Dialogflow, Rasa, and Botpress for cost efficiency and Azure OpenAI integration
-
Jul 2024 - Sep 2024 Waltham, MA
Data Science Engineer Intern
SS&C Intralinks
- Engineered a microservice-ready language identification pipeline for scanned PDFs using CLIP and PyTorch, achieving 98% accuracy across 14 languages (~0.3s/doc)—projected to save $300K/year on OCR costs
- Designed image preprocessing to select text-heavy pages, improving top-1 accuracy from 89% to 98% by filtering blank and image-heavy content before inference
- Benchmarked 30 vision-language models on AWS GPU instances using Hugging Face Transformers for accuracy, latency, and memory trade-offs
- Implemented adaptive multilingual prediction with configurable confidence thresholds for OCR and RAG pipelines
-
Jun 2022 - Mar 2024 Irvine, CA
Full Stack Software Developer (Volunteer)
Irvine Canaan Christian Community Church
- Created a full-stack enrollment and attendance management system using Python, Flask, and MySQL for children's ministry, processing 80+ weekly check-ins/check-outs for 3+ years (since December 2022)
- Developed 8 data management dashboards and 15+ role-based pages serving 4 user roles with real-time validation
- Refactored REST APIs and reduced codebase by 3,600+ lines via SQLAlchemy optimization and DRY architecture
- Integrated barcode SDK for automated badge printing, enabling streamlined check-in workflow for staff and guardians
-
Jul 2021 - Aug 2021 Shanghai, China
Data Analyst Intern
Shanghai Daiqian Information Technology Co., Ltd.
- Conducted consumer research for pre-launch product testing, performing survey analysis and building internal tooling
- Analyzed 190+ consumer surveys using R; findings: 9/10 satisfaction, 86% recommendation rate, 87% purchase intent
- Created 20+ visualizations using ggplot2 (satisfaction radar charts, demographic profiles, efficacy heatmaps, usage treemaps), adopted in stakeholder presentations and consumer testing reports
- Built internal trial management system on Alibaba Cloud with automated tester registration, feedback reminders, and sample tracking for product development workflows
Projects
- Oct 2024 - Dec 2024
paper2summary
- Developed a scientific paper summarization system by LoRA fine-tuning Llama-3.2-1B-Instruct on 20K arXiv papers, training only 0.07% of parameters (~850K) with 10K token context support (~28 hours on single RTX A6000)
- Achieved +51% ROUGE-2 and +37% ROUGE-3 improvement over base model on 6,440-sample test set
- Oct 2023 - Dec 2023
Dillard's Black Friday Return Prediction
- Built ML pipeline to predict Black Friday purchase vs. return outcomes, reducing return-related costs for Dillard’s
- Queried 160M+ POS records and applied SMOTE for class imbalance; trained K-means + Logistic Regression ensemble
- Achieved 78% purchase precision and 58% return recall with 227% projected ROI (~$590K)
Awards
- 2023
- 2023
Dean's Honor List
University of California, Irvine
Skills
| Languages | |
| Python | |
| SQL | |
| R | |
| C++ |
| ML/DL | |
| PyTorch | |
| TensorFlow | |
| Transformers | |
| Scikit-learn | |
| XGBoost | |
| PySpark |
| Computer Vision | |
| OpenCV | |
| Torchvision | |
| Ultralytics | |
| TensorRT | |
| NVIDIA Jetson |
| NLP | |
| Sentence Transformers | |
| SpaCy | |
| NLTK | |
| BERTopic |
| Data Platforms | |
| PostgreSQL | |
| Spark | |
| Databricks | |
| BigQuery | |
| MongoDB | |
| Neo4j | |
| Pinecone |
| Tools | |
| AWS | |
| Docker | |
| Linux | |
| Git | |
| CI/CD | |
| W&B | |
| Streamlit | |
| Tableau |