0
Loading
ML Systems Engineer · Data Infrastructure · UC San Diego

Building data systems that actually scale.

ML Systems Engineer with 2 years of production experience building the infrastructure AI runs on - real-time pipelines, feature stores, model serving, and the observability layer that keeps it all reliable. From streaming data at scale to deploying models in the cloud, I own the full production loop.

About me
Atharva Hirulkar

I build the systems that make ML work in production.

I'm an ML Systems Engineer with hands-on experience building production data infrastructure at serious scale. At State Street Corporation, I owned Snowflake pipelines processing ~5M records/day, monitored ~$1B in real-time transaction flows, and engineered ISO 20022-compliant payment workflows.

I've architected multi-cloud infrastructure across AWS, Azure, and OCI using Terraform - and automated operations with Ansible, cutting provisioning time by ~40%. I care about reliability, not just velocity.

Currently pursuing an MS in Data Science at UC San Diego (GPA 3.80), deepening expertise in scalable data systems, statistical NLP, and optimization. I hold a granted copyright in biomedical time-series ML.

Data Systems
ETLSnowflakeKafkaSparkAirflowPostgreSQLTimescaleDBNeo4jQdrantData Warehousing
Machine Learning
PyTorchTensorFlowScikit-learnMLflowMLOpsNLPTransformersHugging FaceFeature EngineeringAnomaly DetectionTime-Series Forecasting
Cloud & DevOps
AWSAzureOracle CloudTerraformAnsibleDockerKubernetesCI/CDGrafana
Programming & Tools
PythonSQLShell / BashPandasNumPyNetworkXPlotlyMatplotlib
Career

Where I've worked
& what I shipped.

Jul 2023 - Aug 2025
State Street Corp.
Bangalore, India
Data Engineer
  • Built and owned SQL + Python ETL pipelines on Snowflake processing 5M+ records/day, ensuring 99.9%+ uptime compliance with enterprise SLAs across production data workflows.
  • Architected and provisioned multi-cloud data infrastructure (OCI, AWS, Azure) using Terraform IaC, enabling scalable, zero-downtime deployments across production data pipelines.
  • Automated data environment provisioning and configuration using Ansible playbooks and Shell scripting, reducing manual operational overhead by 40% across production data environments.
  • Implemented Azure DevOps CI/CD for data services, cutting release cycles by 40% (5 days → 3 days) and achieving 99.8% deployment success across cloud data workflows.
PythonSQLSnowflakeTerraformAnsibleAzure DevOpsAWS
Jan 2023 - Jul 2023
State Street Corp.
Hyderabad, India
Payment Systems Analyst - Intern
  • Built real-time monitoring dashboards for $1B daily transaction flows (LYNX, CHIPS, TARGET2), detecting 98% of pipeline failures within 30 seconds and sustaining 99.9% settlement uptime.
  • Engineered ISO 20022 payment data workflows in Shell and PL/SQL, raising data-validation accuracy from 96% to 99.4% on high-volume financial messaging pipelines.
PL/SQLShellISO 20022Financial Messaging
"All models are wrong, but some are useful!" - George Box
Work

Things I've built.

01
SignalStack
End-to-end ML systems pipeline. Polygon.io WebSocket → Kafka → PySpark Structured Streaming → TimescaleDB feature store → three concurrent models (LSTM, LightGBM, Isolation Forest) → FastAPI at <10ms p99. Point-in-time correct features, PSI drift monitoring, live Grafana dashboards.
KafkaPySparkTimescaleDBGrafanaPolygon.ioPython
02
FraudLens
Production ML system for real-time fraud detection with explainable AI. IEEE-CIS · XGBoost · LightGBM · MLflow · FastAPI · AWS ECS Fargate + ALB · Qdrant · Airflow · GitHub Actions CI/CD.
XGBoostLightGBMFastAPIAWS ECS FargateMLflowAirflow
03
CosmeTik
Multi-database skincare analytics platform. Unified product + review pipeline across PostgreSQL, Neo4j (entity graphs), and Qdrant (vector search). 1M+ reviews, hybrid ML recommender.
PostgreSQLNeo4jQdrantPythonNLPRecommendation Engine
04
PulseMLCopyright L-114951/2022 (Gov. of India)
Wearable ECG/PPG physiological monitoring. LSTM-based arrhythmia detection with custom IoT data pipeline. Core ML framework for real-time anomaly alerts on biosignals.
LSTMPyTorchSignal ProcessingIoTPythonFeature Engineering
05
Multilingual Speech Transcription
Video-to-text pipeline supporting 100+ languages. BART-Large-CNN abstractive summarization with multilingual encoder. Flask API serving transcription + translation.
BARTHugging FaceNLPFlaskPythonTransformers
06
Seismic Risk AtlasDataHacks 2026
🏆 Best Use of Marimo & Sphinx
Block-level earthquake loss estimator for LA County's 2,498 census tracts. Physics-based ground-motion simulations (500 M6.7 scenarios, Scripps Institution) fused with Zillow ZHVI housing values and ACS census demographics via KD-tree spatial joins on Databricks Spark. FEMA HAZUS fragility curves applied per building code era; Monte Carlo aggregation over all scenarios preserves damage-function nonlinearity. XGBoost damage model GPU-trained on NVIDIA L40s (R² 0.99996, 0.70s). Interactive Leaflet choropleth map with FastAPI + OpenAI tract-level plain-English summaries.
PythonMarimoDatabricksSparkFastAPILeaflet.jsXGBoostSphinxFeature EngineeringOpenAI
Certifications

Credentials &
in progress.

JP Morgan Quantitative Research
✓ Certified
Qdrant Vector DB Essentials
✓ Certified
Qdrant Multi-Vector Search
✓ Certified
Neo4j Graph Data Science
✓ Certified
OCI Architect Associate
✓ Certified
OCI Foundations Associate
✓ Certified
Introduction to Data Analytics
✓ Certified
More coming…
Education
University of California, San Diego
M.S. in Data Science · Sept 2025 - Dec 2026
3.80 / 4.00 GPA
ML Systems · Causal Inference · Optimization · Scalable Data Systems · Statistical NLP · Advanced Data Mining · Data Ethics
Savitribai Phule Pune University
B.Tech. in Computer Engineering · Aug 2019 - Jun 2023
3.91 / 4.00 GPA
Applied Mathematics · Linear Algebra · Machine Learning · BI & Data Analytics · Artificial Intelligence · DBMS · Algorithms
More About Me

Curious who I really am?

Beyond code and data systems, there's more to explore. Check out my personal insights, interests, and the journey behind the engineer.

Contact

Got an
interesting
problem?

Email
atharvahirulkar.010@gmail.com
Location
San Diego, CA