Data Engineer III at Olaris โ I build HIPAA-compliant clinical data platforms at the intersection of biotech and cloud engineering โ with eyes firmly on Fintech and Big Tech. From Mumbai to Boston. I like hard problems, fast cars, and building things from scratch.
Experience & Affiliations
I grew up in Mumbai pulling computers apart and putting them back together โ that's where the engineering instinct came from. Computer Engineering at University of Mumbai, then Business Analytics at UConn, now building production data platforms at Olaris in Boston.
I care about systems that are fast, observable, and production-grade โ the kind built for real scale. HIPAA-compliant cloud infrastructure, sub-500ms APIs, data models designed to last. Outside of work I'm playing piano, following F1, or planning the next long drive.
I was also part of the team behind a Cell Press (iScience) publication in 2025 โ a kidney transplant biomarker classifier that outperformed serum creatinine. That's what good data engineering can do. ๐
When I'm not building pipelines โ
Every role added a new dimension โ scope, scale, and complexity growing year over year.
HIPAA-compliant, cloud-hosted biotech data platform on Azure โ governing clinical records with role-based access control, geo-fencing, field-level encryption, and full audit logging. Powers the diagnostic reporting pipeline for targeted mass spectrometry and NMR-based metabolomics.
Semantic search across 100K+ metabolites from two diagnostic platforms โ replaced keyword lookup with embedding-based retrieval, accelerating downstream analysis.
Deep learning classifier evaluating NMR signal peak quality across 10K+ instrument observations, dramatically cutting analyst manual review burden.
Real-time instrument monitoring database with automated MS Teams alerting pipeline tracking acquisition parameters and QC metrics across 200+ parameters.
TensorFlow NLP system automating ServiceNow IT incident routing at Cigna using topic modelling and lemmatization โ improved accuracy from 30% to 87%.
Migrated 100+ Excel and PowerPoint artifacts across 13 Boehringer Ingelheim research initiatives into a structured Cloudera Data Lake with YAML-defined schemas.
Modelled 400K+ student records from SAS to build interactive R Shiny and Tableau dashboards supporting real-time academic decision-making at UConn.
Designed a unified ID architecture across 7 entity types spanning clinical and research workflows โ eliminating identifier fragmentation and building a parent-child traceability model that handles pooled, derived, and collaborator samples.
Published a public technical deep-dive on Olaris's HIPAA-compliant order intake architecture โ translating internal engineering decisions into a mixed-audience write-up covering Azure Functions, data boundaries, and clinical data flow.
The HIPAA-compliant order intake workflow powering Olaris clinical diagnostics โ from lab partner to clinical report. Click any node to see the technology, rationale, and compliance detail. Read the Journal Club post โ
Six core domains of expertise โ hover any axis label to explore the underlying skills.
The best engineers are always in motion. Here's what I'm digging into outside of work.
Studying RL for optimization problems โ particularly how reward shaping maps to real-world data pipeline scheduling and resource allocation at scale.
Deep-diving OpenCV and modern CV architectures. Exploring applications in biomedical imaging and how vision models can complement metabolomics data.
Exploring how trading infrastructure handles ultra-low-latency data ingestion โ from LMAX Disruptor patterns to event sourcing at financial-grade throughput.
Experimenting with LangChain and fine-tuned models for clinical NLP โ extracting structured insights from free-text clinical notes at production scale.
A urine-based metabolite classifier (myOLARIS-KTdx) trained on 102 patients and validated on 43 achieves AUC 0.867–0.878 for detecting kidney transplant graft injury — significantly outperforming serum creatinine (AUC ≤0.65). The model differentiates under- vs. over-immunosuppression, offering a non-invasive alternative to biopsy for personalized clinical management.
AUC Comparison โ myOLARIS-KTdx vs Serum Creatinine (gold standard)
Higher AUC = better diagnostic discrimination. Serum creatinine is the current clinical gold standard.
Data systems should be built to handle all current scenarios โ but also designed with the flexibility to understand how processes and data change over time. Redesign should be almost frictionless.โ Jowin Jestine
In regulated systems, the ability to trace every state change isn't optional โ it's the product. Every write is a commitment, every read is a contract.
Define strict schemas at ingestion and output boundaries. Keep the transformation layer adaptable. When upstream sources change, only the boundary layer needs to move.
A fast wrong answer is worse than a slow right one โ especially in clinical or financial data. Optimize once correctness is provable and observable.
Monitoring tells you when something is broken. Observability tells you why, and what changed. I build systems I can reason about at 2am without the runbook.
Whether it's a data pipeline, a Fintech platform, or just a conversation about engineering at scale โ I'm all ears. I usually reply within 24 hours (unless there's a race weekend ๐๏ธ).