About Experience Projects Architecture Skills Radar Publications Philosophy Exploring Tools Contact

Hi, I'm Jowin Jestine

๐Ÿ‡ฎ๐Ÿ‡ณ Mumbai โ†’ ๐ŸŽ“ UConn โ†’ ๐Ÿ‡บ๐Ÿ‡ธ Boston, MA
I build  |

Data Engineer III at Olaris โ€” I build HIPAA-compliant clinical data platforms at the intersection of biotech and cloud engineering โ€” with eyes firmly on Fintech and Big Tech. From Mumbai to Boston. I like hard problems, fast cars, and building things from scratch.

0+ Years at Olaris
0+ Clinical Records
0% Query Latency Cut
Now โ†’
Building clinical data platforms at Olaris ยท Reading reinforcement learning & OpenCV research ๐Ÿ“– ยท Watching every F1 race weekend ๐ŸŽ๏ธ ยท Fragging in CS2 ๐ŸŽฎ ยท Spec-ing out next PC build ๐Ÿ–ฅ๏ธ ยท Practising piano ๐ŸŽน ยท Planning the next long drive ๐Ÿš— ยท Building clinical data platforms at Olaris ยท Reading reinforcement learning & OpenCV research ๐Ÿ“– ยท Watching every F1 race weekend ๐ŸŽ๏ธ ยท Fragging in CS2 ๐ŸŽฎ ยท Spec-ing out next PC build ๐Ÿ–ฅ๏ธ ยท Practising piano ๐ŸŽน ยท Planning the next long drive ๐Ÿš—
Scroll

Experience & Affiliations

Olaris, Inc. ยท Boehringer Ingelheim ยท Cigna ยท University of Connecticut ยท Cell Press ยท iScience ยท University of Mumbai

Engineer at Heart,
Builder by Nature

โšก Data Engineer III
๐Ÿ“ Boston, MA

I grew up in Mumbai pulling computers apart and putting them back together โ€” that's where the engineering instinct came from. Computer Engineering at University of Mumbai, then Business Analytics at UConn, now building production data platforms at Olaris in Boston.

I care about systems that are fast, observable, and production-grade โ€” the kind built for real scale. HIPAA-compliant cloud infrastructure, sub-500ms APIs, data models designed to last. Outside of work I'm playing piano, following F1, or planning the next long drive.

I was also part of the team behind a Cell Press (iScience) publication in 2025 โ€” a kidney transplant biomarker classifier that outperformed serum creatinine. That's what good data engineering can do. ๐ŸŽ“

๐Ÿ”ฌ
Biotech-Focused CLIA-compliant pipelines for MS & NMR data
โ˜๏ธ
Cloud-Native Azure infrastructure with ADLS Gen2 & Functions
๐Ÿค–
ML-Enabled XGBoost, deep learning & semantic search at scale

When I'm not building pipelines โ€”

๐ŸŽ๏ธ F1 fanatic ๐ŸŽฎ CS2 ๐Ÿ–ฅ๏ธ Building PCs ๐ŸŽน Grade 5 pianist ๐Ÿš— Long drives ๐ŸŒ Avid traveller ๐Ÿ Cricket ๐Ÿ— Anything chicken ๐Ÿ“– RL & CV research

Scale of Impact
Over Time

Every role added a new dimension โ€” scope, scale, and complexity growing year over year.

Where I've Made an Impact

Olaris, Inc.

Boston, MA
Data Engineer III 2025 โ€” Present
  • Engineered end-to-end analytics system using Azure Functions, FastAPI & PostgreSQL โ€” <0.5s response time, 75% latency reduction
  • Architected HIPAA-compliant cloud database with RBAC, geo-fencing & audit logs governing 1,200+ clinical records, saving $15K/yr in compliance costs
  • Integrated AI-driven semantic search across 100K+ metabolites from two diagnostic platforms, cutting retrieval time by 70%
Azure FunctionsFastAPIPostgreSQLHIPAASemantic SearchSQLMesh

Olaris, Inc.

Boston, MA
Data Engineer I 2023 โ€” 2024
  • Developed deep learning model evaluating NMR signal peak quality across 10K+ observations, improving accuracy by 35% and cutting manual review by 72%
  • Built reusable Python ML libraries standardising NMR ingestion and feature extraction for research pipelines
  • Deployed FastAPI monitoring service with real-time MS Teams alerts across 200+ instrument parameters
Deep LearningNMRPythonFastAPIMage AIDocker

Boehringer Ingelheim

Hartford, CT ยท via Yoh
Data Engineer, Biotherapeutics 2022 โ€” 2023
  • Migrated 100+ Excel/PowerPoint artifacts across 13 research initiatives into relational databases, enabling Cloudera Data Lake integration
  • Defined data schemas via YAML, reducing onboarding time by 30% and improving cross-team data consistency
  • Supported HIPAA compliance strategies for multi-site data warehouses, reducing regulatory risk across clinical analytics
ClouderaPythonSQLYAMLHIPAA

University of Connecticut

Storrs, CT
Data Analyst, Budget Planning & Institutional Research 2021 โ€” 2022
  • Queried and modelled 400K+ records from SAS using SQL to support student success and retention analysis
  • Built interactive dashboards in R Shiny and Tableau enabling real-time decision-making across academic departments
SQLSASR ShinyTableau

Cigna

Hartford, CT
Graduate Data Science Consultant 2021 โ€” 2022
  • Trained TensorFlow neural network automating ServiceNow incident routing โ€” improved accuracy from 30% to 87%
  • Applied NLP (lemmatization, topic modelling) reducing resolution time by 50%
  • Delivered real-time classifier with 96% recall and 80% precision for incident prioritisation
TensorFlowNLPPythonServiceNow

Things I've Built

๐Ÿค– Machine Learning

AI Metabolite Search Engine

Semantic search across 100K+ metabolites from two diagnostic platforms โ€” replaced keyword lookup with embedding-based retrieval, accelerating downstream analysis.

70% faster retrieval
Semantic SearchPythonAzureFastAPI
๐Ÿง  Deep Learning

NMR Signal Quality Model

Deep learning classifier evaluating NMR signal peak quality across 10K+ instrument observations, dramatically cutting analyst manual review burden.

35% accuracy gain ยท 72% less review
Deep LearningPythonNMRscikit-learn
โš™๏ธ Data Engineering

Live NMR Monitoring System

Real-time instrument monitoring database with automated MS Teams alerting pipeline tracking acquisition parameters and QC metrics across 200+ parameters.

200+ params ยท real-time alerts
FastAPIMage AIPostgreSQLDocker
๐Ÿ”ค NLP

IT Incident Routing Neural Net

TensorFlow NLP system automating ServiceNow IT incident routing at Cigna using topic modelling and lemmatization โ€” improved accuracy from 30% to 87%.

30% โ†’ 87% accuracy ยท 96% recall
TensorFlowNLPPythonServiceNow
โ˜๏ธ Data Engineering

Biotech Research Data Migration

Migrated 100+ Excel and PowerPoint artifacts across 13 Boehringer Ingelheim research initiatives into a structured Cloudera Data Lake with YAML-defined schemas.

13 initiatives ยท 30% faster onboarding
ClouderaPythonSQLYAML
๐Ÿ“Š Analytics

Institutional Research Dashboard

Modelled 400K+ student records from SAS to build interactive R Shiny and Tableau dashboards supporting real-time academic decision-making at UConn.

400K+ records ยท live dashboards
R ShinyTableauSQLSAS
๐Ÿ—‚๏ธ System Design

Global Sample Traceability System

Designed a unified ID architecture across 7 entity types spanning clinical and research workflows โ€” eliminating identifier fragmentation and building a parent-child traceability model that handles pooled, derived, and collaborator samples.

7 entity types ยท full lifecycle traceability
System DesignPostgreSQLADLS Gen2Python
๐Ÿ“ Technical Writing

Olaris Journal Club Write-up

Published a public technical deep-dive on Olaris's HIPAA-compliant order intake architecture โ€” translating internal engineering decisions into a mixed-audience write-up covering Azure Functions, data boundaries, and clinical data flow.

HIPAA architecture ยท public deep-dive
Technical WritingAzure FunctionsHIPAAArchitecture

Clinical Order Intake Architecture

The HIPAA-compliant order intake workflow powering Olaris clinical diagnostics โ€” from lab partner to clinical report. Click any node to see the technology, rationale, and compliance detail. Read the Journal Club post โ†—

Skills Radar

Six core domains of expertise โ€” hover any axis label to explore the underlying skills.

What's Next

The best engineers are always in motion. Here's what I'm digging into outside of work.

๐Ÿค–

Reinforcement Learning

Studying RL for optimization problems โ€” particularly how reward shaping maps to real-world data pipeline scheduling and resource allocation at scale.

Actively reading
๐Ÿ‘๏ธ

Computer Vision & OpenCV

Deep-diving OpenCV and modern CV architectures. Exploring applications in biomedical imaging and how vision models can complement metabolomics data.

Experimenting
โšก

Real-Time Data at Microsecond Latency

Exploring how trading infrastructure handles ultra-low-latency data ingestion โ€” from LMAX Disruptor patterns to event sourcing at financial-grade throughput.

Deep dive
๐Ÿง 

LLMs for Structured Data

Experimenting with LangChain and fine-tuned models for clinical NLP โ€” extracting structured insights from free-text clinical notes at production scale.

Building

Tools of the Trade

๐Ÿ”ง

Data Engineering

PostgreSQL Mage AI SQLMesh Azure Data Lake Docker Bitbucket CI/CD Azure Functions OpenMetadata Great Expectations
๐Ÿ’ป

Programming

Python SQL R FastAPI TensorFlow PyTorch scikit-learn Pandas Spark
โ˜๏ธ

Cloud & Infra

Microsoft Azure Docker Linux Cloudera ADLS Gen2 Azure Blob Storage pgAudit
๐Ÿค–

Machine Learning

XGBoost Deep Learning NLP Semantic Search Symbolic Regression Nested CV Predictive Modelling
๐Ÿ“Š

Visualisation

Tableau R Shiny Power BI Apache Superset Plotly
๐Ÿงฌ

Scientific Domain

NMR Metabolomics Mass Spectrometry CLIA Compliance HIPAA GxP Automation ELN Integration

Published Work

iScience Cell Press · 2025

Urinary metabolite signatures to detect and differentiate graft injury in kidney transplant patients

Chen Dong, Alessia Trimigno, Jowin Jestine, Jifang Zhao, Elizabeth M. O'Day, Dirk R. Kuypers

A urine-based metabolite classifier (myOLARIS-KTdx) trained on 102 patients and validated on 43 achieves AUC 0.867–0.878 for detecting kidney transplant graft injury — significantly outperforming serum creatinine (AUC ≤0.65). The model differentiates under- vs. over-immunosuppression, offering a non-invasive alternative to biopsy for personalized clinical management.

0.878 Validation AUC
145 Patients
86.4% Accuracy

AUC Comparison โ€” myOLARIS-KTdx vs Serum Creatinine (gold standard)

Higher AUC = better diagnostic discrimination. Serum creatinine is the current clinical gold standard.

How I Think About
Data Systems

๐Ÿ”’

Audit logs are a feature, not overhead

In regulated systems, the ability to trace every state change isn't optional โ€” it's the product. Every write is a commitment, every read is a contract.

๐Ÿ“

Data contracts at the boundary, flexibility inside

Define strict schemas at ingestion and output boundaries. Keep the transformation layer adaptable. When upstream sources change, only the boundary layer needs to move.

โšก

Correctness first, then performance

A fast wrong answer is worse than a slow right one โ€” especially in clinical or financial data. Optimize once correctness is provable and observable.

๐Ÿ”ญ

Observability is not monitoring

Monitoring tells you when something is broken. Observability tells you why, and what changed. I build systems I can reason about at 2am without the runbook.

Academic Foundation

๐ŸŽ“

Master of Business Analytics & Project Management

University of Connecticut Hartford, CT
โš™๏ธ

Bachelor of Engineering in Computer Engineering

University of Mumbai Mumbai, India

Let's Build Something Together

Whether it's a data pipeline, a Fintech platform, or just a conversation about engineering at scale โ€” I'm all ears. I usually reply within 24 hours (unless there's a race weekend ๐ŸŽ๏ธ).