UX research · health intelligence · 2022–2026

Becoming
fluent.

Not to optimize. To understand. Four years of wearable data, two blood panels, and a personal annotation layer — connected into a question: does the life I'm living support the life I want?

624K

heart rate readings

8,300+

miles logged

101

biomarkers tracked

13 days

of early HRV + temp signals
before fever peak

Case study · February 2026

My watch knew
before I did.

I was sore from a hot stone deep tissue massage and acupuncture, convinced the worst part was dragging my ski bag through an airport. Meanwhile, HRV and wrist temperature had been signaling something else for 13 days — quietly, without a single alert.

38.2°

fever peak · Feb 24

Resting HR deviation from personal baseline · Feb 11 → Mar 2

Feb 11 Mar 2

Resting HR peak

+33 above baseline

HRV floor (ms)

−29 below baseline

22.1

Resp rate peak

+5.2 above baseline

5.3h

Sleep · fever night

−1.7h below average

What happened — hover bars above to highlight

01 · The problem

The data exists.
The language doesn't.

One in three Americans wears a health tracking device. By 2025, that number is approaching 100 million — a dataset of extraordinary longitudinal depth sitting largely unread on people's wrists. The hardware problem is solved. The interpretation problem isn't.

I've worn an Apple Watch for four years. I knew my HRV was 38ms. I had no idea what that meant for my life — until I built something that showed it alongside everything else that was happening: a hard run in Taiwan heat, a flight to Japan, a fever I didn't feel coming.

The number didn't change. My understanding of it completely did.

Apple Health gives you measurements. It has never given you self-knowledge. There's a meaningful gap between "your resting heart rate was 62 bpm" and "you were chronically under-recovered for three weeks before you got sick." The first is a measurement. The second is a story about your body that you could actually act on.

The conditions that limit life in your 60s and 70s are developing silently in your 30s and 40s. The window for meaningful intervention is exactly the period when nothing feels wrong. Longitudinal data makes the slow drift visible before it becomes irreversible — not to create anxiety about the future, but to make today's choices feel connected to the life you actually want.

Existing tools are solving the aggregation problem well — connecting wearables, labs, and health apps into one place with daily recommendations and coaching. But optimization advice is only useful if you understand the body you're optimizing. The missing piece isn't more data or better recommendations. It's the interpretive layer that helps you become fluent in a language your body is already speaking.

The question isn't "how do I optimize my health metrics?" It's "does the way I'm living support the life I want — in ten years, in thirty?" Skiing suppresses HRV for three days. Worth it. A slow upward creep in resting HR over 18 months is barely perceptible day to day — and potentially significant at 58. The design problem is making those tradeoffs visible early enough to matter, without turning health into a performance metric to be maximized.

~100M

US wearable users by 2025

The data exists at scale · eMarketer

0.8%

Adoption rate for adults 75+

Those who need it most have it least · Market.us

10M

Healthcare worker shortage by 2030

WHO projection · NCBI 2025

50%

Chronic disease treatment adherence

Long-term therapy patients · WHO

Sources: eMarketer 2025 · Wearable Technology Statistics 2025 · WHO — Chronic Disease Adherence · WHO — Healthcare Worker Shortage

The goal isn't perfect metrics. It's a life worth living, sustained for as long as your body can support it. The design question is how to help people understand the relationship between the behaviors they choose today and the capabilities they want to have tomorrow.

Why three layers are necessary

Passive sensing

Continuous · objective · contextless

Knows: HRV, HR, sleep, temp, steps — every second, for years

Cannot know: why the signal changed

Periodic clinical

Precise · validated · temporally sparse

Knows: LDL, cortisol, ApoB — precisely, at a moment in time

Cannot know: the days around the draw

Lived context

Irreplaceable · narrative · bias-prone

Knows: the 9-mile run, the flight, the night out, the onsen

Do not necessarily know: what the sensors mean

Interpretive layer

What none can say alone

02 · The data

Four years of signal.
Two blood draws.
One annotation layer.

This project uses three data sources with fundamentally different temporal resolutions and epistemological properties. No single source is sufficient. The wearable without blood panels misses upstream causes. Blood panels without the wearable miss day-to-day dynamics. Both without lived context miss the story that explains the pattern.

All processing happens locally. No health data is stored externally, transmitted to third parties, or included in this repository. Charts and visualizations use real findings; the underlying data files remain private.

One methodological decision shapes every interpretation in this project: personal baseline computation. Every metric is compared against a rolling 90-day personal norm — not population averages. A 35ms HRV means something different for a 55-year-old than a 30-year-old, and something different for this specific body than for either of them. Personal baseline sidesteps population-level confounders entirely.

The annotation layer is what transforms data into narrative. Without knowing about the 9-mile run, the flight to Taiwan, the night out, the hot stone massage — the February signals are uninterpretable. Context isn't supplementary. It's load-bearing.

Passive sensing

Continuous · every few seconds

624K

Heart rate readings

6,753

HRV readings

1,563

Resting HR readings

586

VO2 max estimates

12,906

Sleep stage records

142

Wrist temp readings

Apple Watch Series 6 (Jan 2022–Dec 2025) and Series 10 (Dec 2025–present). Parsed from raw HealthKit XML export using a custom Python pipeline. Series 10 adds wrist temperature and improved sleep staging.

Activity data

Per workout · 1,621 sessions

5,051

Cycling miles

1,248

Running miles

940

Skiing miles

Sport types tracked

460

Hours of activity (2023 alone)

8,300+

Total miles logged

Parsed from HealthKit workout records. Deduplicated across simultaneous sources (Apple Watch, Strava, Runna, Ikon Pass) using a 70% temporal overlap threshold. Distance extracted from both WorkoutStatistics and MetadataEntry fields depending on activity type.

Periodic clinical

Twice · Nov 2025

101

Biomarkers parsed

Draws — same month

Biomarker categories

Flagged out of range

Function Health comprehensive panels. Two draws in November 2025 — cycle day 3 (menstrual phase) and 4 days post-period (early follicular) — capture hormonal variation across the cycle. Parsed from PDF using pdfplumber with custom regex extraction. Cycle phase tagged as clinical context for interpretation.

Lived context

Event-driven · manually annotated

30+

Life events annotated

Life phases defined

Ski trips documented

Illness events with full arc

Manually curated annotation layer connecting life events to physiological signals. Categories: training milestones, illness events, travel, stress, recovery modalities, medical procedures, and device changes. Without this layer, the February signals are noise. With it, they're a story.

A note on n=1

This is a single-subject longitudinal study. The findings are not generalizable to populations — they are specific to one body, one life, four years of continuous measurement. That limitation is also the point. Personal health intelligence only works because it's deeply individual. Population averages erase the variation that makes any individual reading meaningful. The design question isn't how to scale these specific findings. It's how to make this kind of individual longitudinal analysis accessible to anyone.

03 · Findings

Three stories
the data told.

Finding A

The fever arc

On February 11th — 13 days before a fever peaked at 38.2°C — HRV dropped 10ms below personal baseline and wrist temperature climbed 1.0° above baseline. Resting HR was normal. No symptoms. Neither signal alone would have been alarming — HRV fluctuates daily, and a 1° wrist temp deviation sits within normal range. Together, and in retrospect, they were the first readable signal of what was coming.

By February 23rd, all four signals had converged — resting HR +33 above baseline, HRV −29ms, respiratory rate elevated, wrist temperature +2.7°. The full picture took 13 days to develop. The first signal appeared in under 24 hours of whatever triggered the immune response.

A second illness event in March 2025 produced a completely different signature — sudden onset within 24 hours of exposure to a sick travel companion. Two illness events, two distinguishable causal signatures, both readable from wrist sensors alone.

Finding B

The tennis discovery

Across 1,621 logged workouts spanning 9 sport types, tennis produced the strongest positive HRV signal of any activity — +4.8ms above personal baseline the day after, sustained for three days. This is stronger than cycling, running, HIIT, or strength training.

I had 12 tennis sessions in the dataset. I had no idea. No existing health app would have surfaced this finding.

Tennis +2.3+4.8 +4.3+3.1

HIIT −3.8+0.6 +2.4+3.0

Cycling −1.1+0.4 +1.1+1.0

Running −0.3+0.2 +1.6+0.0

Skiing −10.4 −6.2 −3.8 −0.8

Finding C

Four years of adaptation

Resting HR dropped 4.5 bpm over four years. HRV improved 23.8%. VO2 max reached 40.2 from cross-country skiing with a dog, collapsed to 29.6 as training began, then climbed back to 38.4 through half marathon training. The body gets worse before it gets better.

04 · The experiment

Predictions made
before the results.

Written May 25, 2026 — before the draw

Seven predictions made before seeing midyear results. Based on November 2025 baseline panel, six months of wearable data, and physiological mechanisms of endurance training adaptation. Results update here when they arrive ~June 10.

↑ Expected to improve

LDL Particle Number

Nov baseline: 1213 H (optimal <1138)

Endurance training upregulates LDL receptor activity. Effect visible by 3–4 months.

↑ Expected to improve

LDL Small Particles

Nov baseline: 201 H (optimal <142)

Exercise shifts LDL distribution toward larger, less atherogenic particles.

↑ Expected to improve

LDL Peak Size

Nov baseline: 220.5 Å · optimal >222.9

Borderline low in November. Same training adaptation pathway as LDL particle number — expect shift toward larger, less atherogenic particles.

→ Expected stable

Apolipoprotein B (ApoB)

Nov baseline: 72 mg/dL · optimal <90

Already optimal in November. Hard to improve meaningfully from a good baseline. Expect stable.

→ Expected stable

Lipoprotein (a)

Nov baseline: 18 nmol/L · optimal <75

~90% heritable — not modifiable by exercise or diet. 18 nmol/L is very low risk. Expect identical reading.

→ Expected stable or lower

hs-CRP

Nov baseline: 0.4 mg/L · optimal <1.0

Outstanding baseline — essentially no chronic inflammation despite heavy training load. Draw is before race day so won't reflect acute inflammation. Expect similarly low reading.

↓ Watch closely

MCHC + Ferritin

MCHC borderline low Nov 2025

Endurance running increases iron demand. Female athletes particularly vulnerable.

05 · The system

Six design decisions.
Each one a deliberate choice.

The technical pipeline — Python, pandas, pdfplumber, Streamlit, the Claude API — is the scaffolding. The design decisions are what make it meaningful. Each component answers a specific question that existing health tools leave open.

The system is intentionally local. All data processing happens on device. No health data is transmitted externally. Privacy isn't a feature — it's a precondition for the kind of honest self-reflection this kind of tool requires.

Personal baseline

Every metric is computed against a rolling 90-day personal norm — not population averages. A 35ms HRV means something different for a 55-year-old than a 30-year-old, and something different for this specific body than for either of them.

Why: population averages erase the variation that makes any individual reading meaningful. Deviation from your own norm is the signal. Deviation from average is noise.

The weekly letter

AI generates a weekly narrative in plain language — warm, curious, specific. Not a dashboard. Not a score. A letter that reads like it came from someone who actually understands what happened to your body this week.

Why: dashboards optimize for monitoring. Letters optimize for reflection. The goal is self-understanding, not surveillance. Weekly cadence respects attention as a finite resource.

Epistemic humility

"I can't tell from sensors alone" appears when multiple explanations are equally plausible. The system names ambiguity explicitly rather than picking the most likely interpretation and stating it as fact.

Why: a system that is confidently wrong in ways the user can't detect is more dangerous than one that acknowledges uncertainty. Most health AI overclaims — treating correlation as causation, pattern as prediction, and sensor data as clinical truth. The design decision here was to build a system that knows what it doesn't know — and says so.

Signals by sport

Toggle any sport on or off to see how it shaped cardiovascular metrics over time. The underlying analysis computes HRV response from day-of through seven days after each activity — surfacing patterns no training log would show.

Why: tennis produced the strongest positive HRV signal of any sport — +4.8ms the day after, sustained for three days. This finding was invisible until the cross-reference was built. The toggle makes the analysis explorable rather than just reportable.

Blood panel integration

PDF extraction pipeline parses 100+ biomarkers from Function Health reports, tagged with cycle phase context. Two draws in the same month — cycle day 3 and early follicular — capture hormonal variation that a single snapshot misses.

Why: wearables see downstream effects. Blood panels see upstream causes. LDL particle number was elevated in November despite normal calculated LDL — a finding a standard panel would miss. The interpretive value is in the cross-reference, not either source alone.

Health Wrapped

Annual year-in-review that transforms four years of data into a navigable personal narrative. VO2 max peaked at 40.2 from cross-country skiing with a dog. Collapsed to 29.6 as training began. Climbed back to 38.4 through half marathon training.

Why: the Wrapped format removes the introspection requirement entirely. The data speaks — the user's job is to recognize their own life in it. Longitudinal patterns are invisible in daily views. They're unmistakable across years.

Built with

Python pandas pdfplumber Streamlit Plotly Claude API Apple HealthKit Function Health Chart.js Vanilla JS

06 · Reflection

What this means
for health UX.

What worked

Personal baseline computation sidesteps population-level confounders entirely. The annotation system — connecting life events to physiological signals — is what transformed data into narrative. The weekly letter format is naturally resistant to gamification.

The n=1 limitation is real but also the point. This kind of longitudinal personal health intelligence only works because it's deeply individual.

The most important design decision in this project isn't a feature — it's an epistemological stance. The system treats sensor data as probabilistic, not deterministic. Patterns as hypotheses, not conclusions. Correlation as worth investigating, not proof of causation. That stance isn't a limitation to apologize for. It's the only intellectually honest position available — and it turns out to be what makes the system trustworthy rather than just confident.

What's hard

Medications, chronic conditions, and hormonal variation create interpretation confounders the system can't see. Memory degrades — contextual recall two weeks later is less accurate than in-the-moment prompting. Motivated reasoning means people sometimes hear what they want to hear.

The design response to all three is the same: ask better questions rather than claiming certainty the system doesn't have.

Design philosophy

Optimization vs. understanding

Most health tools are built around a coaching model — daily plans, recommendations, nudges toward a better version of yourself. That model is actionable and useful. But a plan is only as good as its understanding of your specific context. Generic advice applied to the wrong week — a fever recovery, a grief period, a ski trip — isn't just unhelpful. It's actively misleading.

Fluent starts from a different premise: understanding isn't the same as optimizing. Before you can make good decisions about your health, you need to understand what your current patterns actually cost — and whether those costs align with the life you want. The weekly letter doesn't tell you what to fix. It asks what you noticed, and what you want your body to be able to do in ten years.

The next design horizon

The life layer

Most of what shapes your physiology happens when you're not exercising.

Every finding in this project came from the intersection of sensor data and annotated life context. The fitness layer is tractable — workouts are logged, distances measured, heart rate captured. But relationships, emotional state, food, and stress are invisible to sensors.

The highest HRV reading in four years of data followed a day of emotional clarity — not a training adaptation. The body keeps score of things no health app thinks to ask about.

The recovery layer

The sensor captures the effect. It has no idea what caused it.

Recovery modalities — gua sha, fascial release, neck massage, professional bodywork — all produce measurable autonomic responses that show up in overnight HRV. The minimum viable annotation for a complete picture includes not just what you did for exercise, but what you did to recover.

People develop sophisticated intuitive knowledge about their own bodies through years of paying attention. This embodied knowledge is invisible to sensors and rarely captured in health apps.

The minimum viable context question

How much does a user need to share?

Food, mood, relationships, work stress, creative fulfillment — all of it shows up in HRV and resting HR, but none of it is captured by sensors alone.

From this dataset, the answer seems to be: big life events, substance use, training load, and cycle phase. Maybe 30 seconds of annotation on a significant day. Not a food diary. Not a mood tracker. Just the things that actually move the needle — and one good question asked at the right moment.

The next research question

This project documented two illness events with completely different causal signatures — contact transmission (sudden signal onset within hours of exposure) and accumulated immune collapse (gradual 13-day deterioration from compounding stressors). Both were readable from wrist sensors alone, with distinguishable lead-up patterns and different intervention windows. Whether these signatures are reliably distinguishable at scale — across bodies, conditions, and contexts — is the question worth asking next.

The data to understand your own body already exists. But understanding isn't the same as optimizing. The goal isn't perfect metrics — it's a life worth living, sustained for as long as your body can support it.

Skiing suppresses HRV for three days. I ski anyway. The system didn't change that decision. It changed my relationship to it — I know what it costs, I know how to recover, and I know it's worth it. That's not optimization. That's agency.

The populations who would benefit most from this kind of longitudinal self-knowledge aren't elite athletes chasing performance. They're people trying to understand whether the life they're living today supports the life they want in ten, twenty, fifty years. That's the question worth designing for.

View on GitHub ↗ Portfolio ↗

Becomingfluent.

The data exists.The language doesn't.

Four years of signal.Two blood draws.One annotation layer.

Three storiesthe data told.

The fever arc

The tennis discovery

Four years of adaptation

Predictions madebefore the results.

Six design decisions.Each one a deliberate choice.

Personal baseline

The weekly letter

Epistemic humility

Signals by sport

Blood panel integration

Health Wrapped

What this meansfor health UX.

What worked

What's hard

Optimization vs. understanding

Most of what shapes your physiology happens when you're not exercising.

The sensor captures the effect. It has no idea what caused it.

How much does a user need to share?

Becoming
fluent.

The data exists.
The language doesn't.

Four years of signal.
Two blood draws.
One annotation layer.

Three stories
the data told.

Predictions made
before the results.

Six design decisions.
Each one a deliberate choice.

What this means
for health UX.