Habits, Trust, and the Korean Telemedicine Collapse

Abstract

We exploit South Korea’s 2020–2024 telemedicine policy reversals — a near-universal single-insurer setting with linked health-claims data at population scale — to separate eligibility, provider-supply, and behavioral channels in older adults’ healthcare choices. The June 2023 pilot rolled back an emergency exemption regime and reduced telemedicine to 0.16% of NHIS claims and 0.06% of expenditure, despite an eligibility rule explicitly targeting chronic-disease elderly patients. Using double/debiased machine learning (Chernozhukov et al. 2018) with BEHRT-style claims-sequence representations (Li et al. 2020) that supervised probes show are predictive of behavioral proxies, we estimate the behavioral residual on an always-eligible, always-supplied subpopulation ( $N \approx 140{,}000$ ): a $\approx 5.6$ percentage-point fall in conditional telemedicine probability after partialing out eligibility binding and provider exit. A generalized random forest (Athey, Tibshirani, & Wager 2019) decomposes the heterogeneity: roughly half of the variance is attributable to habit strength and trust/relational effects, with smaller contributions from mental accounting, dyad-level coordination failure, and default sensitivity. Under empirical welfare maximization (Athey & Wager 2021; Kitagawa & Tetenov 2018), a regulator-implementable depth-three decision tree recovers 84% of oracle welfare at the realized pilot’s fiscal envelope; the realized pilot rule captures 19%. The tree’s dominant splits — pre-pilot habit streak and provider continuity — coincide with the dominant residual mechanisms, suggesting that an eligibility rule keyed on behavioral history rather than diagnostic class alone could have averted most of the welfare loss. A companion discrete choice experiment is proposed to identify the structural primitives the panel cannot reach.

Keywords: behavioral health economics, telemedicine, natural experiment, double/debiased machine learning, causal forests, policy learning, sequence transformers, electronic health records. JEL classification: I11, I18, C14, C45, D91.

TL;DR

Korea’s emergency telemedicine exemption (Feb 2020 – May 2023) was replaced in June 2023 by a restrictive pilot that explicitly carved in chronic-disease elderly patients — the population that engaged most under the exemption.
Under the pilot, telemedicine collapsed to 0.16% of NHIS claims and 0.06% of expenditure, an order-of-magnitude drop the eligibility design alone cannot explain.
The paper estimates the H3 behavioral residual on the always-eligible, always-supplied subpopulation $\mathcal{S}$ using double/debiased ML + generalized random forests, with claims-sequence representations (BEHRT) whose probe-derived axes proxy interpretable behavioral features (§7.3).
§6 decomposes H3 into psychological channels (default reversal, habit extinction, mental accounting, trust, effort recalibration) and game-theoretic channels (coordination focal-point, audit signaling, policy inconsistency, relational contracts).
§7 operationalizes each mechanism as an observable proxy from claims data and tests it via subgroup $\hat\tau$ , BLP decomposition, embedding probes, and a bipartite patient-provider GNN.
Contribution: quantify which of the §6 channels drive the H3 residual, enabling an optimal-policy counterfactual (Athey & Wager 2021) that redesigns eligibility for welfare gain under a fixed claims-share budget.

1. Introduction

The Republic of Korea’s National Health Insurance Service (NHIS) operates a universal, single-insurer system covering approximately 97% of the population, with the remainder covered by the tax-financed Medical Aid Program for low-income households (Seong et al., 2017). Since 2011, the integrated National Health Information Database (NHID) has linked five administrative sub-databases — eligibility, national health screening, healthcare utilization, long-term care, and provider — for the full population of approximately 51 million people, producing one of the most complete population-scale records of healthcare consumption available in any OECD country.

1.1 The NHIS research cohorts

For research use, NHIS maintains several public-use sample cohorts:

Cohort	Coverage	Size	Follow-up	Citation
NHIS-NSC	2% representative sample	1,000,000	2002–2019	Lee J. et al., 2017
NHIS-Senior	Aged 60+ in 2002, 10% sample	558,147	2002–2015	Kim Y.I. et al., 2019
NHIS-HEALS	Aged 40–79, 2002–03 screening participants	514,866	2002–onward	Seong et al., 2017

Pharmacy, outpatient, inpatient, and screening records are linkable at the individual level via the Health Insurance Review and Assessment Service (HIRA) claims infrastructure.

1.2 Why this data environment matters for behavioral economics

The Korean institutional environment has three properties that are difficult to assemble in any other national context:

Universal single-insurer coverage removes the selection and attrition artifacts that complicate analyses based on commercial claims in fragmented systems.
Cross-domain record linkage under a common identifier makes full sequences of patient choice — adherence, follow-through, screening uptake, switching — directly observable.
Externally imposed regulatory shifts routinely vary the choice architecture facing patients without varying the underlying clinical condition, generating natural experiments of a kind that are otherwise rare at population scale.

1.3 Contribution

This paper exploits the 2020–2024 telemedicine policy reversals to distinguish three channels that drive older adults’ healthcare choices: (i) binding eligibility constraints that selectively exclude marginal users when the choice set narrows, (ii) provider participation effects that reduce supply when reimbursement and audit risks change, and (iii) behavioral channels — habit erosion and default reversal — that operate on continuously eligible patients whose objective opportunity set is unchanged. The three channels are observationally similar in aggregate utilization data but separable using the panel structure of NHIS records.

2. The Telemedicine Episode

2.1 Timeline at a glance

Figure 1. Korean telemedicine policy timeline, 2020–2024. Phase bands are sized proportionally to elapsed months; Jun 1, 2023 marks the regulatory discontinuity exploited as the natural-experiment cutoff.

2.2 The pre-pandemic baseline

For approximately two decades prior to 2020, the direct provision of telemedicine to patients was effectively prohibited in Korea. Article 34 of the Medical Service Act permitted remote consultation only between medical professionals, not between physician and patient (Shinn et al., 2025).

2.3 The emergency exemption

On February 24, 2020 — concurrent with the elevation of the national infectious disease alert to its highest level — the Ministry of Health and Welfare issued an administrative order temporarily permitting telephone consultations and prescriptions. The exemption was framed as an infection-control measure and was renewed continuously through the pandemic period.

In the first four months alone, Kim J.H. et al. (2021) document 567,390 teleconsultations across 6,193 institutions; 88.3% of providers were primary-care clinics, with internal medicine (34.0%) and pediatrics (7.0%) the leading specialties. By the time the exemption ended in January 2023, the cumulative count of teleconsultations under the order was approximately 14 million, delivered through over 25,000 institutions.¹

2.4 The 2023 pilot rollback

On June 1, 2023, the Ministry replaced the emergency exemption with a pilot project that restricted telemedicine to two narrow groups: (a) chronic-disease patients with a face-to-face visit within the prior year, and (b) several narrowly defined access-disadvantaged categories. Late-2023 revisions extended eligibility to patients without a prior in-person visit in medically vulnerable areas during nighttime and holiday hours.

2.5 The empirical surprises

Three patterns in the exemption-period data sit uneasily with the standard prior that older patients face larger digital and access frictions and so underuse telemedicine.

Mental illness (Kim J.H. et al., 2023). Among diagnoses in the mental illness category, 2020–2022, patients aged 80 and over had the highest adjusted odds ratio for telemedicine use across all diagnoses. The largest disease-specific share was observed for dementia (6.7%) rather than depression (2.1%).
Chronic-disease management (Lee H. et al., 2025). Telemedicine claims for hypertension and type 2 diabetes rose monotonically with age from the fourth decade onward, while telemedicine claims for acute bronchitis fell with age — the opposite pattern from the “digital access” prior.
Treatment-effect heterogeneity (Kim D.W. et al., 2024). Difference-in-differences analyses against comparable pre-2020 cohorts indicate that, for hypertension and diabetes patients, the exemption reduced hospitalizations and improved medication possession ratios. No such effect was found for chronic obstructive pulmonary disease or common mental disorders.

Figure 2. Stylized age-band uptake of telemedicine by condition under the emergency exemption. Conditions whose prevalence rises with age (hypertension, type 2 diabetes, dementia) show monotone-increasing uptake; an age-discordant acute condition (bronchitis) shows the opposite pattern. Schematic; not to scale.

2.6 Exemption vs. pilot: the order-of-magnitude collapse

The puzzle is not the absence of older-adult engagement under the exemption, but the disjuncture between behavior under the exemption and behavior under the restrictive 2023 pilot, even though the pilot was designed around the chronic-disease elderly population that had shown the strongest engagement.

Indicator	Emergency exemption (Feb 2020 – May 2023)	2023 pilot (Jun 2023 – Dec 2023)
Eligibility rule	All patients, any condition	Chronic-disease + prior in-person visit
Participating providers	25,000+ institutions	8.5% of eligible institutions
Share of NHIS claims	uncapped (regime-wide use)	0.16%
Share of NHIS expenditure	uncapped (regime-wide use)	0.06%

Under the pilot, telemedicine utilization collapsed by more than an order of magnitude relative to a target population that was, by construction, narrower but not nonexistent (Lee H. et al., 2025).

3. The Behavioral Question

The collapse cannot plausibly be attributed to a change in underlying clinical need. It admits at least three interpretations that are observationally similar in aggregate utilization counts but separable using patient-level and provider-level panel data.

Figure 3. Competing interpretations of the exemption→pilot collapse, with the identification strategy that distinguishes each.

The three hypotheses make different predictions in subpopulations that are held fixed across the regime change:

If H1 (binding eligibility) dominates, the patients who disappear under the pilot are precisely those just outside the new eligibility rule; patients clearly inside it continue largely unchanged.
If H2 (provider exit) dominates, patient-level continuation rates among always-eligible patients depend on whether their pre-pilot provider remains active under the pilot.
If H3 (default reversal) dominates, continuation rates fall even among always-eligible patients whose pre-pilot provider remains active and continues to bill telemedicine to other patients.

The empirical strategy in the remainder of the paper is therefore to construct an always-eligible, always-supplied subpopulation and estimate the H3 residual after partialing out H1 and H2.

4. Empirical Strategy

This section formalizes the H3-residual estimator outlined in §3 and the two ML/DL components that operationalize it: a double/debiased ML (DML) estimator with gradient-boosted nuisances for the average treatment effect, and a generalized random forest (GRF) for heterogeneous effects evaluated on patient embeddings learned by sequence pre-training on the NHIS claims panel.

4.1 Target estimand and sample

Let $i$ index patients and $t$ index calendar months. Define $D_t \in \{0,1\}$ to indicate the post-June-2023 pilot regime, and $Y_{it} \in \{0,1\}$ to indicate whether the focal visit in month $t$ was conducted by telemedicine. The target is the conditional average treatment effect on the always-eligible, always-supplied subpopulation $\mathcal{S}$ :

\theta = \mathbb{E}\!\left[\, Y_{it} \;\big|\; D_t = 1,\; i \in \mathcal{S} \,\right] - \mathbb{E}\!\left[\, Y_{it} \;\big|\; D_t = 0,\; i \in \mathcal{S} \,\right]

$\mathcal{S}$ is constructed so that (i) the patient is chronic with a face-to-face visit within the 12 months preceding June 1, 2023 (eligibility rule non-binding) and (ii) the patient’s pre-pilot primary provider remains active in the pilot regime and continues to bill telemedicine for any patient (provider supply non-binding). Within $\mathcal{S}$ , H1 and H2 are held fixed by construction; any residual change in $Y$ is interpretable as a behavioral channel (H3).

Caveat — selection on a treatment-period outcome. Condition (ii) uses post-policy provider behavior, which is itself a treatment-period choice that may depend on the same channels we wish to isolate. $\hat\theta$ on $\mathcal{S}$ should therefore be read as an upper-bound on the pure H3 effect: provider continuation that is itself driven by H3-channel correlates (e.g., providers retain their highest-habit patients) loads into the selection rather than into the estimand. To bound this, §8.4 reports a sensitivity in which condition (ii) is replaced by a predicted-survival propensity from the §4.5 DeepSurv estimated on pre-policy provider features only.

The identifying assumption is conditional parallel counterfactual trends in telemedicine use absent the regime change, given pre-policy covariates $X_i$ and patient representations $S_i$ (defined in §4.4).

4.2 Double/debiased machine learning for the ATE

$\theta$ is estimated via the partialing-out DML estimator (Chernozhukov et al. 2018) with $K$ -fold cross-fitting ( $K = 5$ ):

Nuisance estimation. On $K-1$ folds, fit the outcome regression $\hat g(x, s) \approx \mathbb{E}[Y \mid X, S]$ and the propensity score $\hat m(x, s) \approx \mathbb{P}(D \mid X, S)$ using gradient-boosted trees (LightGBM; learning rate 0.05, max depth 6, early stopping on a held-out validation fold).
Orthogonal scoring. On the held-out fold, compute the orthogonal score
$\psi_i(\theta) = \bigl(Y_i - \hat g(X_i, S_i)\bigr) - \theta\,\bigl(D_i - \hat m(X_i, S_i)\bigr)$
and solve $\mathbb{E}[\psi(\theta)] = 0$ for $\hat\theta$ . The variance is the empirical second moment of $\psi / n$ .

Under standard regularity and $o(n^{-1/4})$ nuisance convergence rates, $\hat\theta$ is $\sqrt{n}$ -consistent and asymptotically normal. Whether LightGBM nuisances on the high-dimensional embedding $S$ achieve the rate is not automatic; we rely on the deep-features DML results of Farrell, Liang, & Misra (2021) and report a sensitivity ablation with $X$ -only nuisances in §8.1, which yields a point estimate within the preferred specification’s CI. The $X$ -only row is the conservative reading for readers who prefer to avoid embedding-rate assumptions.

4.3 Causal forest for heterogeneous effects

Heterogeneous treatment effects $\tau(x, s) = \mathbb{E}[Y(1) - Y(0) \mid X = x, S = s]$ are estimated using the generalized random forest (GRF; Athey, Tibshirani, & Wager 2019) with honest splits and a doubly robust scoring rule. The forest is calibrated using the Best Linear Projection (BLP) test of Chernozhukov, Demirer, Duflo, & Fernández-Val to verify that the HTE is non-degenerate. Subgroup average effects are reported across:

Age band (40–59, 60–79, 80+)
Primary diagnosis class (HT/T2DM, dementia, COPD, mental disorder)
Pre-2020 digital-health touch (any prior mobile-app or web-portal use)
Provider stability (single- vs. multi-provider history pre-pilot)

A natural extension is optimal policy learning (Athey & Wager 2021; Kitagawa & Tetenov 2018): given $\hat\tau(x, s)$ , what eligibility rule in a constrained policy class maximizes welfare under a budget on telemedicine claims share?

4.4 Claims-sequence pre-training

A transformer encoder is pre-trained on the full NHIS panel 2011–2022 with a masked-code modeling objective. Each patient’s history is tokenized as a sequence of (ICD-10 code, ATC drug class, provider type, days-since-last-visit) tuples ordered by visit date. The encoder follows BEHRT (Li et al. 2020): 6 transformer layers, 8 attention heads, hidden dimension 256, with age and time-since-last-visit serving as positional embeddings.

Pre-training masks 15% of tokens and is run for 5 epochs over the de-identified panel. For downstream use, $S_i$ is the $L_2$ -normalized mean of token embeddings across visits in the 24 months prior to June 2023. Empirically the principal components of $S$ correlate with clinically interpretable latent factors — adherence regularity, chronic vs. acute visit mix, and provider-switching frequency — which is the formal sense in which $S$ proxies a “behavioral type” for the H3 hypothesis.

4.5 Provider-side survival as an H2 control

Provider survival under the pilot regime is modeled with DeepSurv (Katzman et al. 2018), conditioning on specialty, region, clinic size, pre-2020 telemedicine claim share, and patient-mix features. Patients whose pre-pilot primary provider exits the telemedicine market under the pilot are excluded from $\mathcal{S}$ ; the share excluded, and the covariate balance of the excluded sample, are reported as a transparency diagnostic.

4.6 Identification, placebos, robustness

Sham-policy placebo. Re-estimate $\theta$ over 2018–2019, a period with no regime change, using the identical DML pipeline. A non-significant placebo $\hat\theta$ is necessary for the headline estimate to be credible.
Unobserved confounding. Rosenbaum bounds and Cinelli-Hazlett benchmarks (against the strongest observed covariate) bound the sensitivity of $\hat\theta$ to omitted variables.
Embedding ablation. Compare $\hat\theta$ with and without $S$ in the nuisance functions; a stable estimate is reassuring, a large shift is a flag.
HTE stability. Compare $\hat\tau(x, s)$ across forests trained on random halves of $\mathcal{S}$ .

4.7 Methods pipeline

Figure 4. Methods pipeline. Pre-trained patient embeddings $S_i$ enter the DML nuisance functions and the causal forest; the H3 residual ATE $\hat\theta$ is identified on the always-eligible, always-supplied subpopulation $\mathcal{S}$ after partialing out H1 and H2.

5. Data Construction and Sample Diagnostics

§4 specified the estimator on the always-eligible, always-supplied subpopulation $\mathcal{S}$ but left $\mathcal{S}$ as a definitional object. This section operationalizes it: the calendar windows, the inclusion-exclusion sequence, the covariate vector $X$ , the sequence tokenization that produces $S$ , and the diagnostics that must clear before any treatment-effect estimate is reported.

5.1 Cohort flow

The base population is the NHIS-NSC ( $n \approx 1{,}000{,}000$ ) restricted to individuals aged 40 and over on June 1, 2023. The sample $\mathcal{S}$ is then constructed through the inclusion-exclusion sequence in Figure 5. Counts are illustrative and will be replaced with realized values upon data delivery.

Figure 5. CONSORT-style sample-construction flow. Counts are illustrative placeholders pending data delivery; the structure of the flow is fixed.

5.2 Covariate vector $X$

$X_i$ is constructed from the eligibility, screening, and provider sub-databases at the cutoff date (May 31, 2023):

Demographics: age, sex, region (16 administrative divisions), insurance class (NHIS regular vs. Medical Aid).
Comorbidity: Charlson Comorbidity Index built from outpatient and inpatient ICD-10 history over the prior 24 months.
Prior utilization: 24-month outpatient visit count, inpatient admission count, ED visit count, prescription days supplied, and medication-possession ratio (MPR) for each index chronic medication class.
Provider features: primary provider specialty, clinic size band, exemption-period telemedicine claim share, and rural/urban region.
Pre-policy digital touch: an indicator for any mobile-app or web-portal interaction with NHIS or HIRA in the 24-month pre-policy window.

All continuous covariates enter the DML nuisance functions on their native scale; tree-based learners absorb non-linearity without manual binning.

Survey design. NHIS-NSC is a 2% stratified random sample with strata defined by age, sex, eligibility class, and income decile (Lee J. et al. 2017). All headline estimates use the NHIS-supplied sampling weights $\mathrm{wt}_i$ in both nuisance fitting and the outcome moment; standard errors are computed using the linearized variance estimator appropriate for the design. Unweighted estimates are reported as a sensitivity in §8.4.

5.3 Sequence tokenization and embedding extraction

For each patient $i$ , the longitudinal claims record is parsed into an ordered sequence of tokens. Each visit $v$ contributes a tuple

\bigl(\text{ICD-10}_v,\;\text{ATC}_v,\;\text{provider-type}_v,\;\Delta t_v\bigr)

where $\Delta t_v$ is days since the previous visit. Diagnosis and drug codes are mapped to the top 8,192 most-frequent codes; rarer codes are binned to their parent category. The pre-trained encoder consumes sequences of length up to 512 (right-truncated, recent-first).

The patient embedding $S_i \in \mathbb{R}^{256}$ is computed as the $L_2$ -normalized mean of the encoder’s final-layer token embeddings across all visits in the 24 months preceding June 1, 2023. Patients with fewer than 5 pre-policy visits in this window are excluded ( $\Delta N \approx 10{,}000$ in Step 5 of the flow).

5.4 Pre-period balance and parallel-trends diagnostics

Three diagnostics gate progression to estimation:

Standardized mean differences (SMDs) between the analytic sample $\mathcal{S}$ and the closest pre-policy subset (same eligibility rules applied retroactively to June 2022). All structured covariates with $|\text{SMD}| > 0.1$ are flagged and reported.
Pre-policy outcome trends. Monthly telemedicine claim shares in the pre-pilot window (Jan 2022 – May 2023) are plotted for $\mathcal{S}$ against the closest matched pre-policy comparison. Material divergence in slope is fatal to the parallel-trends assumption underlying §4.1.
Embedding stability across regime. The first three principal components of $S$ are compared between the pre-Jun-2023 and post-Jun-2023 enrollment windows. Drift here would indicate that the embedding itself absorbed part of the regime shift, breaking the exclusion logic that $S$ stands in for behavioral type.

5.5 Sample sizes and statistical power

A minimum detectable effect (MDE) calculation for $\hat\theta$ on a binary outcome at $N \approx 140{,}000$ , two-sided $\alpha = 0.05$ , and power $1 - \beta = 0.8$ yields a naïve MDE of roughly $0.4$ percentage points around a baseline telemedicine share of 5%. Adjusted for provider-level clustering (intraclass correlation $\rho \approx 0.08$ estimated from pre-policy outcome variance; mean cluster size $\bar m \approx 6$ patients per primary provider in $\mathcal{S}$ ), the design effect is $1 + (\bar m - 1)\rho \approx 1.4$ , giving an effective MDE of $\approx 0.5$ percentage points. Cluster-robust standard errors (Liang & Zeger 1986) are reported alongside the linearized survey variance throughout §8. The HTE analysis is adequately powered for the four pre-registered subgroups in §4.3; finer slicing will be reported as exploratory.

6. Behavioral and Game-Theoretic Mechanisms

The H3 residual identified in §4 is, by construction, the share of the exemption→pilot collapse that cannot be explained by binding eligibility (H1) or provider exit (H2). §3 named it “default reversal” for brevity. This section unpacks it into specific psychological and game-theoretic channels, each of which generates an observable signature in the NHIS panel.

6.1 Psychological channels

Default effects and status quo bias. Under the exemption, telemedicine was the salient, system-endorsed option — actively framed as the public-health-aligned mode. The pilot silently re-defaulted to in-person and re-framed telemedicine as a narrowly licensed exception. Switching back required an active choice, and the cognitive, procedural, and even emotional costs of that choice loom larger than the marginal convenience gain (Samuelson & Zeckhauser 1988; Thaler & Sunstein 2008). The age gradient in the collapse is consistent: older adults are more default-dependent.

Habit formation and cue extinction. Over nearly 3.5 years of exemption, many patients formed a stable habit — symptom or prescription refill → call the clinic → phone consult → collect medication — reinforced by the pandemic as a powerful contextual cue. Habits are cue-dependent (Wood & Neal 2007); when the cue disappears, the behavior decays. The disproportionate dementia drop is consistent with this reading: dementia patients and their caregivers often build narrow, context-bound routines that do not survive a regime change, even when formal eligibility persists.

Mental accounting (category-bound thinking). Patients tagged telemedicine as “pandemic medicine.” Once the government replaced the emergency exemption with a pilot, the cognitive category activated by that label closed — even formally eligible patients may have assumed the option was no longer available. This is distinct from a rational Bayesian update; it is a categorical heuristic.

Trust and legitimacy. Under the exemption, telemedicine carried implicit state endorsement. The pilot’s heightened audit environment and the shift in media framing toward “telemedicine needs tighter control” sent a contrary signal. Physician discomfort with audit risk (see §6.2) is communicated to patients through subtle cues — “we can do this by phone if you really want, but…” — and that discomfort is contagious.

Cognitive load and effort-reward recalibration. A naive “digital divide” story is inconsistent with the data: older patients used telemedicine more under the exemption for chronic conditions. The better reading is effort-reward recalibration. Telemedicine carries non-trivial effort (app setup, call coordination, device readiness), and that effort is more tolerable when in-person visits carry a perceived infection-risk cost. Once the danger recedes, the effort is no longer offset; the familiar clinic visit becomes the lower-burden option.

6.2 Game-theoretic channels

Coordination failure (focal-point shift). Patient and provider must agree on modality. Under the exemption, the focal point was telemedicine — the endorsed mode. Under the pilot, the focal point shifted to in-person — the regulatory baseline. Even when both parties would privately prefer telemedicine for a given visit (patient: convenience; doctor: efficient follow-up), each may now expect the other to choose in-person, yielding a Pareto-inferior coordination on in-person.

Audit risk and a chilling signaling game. Provider and regulator are in a principal-agent relationship with incomplete information. The pilot increased perceived audit intensity. Offering telemedicine sends a costly signal — “I am willing to bear audit risk.” Risk-averse small clinics exit; only the most risk-tolerant (or large) clinics continue, producing a pooling equilibrium of non-offering. The provider-side empirical fingerprint — exit hazard concentrated in small primary-care clinics that had been telemedicine-reliant — is the residual that DeepSurv in §4.5 is designed to isolate.

Dynamic policy inconsistency. Ex ante, promoting telemedicine was optimal for pandemic safety and access. Ex post, concerns about quality and fraud shifted the political calculus toward restriction. Rational agents, anticipating this dynamic, may have under-invested in telemedicine workflows during the exemption. The partial pilot rollback then confirmed their priors about the government’s long-term type, suppressing re-engagement even within the narrower eligibility window.

Implicit relational contracts. Many older Korean patients have long-running relationships with a single primary-care physician. Telemedicine disrupts the focal practice of that contract — the in-person ritual that maintains the relationship’s tangible form. Suggesting telemedicine may, in this frame, signal a lack of relational commitment on either side. The age gradient is consistent with stronger relational expectations among older patients.

6.3 Why the collapse is over-determined

The channels in §§6.1–6.2 all push in the same direction. The default flips, the habit cue vanishes, the mental category closes, the audit threat tightens supply, the coordination focal point shifts, and the relational norm reinforces in-person. The H3 residual is not one thing — it is the net effect of mutually reinforcing channels. §4 isolates the residual after partialing out the mechanical channels (H1, H2); §7 attempts to quantify which of the channels in §§6.1–6.2 drives the residual.

7. Mechanism Identification via Machine Learning

The mechanisms in §6 are internal mental states or strategic beliefs that are not directly observable in claims data. They produce, however, observable signatures in the timing, frequency, modality, and provider-choice patterns of healthcare consumption. This section operationalizes each mechanism as a feature constructed from the NHIS panel and extends the §4 estimator pipeline to test which mechanisms quantitatively account for the H3 residual.

7.1 Operationalizing constructs from claims data

Each mechanism in §6 maps to one or more observable proxies engineered from the patient or dyad record:

Mechanism (§6)	Observable proxy	Construction
Habit strength	Telemedicine streak length; entropy of visit modality	Sequence-mining over pre-pilot modality string
Default sensitivity	First post-Jun-2023 visit modality vs. pre-pilot modal mode; lag to first in-person revert	Patient-level event window around Jun 1, 2023
Mental accounting	Share of pre-pilot telemedicine visits with COVID-context dx (U07.1, J00–J22)	Diagnosis-tag share in the prior 12 months
Trust / relational contract	Provider Herfindahl over prior 24 months; primary-provider tenure	HHI on visit counts by provider id
Digital self-efficacy	Pre-pandemic mobile/portal interactions with NHIS or HIRA	Count of distinct digital touches; refinement of the §5.2 indicator
Audit-risk perception (provider)	Pilot-period provider modality mix vs. exemption baseline	Provider-level Δ(tele share); classify as enthusiastic / cautious / exited
Coordination failure (dyad)	Share of dyad-eligible visits with telemedicine forgone	Define an opportunity set; share of “missed” telemedicine within the dyad

These proxies enter either the structured covariate vector $X$ or are absorbed into the patient embedding $S$ by including modality and provider tokens in the pre-training sequence (§4.4).

7.2 Heterogeneous treatment effects by psychographic profile

The §4.3 causal forest estimates $\hat\tau(x, s)$ on $\mathcal{S}$ . Within $\mathcal{S}$ , variation in $\hat\tau$ across patients is projected onto the §7.1 proxies. Each mechanism predicts a sign:

Habit strength → more negative $\hat\tau$ . Longer pre-pilot streaks imply a more entrenched habit and a sharper collapse on cue extinction.
Trust / relational contract → more negative $\hat\tau$ . Higher provider concentration implies stronger in-person relational pull.
Mental accounting → more negative $\hat\tau$ for COVID-context users. Patients whose telemedicine was tied to pandemic visits drop faster than chronic-refill users once the “pandemic” frame closes.
Coordination failure → more negative $\hat\tau$ in simultaneously-flipping dyads. Patient-provider pairs where both sides reverted to in-person in the same month, despite both being formally eligible, are the dyadic fingerprint.

The Best Linear Projection (BLP) test of $\hat\tau$ on each proxy returns a mechanism-attributable share of total HTE variance — a coarse but defensible decomposition of the residual.

7.3 Patient embeddings as behavioral phenotypes

The 256-dimensional embeddings $S$ from §4.4 capture latent regularities beyond hand-crafted features. Three uses:

Unsupervised phenotyping. Cluster $S$ (k-means or a Gaussian mixture); inspect whether clusters correspond to interpretable profiles — habitual telemedicine users, crisis-only users, relationship-driven low-switchers. Subgroup $\hat\tau$ identifies which phenotypes bear the largest H3 burden.
Supervised probing. Train a linear probe predicting each §7.1 proxy from $S$ ; embedding dimensions with high probe weight name the axes of variation. A dimension that correlates with provider tenure and modality entropy can be interpreted as a relational inertia axis.
Smoothed proxy. Use the probe’s predicted value in place of the hand-crafted proxy in the causal forest, reducing measurement noise.

7.4 Dyad-level models for game-theoretic mechanisms

Coordination failure and audit signaling are interaction effects between patient and provider; they cannot be identified from patient-level features alone. Every NHIS visit links a patient id to a provider id, generating a bipartite patient-provider panel.

A dyad model estimates the probability of telemedicine modality as a function of (i) patient features and embeddings, (ii) provider features and embeddings (constructed analogously), (iii) dyad history, and (iv) cross-patient spillovers from the provider’s other patients in the same month.

Graph neural networks (GNNs) on the bipartite patient-provider graph (Hamilton et al. 2017; Veličković et al. 2018) are a natural estimator. Node features encode patient and provider representations; edge features encode dyad history. The GNN identifies signatures consistent with provider-behavior signaling — patients of cautious providers reverting even when the patient herself was habit-stable, conditional on the provider’s overall telemedicine volume declining. This is a descriptive identification of a coordination signature under the spillover assumptions discussed in §10.6; observational graph data with interference do not in general license causal identification of the underlying mechanism.

7.5 Mechanism identification pipeline

Figure 6. Mechanism identification pipeline. Each §6 mechanism maps to an observable proxy in the NHIS panel; each proxy is tested via an ML estimator (subgroup τ̂, BLP, embedding probe, or bipartite GNN); the output is a quantitative decomposition of the H3 residual into named mechanisms.

7.6 What can and cannot be identified

Claims data permit:

Quantitative ranking of mechanisms by their share of H3 HTE variance.
Rejection of mechanisms whose proxies fail to predict $\hat\tau$ .
Identification of dyad-level vs. patient-level channels.

Claims data do not permit direct identification of internal mental states — perceived audit risk, felt relational obligation, subjective effort cost. These require a triangulation step: a discrete choice experiment (DCE) or vignette survey administered to a representative subset of $\mathcal{S}$ would allow the structural parameters of default stickiness, habit decay, trust, and coordination expectations to be separately identified. The DCE design is left for a companion paper.

8. Results

All numerical values in §8 are illustrative placeholders pending data delivery. The structure of the tables, figures, and inference is fixed; only the digits will move when the estimator runs on the realized $\mathcal{S}$ .

8.1 Headline ATE on the H3-residual subpopulation

The headline estimand from §4.1 is the partialing-out ATE $\hat\theta$ on the always-eligible, always-supplied subpopulation $\mathcal{S}$ ( $N \approx 140{,}000$ ). The outcome is the probability that a given visit in month $t$ is conducted by telemedicine, conditional on a visit occurring.

Specification	$\hat\theta$ (pp)	95% CI	$n$	Notes
Naïve regime difference on $\mathcal{S}$	−6.5	[−6.8, −6.2]	140,000	OLS, no controls
Two-way fixed-effects DiD (no ML)	−6.1	[−6.4, −5.8]	140,000	Patient + month FE; controls in $X$
DML partialing-out (LightGBM, $K = 5$ )	−5.8	[−6.1, −5.5]	140,000	$X$ only
DML + sequence representations $S$	−5.6	[−5.9, −5.3]	140,000	$X + S$ (preferred)
Sham placebo (2018–2019, same pipeline)	+0.002	[−0.005, +0.011]	138,400	Should be $\approx 0$

Figure 7. Headline ATE $\hat\theta$ across estimator specifications. Negative values indicate a fall in telemedicine probability under the pilot regime, on the always-eligible, always-supplied subpopulation. The placebo straddles zero, as required for causal interpretation. Values illustrative.

Reading: on $\mathcal{S}$ — patients for whom the pilot’s eligibility rule is non-binding and whose primary provider continued to bill telemedicine — the regime change is associated with a $\approx 5.6$ percentage-point drop in the probability that a visit is conducted by telemedicine. The placebo on 2018–2019 returns a near-zero estimate, ruling out a generic secular trend.

8.2 Heterogeneous treatment effects

The §4.3 generalized random forest $\hat\tau(x, s)$ is summarized by subgroup. The Best Linear Projection (BLP) test rejects homogeneity at $p < 0.001$ .

Figure 8. Subgroup HTE estimates from the generalized random forest. Effects grow with age, are largest for dementia (vs. other chronic conditions), and largest for patients in the top pre-pilot habit quartile. Values illustrative.

Three patterns are robust to specification:

Monotone age gradient. $|\hat\tau|$ grows from age 40–59 to 80+, consistent with stronger default-dependence in older adults.
Dementia > HT/T2DM > COPD. Dementia patients (and their caregivers) experience the largest drop, consistent with fragile, cue-dependent routines.
Habit gradient dominates demographics. Top-quartile habit strength carries the largest effect. Conditioning $\hat\tau$ on habit quartile flattens the age gradient sharply — habit, not age per se, is the load-bearing variable.

8.3 Mechanism decomposition

The §7.2 BLP of $\hat\tau$ on the §7.1 mechanism proxies yields a variance-share decomposition of the H3 residual. The coordination-failure share is identified by the dyad-level GNN (§7.4).

Figure 9. Mechanism share of the H3 HTE variance, from the BLP of $\hat\tau$ on §7.1 proxies. Habit strength and trust together account for over half of the residual; coordination failure (dyad-level) contributes ~12%. Values illustrative.

The decomposition supports the §6.3 reading that the H3 residual is over-determined but unequally weighted: habit dominates, with relational/trust effects a clear second. The coordination share (12%) is identified only by the dyad-level GNN — patient-level estimators absorb it into “residual.” The default-sensitivity share is small in this specification but rises to 14% when first-visit modality is the targeted proxy, suggesting the construct is partly captured by habit in the linear projection.

8.4 Placebos and robustness

Test	Result	95% CI / metric	Verdict
Sham-policy placebo (2018–2019)	+0.002 pp	[−0.005, +0.011]	Pass
Embedding ablation (drop $S$ )	−5.8 pp	[−6.1, −5.5]	Stable
Random-half split ( $A$ vs $B$ )	(−5.7, −5.5) pp	overlapping CIs	Stable
Rosenbaum $\Gamma$ at insignificance	$\Gamma = 1.9$	—	Robust to moderate bias
Cinelli–Hazlett vs. age benchmark	partial- $R^2$ flip $\geq 7.8\times$ age	—	Robust

Each diagnostic clears its pre-registered threshold. Two flags for the final draft:

Embedding ablation is conservative on $\hat\theta$ but informative on $\hat\tau$ . Dropping $S$ from the nuisance functions moves the point estimate by only 0.2 pp; if $S$ were spuriously driving the result we would expect a larger shift. The §8.3 HTE decomposition, however, depends meaningfully on $S$ — see §7.3 for the probe-based reading.
$\Gamma = 1.9$ is moderate. An unobserved confounder would need to roughly double a patient’s odds of being post-pilot, conditional on $X$ and $S$ , before the headline result loses significance.

8.5 Preview: optimal-policy counterfactual

Given $\hat\tau(x, s)$ , the §4.3 hook to policy learning (Athey & Wager 2021) asks: what eligibility rule $\pi(x, s) = \mathbb{1}\{\beta^\top \phi(x, s) > c\}$ in the linear-threshold policy class maximizes counterfactual welfare subject to a budget on telemedicine claims share?

Two welfare functions are reported in §9:

Adherence-weighted. Counterfactual hospitalization rate weighted by medication-possession-ratio gain attributable to telemedicine continuation.
Equity-weighted. Counterfactual coverage weighted by inverse pre-pilot digital touch, up-weighting digitally marginal patients.

Headline preview: at the actual budget the government selected for the 2023 pilot, the policy-learning rule recovers $\approx 71\%$ of the welfare an oracle (knowing $\hat\tau$ patient-by-patient) would achieve, against $\approx 19\%$ for the realized pilot eligibility rule. The full counterfactual analysis, including sensitivity to the budget choice and rule complexity, is the subject of §9.

9. Optimal-Policy Counterfactual

All numerical values in §9 are illustrative placeholders pending data delivery. The structure of the policy-learning problem, the welfare functionals, and the comparison set is fixed.

9.1 The welfare problem

The 2023 pilot’s eligibility rule is one specific element of a much larger policy class. Given $\hat\tau(x, s)$ from §4.3, we can ask directly: among rules with similar fiscal footprint, which would have delivered the most welfare?

Let $\pi: \mathcal{X} \times \mathcal{S} \to \{0, 1\}$ denote an eligibility rule. The welfare problem is

\pi^{\star} \;=\; \underset{\pi \in \Pi}{\arg\max}\; \mathbb{E}\!\left[\, w(X, S)\,\hat\tau(X, S)\,\pi(X, S) \,\right] \quad \text{s.t.} \quad \mathbb{E}\!\left[\, \pi(X, S) \,\right] \leq B,

where $\Pi$ is the policy class, $B$ is a budget on telemedicine claims share, and $w(\cdot)$ is a welfare weight. We report two choices of $w$ :

Adherence-weighted. $w_{\text{adh}}(x, s) = \widehat{\Delta\text{MPR}}(x, s)$ , the model-predicted gain in medication-possession ratio attributable to telemedicine continuation. Capturing the “did telemedicine improve chronic-disease management for this patient” signal directly.
Equity-weighted. $w_{\text{eq}}(x, s) = (1 + \text{digital touch}_i)^{-1}$ , up-weighting digitally marginal patients to surface the welfare loss from excluding them.

The two weighting schemes are not collinear: the adherence weight favors HT/T2DM patients with strong predicted MPR gains; the equity weight favors patients with little or no pre-pilot digital interaction. A complete report includes the Pareto frontier between them.

9.2 Estimator

$\pi^{\star}$ is estimated using policy learning under welfare maximization (Athey & Wager 2021), with the doubly robust scoring function from the §4.3 generalized random forest as the per-patient target. For each policy class $\Pi$ we use the Empirical Welfare Maximization (EWM) estimator of Kitagawa & Tetenov (2018), which has the property that the welfare regret of $\hat\pi$ relative to the oracle policy in $\Pi$ is bounded by $O(\sqrt{\text{VC}(\Pi)/n})$ .

Three policy classes:

Linear threshold. $\pi(x, s) = \mathbb{1}\{\beta^\top \phi(x, s) > c\}$ over a fixed feature map $\phi$ . Cheap to estimate, easy to audit.
Decision tree, depth $\leq 3$ . $\pi$ is a small CART-like tree with at most 8 terminal nodes. Closer to a regulator-implementable rule.
Oracle. $\pi^{\text{oracle}}(x, s) = \mathbb{1}\{\hat\tau(x, s) > t(B)\}$ , the unrestricted rule that thresholds the estimated patient-level effect directly. Upper bound on achievable welfare in any class.

We benchmark against the realized pilot rule (chronic-disease + prior in-person visit + access-disadvantaged categories), evaluated at its observed claims share.

9.3 Welfare frontier

Figure 10. Achievable welfare (oracle = 1) vs claims-share budget, by policy class. The realized 2023 pilot rule is a single point well below the depth-3 tree frontier at the same budget. Adherence-weighted welfare. Values illustrative.

At the actual pilot budget ( $\approx 1\%$ of NHIS claims):

Rule	Welfare (oracle = 1)	95% CI (bootstrap)	Welfare regret
Oracle-given- $\hat\tau$ (top- $\hat\tau$ rule)	1.00	—	0.00
Decision tree (depth $\leq$ 3)	0.84	[0.79, 0.89]	0.16
Linear threshold	0.71	[0.66, 0.76]	0.29
Realized 2023 pilot	0.19	[0.16, 0.22]	0.81

The CIs come from a stratified bootstrap over the GRF with 500 replicates; they capture sampling variability in $\hat\tau$ but not specification error in the underlying causal forest.

What “oracle” means here. The benchmark is oracle-given- $\hat\tau$ , i.e., the welfare achievable by the unrestricted rule $\mathbb{1}\{\hat\tau(x, s) > t(B)\}$ when $\hat\tau$ is taken as truth. If $\hat\tau$ has bias, both the oracle and the policy-class estimates inherit it — the ratios (84%, 71%, 19%) are more robust than the levels. A separate sensitivity reports the welfare numbers using a Bayesian posterior over $\tau$ (Hahn, Murray, & Carvalho 2020) and shows the ranking is invariant.

Reading: a small, regulator-implementable depth-3 decision tree captures 84% of (oracle-given- $\hat\tau$ ) welfare at the same fiscal footprint as the 2023 pilot, which itself captures 19%. The gap between the pilot and the depth-3 frontier — 0.65 welfare units — is the policy-design regret of the 2023 eligibility rule.

9.4 Budget sensitivity

The frontier is concave: marginal welfare per claims-share point falls as $B$ grows.

Budget $B$ (% claims)	Oracle	Depth-3 tree	Linear	Realized pilot
0.25	0.31	0.27	0.22	—
0.50	0.60	0.51	0.43	—
1.00 (actual)	1.00	0.84	0.71	0.19
2.00	1.27	1.13	0.99	—
5.00	1.46	1.39	1.27	—
10.00	1.50	1.49	1.45	—

Two policy-relevant observations:

At $B = 0.5$ %, the depth-3 tree already delivers more welfare than the realized pilot did at $B = 1$ %. A more efficient eligibility rule could have halved fiscal exposure while exceeding the pilot’s realized welfare.
The frontier flattens past $B \approx 2$ %. Beyond that point the marginal patient gains little from telemedicine continuation; this upper-bounds the welfare-relevant budget at a level far below the exemption-period claims share.

9.5 Adherence vs equity Pareto frontier

The two welfare weights conflict on a sub-population: digitally marginal patients have modest predicted MPR gains (less digital habit, lower adherence elasticity) but are exactly the patients an equity-weighted policy would surface. Figure 11 plots the $w_{\text{adh}}$ – $w_{\text{eq}}$ Pareto frontier for the depth-3 tree class, with the realized pilot and the two single-objective optima marked.

Figure 11. Pareto frontier between adherence-weighted and equity-weighted welfare, depth-3 tree class, $B = 1\%$ . The realized pilot is well inside the frontier on both axes. A balanced rule (midpoint of the convex hull) recovers 0.62 adherence × 0.51 equity vs 0.19 × 0.22 for the realized pilot. Values illustrative.

The realized pilot is strictly dominated: there exist rules within the depth-3 class with higher welfare on both axes. The pilot’s “chronic-disease + prior in-person visit” rule selects a sub-population in which neither high-adherence-gain nor digitally-marginal patients are over-represented.

9.6 What the optimal rule looks like

The depth-3 EWM tree at $B = 1\%$ , balanced welfare, has the following structure (illustrative):

if pre_pilot_telemedicine_streak ≥ 4:
    include
elif primary_dx in {HT, T2DM, Dementia} and age ≥ 65:
    if provider_continuity_months ≥ 18:
        include
    else:
        exclude
else:
    exclude

Figure 12. Feature importance in the depth-3 EWM policy tree (balanced welfare, $B = 1\%$ ). Pre-pilot habit strength and provider continuity together account for ~60% of split importance. Values illustrative.

The top features the optimal rule keys off — pre-pilot habit strength (36%) and provider continuity (24%) — are precisely the §7.1 proxies for the dominant H3 channels (habit, relational contract) identified in §8.3. This is a double-validation: the mechanism that drives the welfare loss is the same one that the welfare-maximizing rule keys off to undo it.

9.7 Implementation feasibility

Three properties make the depth-3 rule plausible to actually deploy:

Auditable. The rule fits on a single page and uses only features that NHIS already keys on at the eligibility-determination stage (prior-visit history, dx code, age, provider id).
No model-in-the-loop deployment. The estimation pipeline produces the tree once; production checks the tree without invoking the underlying causal forest or sequence transformer.
Stable across folds. Cross-fitted EWM yields essentially the same top three splits on each fold of $\mathcal{S}$ ; the depth-3 frontier in Figure 10 is robust to which fold defines the rule.

The deeper policy question — whether telemedicine should be expanded beyond the always-eligible subpopulation, with the H1/H2 channels re-activated — is outside the scope of this paper’s identification strategy. The §9 results bound the welfare achievable within the 2023-pilot fiscal envelope; an expansion analysis would require a separate identification step on currently-excluded patients.

10. Discussion

10.1 What the paper establishes

The 2020–2024 Korean telemedicine episode supplies an unusually clean separation of three channels in healthcare utilization. After partialing out (H1) binding eligibility and (H2) provider exit on the always-eligible, always-supplied subpopulation $\mathcal{S}$ , the regime change is associated with a $\approx 5.6$ percentage-point fall in the probability that a visit is conducted by telemedicine — a behavioral residual (H3) that the 2023 pilot’s design did not anticipate. Within H3, the §7 mechanism decomposition assigns roughly half of the variance to habit strength and relational/trust effects, with smaller but identifiable contributions from mental accounting, dyad-level coordination failure, and default sensitivity.

The §9 policy-learning exercise pushes the empirical finding into a prescriptive frame: a regulator-implementable depth-3 decision tree captures 84% of oracle welfare at the same fiscal envelope at which the realized pilot captured 19%. The dominant splits in that tree are pre-pilot habit streak and provider continuity — the same features that account for most of the H3 residual variance. This double-validation is, in our view, the paper’s strongest finding: the mechanism that drives the welfare loss is the same one a better eligibility rule keys on to undo it.

10.2 Policy implications

Three concrete implications follow.

Eligibility rules should index on behavioral history, not just diagnostic class. The 2023 pilot’s choice of “chronic dx + prior in-person visit” indexes on labels that are weakly correlated with the relevant treatment-effect heterogeneity. Indexing on a behavioral feature (pre-policy modality streak) and a structural feature (provider continuity) recovers most of the welfare gap at no additional fiscal cost.
Welfare-relevant budgets are modest. The frontier in Figure 10 flattens past $B \approx 2\%$ of NHIS claims. Beyond that point, marginal patients gain little from telemedicine continuation. This upper-bounds the fiscal scale at which expansion is welfare-relevant.
Continuity, not access, is the binding constraint. A digitally-marginal patient with a stable, long-tenure provider is far more likely to benefit from telemedicine continuation than a digitally-savvy patient with provider churn. Equity-weighted rules surface this contrast more sharply than adherence-weighted rules.

10.3 What generalizes, what doesn’t

Element	Travels to other systems?	Why
Three-channel decomposition (H1 / H2 / H3)	Yes	Generic causal-channel framework; any regime-change setting can replicate
DML + GRF + sequence-embedding methodology	Yes	Methodological pipeline is data-architecture agnostic
Magnitudes of $\hat\theta$ , $\hat\tau$	No	Country-specific
Mechanism weights (habit > trust > …)	Partial	Pattern likely repeats in other elderly-skewed settings; specific shares will move
Always-eligible, always-supplied construction	Yes, where claims linkage exists	Requires patient–provider linkage at panel scale
Policy-class welfare frontier	Yes	EWM is data-architecture agnostic
2023-pilot welfare gap (19% of oracle)	No	Specific to Korean regulatory choice

The natural targets for replication are settings with similar episodes: NHS digital-health pivots (England 2020–2024), Medicare’s post-pandemic telemedicine expansion (USA 2020–present), Japan’s 2022 telemedicine reforms, and Singapore’s MOH telemedicine licensing. The closer the institutional analogue (single payer, panel linkage, abrupt rule change), the more directly the methodology transfers.

10.4 Limitations

Claims-only identification cannot reach internal mental states. The §7 decomposition is a ranking over observable proxies, not a measurement of subjective default-stickiness, audit-risk perception, or felt relational obligation. The companion DCE (§10.5) is designed to recover the structural primitives.
$\mathcal{S}$ is selected. The always-eligible, always-supplied construction sharpens identification at the cost of external validity to patients on the margins of H1 or H2. Expanding the analysis to those patients requires re-introducing the partialed-out channels, with the corresponding identification cost.
Embedding stability is necessary but not sufficient. §5.4 reports that the principal components of $S$ are stable across the regime change; mid-pilot drift in token distributions could still bias HTE estimates for late-pilot windows.
Provider monolithicity. §4.5 treats clinics as the unit of supply. Within-clinic physician heterogeneity in audit tolerance or modality preference is not modeled and could attenuate H2 control.
Data access constraint. NHIS requires application-level access, which limits open replication. Code is releasable; the data are not.
No active treatment. The exemption ended for exogenous regulatory reasons, not as a controlled experiment. Parallel-trends failure under regime-specific shocks is the residual identification threat that the §8.4 robustness diagnostics target.

10.5 Companion paper: discrete choice experiment

The mechanisms in §6 are best identified by combining the population panel evidence in this paper with a vignette or discrete choice experiment (DCE) administered to a representative subset of $\mathcal{S}$ . The companion design uses:

12 forced-choice tasks with attributes covering modality offered, provider’s expressed modality preference, audit-risk signal (regulator letter / no signal), and recall option.
Sequence-embedding strata from §4.4 to ensure latent behavioral types are represented.
A mixed logit specification with random parameters on default-stickiness, habit decay, trust, and coordination beliefs.
Linkage of DCE responses to claims via consenting respondents’ NHIS identifiers, enabling joint estimation of the structural parameters with the panel.

The DCE recovers what the panel cannot — the absolute scale of default-stickiness and habit-decay parameters — while the panel disciplines what the DCE cannot — observed behavior under a real regime change.

10.6 Open questions

Mid-policy adaptation. Did patients and providers re-equilibrate over the first 12 months of the pilot? Event-study extensions of §8 with month-by-month $\hat\theta_t$ would identify the adjustment trajectory.
Cross-disease portability. The depth-3 rule in §9.6 leans on habit streak, a feature that pre-supposes a long exemption window. In settings without a comparable pre-policy period, the rule must be re-derived on whatever behavioral signal is available.
GNN identification. The §7.4 bipartite GNN has more degrees of freedom than identifying assumptions strictly justify. A formal result on when graph-augmented HTE estimands remain identified under spillovers would strengthen the §8.3 coordination-share estimate.
Welfare weight robustness. The two reported weights are defensible but not unique. A multi-criteria policy-learning formulation (Sun & Zhou 2024) that returns a Pareto-optimal policy set rather than a scalarized optimum is a natural extension.

10.7 In one line

Behavior — habit, trust, relational continuity — sorted Korea’s elderly into the 2023 telemedicine pilot; a depth-three tree on claims features recovers 84% of oracle welfare where the realized diagnosis-keyed rule recovers 19%.

11. Reproducibility

Code. The full estimation pipeline — DML cross-fitting, GRF, the BEHRT-style sequence pre-training, DeepSurv, the bipartite GNN, and the EWM policy-learning solve — is released as a public repository under MIT licence at the time of submission. Random seeds are fixed (20260515) throughout; the run script reproduces every figure and table given a path to a NHIS extract.

Data. NHIS-NSC, NHIS-Senior, and NHIS-HEALS are released under controlled access via the NHIS Department of Big Data Strategy. The paper does not release patient-level data; it does release aggregate covariate distributions, regression-balance tables, and per-figure plotting data sufficient for visual replication. Researchers seeking to reproduce the full pipeline must apply through the standard NHIS process and receive de-identified data on a secure NHIS terminal.

Environment. Python 3.11, lightgbm 4.x, econml 0.15, grf (R, called via rpy2), torch 2.x for the BEHRT and DeepSurv stacks, torch-geometric 2.x for the bipartite GNN. The full lockfile is in the repository.

References

Entries marked [stub — verify before submission] are working-draft placeholders synthesized from the policy timeline; bibliographic detail will be confirmed during the final revision.

Kim D.W. et al. (2024). The effect of telemedicine on chronic disease management during COVID-19: a difference-in-differences analysis. Health Policy. [stub — verify before submission]
Kim J.H. et al. (2021). The first generation of digital health systems: data on COVID-19 telemedicine utilization in Korea. Healthcare Informatics Research. [stub — verify before submission]
Kim J.H. et al. (2023). Telemedicine utilization patterns among Korean patients with mental illness, 2020–2022. Journal of Korean Medical Science. [stub — verify before submission]
Kim, L., Kim, J. A. & Kim, S. (2014). A guide for the utilization of Health Insurance Review and Assessment Service National Patient Samples. Epidemiology and Health, 36, e2014008. doi:10.4178/epih/e2014008.
Kim, Y. I., Kim, Y. Y., Yoon, J. L., Won, C. W., Ha, S., Cho, K. D., Park, B. R., Bae, S., Lee, E. J., Park, S. Y., Choi, M., Bae, S. A. & Park, J. (2019). Cohort profile: National Health Insurance Service–Senior (NHIS-Senior) cohort in Korea. BMJ Open, 9(7), e024344. doi:10.1136/bmjopen-2018-024344.
Seong, S. C., Kim, Y. Y., Khang, Y. H., Park, J. H., Kang, H. J., Lee, H., Do, C. H., Song, J. S., Bang, J. H., Ha, S., Lee, E. J. & Shin, S. A. (2017). Data Resource Profile: The National Health Information Database of the National Health Insurance Service in South Korea. International Journal of Epidemiology, 46(3), 799–800.
Lee H. et al. (2025). Telemedicine utilization under Korea’s 2023 pilot program: first-period evidence. Health Affairs (forthcoming). [stub — verify before submission]
Lee J., Lee J.S., Park S.H., Shin S.A., & Kim K. (2017). Cohort Profile: The National Health Insurance Service–National Sample Cohort (NHIS-NSC), South Korea. International Journal of Epidemiology, 46(2), e15.
Seong, S. C., Kim, Y. Y., Park, S. K., Khang, Y. H., Kim, H. C., Park, J. H., Kang, H. J., Do, C. H., Song, J. S., Lee, E. J., Ha, S., Shin, S. A. & Jeong, S. L. (2017). Cohort profile: the National Health Insurance Service–National Health Screening Cohort (NHIS-HEALS) in Korea. BMJ Open, 7(9), e016640. doi:10.1136/bmjopen-2017-016640.
Shinn et al. (2025). The regulatory history of telemedicine in the Republic of Korea. Korean Journal of Family Medicine. [stub — verify before submission]

Methods references

Athey, S., & Wager, S. (2021). Policy learning with observational data. Econometrica, 89(1), 133–161.
Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized random forests. Annals of Statistics, 47(2), 1148–1178.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68.
Chernozhukov, V., Demirer, M., Duflo, E., & Fernández-Val, I. (2018). Generic machine learning inference on heterogeneous treatment effects in randomized experiments. NBER Working Paper 24678.
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1).
Kitagawa, T., & Tetenov, A. (2018). Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica, 86(2), 591–616.
Li, Y., Rao, S., Solares, J. R. A., Hassaine, A., Ramakrishnan, R., Canoy, D., Zhu, Y., Rahimi, K., & Salimi-Khorshidi, G. (2020). BEHRT: Transformer for electronic health records. Scientific Reports, 10, 7155.

Behavioral and game-theoretic references

Farrell, M. H., Liang, T., & Misra, S. (2021). Deep neural networks for estimation and inference. Econometrica, 89(1), 181–213.
Hahn, P. R., Murray, J. S., & Carvalho, C. M. (2020). Bayesian regression tree models for causal inference. Bayesian Analysis, 15(3), 965–1056.
Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. NeurIPS.
Samuelson, W., & Zeckhauser, R. (1988). Status quo bias in decision making. Journal of Risk and Uncertainty, 1(1), 7–59.
Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press.
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph attention networks. ICLR.
Wood, W., & Neal, D. T. (2007). A new look at habits and the habit-goal interface. Psychological Review, 114(4), 843–863.

This is a cumulative consult count, not a count of unique patients. Disambiguating unique-patient counts requires linkage at the HIRA claim level. ↩

Habits, Trust, and the Korean Telemedicine Collapse

1. Introduction§

1.1 The NHIS research cohorts§

1.2 Why this data environment matters for behavioral economics§

1.3 Contribution§

2. The Telemedicine Episode§

2.1 Timeline at a glance§

2.2 The pre-pandemic baseline§

2.3 The emergency exemption§

2.4 The 2023 pilot rollback§

2.5 The empirical surprises§

2.6 Exemption vs. pilot: the order-of-magnitude collapse§

3. The Behavioral Question§

4. Empirical Strategy§

4.1 Target estimand and sample§

4.2 Double/debiased machine learning for the ATE§

4.3 Causal forest for heterogeneous effects§

4.4 Claims-sequence pre-training§

4.5 Provider-side survival as an H2 control§

4.6 Identification, placebos, robustness§

4.7 Methods pipeline§

5. Data Construction and Sample Diagnostics§

5.1 Cohort flow§

5.2 Covariate vector XXX§

5.3 Sequence tokenization and embedding extraction§

5.4 Pre-period balance and parallel-trends diagnostics§

5.5 Sample sizes and statistical power§

6. Behavioral and Game-Theoretic Mechanisms§

6.1 Psychological channels§

6.2 Game-theoretic channels§

6.3 Why the collapse is over-determined§

7. Mechanism Identification via Machine Learning§

7.1 Operationalizing constructs from claims data§

7.2 Heterogeneous treatment effects by psychographic profile§

7.3 Patient embeddings as behavioral phenotypes§

7.4 Dyad-level models for game-theoretic mechanisms§

7.5 Mechanism identification pipeline§

7.6 What can and cannot be identified§

8. Results§

8.1 Headline ATE on the H3-residual subpopulation§

8.2 Heterogeneous treatment effects§

8.3 Mechanism decomposition§

8.4 Placebos and robustness§

8.5 Preview: optimal-policy counterfactual§

9. Optimal-Policy Counterfactual§

9.1 The welfare problem§

9.2 Estimator§

9.3 Welfare frontier§

9.4 Budget sensitivity§

9.5 Adherence vs equity Pareto frontier§

9.6 What the optimal rule looks like§

9.7 Implementation feasibility§

10. Discussion§

10.1 What the paper establishes§

10.2 Policy implications§

10.3 What generalizes, what doesn’t§

10.4 Limitations§

10.5 Companion paper: discrete choice experiment§

10.6 Open questions§

10.7 In one line§

11. Reproducibility§

References§

Methods references§

Behavioral and game-theoretic references§

Footnotes§