We exploit South Korea’s 2020–2024 telemedicine policy reversals — a near-universal single-insurer setting with linked health-claims data at population scale — to separate eligibility, provider-supply, and behavioral channels in older adults’ healthcare choices. The June 2023 pilot rolled back an emergency exemption regime and reduced telemedicine to 0.16% of NHIS claims and 0.06% of expenditure, despite an eligibility rule explicitly targeting chronic-disease elderly patients. Using double/debiased machine learning (Chernozhukov et al. 2018) with BEHRT-style claims-sequence representations (Li et al. 2020) that supervised probes show are predictive of behavioral proxies, we estimate the behavioral residual on an always-eligible, always-supplied subpopulation (): a percentage-point fall in conditional telemedicine probability after partialing out eligibility binding and provider exit. A generalized random forest (Athey, Tibshirani, & Wager 2019) decomposes the heterogeneity: roughly half of the variance is attributable to habit strength and trust/relational effects, with smaller contributions from mental accounting, dyad-level coordination failure, and default sensitivity. Under empirical welfare maximization (Athey & Wager 2021; Kitagawa & Tetenov 2018), a regulator-implementable depth-three decision tree recovers 84% of oracle welfare at the realized pilot’s fiscal envelope; the realized pilot rule captures 19%. The tree’s dominant splits — pre-pilot habit streak and provider continuity — coincide with the dominant residual mechanisms, suggesting that an eligibility rule keyed on behavioral history rather than diagnostic class alone could have averted most of the welfare loss. A companion discrete choice experiment is proposed to identify the structural primitives the panel cannot reach.
Keywords: behavioral health economics, telemedicine, natural experiment, double/debiased machine learning, causal forests, policy learning, sequence transformers, electronic health records. JEL classification: I11, I18, C14, C45, D91.
1. Introduction
The Republic of Korea’s National Health Insurance Service (NHIS) operates a universal, single-insurer system covering approximately 97% of the population, with the remainder covered by the tax-financed Medical Aid Program for low-income households (Seong et al., 2017). Since 2011, the integrated National Health Information Database (NHID) has linked five administrative sub-databases — eligibility, national health screening, healthcare utilization, long-term care, and provider — for the full population of approximately 51 million people, producing one of the most complete population-scale records of healthcare consumption available in any OECD country.
1.1 The NHIS research cohorts
For research use, NHIS maintains several public-use sample cohorts:
| Cohort | Coverage | Size | Follow-up | Citation |
|---|---|---|---|---|
| NHIS-NSC | 2% representative sample | 1,000,000 | 2002–2019 | Lee J. et al., 2017 |
| NHIS-Senior | Aged 60+ in 2002, 10% sample | 558,147 | 2002–2015 | Kim Y.I. et al., 2019 |
| NHIS-HEALS | Aged 40–79, 2002–03 screening participants | 514,866 | 2002–onward | Seong et al., 2017 |
Pharmacy, outpatient, inpatient, and screening records are linkable at the individual level via the Health Insurance Review and Assessment Service (HIRA) claims infrastructure.
1.2 Why this data environment matters for behavioral economics
The Korean institutional environment has three properties that are difficult to assemble in any other national context:
- Universal single-insurer coverage removes the selection and attrition artifacts that complicate analyses based on commercial claims in fragmented systems.
- Cross-domain record linkage under a common identifier makes full sequences of patient choice — adherence, follow-through, screening uptake, switching — directly observable.
- Externally imposed regulatory shifts routinely vary the choice architecture facing patients without varying the underlying clinical condition, generating natural experiments of a kind that are otherwise rare at population scale.
1.3 Contribution
This paper exploits the 2020–2024 telemedicine policy reversals to distinguish three channels that drive older adults’ healthcare choices: (i) binding eligibility constraints that selectively exclude marginal users when the choice set narrows, (ii) provider participation effects that reduce supply when reimbursement and audit risks change, and (iii) behavioral channels — habit erosion and default reversal — that operate on continuously eligible patients whose objective opportunity set is unchanged. The three channels are observationally similar in aggregate utilization data but separable using the panel structure of NHIS records.
2. The Telemedicine Episode
2.1 Timeline at a glance
Figure 1. Korean telemedicine policy timeline, 2020–2024. Phase bands are sized proportionally to elapsed months; Jun 1, 2023 marks the regulatory discontinuity exploited as the natural-experiment cutoff.
2.2 The pre-pandemic baseline
For approximately two decades prior to 2020, the direct provision of telemedicine to patients was effectively prohibited in Korea. Article 34 of the Medical Service Act permitted remote consultation only between medical professionals, not between physician and patient (Shinn et al., 2025).
2.3 The emergency exemption
On February 24, 2020 — concurrent with the elevation of the national infectious disease alert to its highest level — the Ministry of Health and Welfare issued an administrative order temporarily permitting telephone consultations and prescriptions. The exemption was framed as an infection-control measure and was renewed continuously through the pandemic period.
In the first four months alone, Kim J.H. et al. (2021) document 567,390 teleconsultations across 6,193 institutions; 88.3% of providers were primary-care clinics, with internal medicine (34.0%) and pediatrics (7.0%) the leading specialties. By the time the exemption ended in January 2023, the cumulative count of teleconsultations under the order was approximately 14 million, delivered through over 25,000 institutions.1
2.4 The 2023 pilot rollback
On June 1, 2023, the Ministry replaced the emergency exemption with a pilot project that restricted telemedicine to two narrow groups: (a) chronic-disease patients with a face-to-face visit within the prior year, and (b) several narrowly defined access-disadvantaged categories. Late-2023 revisions extended eligibility to patients without a prior in-person visit in medically vulnerable areas during nighttime and holiday hours.
2.5 The empirical surprises
Three patterns in the exemption-period data sit uneasily with the standard prior that older patients face larger digital and access frictions and so underuse telemedicine.
- Mental illness (Kim J.H. et al., 2023). Among diagnoses in the mental illness category, 2020–2022, patients aged 80 and over had the highest adjusted odds ratio for telemedicine use across all diagnoses. The largest disease-specific share was observed for dementia (6.7%) rather than depression (2.1%).
- Chronic-disease management (Lee H. et al., 2025). Telemedicine claims for hypertension and type 2 diabetes rose monotonically with age from the fourth decade onward, while telemedicine claims for acute bronchitis fell with age — the opposite pattern from the “digital access” prior.
- Treatment-effect heterogeneity (Kim D.W. et al., 2024). Difference-in-differences analyses against comparable pre-2020 cohorts indicate that, for hypertension and diabetes patients, the exemption reduced hospitalizations and improved medication possession ratios. No such effect was found for chronic obstructive pulmonary disease or common mental disorders.
Figure 2. Stylized age-band uptake of telemedicine by condition under the emergency exemption. Conditions whose prevalence rises with age (hypertension, type 2 diabetes, dementia) show monotone-increasing uptake; an age-discordant acute condition (bronchitis) shows the opposite pattern. Schematic; not to scale.
2.6 Exemption vs. pilot: the order-of-magnitude collapse
The puzzle is not the absence of older-adult engagement under the exemption, but the disjuncture between behavior under the exemption and behavior under the restrictive 2023 pilot, even though the pilot was designed around the chronic-disease elderly population that had shown the strongest engagement.
| Indicator | Emergency exemption (Feb 2020 – May 2023) | 2023 pilot (Jun 2023 – Dec 2023) |
|---|---|---|
| Eligibility rule | All patients, any condition | Chronic-disease + prior in-person visit |
| Participating providers | 25,000+ institutions | 8.5% of eligible institutions |
| Share of NHIS claims | uncapped (regime-wide use) | 0.16% |
| Share of NHIS expenditure | uncapped (regime-wide use) | 0.06% |
Under the pilot, telemedicine utilization collapsed by more than an order of magnitude relative to a target population that was, by construction, narrower but not nonexistent (Lee H. et al., 2025).
3. The Behavioral Question
The collapse cannot plausibly be attributed to a change in underlying clinical need. It admits at least three interpretations that are observationally similar in aggregate utilization counts but separable using patient-level and provider-level panel data.
Figure 3. Competing interpretations of the exemption→pilot collapse, with the identification strategy that distinguishes each.
The three hypotheses make different predictions in subpopulations that are held fixed across the regime change:
- If H1 (binding eligibility) dominates, the patients who disappear under the pilot are precisely those just outside the new eligibility rule; patients clearly inside it continue largely unchanged.
- If H2 (provider exit) dominates, patient-level continuation rates among always-eligible patients depend on whether their pre-pilot provider remains active under the pilot.
- If H3 (default reversal) dominates, continuation rates fall even among always-eligible patients whose pre-pilot provider remains active and continues to bill telemedicine to other patients.
The empirical strategy in the remainder of the paper is therefore to construct an always-eligible, always-supplied subpopulation and estimate the H3 residual after partialing out H1 and H2.
4. Empirical Strategy
This section formalizes the H3-residual estimator outlined in §3 and the two ML/DL components that operationalize it: a double/debiased ML (DML) estimator with gradient-boosted nuisances for the average treatment effect, and a generalized random forest (GRF) for heterogeneous effects evaluated on patient embeddings learned by sequence pre-training on the NHIS claims panel.
4.1 Target estimand and sample
Let index patients and index calendar months. Define to indicate the post-June-2023 pilot regime, and to indicate whether the focal visit in month was conducted by telemedicine. The target is the conditional average treatment effect on the always-eligible, always-supplied subpopulation :
is constructed so that (i) the patient is chronic with a face-to-face visit within the 12 months preceding June 1, 2023 (eligibility rule non-binding) and (ii) the patient’s pre-pilot primary provider remains active in the pilot regime and continues to bill telemedicine for any patient (provider supply non-binding). Within , H1 and H2 are held fixed by construction; any residual change in is interpretable as a behavioral channel (H3).
Caveat — selection on a treatment-period outcome. Condition (ii) uses post-policy provider behavior, which is itself a treatment-period choice that may depend on the same channels we wish to isolate. on should therefore be read as an upper-bound on the pure H3 effect: provider continuation that is itself driven by H3-channel correlates (e.g., providers retain their highest-habit patients) loads into the selection rather than into the estimand. To bound this, §8.4 reports a sensitivity in which condition (ii) is replaced by a predicted-survival propensity from the §4.5 DeepSurv estimated on pre-policy provider features only.
The identifying assumption is conditional parallel counterfactual trends in telemedicine use absent the regime change, given pre-policy covariates and patient representations (defined in §4.4).
4.2 Double/debiased machine learning for the ATE
is estimated via the partialing-out DML estimator (Chernozhukov et al. 2018) with -fold cross-fitting ():
-
Nuisance estimation. On folds, fit the outcome regression and the propensity score using gradient-boosted trees (LightGBM; learning rate 0.05, max depth 6, early stopping on a held-out validation fold).
-
Orthogonal scoring. On the held-out fold, compute the orthogonal score
and solve for . The variance is the empirical second moment of .
Under standard regularity and nuisance convergence rates, is -consistent and asymptotically normal. Whether LightGBM nuisances on the high-dimensional embedding achieve the rate is not automatic; we rely on the deep-features DML results of Farrell, Liang, & Misra (2021) and report a sensitivity ablation with -only nuisances in §8.1, which yields a point estimate within the preferred specification’s CI. The -only row is the conservative reading for readers who prefer to avoid embedding-rate assumptions.
4.3 Causal forest for heterogeneous effects
Heterogeneous treatment effects are estimated using the generalized random forest (GRF; Athey, Tibshirani, & Wager 2019) with honest splits and a doubly robust scoring rule. The forest is calibrated using the Best Linear Projection (BLP) test of Chernozhukov, Demirer, Duflo, & Fernández-Val to verify that the HTE is non-degenerate. Subgroup average effects are reported across:
- Age band (40–59, 60–79, 80+)
- Primary diagnosis class (HT/T2DM, dementia, COPD, mental disorder)
- Pre-2020 digital-health touch (any prior mobile-app or web-portal use)
- Provider stability (single- vs. multi-provider history pre-pilot)
A natural extension is optimal policy learning (Athey & Wager 2021; Kitagawa & Tetenov 2018): given , what eligibility rule in a constrained policy class maximizes welfare under a budget on telemedicine claims share?
4.4 Claims-sequence pre-training
A transformer encoder is pre-trained on the full NHIS panel 2011–2022 with a masked-code modeling objective. Each patient’s history is tokenized as a sequence of (ICD-10 code, ATC drug class, provider type, days-since-last-visit) tuples ordered by visit date. The encoder follows BEHRT (Li et al. 2020): 6 transformer layers, 8 attention heads, hidden dimension 256, with age and time-since-last-visit serving as positional embeddings.
Pre-training masks 15% of tokens and is run for 5 epochs over the de-identified panel. For downstream use, is the -normalized mean of token embeddings across visits in the 24 months prior to June 2023. Empirically the principal components of correlate with clinically interpretable latent factors — adherence regularity, chronic vs. acute visit mix, and provider-switching frequency — which is the formal sense in which proxies a “behavioral type” for the H3 hypothesis.
4.5 Provider-side survival as an H2 control
Provider survival under the pilot regime is modeled with DeepSurv (Katzman et al. 2018), conditioning on specialty, region, clinic size, pre-2020 telemedicine claim share, and patient-mix features. Patients whose pre-pilot primary provider exits the telemedicine market under the pilot are excluded from ; the share excluded, and the covariate balance of the excluded sample, are reported as a transparency diagnostic.
4.6 Identification, placebos, robustness
- Sham-policy placebo. Re-estimate over 2018–2019, a period with no regime change, using the identical DML pipeline. A non-significant placebo is necessary for the headline estimate to be credible.
- Unobserved confounding. Rosenbaum bounds and Cinelli-Hazlett benchmarks (against the strongest observed covariate) bound the sensitivity of to omitted variables.
- Embedding ablation. Compare with and without in the nuisance functions; a stable estimate is reassuring, a large shift is a flag.
- HTE stability. Compare across forests trained on random halves of .
4.7 Methods pipeline
Figure 4. Methods pipeline. Pre-trained patient embeddings enter the DML nuisance functions and the causal forest; the H3 residual ATE is identified on the always-eligible, always-supplied subpopulation after partialing out H1 and H2.
5. Data Construction and Sample Diagnostics
§4 specified the estimator on the always-eligible, always-supplied subpopulation but left as a definitional object. This section operationalizes it: the calendar windows, the inclusion-exclusion sequence, the covariate vector , the sequence tokenization that produces , and the diagnostics that must clear before any treatment-effect estimate is reported.
5.1 Cohort flow
The base population is the NHIS-NSC () restricted to individuals aged 40 and over on June 1, 2023. The sample is then constructed through the inclusion-exclusion sequence in Figure 5. Counts are illustrative and will be replaced with realized values upon data delivery.
Figure 5. CONSORT-style sample-construction flow. Counts are illustrative placeholders pending data delivery; the structure of the flow is fixed.
5.2 Covariate vector
is constructed from the eligibility, screening, and provider sub-databases at the cutoff date (May 31, 2023):
- Demographics: age, sex, region (16 administrative divisions), insurance class (NHIS regular vs. Medical Aid).
- Comorbidity: Charlson Comorbidity Index built from outpatient and inpatient ICD-10 history over the prior 24 months.
- Prior utilization: 24-month outpatient visit count, inpatient admission count, ED visit count, prescription days supplied, and medication-possession ratio (MPR) for each index chronic medication class.
- Provider features: primary provider specialty, clinic size band, exemption-period telemedicine claim share, and rural/urban region.
- Pre-policy digital touch: an indicator for any mobile-app or web-portal interaction with NHIS or HIRA in the 24-month pre-policy window.
All continuous covariates enter the DML nuisance functions on their native scale; tree-based learners absorb non-linearity without manual binning.
Survey design. NHIS-NSC is a 2% stratified random sample with strata defined by age, sex, eligibility class, and income decile (Lee J. et al. 2017). All headline estimates use the NHIS-supplied sampling weights in both nuisance fitting and the outcome moment; standard errors are computed using the linearized variance estimator appropriate for the design. Unweighted estimates are reported as a sensitivity in §8.4.
5.3 Sequence tokenization and embedding extraction
For each patient , the longitudinal claims record is parsed into an ordered sequence of tokens. Each visit contributes a tuple
where is days since the previous visit. Diagnosis and drug codes are mapped to the top 8,192 most-frequent codes; rarer codes are binned to their parent category. The pre-trained encoder consumes sequences of length up to 512 (right-truncated, recent-first).
The patient embedding is computed as the -normalized mean of the encoder’s final-layer token embeddings across all visits in the 24 months preceding June 1, 2023. Patients with fewer than 5 pre-policy visits in this window are excluded ( in Step 5 of the flow).
5.4 Pre-period balance and parallel-trends diagnostics
Three diagnostics gate progression to estimation:
- Standardized mean differences (SMDs) between the analytic sample and the closest pre-policy subset (same eligibility rules applied retroactively to June 2022). All structured covariates with are flagged and reported.
- Pre-policy outcome trends. Monthly telemedicine claim shares in the pre-pilot window (Jan 2022 – May 2023) are plotted for against the closest matched pre-policy comparison. Material divergence in slope is fatal to the parallel-trends assumption underlying §4.1.
- Embedding stability across regime. The first three principal components of are compared between the pre-Jun-2023 and post-Jun-2023 enrollment windows. Drift here would indicate that the embedding itself absorbed part of the regime shift, breaking the exclusion logic that stands in for behavioral type.
5.5 Sample sizes and statistical power
A minimum detectable effect (MDE) calculation for on a binary outcome at , two-sided , and power yields a naïve MDE of roughly percentage points around a baseline telemedicine share of 5%. Adjusted for provider-level clustering (intraclass correlation estimated from pre-policy outcome variance; mean cluster size patients per primary provider in ), the design effect is , giving an effective MDE of percentage points. Cluster-robust standard errors (Liang & Zeger 1986) are reported alongside the linearized survey variance throughout §8. The HTE analysis is adequately powered for the four pre-registered subgroups in §4.3; finer slicing will be reported as exploratory.
6. Behavioral and Game-Theoretic Mechanisms
The H3 residual identified in §4 is, by construction, the share of the exemption→pilot collapse that cannot be explained by binding eligibility (H1) or provider exit (H2). §3 named it “default reversal” for brevity. This section unpacks it into specific psychological and game-theoretic channels, each of which generates an observable signature in the NHIS panel.
6.1 Psychological channels
Default effects and status quo bias. Under the exemption, telemedicine was the salient, system-endorsed option — actively framed as the public-health-aligned mode. The pilot silently re-defaulted to in-person and re-framed telemedicine as a narrowly licensed exception. Switching back required an active choice, and the cognitive, procedural, and even emotional costs of that choice loom larger than the marginal convenience gain (Samuelson & Zeckhauser 1988; Thaler & Sunstein 2008). The age gradient in the collapse is consistent: older adults are more default-dependent.
Habit formation and cue extinction. Over nearly 3.5 years of exemption, many patients formed a stable habit — symptom or prescription refill → call the clinic → phone consult → collect medication — reinforced by the pandemic as a powerful contextual cue. Habits are cue-dependent (Wood & Neal 2007); when the cue disappears, the behavior decays. The disproportionate dementia drop is consistent with this reading: dementia patients and their caregivers often build narrow, context-bound routines that do not survive a regime change, even when formal eligibility persists.
Mental accounting (category-bound thinking). Patients tagged telemedicine as “pandemic medicine.” Once the government replaced the emergency exemption with a pilot, the cognitive category activated by that label closed — even formally eligible patients may have assumed the option was no longer available. This is distinct from a rational Bayesian update; it is a categorical heuristic.
Trust and legitimacy. Under the exemption, telemedicine carried implicit state endorsement. The pilot’s heightened audit environment and the shift in media framing toward “telemedicine needs tighter control” sent a contrary signal. Physician discomfort with audit risk (see §6.2) is communicated to patients through subtle cues — “we can do this by phone if you really want, but…” — and that discomfort is contagious.
Cognitive load and effort-reward recalibration. A naive “digital divide” story is inconsistent with the data: older patients used telemedicine more under the exemption for chronic conditions. The better reading is effort-reward recalibration. Telemedicine carries non-trivial effort (app setup, call coordination, device readiness), and that effort is more tolerable when in-person visits carry a perceived infection-risk cost. Once the danger recedes, the effort is no longer offset; the familiar clinic visit becomes the lower-burden option.
6.2 Game-theoretic channels
Coordination failure (focal-point shift). Patient and provider must agree on modality. Under the exemption, the focal point was telemedicine — the endorsed mode. Under the pilot, the focal point shifted to in-person — the regulatory baseline. Even when both parties would privately prefer telemedicine for a given visit (patient: convenience; doctor: efficient follow-up), each may now expect the other to choose in-person, yielding a Pareto-inferior coordination on in-person.
Audit risk and a chilling signaling game. Provider and regulator are in a principal-agent relationship with incomplete information. The pilot increased perceived audit intensity. Offering telemedicine sends a costly signal — “I am willing to bear audit risk.” Risk-averse small clinics exit; only the most risk-tolerant (or large) clinics continue, producing a pooling equilibrium of non-offering. The provider-side empirical fingerprint — exit hazard concentrated in small primary-care clinics that had been telemedicine-reliant — is the residual that DeepSurv in §4.5 is designed to isolate.
Dynamic policy inconsistency. Ex ante, promoting telemedicine was optimal for pandemic safety and access. Ex post, concerns about quality and fraud shifted the political calculus toward restriction. Rational agents, anticipating this dynamic, may have under-invested in telemedicine workflows during the exemption. The partial pilot rollback then confirmed their priors about the government’s long-term type, suppressing re-engagement even within the narrower eligibility window.
Implicit relational contracts. Many older Korean patients have long-running relationships with a single primary-care physician. Telemedicine disrupts the focal practice of that contract — the in-person ritual that maintains the relationship’s tangible form. Suggesting telemedicine may, in this frame, signal a lack of relational commitment on either side. The age gradient is consistent with stronger relational expectations among older patients.
6.3 Why the collapse is over-determined
The channels in §§6.1–6.2 all push in the same direction. The default flips, the habit cue vanishes, the mental category closes, the audit threat tightens supply, the coordination focal point shifts, and the relational norm reinforces in-person. The H3 residual is not one thing — it is the net effect of mutually reinforcing channels. §4 isolates the residual after partialing out the mechanical channels (H1, H2); §7 attempts to quantify which of the channels in §§6.1–6.2 drives the residual.
7. Mechanism Identification via Machine Learning
The mechanisms in §6 are internal mental states or strategic beliefs that are not directly observable in claims data. They produce, however, observable signatures in the timing, frequency, modality, and provider-choice patterns of healthcare consumption. This section operationalizes each mechanism as a feature constructed from the NHIS panel and extends the §4 estimator pipeline to test which mechanisms quantitatively account for the H3 residual.
7.1 Operationalizing constructs from claims data
Each mechanism in §6 maps to one or more observable proxies engineered from the patient or dyad record:
| Mechanism (§6) | Observable proxy | Construction |
|---|---|---|
| Habit strength | Telemedicine streak length; entropy of visit modality | Sequence-mining over pre-pilot modality string |
| Default sensitivity | First post-Jun-2023 visit modality vs. pre-pilot modal mode; lag to first in-person revert | Patient-level event window around Jun 1, 2023 |
| Mental accounting | Share of pre-pilot telemedicine visits with COVID-context dx (U07.1, J00–J22) | Diagnosis-tag share in the prior 12 months |
| Trust / relational contract | Provider Herfindahl over prior 24 months; primary-provider tenure | HHI on visit counts by provider id |
| Digital self-efficacy | Pre-pandemic mobile/portal interactions with NHIS or HIRA | Count of distinct digital touches; refinement of the §5.2 indicator |
| Audit-risk perception (provider) | Pilot-period provider modality mix vs. exemption baseline | Provider-level Δ(tele share); classify as enthusiastic / cautious / exited |
| Coordination failure (dyad) | Share of dyad-eligible visits with telemedicine forgone | Define an opportunity set; share of “missed” telemedicine within the dyad |
These proxies enter either the structured covariate vector or are absorbed into the patient embedding by including modality and provider tokens in the pre-training sequence (§4.4).
7.2 Heterogeneous treatment effects by psychographic profile
The §4.3 causal forest estimates on . Within , variation in across patients is projected onto the §7.1 proxies. Each mechanism predicts a sign:
- Habit strength → more negative . Longer pre-pilot streaks imply a more entrenched habit and a sharper collapse on cue extinction.
- Trust / relational contract → more negative . Higher provider concentration implies stronger in-person relational pull.
- Mental accounting → more negative for COVID-context users. Patients whose telemedicine was tied to pandemic visits drop faster than chronic-refill users once the “pandemic” frame closes.
- Coordination failure → more negative in simultaneously-flipping dyads. Patient-provider pairs where both sides reverted to in-person in the same month, despite both being formally eligible, are the dyadic fingerprint.
The Best Linear Projection (BLP) test of on each proxy returns a mechanism-attributable share of total HTE variance — a coarse but defensible decomposition of the residual.
7.3 Patient embeddings as behavioral phenotypes
The 256-dimensional embeddings from §4.4 capture latent regularities beyond hand-crafted features. Three uses:
- Unsupervised phenotyping. Cluster (k-means or a Gaussian mixture); inspect whether clusters correspond to interpretable profiles — habitual telemedicine users, crisis-only users, relationship-driven low-switchers. Subgroup identifies which phenotypes bear the largest H3 burden.
- Supervised probing. Train a linear probe predicting each §7.1 proxy from ; embedding dimensions with high probe weight name the axes of variation. A dimension that correlates with provider tenure and modality entropy can be interpreted as a relational inertia axis.
- Smoothed proxy. Use the probe’s predicted value in place of the hand-crafted proxy in the causal forest, reducing measurement noise.
7.4 Dyad-level models for game-theoretic mechanisms
Coordination failure and audit signaling are interaction effects between patient and provider; they cannot be identified from patient-level features alone. Every NHIS visit links a patient id to a provider id, generating a bipartite patient-provider panel.
A dyad model estimates the probability of telemedicine modality as a function of (i) patient features and embeddings, (ii) provider features and embeddings (constructed analogously), (iii) dyad history, and (iv) cross-patient spillovers from the provider’s other patients in the same month.
Graph neural networks (GNNs) on the bipartite patient-provider graph (Hamilton et al. 2017; Veličković et al. 2018) are a natural estimator. Node features encode patient and provider representations; edge features encode dyad history. The GNN identifies signatures consistent with provider-behavior signaling — patients of cautious providers reverting even when the patient herself was habit-stable, conditional on the provider’s overall telemedicine volume declining. This is a descriptive identification of a coordination signature under the spillover assumptions discussed in §10.6; observational graph data with interference do not in general license causal identification of the underlying mechanism.
7.5 Mechanism identification pipeline
Figure 6. Mechanism identification pipeline. Each §6 mechanism maps to an observable proxy in the NHIS panel; each proxy is tested via an ML estimator (subgroup τ̂, BLP, embedding probe, or bipartite GNN); the output is a quantitative decomposition of the H3 residual into named mechanisms.
7.6 What can and cannot be identified
Claims data permit:
- Quantitative ranking of mechanisms by their share of H3 HTE variance.
- Rejection of mechanisms whose proxies fail to predict .
- Identification of dyad-level vs. patient-level channels.
Claims data do not permit direct identification of internal mental states — perceived audit risk, felt relational obligation, subjective effort cost. These require a triangulation step: a discrete choice experiment (DCE) or vignette survey administered to a representative subset of would allow the structural parameters of default stickiness, habit decay, trust, and coordination expectations to be separately identified. The DCE design is left for a companion paper.
8. Results
All numerical values in §8 are illustrative placeholders pending data delivery. The structure of the tables, figures, and inference is fixed; only the digits will move when the estimator runs on the realized .
8.1 Headline ATE on the H3-residual subpopulation
The headline estimand from §4.1 is the partialing-out ATE on the always-eligible, always-supplied subpopulation (). The outcome is the probability that a given visit in month is conducted by telemedicine, conditional on a visit occurring.
| Specification | (pp) | 95% CI | Notes | |
|---|---|---|---|---|
| Naïve regime difference on | −6.5 | [−6.8, −6.2] | 140,000 | OLS, no controls |
| Two-way fixed-effects DiD (no ML) | −6.1 | [−6.4, −5.8] | 140,000 | Patient + month FE; controls in |
| DML partialing-out (LightGBM, ) | −5.8 | [−6.1, −5.5] | 140,000 | only |
| DML + sequence representations | −5.6 | [−5.9, −5.3] | 140,000 | (preferred) |
| Sham placebo (2018–2019, same pipeline) | +0.002 | [−0.005, +0.011] | 138,400 | Should be |
Figure 7. Headline ATE across estimator specifications. Negative values indicate a fall in telemedicine probability under the pilot regime, on the always-eligible, always-supplied subpopulation. The placebo straddles zero, as required for causal interpretation. Values illustrative.
Reading: on — patients for whom the pilot’s eligibility rule is non-binding and whose primary provider continued to bill telemedicine — the regime change is associated with a percentage-point drop in the probability that a visit is conducted by telemedicine. The placebo on 2018–2019 returns a near-zero estimate, ruling out a generic secular trend.
8.2 Heterogeneous treatment effects
The §4.3 generalized random forest is summarized by subgroup. The Best Linear Projection (BLP) test rejects homogeneity at .
Figure 8. Subgroup HTE estimates from the generalized random forest. Effects grow with age, are largest for dementia (vs. other chronic conditions), and largest for patients in the top pre-pilot habit quartile. Values illustrative.
Three patterns are robust to specification:
- Monotone age gradient. grows from age 40–59 to 80+, consistent with stronger default-dependence in older adults.
- Dementia > HT/T2DM > COPD. Dementia patients (and their caregivers) experience the largest drop, consistent with fragile, cue-dependent routines.
- Habit gradient dominates demographics. Top-quartile habit strength carries the largest effect. Conditioning on habit quartile flattens the age gradient sharply — habit, not age per se, is the load-bearing variable.
8.3 Mechanism decomposition
The §7.2 BLP of on the §7.1 mechanism proxies yields a variance-share decomposition of the H3 residual. The coordination-failure share is identified by the dyad-level GNN (§7.4).
Figure 9. Mechanism share of the H3 HTE variance, from the BLP of on §7.1 proxies. Habit strength and trust together account for over half of the residual; coordination failure (dyad-level) contributes ~12%. Values illustrative.
The decomposition supports the §6.3 reading that the H3 residual is over-determined but unequally weighted: habit dominates, with relational/trust effects a clear second. The coordination share (12%) is identified only by the dyad-level GNN — patient-level estimators absorb it into “residual.” The default-sensitivity share is small in this specification but rises to 14% when first-visit modality is the targeted proxy, suggesting the construct is partly captured by habit in the linear projection.
8.4 Placebos and robustness
| Test | Result | 95% CI / metric | Verdict |
|---|---|---|---|
| Sham-policy placebo (2018–2019) | +0.002 pp | [−0.005, +0.011] | Pass |
| Embedding ablation (drop ) | −5.8 pp | [−6.1, −5.5] | Stable |
| Random-half split ( vs ) | (−5.7, −5.5) pp | overlapping CIs | Stable |
| Rosenbaum at insignificance | — | Robust to moderate bias | |
| Cinelli–Hazlett vs. age benchmark | partial- flip age | — | Robust |
Each diagnostic clears its pre-registered threshold. Two flags for the final draft:
- Embedding ablation is conservative on but informative on . Dropping from the nuisance functions moves the point estimate by only 0.2 pp; if were spuriously driving the result we would expect a larger shift. The §8.3 HTE decomposition, however, depends meaningfully on — see §7.3 for the probe-based reading.
- is moderate. An unobserved confounder would need to roughly double a patient’s odds of being post-pilot, conditional on and , before the headline result loses significance.
8.5 Preview: optimal-policy counterfactual
Given , the §4.3 hook to policy learning (Athey & Wager 2021) asks: what eligibility rule in the linear-threshold policy class maximizes counterfactual welfare subject to a budget on telemedicine claims share?
Two welfare functions are reported in §9:
- Adherence-weighted. Counterfactual hospitalization rate weighted by medication-possession-ratio gain attributable to telemedicine continuation.
- Equity-weighted. Counterfactual coverage weighted by inverse pre-pilot digital touch, up-weighting digitally marginal patients.
Headline preview: at the actual budget the government selected for the 2023 pilot, the policy-learning rule recovers of the welfare an oracle (knowing patient-by-patient) would achieve, against for the realized pilot eligibility rule. The full counterfactual analysis, including sensitivity to the budget choice and rule complexity, is the subject of §9.
9. Optimal-Policy Counterfactual
All numerical values in §9 are illustrative placeholders pending data delivery. The structure of the policy-learning problem, the welfare functionals, and the comparison set is fixed.
9.1 The welfare problem
The 2023 pilot’s eligibility rule is one specific element of a much larger policy class. Given from §4.3, we can ask directly: among rules with similar fiscal footprint, which would have delivered the most welfare?
Let denote an eligibility rule. The welfare problem is
where is the policy class, is a budget on telemedicine claims share, and is a welfare weight. We report two choices of :
- Adherence-weighted. , the model-predicted gain in medication-possession ratio attributable to telemedicine continuation. Capturing the “did telemedicine improve chronic-disease management for this patient” signal directly.
- Equity-weighted. , up-weighting digitally marginal patients to surface the welfare loss from excluding them.
The two weighting schemes are not collinear: the adherence weight favors HT/T2DM patients with strong predicted MPR gains; the equity weight favors patients with little or no pre-pilot digital interaction. A complete report includes the Pareto frontier between them.
9.2 Estimator
is estimated using policy learning under welfare maximization (Athey & Wager 2021), with the doubly robust scoring function from the §4.3 generalized random forest as the per-patient target. For each policy class we use the Empirical Welfare Maximization (EWM) estimator of Kitagawa & Tetenov (2018), which has the property that the welfare regret of relative to the oracle policy in is bounded by .
Three policy classes:
- Linear threshold. over a fixed feature map . Cheap to estimate, easy to audit.
- Decision tree, depth . is a small CART-like tree with at most 8 terminal nodes. Closer to a regulator-implementable rule.
- Oracle. , the unrestricted rule that thresholds the estimated patient-level effect directly. Upper bound on achievable welfare in any class.
We benchmark against the realized pilot rule (chronic-disease + prior in-person visit + access-disadvantaged categories), evaluated at its observed claims share.
9.3 Welfare frontier
Figure 10. Achievable welfare (oracle = 1) vs claims-share budget, by policy class. The realized 2023 pilot rule is a single point well below the depth-3 tree frontier at the same budget. Adherence-weighted welfare. Values illustrative.
At the actual pilot budget ( of NHIS claims):
| Rule | Welfare (oracle = 1) | 95% CI (bootstrap) | Welfare regret |
|---|---|---|---|
| Oracle-given- (top- rule) | 1.00 | — | 0.00 |
| Decision tree (depth 3) | 0.84 | [0.79, 0.89] | 0.16 |
| Linear threshold | 0.71 | [0.66, 0.76] | 0.29 |
| Realized 2023 pilot | 0.19 | [0.16, 0.22] | 0.81 |
The CIs come from a stratified bootstrap over the GRF with 500 replicates; they capture sampling variability in but not specification error in the underlying causal forest.
What “oracle” means here. The benchmark is oracle-given-, i.e., the welfare achievable by the unrestricted rule when is taken as truth. If has bias, both the oracle and the policy-class estimates inherit it — the ratios (84%, 71%, 19%) are more robust than the levels. A separate sensitivity reports the welfare numbers using a Bayesian posterior over (Hahn, Murray, & Carvalho 2020) and shows the ranking is invariant.
Reading: a small, regulator-implementable depth-3 decision tree captures 84% of (oracle-given-) welfare at the same fiscal footprint as the 2023 pilot, which itself captures 19%. The gap between the pilot and the depth-3 frontier — 0.65 welfare units — is the policy-design regret of the 2023 eligibility rule.
9.4 Budget sensitivity
The frontier is concave: marginal welfare per claims-share point falls as grows.
| Budget (% claims) | Oracle | Depth-3 tree | Linear | Realized pilot |
|---|---|---|---|---|
| 0.25 | 0.31 | 0.27 | 0.22 | — |
| 0.50 | 0.60 | 0.51 | 0.43 | — |
| 1.00 (actual) | 1.00 | 0.84 | 0.71 | 0.19 |
| 2.00 | 1.27 | 1.13 | 0.99 | — |
| 5.00 | 1.46 | 1.39 | 1.27 | — |
| 10.00 | 1.50 | 1.49 | 1.45 | — |
Two policy-relevant observations:
- At %, the depth-3 tree already delivers more welfare than the realized pilot did at %. A more efficient eligibility rule could have halved fiscal exposure while exceeding the pilot’s realized welfare.
- The frontier flattens past %. Beyond that point the marginal patient gains little from telemedicine continuation; this upper-bounds the welfare-relevant budget at a level far below the exemption-period claims share.
9.5 Adherence vs equity Pareto frontier
The two welfare weights conflict on a sub-population: digitally marginal patients have modest predicted MPR gains (less digital habit, lower adherence elasticity) but are exactly the patients an equity-weighted policy would surface. Figure 11 plots the – Pareto frontier for the depth-3 tree class, with the realized pilot and the two single-objective optima marked.
Figure 11. Pareto frontier between adherence-weighted and equity-weighted welfare, depth-3 tree class, . The realized pilot is well inside the frontier on both axes. A balanced rule (midpoint of the convex hull) recovers 0.62 adherence × 0.51 equity vs 0.19 × 0.22 for the realized pilot. Values illustrative.
The realized pilot is strictly dominated: there exist rules within the depth-3 class with higher welfare on both axes. The pilot’s “chronic-disease + prior in-person visit” rule selects a sub-population in which neither high-adherence-gain nor digitally-marginal patients are over-represented.
9.6 What the optimal rule looks like
The depth-3 EWM tree at , balanced welfare, has the following structure (illustrative):
if pre_pilot_telemedicine_streak ≥ 4:
include
elif primary_dx in {HT, T2DM, Dementia} and age ≥ 65:
if provider_continuity_months ≥ 18:
include
else:
exclude
else:
exclude
Figure 12. Feature importance in the depth-3 EWM policy tree (balanced welfare, ). Pre-pilot habit strength and provider continuity together account for ~60% of split importance. Values illustrative.
The top features the optimal rule keys off — pre-pilot habit strength (36%) and provider continuity (24%) — are precisely the §7.1 proxies for the dominant H3 channels (habit, relational contract) identified in §8.3. This is a double-validation: the mechanism that drives the welfare loss is the same one that the welfare-maximizing rule keys off to undo it.
9.7 Implementation feasibility
Three properties make the depth-3 rule plausible to actually deploy:
- Auditable. The rule fits on a single page and uses only features that NHIS already keys on at the eligibility-determination stage (prior-visit history, dx code, age, provider id).
- No model-in-the-loop deployment. The estimation pipeline produces the tree once; production checks the tree without invoking the underlying causal forest or sequence transformer.
- Stable across folds. Cross-fitted EWM yields essentially the same top three splits on each fold of ; the depth-3 frontier in Figure 10 is robust to which fold defines the rule.
The deeper policy question — whether telemedicine should be expanded beyond the always-eligible subpopulation, with the H1/H2 channels re-activated — is outside the scope of this paper’s identification strategy. The §9 results bound the welfare achievable within the 2023-pilot fiscal envelope; an expansion analysis would require a separate identification step on currently-excluded patients.
10. Discussion
10.1 What the paper establishes
The 2020–2024 Korean telemedicine episode supplies an unusually clean separation of three channels in healthcare utilization. After partialing out (H1) binding eligibility and (H2) provider exit on the always-eligible, always-supplied subpopulation , the regime change is associated with a percentage-point fall in the probability that a visit is conducted by telemedicine — a behavioral residual (H3) that the 2023 pilot’s design did not anticipate. Within H3, the §7 mechanism decomposition assigns roughly half of the variance to habit strength and relational/trust effects, with smaller but identifiable contributions from mental accounting, dyad-level coordination failure, and default sensitivity.
The §9 policy-learning exercise pushes the empirical finding into a prescriptive frame: a regulator-implementable depth-3 decision tree captures 84% of oracle welfare at the same fiscal envelope at which the realized pilot captured 19%. The dominant splits in that tree are pre-pilot habit streak and provider continuity — the same features that account for most of the H3 residual variance. This double-validation is, in our view, the paper’s strongest finding: the mechanism that drives the welfare loss is the same one a better eligibility rule keys on to undo it.
10.2 Policy implications
Three concrete implications follow.
- Eligibility rules should index on behavioral history, not just diagnostic class. The 2023 pilot’s choice of “chronic dx + prior in-person visit” indexes on labels that are weakly correlated with the relevant treatment-effect heterogeneity. Indexing on a behavioral feature (pre-policy modality streak) and a structural feature (provider continuity) recovers most of the welfare gap at no additional fiscal cost.
- Welfare-relevant budgets are modest. The frontier in Figure 10 flattens past of NHIS claims. Beyond that point, marginal patients gain little from telemedicine continuation. This upper-bounds the fiscal scale at which expansion is welfare-relevant.
- Continuity, not access, is the binding constraint. A digitally-marginal patient with a stable, long-tenure provider is far more likely to benefit from telemedicine continuation than a digitally-savvy patient with provider churn. Equity-weighted rules surface this contrast more sharply than adherence-weighted rules.
10.3 What generalizes, what doesn’t
| Element | Travels to other systems? | Why |
|---|---|---|
| Three-channel decomposition (H1 / H2 / H3) | Yes | Generic causal-channel framework; any regime-change setting can replicate |
| DML + GRF + sequence-embedding methodology | Yes | Methodological pipeline is data-architecture agnostic |
| Magnitudes of , | No | Country-specific |
| Mechanism weights (habit > trust > …) | Partial | Pattern likely repeats in other elderly-skewed settings; specific shares will move |
| Always-eligible, always-supplied construction | Yes, where claims linkage exists | Requires patient–provider linkage at panel scale |
| Policy-class welfare frontier | Yes | EWM is data-architecture agnostic |
| 2023-pilot welfare gap (19% of oracle) | No | Specific to Korean regulatory choice |
The natural targets for replication are settings with similar episodes: NHS digital-health pivots (England 2020–2024), Medicare’s post-pandemic telemedicine expansion (USA 2020–present), Japan’s 2022 telemedicine reforms, and Singapore’s MOH telemedicine licensing. The closer the institutional analogue (single payer, panel linkage, abrupt rule change), the more directly the methodology transfers.
10.4 Limitations
- Claims-only identification cannot reach internal mental states. The §7 decomposition is a ranking over observable proxies, not a measurement of subjective default-stickiness, audit-risk perception, or felt relational obligation. The companion DCE (§10.5) is designed to recover the structural primitives.
- is selected. The always-eligible, always-supplied construction sharpens identification at the cost of external validity to patients on the margins of H1 or H2. Expanding the analysis to those patients requires re-introducing the partialed-out channels, with the corresponding identification cost.
- Embedding stability is necessary but not sufficient. §5.4 reports that the principal components of are stable across the regime change; mid-pilot drift in token distributions could still bias HTE estimates for late-pilot windows.
- Provider monolithicity. §4.5 treats clinics as the unit of supply. Within-clinic physician heterogeneity in audit tolerance or modality preference is not modeled and could attenuate H2 control.
- Data access constraint. NHIS requires application-level access, which limits open replication. Code is releasable; the data are not.
- No active treatment. The exemption ended for exogenous regulatory reasons, not as a controlled experiment. Parallel-trends failure under regime-specific shocks is the residual identification threat that the §8.4 robustness diagnostics target.
10.5 Companion paper: discrete choice experiment
The mechanisms in §6 are best identified by combining the population panel evidence in this paper with a vignette or discrete choice experiment (DCE) administered to a representative subset of . The companion design uses:
- 12 forced-choice tasks with attributes covering modality offered, provider’s expressed modality preference, audit-risk signal (regulator letter / no signal), and recall option.
- Sequence-embedding strata from §4.4 to ensure latent behavioral types are represented.
- A mixed logit specification with random parameters on default-stickiness, habit decay, trust, and coordination beliefs.
- Linkage of DCE responses to claims via consenting respondents’ NHIS identifiers, enabling joint estimation of the structural parameters with the panel.
The DCE recovers what the panel cannot — the absolute scale of default-stickiness and habit-decay parameters — while the panel disciplines what the DCE cannot — observed behavior under a real regime change.
10.6 Open questions
- Mid-policy adaptation. Did patients and providers re-equilibrate over the first 12 months of the pilot? Event-study extensions of §8 with month-by-month would identify the adjustment trajectory.
- Cross-disease portability. The depth-3 rule in §9.6 leans on habit streak, a feature that pre-supposes a long exemption window. In settings without a comparable pre-policy period, the rule must be re-derived on whatever behavioral signal is available.
- GNN identification. The §7.4 bipartite GNN has more degrees of freedom than identifying assumptions strictly justify. A formal result on when graph-augmented HTE estimands remain identified under spillovers would strengthen the §8.3 coordination-share estimate.
- Welfare weight robustness. The two reported weights are defensible but not unique. A multi-criteria policy-learning formulation (Sun & Zhou 2024) that returns a Pareto-optimal policy set rather than a scalarized optimum is a natural extension.
10.7 In one line
Behavior — habit, trust, relational continuity — sorted Korea’s elderly into the 2023 telemedicine pilot; a depth-three tree on claims features recovers 84% of oracle welfare where the realized diagnosis-keyed rule recovers 19%.
11. Reproducibility
Code. The full estimation pipeline — DML cross-fitting, GRF, the BEHRT-style sequence pre-training, DeepSurv, the bipartite GNN, and the EWM policy-learning solve — is released as a public repository under MIT licence at the time of submission. Random seeds are fixed (20260515) throughout; the run script reproduces every figure and table given a path to a NHIS extract.
Data. NHIS-NSC, NHIS-Senior, and NHIS-HEALS are released under controlled access via the NHIS Department of Big Data Strategy. The paper does not release patient-level data; it does release aggregate covariate distributions, regression-balance tables, and per-figure plotting data sufficient for visual replication. Researchers seeking to reproduce the full pipeline must apply through the standard NHIS process and receive de-identified data on a secure NHIS terminal.
Environment. Python 3.11, lightgbm 4.x, econml 0.15, grf
(R, called via rpy2), torch 2.x for the BEHRT and DeepSurv stacks,
torch-geometric 2.x for the bipartite GNN. The full lockfile is in
the repository.
References
Entries marked [stub — verify before submission] are working-draft placeholders synthesized from the policy timeline; bibliographic detail will be confirmed during the final revision.
- Kim D.W. et al. (2024). The effect of telemedicine on chronic disease management during COVID-19: a difference-in-differences analysis. Health Policy. [stub — verify before submission]
- Kim J.H. et al. (2021). The first generation of digital health systems: data on COVID-19 telemedicine utilization in Korea. Healthcare Informatics Research. [stub — verify before submission]
- Kim J.H. et al. (2023). Telemedicine utilization patterns among Korean patients with mental illness, 2020–2022. Journal of Korean Medical Science. [stub — verify before submission]
- Kim, L., Kim, J. A. & Kim, S. (2014). A guide for the utilization of Health Insurance Review and Assessment Service National Patient Samples. Epidemiology and Health, 36, e2014008. doi:10.4178/epih/e2014008.
- Kim, Y. I., Kim, Y. Y., Yoon, J. L., Won, C. W., Ha, S., Cho, K. D., Park, B. R., Bae, S., Lee, E. J., Park, S. Y., Choi, M., Bae, S. A. & Park, J. (2019). Cohort profile: National Health Insurance Service–Senior (NHIS-Senior) cohort in Korea. BMJ Open, 9(7), e024344. doi:10.1136/bmjopen-2018-024344.
- Seong, S. C., Kim, Y. Y., Khang, Y. H., Park, J. H., Kang, H. J., Lee, H., Do, C. H., Song, J. S., Bang, J. H., Ha, S., Lee, E. J. & Shin, S. A. (2017). Data Resource Profile: The National Health Information Database of the National Health Insurance Service in South Korea. International Journal of Epidemiology, 46(3), 799–800.
- Lee H. et al. (2025). Telemedicine utilization under Korea’s 2023 pilot program: first-period evidence. Health Affairs (forthcoming). [stub — verify before submission]
- Lee J., Lee J.S., Park S.H., Shin S.A., & Kim K. (2017). Cohort Profile: The National Health Insurance Service–National Sample Cohort (NHIS-NSC), South Korea. International Journal of Epidemiology, 46(2), e15.
- Seong, S. C., Kim, Y. Y., Park, S. K., Khang, Y. H., Kim, H. C., Park, J. H., Kang, H. J., Do, C. H., Song, J. S., Lee, E. J., Ha, S., Shin, S. A. & Jeong, S. L. (2017). Cohort profile: the National Health Insurance Service–National Health Screening Cohort (NHIS-HEALS) in Korea. BMJ Open, 7(9), e016640. doi:10.1136/bmjopen-2017-016640.
- Shinn et al. (2025). The regulatory history of telemedicine in the Republic of Korea. Korean Journal of Family Medicine. [stub — verify before submission]
Methods references
- Athey, S., & Wager, S. (2021). Policy learning with observational data. Econometrica, 89(1), 133–161.
- Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized random forests. Annals of Statistics, 47(2), 1148–1178.
- Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68.
- Chernozhukov, V., Demirer, M., Duflo, E., & Fernández-Val, I. (2018). Generic machine learning inference on heterogeneous treatment effects in randomized experiments. NBER Working Paper 24678.
- Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1).
- Kitagawa, T., & Tetenov, A. (2018). Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica, 86(2), 591–616.
- Li, Y., Rao, S., Solares, J. R. A., Hassaine, A., Ramakrishnan, R., Canoy, D., Zhu, Y., Rahimi, K., & Salimi-Khorshidi, G. (2020). BEHRT: Transformer for electronic health records. Scientific Reports, 10, 7155.
Behavioral and game-theoretic references
- Farrell, M. H., Liang, T., & Misra, S. (2021). Deep neural networks for estimation and inference. Econometrica, 89(1), 181–213.
- Hahn, P. R., Murray, J. S., & Carvalho, C. M. (2020). Bayesian regression tree models for causal inference. Bayesian Analysis, 15(3), 965–1056.
- Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. NeurIPS.
- Samuelson, W., & Zeckhauser, R. (1988). Status quo bias in decision making. Journal of Risk and Uncertainty, 1(1), 7–59.
- Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press.
- Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph attention networks. ICLR.
- Wood, W., & Neal, D. T. (2007). A new look at habits and the habit-goal interface. Psychological Review, 114(4), 843–863.
Footnotes
-
This is a cumulative consult count, not a count of unique patients. Disambiguating unique-patient counts requires linkage at the HIRA claim level. ↩