| Title: | Adaptive Influence-Based Borrowing for Hybrid Control Trials |
|---|---|
| Description: | Implements the adaptive influence-based borrowing framework proposed by Qinwei Yang, Jingyi Li, Peng Wu, and Shu Yang (2026+) in the paper ``Improving Treatment Effect Estimation in Trials through Adaptive Borrowing of External Controls" <doi:10.48550/arXiv.2604.13973> for augmenting Randomized Controlled Trials (RCTs) with External Control (EC) data. This package provides a comprehensive workflow to: (1) quantify the comparability of external control samples using influence scores approximated via the influence function of the M-estimator; (2) construct candidate borrowing subsets and select the optimal subset that minimizes the Mean Squared Error (MSE); and (3) calibrate systematic differences in external outcomes using R-learner methods implemented via Ordinary Least Squares or Kernel Ridge Regression. |
| Authors: | Jile Chaoge [aut, cre], Peng Wu [aut], Shu Yang [aut] |
| Maintainer: | Jile Chaoge <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-05-24 08:29:43 UTC |
| Source: | https://github.com/cran/InfluenceBorrowing |
This function quantifies the comparability of external control samples by assessing how much each individual external sample perturbs the outcome model fitted on the RCT control data. A smaller influence score indicates that the external sample is more compatible with the RCT controls.
compute_influences(model, testdata = NULL, type = "observed")compute_influences(model, testdata = NULL, type = "observed")
model |
A fitted |
testdata |
A |
type |
Character string specifying the type of Hessian matrix:
|
The influence score is approximated using the influence function of the M-estimator. It measures the standardized change in model parameters if a specific external sample were added to the training set, without the computational cost of refitting.
Vector of influence scores corresponding to each row in testdata.
This function estimates the ATE using only the provided RCT data. It calculates two estimators: the direct estimator and the AIPW estimator.
estimate_rct( X, A, Y, trim = 0.01, outcome_family = gaussian(), ps_hat = NULL, mu0_hat = NULL, mu1_hat = NULL )estimate_rct( X, A, Y, trim = 0.01, outcome_family = gaussian(), ps_hat = NULL, mu0_hat = NULL, mu1_hat = NULL )
X |
Covariate matrix. |
A |
Treatment assignment vector (binary: 0 or 1). |
Y |
Outcome vector. |
trim |
Numeric value for trimming propensity scores (default 0.01). |
outcome_family |
GLM family for outcome models (default gaussian()). |
ps_hat |
Optional pre-calculated propensity scores. |
mu0_hat |
Optional pre-calculated outcome predictions E[Y|X,A=0]. |
mu1_hat |
Optional pre-calculated outcome predictions E[Y|X,A=1]. |
A list containing:
estimate: A named vector containing point estimates for Direct and AIPW methods.
se: A named vector containing standard errors for both estimators.
psi: Vector of influence function values for the AIPW estimator.
mu0_hat: Fitted values for the control outcome model.
mu1_hat: Fitted values for the treated outcome model.
ps_hat: Fitted propensity scores.
n <- 200 X <- runif(n, 0, 2) A <- rbinom(n, size = 1, prob = 1/2) Y1 <- 3 - 2*X + rnorm(n, sd = 0.2) Y0 <- 2*X + rnorm(n, sd = 0.2) Y <- (1 - A)*Y0 + A*Y1 result <- estimate_rct(X, A, Y) print(result$estimate) print(result$se)n <- 200 X <- runif(n, 0, 2) A <- rbinom(n, size = 1, prob = 1/2) Y1 <- 3 - 2*X + rnorm(n, sd = 0.2) Y0 <- 2*X + rnorm(n, sd = 0.2) Y <- (1 - A)*Y0 + A*Y1 result <- estimate_rct(X, A, Y) print(result$estimate) print(result$se)
Estimate ATE for a Selected Data Subset (with GLM support)
estimate_selected( X, A, Y, reference_value = NULL, trim = 0.01, outcome_family = gaussian(), ps_hat = NULL, mu0_hat = NULL, mu1_hat = NULL )estimate_selected( X, A, Y, reference_value = NULL, trim = 0.01, outcome_family = gaussian(), ps_hat = NULL, mu0_hat = NULL, mu1_hat = NULL )
X |
Covariate matrix. |
A |
Treatment assignment vector (binary: 0 or 1). |
Y |
Outcome vector. |
reference_value |
A value representing the "true" treatment effect or a reference estimate used to calculate bias and MSE. If NULL, MSE is returned as NULL. |
trim |
Value for trimming propensity scores (default 0.01). |
outcome_family |
GLM family for outcome models (default gaussian()). |
ps_hat |
Optional vector of estimated propensity scores P(A=1|X). |
mu0_hat |
Optional vector of estimated conditional means E[Y|X,A=0]. |
mu1_hat |
Optional vector of estimated conditional means E[Y|X,A=1]. |
A list containing:
estimate: The AIPW point estimate.
se: The standard error of the estimated treatment effect.
psi: Vector of influence function values.
mse: The estimated Mean Squared Error (Variance + Bias^2), if reference_value is provided.
mu0_hat, mu1_hat, ps_hat: Fitted nuisance parameters.
# Generate RCT data n <- 100 X_rct <- runif(n, 0, 2) A_rct <- rbinom(n, size = 1, prob = 1/2) Y1_rct <- 3 - 2*X_rct + rnorm(n, sd = 0.2) Y0_rct <- 2*X_rct + rnorm(n, sd = 0.2) Y_rct <- (1 - A_rct)*Y0_rct + A_rct*Y1_rct # Generate EC data n <- 500 X_ec <- runif(n, 0, 2) A_ec <- rep(0, n) Y_ec <- rep(NA, n) Y_ec[1:200] <- 2*X_ec[1:200] + rnorm(200, sd = 0.2) Y_ec[201:n] <- 3*X_ec[201:n] + rnorm(n-200, sd = 0.2) # Selected EC data X_selected <- X_ec[1:200] A_selected <- A_ec[1:200] Y_selected <- Y_ec[1:200] result <- estimate_selected(X = c(X_rct, X_ec), A = c(A_rct, A_ec), Y = c(Y_rct, Y_ec)) print(result$estimate) print(result$se) result <- estimate_selected(X = c(X_rct, X_selected), A = c(A_rct, A_selected), Y = c(Y_rct, Y_selected)) print(result$estimate) print(result$se)# Generate RCT data n <- 100 X_rct <- runif(n, 0, 2) A_rct <- rbinom(n, size = 1, prob = 1/2) Y1_rct <- 3 - 2*X_rct + rnorm(n, sd = 0.2) Y0_rct <- 2*X_rct + rnorm(n, sd = 0.2) Y_rct <- (1 - A_rct)*Y0_rct + A_rct*Y1_rct # Generate EC data n <- 500 X_ec <- runif(n, 0, 2) A_ec <- rep(0, n) Y_ec <- rep(NA, n) Y_ec[1:200] <- 2*X_ec[1:200] + rnorm(200, sd = 0.2) Y_ec[201:n] <- 3*X_ec[201:n] + rnorm(n-200, sd = 0.2) # Selected EC data X_selected <- X_ec[1:200] A_selected <- A_ec[1:200] Y_selected <- Y_ec[1:200] result <- estimate_selected(X = c(X_rct, X_ec), A = c(A_rct, A_ec), Y = c(Y_rct, Y_ec)) print(result$estimate) print(result$se) result <- estimate_selected(X = c(X_rct, X_selected), A = c(A_rct, A_selected), Y = c(Y_rct, Y_selected)) print(result$estimate) print(result$se)
This function iterates through a sequence of candidate subset sizes (k), selecting the top-k external controls with the smallest influence scores. For each k, it estimates the treatment effect and calculates the MSE relative to a provided reference value. It returns the results for all k and identifies the optimal k that minimizes MSE.
find_optimal_k( dat_rct, dat_ec, influences, reference_value, trim = 0.01, k_vector = NULL, outcome_family = gaussian() )find_optimal_k( dat_rct, dat_ec, influences, reference_value, trim = 0.01, k_vector = NULL, outcome_family = gaussian() )
dat_rct |
A |
dat_ec |
A |
influences |
Vector of influence scores for the external controls. The length must match the number of rows in |
reference_value |
A value representing the "true" treatment effect or a high-quality reference estimate used to calculate bias and MSE. |
trim |
Value for trimming propensity scores (default 0.01). |
k_vector |
Optional integer vector specifying the candidate numbers of external controls to borrow. If NULL, a default sequence is generated (from 0 to N_ec, step 50). |
outcome_family |
GLM family for outcome models (default gaussian()). |
Data Structure Requirements:
The input data frames (dat_rct and dat_ec) must follow this column order:
Covariates (X): The first ncol-2 columns are covariates.
Treatment (A): Must contain a column named A (binary 0/1).
Outcome (Y): Must contain a column named Y.
The code automatically identifies covariates as the first ncol-2 columns.
Therefore, please ensure A and Y are placed at the very end of the data frame (e.g., columns order: X1, X2, ..., Xp, A, Y).
A list containing:
mse_k: A data.frame summarizing the estimate, bias, variance, and MSE for each candidate k.
mse_optimal: The single row from mse_k corresponding to the minimum MSE.
This function generates synthetic data for a RCT and an EC arm. It is designed to demonstrate the adaptive borrowing framework, creating a scenario where the external controls have a different outcome mechanism (bias) compared to the RCT controls, along with some outliers.
gen_demo_data(n_rct = 100, n_ec = 200, seed = 123)gen_demo_data(n_rct = 100, n_ec = 200, seed = 123)
n_rct |
Integer. Sample size of the randomized controlled trial (default 100). |
n_ec |
Integer. Sample size of the external controls (default 200). |
seed |
Integer. Random seed for reproducibility (default 123). |
Output Format:
The returned data frames are formatted to be directly compatible with find_optimal_k:
Column Order: Covariates (X) are in the first columns, followed by Treatment (A) and Outcome (Y).
Bias Mechanism: External controls are generated with a shifted intercept and slope to simulate systematic bias.
True ATE: The true Average Treatment Effect is fixed at -1.
A list containing:
data_rct: A data.frame with columns X (covariate), A (treatment 0/1), and Y (outcome).
data_ec: A data.frame with columns X, A (always 0), and Y (outcome).
ATE_true: The true Average Treatment Effect (numeric, fixed at -1).
sim_data <- gen_demo_data(n_rct = 100, n_ec = 200) head(sim_data$data_rct) head(sim_data$data_ec)sim_data <- gen_demo_data(n_rct = 100, n_ec = 200) head(sim_data$data_rct) head(sim_data$data_ec)
Predict estimated treatment effects (tau) for new data using a trained rlearner_krls model.
## S3 method for class 'rlearner_krls' predict(object, newx = NULL, ...)## S3 method for class 'rlearner_krls' predict(object, newx = NULL, ...)
object |
An object of class |
newx |
Covariate matrix to make predictions on. If NULL, returns predictions on the training data. |
... |
Additional arguments (currently ignored). |
A vector of predicted treatment effects.
Get estimated tau(x) using the trained rlearner_lm model.
## S3 method for class 'rlearner_lm' predict(object, newx = NULL, ...)## S3 method for class 'rlearner_lm' predict(object, newx = NULL, ...)
object |
An object of class |
newx |
Covariate matrix to make predictions on. If NULL, returns predictions on the training data. |
... |
Additional arguments (currently ignored). |
Vector of predicted treatment effects.
n = 200; p = 5 set.seed(123) x = matrix(rnorm(n*p), n, p) r = rbinom(n, 1, 0.5) y = 0.5*x[,1] + 0.8*x[,2] + 1.2*r*x[,1] + rnorm(n, sd=0.5) rl_fit = rlearner_lm(x, r, y) new_data = matrix(rnorm(10*5), 10, 5) predictions = predict(rl_fit, new_data)n = 200; p = 5 set.seed(123) x = matrix(rnorm(n*p), n, p) r = rbinom(n, 1, 0.5) y = 0.5*x[,1] + 0.8*x[,2] + 1.2*r*x[,1] + rnorm(n, sd=0.5) rl_fit = rlearner_lm(x, r, y) new_data = matrix(rnorm(10*5), 10, 5) predictions = predict(rl_fit, new_data)
Implements the R-learner (Nie and Wager, 2017) using kernel ridge regression (via the KRLS package) for nuisance parameter estimation and the final treatment effect model.
rlearner_krls(x, r, y, whichkernel = "gaussian", pi_hat = NULL, m_hat = NULL)rlearner_krls(x, r, y, whichkernel = "gaussian", pi_hat = NULL, m_hat = NULL)
x |
Covariate matrix. |
r |
Treatment assignment vector (binary: 0 or 1). |
y |
Outcome vector. |
whichkernel |
Character string specifying the kernel type (default "gaussian"). Passed to |
pi_hat |
Optional vector of estimated propensity scores E[R|X]. If NULL, estimated using KRLS. |
m_hat |
Optional vector of estimated conditional means E[Y|X]. If NULL, estimated using KRLS. |
An object of class rlearner_krls containing the fitted models and estimates.
R-learner, as proposed by Nie and Wager (2017), implemented via standard linear regression (lm). It uses linear models (or logistic regression for propensity scores) to estimate nuisance parameters and the final treatment effect model.
rlearner_lm(x, r, y, pi_hat = NULL, m_hat = NULL)rlearner_lm(x, r, y, pi_hat = NULL, m_hat = NULL)
x |
Covariate matrix. |
r |
Treatment assignment vector (binary: 0 or 1). |
y |
Outcome vector. |
pi_hat |
Optional vector of estimated propensity scores E[R|X]. If NULL, estimated using logistic regression. |
m_hat |
Optional vector of estimated conditional means E[Y|X]. If NULL, estimated using linear regression. |
An object of class rlearner_lm containing the fitted models and estimates.
n = 200; p = 5 set.seed(123) x = matrix(rnorm(n*p), n, p) r = rbinom(n, 1, 0.5) y = 0.5*x[,1] + 0.8*x[,2] + 1.2*r*x[,1] + rnorm(n, sd=0.5) rl_fit = rlearner_lm(x, r, y) rl_est = predict(rl_fit, x)n = 200; p = 5 set.seed(123) x = matrix(rnorm(n*p), n, p) r = rbinom(n, 1, 0.5) y = 0.5*x[,1] + 0.8*x[,2] + 1.2*r*x[,1] + rnorm(n, sd=0.5) rl_fit = rlearner_lm(x, r, y) rl_est = predict(rl_fit, x)