Research Article

A Note on the Logistic Regression Model with a Random Coefficient to Predict Propensity Scores

ABSTRACT

In observational studies, marginal structural models are often used to adjust for confounding. When predicting propensity scores, some investigators may want to apply a logistic regression model with a random coefficient to take account of residual confounding. Here, we show that the random coefficient can be interpreted as the logarithm of the confounding risk ratio;

*i.e.*, the ratio of crude risk ratio to causal risk ratio. Three target populations (the exposed, unexposed, and total groups) are discussed.KEYWORDS

Confounding risk ratio; Inverse-probability-weighting; Marginal structural model; Potential outcome

INTRODUCTION

Confounding is widely recognized as one of the principal problems faced by investigators conducting observational studies. In an analysis, some investigators may want to take account of residual confounding. In the situations, a random coefficient regression model (mixed effect model with random intercept) may be applied. The use of such a model has been discussed in the context of ordinal regression analysis [1-3]. However, this discussion has not been conducted in the context of marginal structural models (MSMs) [4,5], in which a logistic regression model is often used to predict propensity scores [6,7].

Here, we give an interpretation of a random coefficient when a logistic regression model with a random coefficient is used for predicting propensity scores. We discuss MSMs under the three target populations: the total, exposed, and unexposed groups.

MATERIALS AND METHODS

We use

*X*as an exposure indicator and assume the now-standard deterministic potential outcome model [8], in which*Y*_{X}_{=1}and*Y*_{X}_{=0}are the potential outcome indicators under*X*= 1 and*X*= 0, respectively. The potential risks Pr(*Y*_{X}_{=1 }= 1) and Pr(*Y*_{X}_{=0 }= 1) are then the expectation of*Y*if everyone in the study population had been exposed and that if everyone had been not exposed, respectively. Causal effects with the total group as the target population are contrasts between these two risks. Those with*X*=*x*as the target population are contrasts between Pr(*Y*_{X}_{=1 }= 1 |*X*=*x*) and Pr(*Y*_{X}_{=0 }= 1 |*X*=*x*). Let

*i*= 1, …,*n*denote a subject and*z*denote a vector of measured confounders. The propensity score Pr(_{i}*X*= 1 |*Z*=*z*) is then predicted using a logistic regression model:_{i}
$\mathrm{Pr}(X=1/Z={z}_{i})=\frac{\mathrm{exp}({\theta}^{\prime}{z}_{i})}{1+\mathrm{exp}({\theta}^{\prime}{z}_{i})},\text{(1)}$

where

*θ*is a vector of the regression coefficient. When residual confounding exists, however, Equation (1) derives the biased propensity scores. As a result, the MSM will derive biased estimates of causal effects. In the next section, we give an interpretation of a random coefficient when it is included in Equation (1).RESULTS AND DISCUSSION

Unexposed group as the target population

To take account of residual confounding, we assume that the propensity score

*p*_{0i}is explained by the following logistic regression model with a random coefficient:
${p}_{0i}=\frac{\mathrm{exp}\{\mathrm{log}({\alpha}_{i})+{\theta}^{\prime}{z}_{i}\}}{1+\mathrm{exp}\{\mathrm{log}({\alpha}_{i})+{\theta}^{\prime}{z}_{i}\}},$

where log (

*α*) is a random coefficient. Using Equation (1),_{i}*p*_{0i}can be expressed as:
${p}_{0i}=\frac{\mathrm{exp}({\theta}^{\prime}{z}_{i})}{1/{\alpha}_{i}+\mathrm{exp}({\theta}^{\prime}{z}_{i})}=\frac{\mathrm{Pr}(X=1|Z={z}_{i})}{\mathrm{Pr}(X=0|Z={z}_{i})/{\alpha}_{i}+\mathrm{Pr}(X=1|Z={z}_{i})}\text{(2)}$

Using the inverse-probability-weighting (IPW) method, Pr(

*Y*_{X}_{=x }= 1|*X*= 0) is estimated as follows:
$\mathrm{Pr}({Y}_{X=1}=1|X=0)=\frac{1}{{n}_{0}}{\displaystyle \sum _{i=1}^{n}\frac{1-{p}_{0i}}{{p}_{0i}}{y}_{i}{x}_{i}},\text{(3)}$

$\mathrm{Pr}({Y}_{X=0}=1|X=0)=\frac{1}{{n}_{0}}{\displaystyle \sum _{i=1}^{n}{y}_{i}(1-{x}_{i})},$

where

*n*_{0 }=*n*Pr(*X*= 0) [9,10]. In the framework of MSMs, the causal risk difference (RD) is estimated using a weighted linear regression analysis of*X*on*Y*with the weights (1 –*p*_{0i}) /*p*_{0i}for exposed subjects and 1 for unexposed subjects. The causal risk ratio (RR) is estimated using the weighted Poisson regression analysis of*X*on*Y*with the same weights. By substituting Equation (2) into Equation (3) and replacing

*y*/_{i}x_{i}*n*with Pr(*Y*= 1,*X*= 1,*Z*=*z*) in the summation, the following equation is derived:_{i}
$\mathrm{Pr}({Y}_{X=1}=1|X=0)=\frac{n}{{n}_{0}}{\displaystyle \sum _{i=1}^{n}\frac{\mathrm{Pr}(X=0|Z={z}_{i})/{\alpha}_{i}}{\mathrm{Pr}(X=1|Z={z}_{i})}\frac{{y}_{i}{x}_{i}}{n}}$

$=\frac{1}{\mathrm{Pr}(X=0)}{\displaystyle \sum _{i=1}^{n}\frac{\mathrm{Pr}(X=0|Z={z}_{i})}{{\alpha}_{i}\mathrm{Pr}(X=1|Z={z}_{i})}}$

$={\displaystyle \sum _{i=1}^{n}\frac{\mathrm{Pr}(Y=1|X=1,Z={z}_{i})}{{\alpha}_{i}}\mathrm{Pr}(Z={z}_{i}|X=0)}$

The left- and right-hand sides of this equation are equal when:

$\mathrm{Pr}(Y=1|X=1,Z={z}_{i})/{\alpha}_{i}=\mathrm{Pr}({Y}_{X=1}=1|X=0,Z={z}_{i}),$

$\text{because}\mathrm{Pr}({Y}_{X=1}=1|X=0)={\displaystyle {\sum}_{i=1}^{n}\mathrm{Pr}({Y}_{X=1}=1}|X=0,Z={z}_{i})\mathrm{Pr}(Z={z}_{i}|X=0).\text{}\text{Therefore}:$

${\alpha}_{i}=\frac{\mathrm{Pr}({Y}_{X=1}=1|X=1,Z={z}_{i})}{\mathrm{Pr}({Y}_{X=1}=1|X=0,Z={z}_{i})}=\frac{\mathrm{Pr}(Y=1|X=1,Z={z}_{i})}{\mathrm{Pr}(Y=1|X=0,Z={z}_{i})}/\frac{\mathrm{Pr}({Y}_{X=1}=1|X=0,Z={z}_{i})}{\mathrm{Pr}({Y}_{X=0}=1|X=0,Z={z}_{i})}.$

This

*α*is the confounding risk ratio (CRR) with the unexposed group as the target population [11], which is the ratio of crude RR to causal RR, for an individual with_{i}*Z*=*z*._{i} Exposed group as the target population

In the case of the exposed group as the target population, we can make an argument similar to the above subsection. We assume that the propensity score

*p*_{1i}is explained by the following logistic regression model with a random coefficient:
${p}_{1i}=\frac{\mathrm{exp}\{\mathrm{log}({\beta}_{i})+{\theta}^{\prime}{z}_{i}\}}{1+\mathrm{exp}\{\mathrm{log}({\beta}_{i})+{\theta}^{\prime}{z}_{i}\}}$

where log(

*β*) is a random coefficient. Then, by the IPW method, Pr(_{i}*Y*_{X}_{=x }= 1|*X*= 1) is estimated as:
$\mathrm{Pr}({Y}_{X=1}=1|X=1)=\frac{1}{{n}_{1}}{\displaystyle \sum _{i=1}^{n}{y}_{i}{x}_{i}},\mathrm{Pr}({Y}_{X=0}=1|X=1)=\frac{1}{{n}_{1}}{\displaystyle \sum _{i=1}^{n}\frac{{p}_{1i}}{1-{p}_{1i}}{y}_{i}(1-{x}_{i})},$

where

*n*_{1}=*n*Pr(*X*= 1) [9,10]. In the framework of MSMs, the causal effects are estimated using the weighted regression analyses of*X*on*Y*with the weights 1 for exposed subjects and*p*_{1i}/ (1 –*p*_{1i}) for unexposed subjects. Algebra similar to the above subsection yields:
$\mathrm{Pr}({Y}_{X=0}=1|X=1)={\displaystyle \sum _{i=1}^{n}{\beta}_{i}\mathrm{Pr}(Y=1|X=0,Z={z}_{i})\mathrm{Pr}(Z={z}_{i}|X=1)}.$

Because
$\mathrm{Pr}({Y}_{X=0}=1|X=1)={\displaystyle {\sum}_{i=1}^{n}\mathrm{Pr}({Y}_{X=0}=1}|X=1,Z={z}_{i})\mathrm{Pr}(Z={z}_{i}|X=1)$

*β*can be expressed as:_{i}
${\beta}_{i}=\frac{\mathrm{Pr}({Y}_{X=0}=1|X=1,Z={z}_{i})}{\mathrm{Pr}({Y}_{X=0}=1|X=0,Z={z}_{i})}=\frac{\mathrm{Pr}(Y=1|X=1,Z={z}_{i})}{\mathrm{Pr}(Y=1|X=0,Z={z}_{i})}/\frac{\mathrm{Pr}({Y}_{X=1}=1|X=1,Z={z}_{i})}{\mathrm{Pr}({Y}_{X=0}=1|X=1,Z={z}_{i})}.$

This

*β*is the CRR with the exposed group as the target population [11], for an individual with_{i}*Z*=*z*._{i}Total group as the target population

We assume that the propensity score

*p*is explained by the following logistic regression model with a random coefficient:_{i}
${p}_{i}=\frac{\mathrm{exp}\{\mathrm{log}({\gamma}_{i})+{\theta}^{\prime}{z}_{i}\}}{1+\mathrm{exp}\{\mathrm{log}({\gamma}_{i})+{\theta}^{\prime}{z}_{i}\}}$

where log(

*γ*) is a random coefficient. Then, by the IPW method, Pr(_{i}*Y*_{X}_{=x }= 1) is estimated as:
$\mathrm{Pr}({Y}_{X=1}=1)=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}\frac{{y}_{i}{x}_{i}}{{p}_{i}}},\text{(4)}$

$\mathrm{Pr}({Y}_{X=0}=1)=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}\frac{{y}_{i}(1-{x}_{i})}{1-{p}_{i}}}.\text{(5)}$

In the framework of MSMs, the causal effects are estimated using the weighted regression models of

*X*on*Y*with the weights 1 /*p*for the exposed subjects and 1 / (1 –_{i}*p*) for the unexposed subjects._{i} By a calculation similar to those in the above subsections, Equation (4) can be expressed as:

$\begin{array}{l}\mathrm{Pr}({Y}_{X=1}=1)=\\ {\displaystyle \sum _{i=1}^{n}\left\{\frac{\mathrm{Pr}(X=0|Z={z}_{i})}{{\gamma}_{i}}+\mathrm{Pr}(X=1|Z={z}_{i})\right\}\mathrm{Pr}(Y=1|X=1,Z={z}_{i})\mathrm{Pr}(Z={z}_{i})}\end{array}$

Because
$\mathrm{Pr}({Y}_{X=1}=1)={\displaystyle {\sum}_{i=1}^{n}\mathrm{Pr}({Y}_{X=1}=1|Z={z}_{i})\mathrm{Pr}(Z={z}_{i})},$
the left- and right-hand sides of this equation are equal when:

$\mathrm{Pr}({Y}_{X=1}=1|Z={z}_{i})=\left\{\frac{\mathrm{Pr}(X=0|Z={z}_{i})}{{\gamma}_{i}}+\mathrm{Pr}(X=1|Z={z}_{i})\right\}\mathrm{Pr}(Y=1|X=1,Z={z}_{i}),$

which derives:
${\gamma}_{i}={\alpha}_{i}=\frac{\mathrm{Pr}({Y}_{X=1}=1|X=1,Z={z}_{i})}{\mathrm{Pr}({Y}_{X=1}=1|X=0,Z={z}_{i})}.$

Likewise, Equation (5) derives:
${\gamma}_{i}={\beta}_{i}=\frac{\mathrm{Pr}({Y}_{X=0}=1|X=1,Z={z}_{i})}{\mathrm{Pr}({Y}_{X=0}=1|X=0,Z={z}_{i})},$

because Equation (5) can be expressed as:

$\mathrm{Pr}({Y}_{X=0}=1)={\displaystyle \sum _{i=1}^{n}\begin{array}{l}\text{{}\mathrm{Pr}(X=0|Z={z}_{i})+{\gamma}_{i}\mathrm{Pr}(X=1|Z={z}_{i})\text{}}\\ \text{Pr}(Y=1|X=0,Z={z}_{i})\mathrm{Pr}(Z={z}_{i})\end{array}}.$

This observation shows that

*γ*cannot be interpreted until_{i}*α*=_{i}*β*holds (_{i}*i.e.*, the two CRRs with the exposed and unexposed groups as the target population are equal). When*α*=_{i }*β*holds,_{i}*γ*can be interpreted as the CRR with the total group as the target population, because this CRR is expressed as:_{i}
$\frac{\frac{\mathrm{Pr}(Y=1|X=1,Z={z}_{i})}{\mathrm{Pr}(Y=1|X=0,Z={z}_{i})}}{\frac{\mathrm{Pr}({Y}_{X=1}=1|Z={z}_{i})}{\mathrm{Pr}({Y}_{X=0}=1|Z={z}_{i})}}=\frac{\mathrm{Pr}(X=0|Z={z}_{i})+{\beta}_{i}\mathrm{Pr}(X=1|Z={z}_{i})}{\frac{\mathrm{Pr}(X=0|Z={z}_{i})}{{\alpha}_{i}}+\mathrm{Pr}(X=1|Z={z}_{i})}\text{(6)}$

and then

*γ*is equal to the CRR when_{i}*α*=_{i }*β*(=_{i}*γ*). The derivation of Equation (6) is given in the Appendix._{i} Conclusion

Based on the formulas of the random coefficient model and the IPW approach, we have given an interpretation of a random coefficient when logistic regression with a random coefficient is used for predicting propensity scores. In conclusion, when the exposed or unexposed group is the target population, the random coefficient can be interpreted as the logarithm of CRR with its group as the target population. When the total group is the target population, however, the random coefficient cannot be interpreted in a straightforward manner. The random coefficient can be interpreted as the CRR only when the two CRRs with the exposed and unexposed groups as the target population are equal.

Although we have given an interpretation of a random coefficient in a logistic regression model, we have not discussed the predicted values of propensity scores or the estimates of causal effects themselves. We will need to research their characteristics through, for example, simulation studies.

Appendix: Derivation of Equation (6)

In the stratum with

*Z*=*z*, we let RR_{i}_{Ci}= Pr(*Y*= 1 |*X*= 1,*Z*=*z*) / Pr(_{i}*Y*= 1 |*X*= 0,*Z*=*z*) denote the crude RR, RR_{i}_{Ei}= Pr(*Y*_{X}_{=1}= 1 |*X*= 1,*Z*=*z*) / Pr(_{i}*Y*_{X}_{=0}= 1 |*X*= 1,*Z*=*z*) denote the causal RR with the exposed group as the target population, and RR_{i}_{Ui}= Pr(*Y*_{X}_{=1}= 1 |*X*= 0,*Z*=*z*) / Pr(_{i}*Y*_{X}_{=0}= 1 |*X*= 0,*Z*=*z*) denote the causal RR with the unexposed group as the target population. Then, the CRR with the total group as the target population can be expressed as:_{i}
$\frac{\mathrm{Pr}(Y=1|X=1,Z={z}_{i})}{\mathrm{Pr}(Y=1|X=0,Z={z}_{i})}/\frac{\mathrm{Pr}({Y}_{X=1}=1|Z={z}_{i})}{\mathrm{Pr}({Y}_{X=0}=1|Z={z}_{i})}=\frac{{\displaystyle \sum _{x=0}^{1}\mathrm{Pr}({Y}_{X=0}=1|X=x,Z={z}_{i})\mathrm{Pr}(X=x|Z={z}_{i})}}{{\displaystyle \sum _{x=0}^{1}\mathrm{Pr}({Y}_{X=1}=1|X=x,Z={z}_{i})\mathrm{Pr}(X=x|Z={z}_{i})}}{\text{RR}}_{\text{C}i}$

$=\frac{\mathrm{Pr}({Y}_{X=1}=1|X=1,Z={z}_{i}){\displaystyle \sum _{x=0}^{1}\frac{\mathrm{Pr}({Y}_{X=0}=1|X=x,Z={z}_{i})}{\mathrm{Pr}({Y}_{X=1}=1|X=1,Z={z}_{i})}\mathrm{Pr}(X=x|Z={z}_{i})}}{\mathrm{Pr}({Y}_{X=0}=1|X=0,Z={z}_{i}){\displaystyle \sum _{x=0}^{1}\frac{\mathrm{Pr}({Y}_{X=1}=1|X=x,Z={z}_{i})}{\mathrm{Pr}({Y}_{X=0}=1|X=0,Z={z}_{i})}\mathrm{Pr}(X=x|Z={z}_{i})}}{\text{RR}}_{\text{C}i}$

$=\frac{\left\{\frac{\mathrm{Pr}(X=0|Z={z}_{i})}{{\text{RR}}_{\text{C}i}}+\frac{\mathrm{Pr}(X=1|Z={z}_{i})}{{\text{RR}}_{\text{E}i}}\right\}{\text{RR}}_{\text{C}i}}{\frac{\mathrm{Pr}(X=0|Z={z}_{i}){\text{RR}}_{\text{U}i}+\mathrm{Pr}(X=1|Z={z}_{i}){\text{RR}}_{\text{C}i}}{{\text{RR}}_{\text{C}i}}}=\frac{\mathrm{Pr}(X=0|Z={z}_{i})+{\beta}_{i}\mathrm{Pr}(X=1|Z={z}_{i})}{\frac{\mathrm{Pr}(X=0|Z={z}_{i})}{{\alpha}_{i}}+\mathrm{Pr}(X=1|Z={z}_{i})}.$

Acknowledgements

This work was supported partially by Grant-in-Aid for Scientific Research (No. 23700344) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan.

References

- Larsen K, Petersen JH, Budtz-Jørgensen E, Endahl L. Interpreting parameters in the logistic regression model with random effects. Biometrics. 2000; 56: 909-914.
- Greenland S. When should epidemiologic regressions use random coefficients? Biometrics. 2000; 56: 915-921.
- Gustafson P, Greenland S. The performance of random coefficient regression in accounting for residual confounding. Biometrics. 2006; 62: 760-768.
- Robins JM. Association, causation, and marginal structural models. Synthese. 1990; 121: 151-179.
- Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000; 11: 561-570.
- Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983; 70: 41-55.
- Joffe MM, Rosenbaum PR. Propensity scores. Am J Epidemiol. 1999; 150: 327-333.
- Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974; 66: 688-701.
- Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology. 2003; 14: 680-686.
- Chiba Y. A simple method for sensitivity analysis of unmeasured confounding. J Biomet Biostat. 2012; 3: e113.
- Arah OA, Chiba Y, Greenland S. Bias formulas for external adjustment and sensitivity analysis of unmeasured confounders. Ann Epidemiol. 2008; 18: 637-646.

**Cite this article:**Chiba Y (2013) A Note on the Logistic Regression Model with a Random Coefficient to Predict Propensity Scores. Ann Biom Biostat 1: 1001.

Current Issue Vol.1.1

A Note on the Logistic Regression Model with a Random Coefficient to Predict Propensity Scores

**Yasutaka Chiba***Conservative Sample Size Determination for Repeated Measures Analysis of Covariance

**Timothy M. Morgan* and L. Douglas Case**
Copyright © 2013 JSciMed Central. All rights reserved.