Research Article
A Note on the Logistic Regression Model with a Random Coefficient to Predict Propensity Scores
Yasutaka Chiba*
Division of Biostatistics, Clinical Research Center, Kinki University School of Medicine, Osaka, Japan

ABSTRACT
In observational studies, marginal structural models are often used to adjust for confounding. When predicting propensity scores, some investigators may want to apply a logistic regression model with a random coefficient to take account of residual confounding. Here, we show that the random coefficient can be interpreted as the logarithm of the confounding risk ratio; i.e., the ratio of crude risk ratio to causal risk ratio. Three target populations (the exposed, unexposed, and total groups) are discussed.
KEYWORDS
Confounding risk ratio; Inverse-probability-weighting; Marginal structural model; Potential outcome

INTRODUCTION
Confounding is widely recognized as one of the principal problems faced by investigators conducting observational studies. In an analysis, some investigators may want to take account of residual confounding. In the situations, a random coefficient regression model (mixed effect model with random intercept) may be applied. The use of such a model has been discussed in the context of ordinal regression analysis [1-3]. However, this discussion has not been conducted in the context of marginal structural models (MSMs) [4,5], in which a logistic regression model is often used to predict propensity scores [6,7].
Here, we give an interpretation of a random coefficient when a logistic regression model with a random coefficient is used for predicting propensity scores. We discuss MSMs under the three target populations: the total, exposed, and unexposed groups.
MATERIALS AND METHODS
We use X as an exposure indicator and assume the now-standard deterministic potential outcome model [8], in which YX=1 and YX=0 are the potential outcome indicators under X = 1 and X = 0, respectively. The potential risks Pr(YX=1 = 1) and Pr(YX=0 = 1) are then the expectation of Y if everyone in the study population had been exposed and that if everyone had been not exposed, respectively. Causal effects with the total group as the target population are contrasts between these two risks. Those with X = x as the target population are contrasts between Pr(YX=1 = 1 | X = x) and Pr(YX=0 = 1 | X = x).
Let i = 1, …, n denote a subject and zi denote a vector of measured confounders. The propensity score Pr(X = 1 | Z = zi) is then predicted using a logistic regression model:
where θ is a vector of the regression coefficient. When residual confounding exists, however, Equation (1) derives the biased propensity scores. As a result, the MSM will derive biased estimates of causal effects. In the next section, we give an interpretation of a random coefficient when it is included in Equation (1).
RESULTS AND DISCUSSION
Unexposed group as the target population
To take account of residual confounding, we assume that the propensity score p0i is explained by the following logistic regression model with a random coefficient:
${p}_{0i}=\frac{\mathrm{exp}\left\{\mathrm{log}\left({\alpha }_{i}\right)+{\theta }^{\prime }{z}_{i}\right\}}{1+\mathrm{exp}\left\{\mathrm{log}\left({\alpha }_{i}\right)+{\theta }^{\prime }{z}_{i}\right\}},$
where log (αi) is a random coefficient. Using Equation (1), p0i can be expressed as:
Using the inverse-probability-weighting (IPW) method, Pr(YX=x = 1|X = 0) is estimated as follows:
$\mathrm{Pr}\left({Y}_{X=0}=1|X=0\right)=\frac{1}{{n}_{0}}\sum _{i=1}^{n}{y}_{i}\left(1-{x}_{i}\right),$
where n0 = nPr(X = 0) [9,10]. In the framework of MSMs, the causal risk difference (RD) is estimated using a weighted linear regression analysis of X on Y with the weights (1 – p0i) / p0i for exposed subjects and 1 for unexposed subjects. The causal risk ratio (RR) is estimated using the weighted Poisson regression analysis of X on Y with the same weights.
By substituting Equation (2) into Equation (3) and replacing yixi / n with Pr(Y = 1, X = 1, Z = zi) in the summation, the following equation is derived:
$\mathrm{Pr}\left({Y}_{X=1}=1|X=0\right)=\frac{n}{{n}_{0}}\sum _{i=1}^{n}\frac{\mathrm{Pr}\left(X=0|Z={z}_{i}\right)/{\alpha }_{i}}{\mathrm{Pr}\left(X=1|Z={z}_{i}\right)}\frac{{y}_{i}{x}_{i}}{n}$
$=\frac{1}{\mathrm{Pr}\left(X=0\right)}\sum _{i=1}^{n}\frac{\mathrm{Pr}\left(X=0|Z={z}_{i}\right)}{{\alpha }_{i}\mathrm{Pr}\left(X=1|Z={z}_{i}\right)}$
$=\sum _{i=1}^{n}\frac{\mathrm{Pr}\left(Y=1|X=1,Z={z}_{i}\right)}{{\alpha }_{i}}\mathrm{Pr}\left(Z={z}_{i}|X=0\right)$
The left- and right-hand sides of this equation are equal when:
$\mathrm{Pr}\left(Y=1|X=1,Z={z}_{i}\right)/{\alpha }_{i}=\mathrm{Pr}\left({Y}_{X=1}=1|X=0,Z={z}_{i}\right),$
${\alpha }_{i}=\frac{\mathrm{Pr}\left({Y}_{X=1}=1|X=1,Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=1}=1|X=0,Z={z}_{i}\right)}=\frac{\mathrm{Pr}\left(Y=1|X=1,Z={z}_{i}\right)}{\mathrm{Pr}\left(Y=1|X=0,Z={z}_{i}\right)}/\frac{\mathrm{Pr}\left({Y}_{X=1}=1|X=0,Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=0}=1|X=0,Z={z}_{i}\right)}.$
This αi is the confounding risk ratio (CRR) with the unexposed group as the target population [11], which is the ratio of crude RR to causal RR, for an individual with Z = zi.
Exposed group as the target population
In the case of the exposed group as the target population, we can make an argument similar to the above subsection. We assume that the propensity score p1i is explained by the following logistic regression model with a random coefficient:
${p}_{1i}=\frac{\mathrm{exp}\left\{\mathrm{log}\left({\beta }_{i}\right)+{\theta }^{\prime }{z}_{i}\right\}}{1+\mathrm{exp}\left\{\mathrm{log}\left({\beta }_{i}\right)+{\theta }^{\prime }{z}_{i}\right\}}$
where log(βi) is a random coefficient. Then, by the IPW method, Pr(YX=x = 1|X = 1) is estimated as:
$\mathrm{Pr}\left({Y}_{X=1}=1|X=1\right)=\frac{1}{{n}_{1}}\sum _{i=1}^{n}{y}_{i}{x}_{i},\mathrm{Pr}\left({Y}_{X=0}=1|X=1\right)=\frac{1}{{n}_{1}}\sum _{i=1}^{n}\frac{{p}_{1i}}{1-{p}_{1i}}{y}_{i}\left(1-{x}_{i}\right),$
where n1 = nPr(X = 1) [9,10]. In the framework of MSMs, the causal effects are estimated using the weighted regression analyses of X on Y with the weights 1 for exposed subjects and p1i / (1 – p1i) for unexposed subjects.

Algebra similar to the above subsection yields:
$\mathrm{Pr}\left({Y}_{X=0}=1|X=1\right)=\sum _{i=1}^{n}{\beta }_{i}\mathrm{Pr}\left(Y=1|X=0,Z={z}_{i}\right)\mathrm{Pr}\left(Z={z}_{i}|X=1\right).$
Because $\mathrm{Pr}\left({Y}_{X=0}=1|X=1\right)={\sum }_{i=1}^{n}\mathrm{Pr}\left({Y}_{X=0}=1|X=1,Z={z}_{i}\right)\mathrm{Pr}\left(Z={z}_{i}|X=1\right)$ βi can be expressed as:
${\beta }_{i}=\frac{\mathrm{Pr}\left({Y}_{X=0}=1|X=1,Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=0}=1|X=0,Z={z}_{i}\right)}=\frac{\mathrm{Pr}\left(Y=1|X=1,Z={z}_{i}\right)}{\mathrm{Pr}\left(Y=1|X=0,Z={z}_{i}\right)}/\frac{\mathrm{Pr}\left({Y}_{X=1}=1|X=1,Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=0}=1|X=1,Z={z}_{i}\right)}.$
This βi is the CRR with the exposed group as the target population [11], for an individual with Z = zi.
Total group as the target population
We assume that the propensity score pi is explained by the following logistic regression model with a random coefficient:
${p}_{i}=\frac{\mathrm{exp}\left\{\mathrm{log}\left({\gamma }_{i}\right)+{\theta }^{\prime }{z}_{i}\right\}}{1+\mathrm{exp}\left\{\mathrm{log}\left({\gamma }_{i}\right)+{\theta }^{\prime }{z}_{i}\right\}}$
where log(γi) is a random coefficient. Then, by the IPW method, Pr(YX=x = 1) is estimated as:
In the framework of MSMs, the causal effects are estimated using the weighted regression models of X on Y with the weights 1 / pi for the exposed subjects and 1 / (1 – pi) for the unexposed subjects.
By a calculation similar to those in the above subsections, Equation (4) can be expressed as:
$\begin{array}{l}\mathrm{Pr}\left({Y}_{X=1}=1\right)=\\ \sum _{i=1}^{n}\left\{\frac{\mathrm{Pr}\left(X=0|Z={z}_{i}\right)}{{\gamma }_{i}}+\mathrm{Pr}\left(X=1|Z={z}_{i}\right)\right\}\mathrm{Pr}\left(Y=1|X=1,Z={z}_{i}\right)\mathrm{Pr}\left(Z={z}_{i}\right)\end{array}$
Because   $\mathrm{Pr}\left({Y}_{X=1}=1\right)={\sum }_{i=1}^{n}\mathrm{Pr}\left({Y}_{X=1}=1|Z={z}_{i}\right)\mathrm{Pr}\left(Z={z}_{i}\right),$ the left- and right-hand sides of this equation are equal when:
$\mathrm{Pr}\left({Y}_{X=1}=1|Z={z}_{i}\right)=\left\{\frac{\mathrm{Pr}\left(X=0|Z={z}_{i}\right)}{{\gamma }_{i}}+\mathrm{Pr}\left(X=1|Z={z}_{i}\right)\right\}\mathrm{Pr}\left(Y=1|X=1,Z={z}_{i}\right),$
which derives: ${\gamma }_{i}={\alpha }_{i}=\frac{\mathrm{Pr}\left({Y}_{X=1}=1|X=1,Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=1}=1|X=0,Z={z}_{i}\right)}.$
Likewise, Equation (5) derives: ${\gamma }_{i}={\beta }_{i}=\frac{\mathrm{Pr}\left({Y}_{X=0}=1|X=1,Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=0}=1|X=0,Z={z}_{i}\right)},$
because Equation (5) can be expressed as:
$\mathrm{Pr}\left({Y}_{X=0}=1\right)=\sum _{i=1}^{n}\begin{array}{l}\text{{}\mathrm{Pr}\left(X=0|Z={z}_{i}\right)+{\gamma }_{i}\mathrm{Pr}\left(X=1|Z={z}_{i}\right)\text{}}\\ \text{Pr}\left(Y=1|X=0,Z={z}_{i}\right)\mathrm{Pr}\left(Z={z}_{i}\right)\end{array}.$
This observation shows that γi cannot be interpreted until αi= βi holds (i.e., the two CRRs with the exposed and unexposed groups as the target population are equal). When αi = βi holds, γi can be interpreted as the CRR with the total group as the target population, because this CRR is expressed as:
and then γi is equal to the CRR when αi = βi (= γi). The derivation of Equation (6) is given in the Appendix.
Conclusion
Based on the formulas of the random coefficient model and the IPW approach, we have given an interpretation of a random coefficient when logistic regression with a random coefficient is used for predicting propensity scores. In conclusion, when the exposed or unexposed group is the target population, the random coefficient can be interpreted as the logarithm of CRR with its group as the target population. When the total group is the target population, however, the random coefficient cannot be interpreted in a straightforward manner. The random coefficient can be interpreted as the CRR only when the two CRRs with the exposed and unexposed groups as the target population are equal.
Although we have given an interpretation of a random coefficient in a logistic regression model, we have not discussed the predicted values of propensity scores or the estimates of causal effects themselves. We will need to research their characteristics through, for example, simulation studies.
Appendix: Derivation of Equation (6)
In the stratum with Z = zi, we let RRCi = Pr(Y = 1 | X = 1, Z = zi) / Pr(Y = 1 | X = 0, Z = zi) denote the crude RR, RREi = Pr(YX=1 = 1 | X = 1, Z = zi) / Pr(YX=0 = 1 | X = 1, Z = zi) denote the causal RR with the exposed group as the target population, and RRUi = Pr(YX=1 = 1 | X = 0, Z = zi) / Pr(YX=0 = 1 | X = 0, Z = zi) denote the causal RR with the unexposed group as the target population. Then, the CRR with the total group as the target population can be expressed as:
$\frac{\mathrm{Pr}\left(Y=1|X=1,Z={z}_{i}\right)}{\mathrm{Pr}\left(Y=1|X=0,Z={z}_{i}\right)}/\frac{\mathrm{Pr}\left({Y}_{X=1}=1|Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=0}=1|Z={z}_{i}\right)}=\frac{\sum _{x=0}^{1}\mathrm{Pr}\left({Y}_{X=0}=1|X=x,Z={z}_{i}\right)\mathrm{Pr}\left(X=x|Z={z}_{i}\right)}{\sum _{x=0}^{1}\mathrm{Pr}\left({Y}_{X=1}=1|X=x,Z={z}_{i}\right)\mathrm{Pr}\left(X=x|Z={z}_{i}\right)}{\text{RR}}_{\text{C}i}$
$=\frac{\mathrm{Pr}\left({Y}_{X=1}=1|X=1,Z={z}_{i}\right)\sum _{x=0}^{1}\frac{\mathrm{Pr}\left({Y}_{X=0}=1|X=x,Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=1}=1|X=1,Z={z}_{i}\right)}\mathrm{Pr}\left(X=x|Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=0}=1|X=0,Z={z}_{i}\right)\sum _{x=0}^{1}\frac{\mathrm{Pr}\left({Y}_{X=1}=1|X=x,Z={z}_{i}\right)}{\mathrm{Pr}\left({Y}_{X=0}=1|X=0,Z={z}_{i}\right)}\mathrm{Pr}\left(X=x|Z={z}_{i}\right)}{\text{RR}}_{\text{C}i}$
$=\frac{\left\{\frac{\mathrm{Pr}\left(X=0|Z={z}_{i}\right)}{{\text{RR}}_{\text{C}i}}+\frac{\mathrm{Pr}\left(X=1|Z={z}_{i}\right)}{{\text{RR}}_{\text{E}i}}\right\}{\text{RR}}_{\text{C}i}}{\frac{\mathrm{Pr}\left(X=0|Z={z}_{i}\right){\text{RR}}_{\text{U}i}+\mathrm{Pr}\left(X=1|Z={z}_{i}\right){\text{RR}}_{\text{C}i}}{{\text{RR}}_{\text{C}i}}}=\frac{\mathrm{Pr}\left(X=0|Z={z}_{i}\right)+{\beta }_{i}\mathrm{Pr}\left(X=1|Z={z}_{i}\right)}{\frac{\mathrm{Pr}\left(X=0|Z={z}_{i}\right)}{{\alpha }_{i}}+\mathrm{Pr}\left(X=1|Z={z}_{i}\right)}.$
Acknowledgements
This work was supported partially by Grant-in-Aid for Scientific Research (No. 23700344) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan.

Cite this article: Chiba Y (2013) A Note on the Logistic Regression Model with a Random Coefficient to Predict Propensity Scores. Ann Biom Biostat 1: 1001.
Right Table
Current Issue Vol.1.1
Footer
Content:
Journal Info: