Introduction
Latent class analysis (LCA) is a widely used statistical method in many fields. It assumes the subjects belong to some latent subgroups, referred as latent classes. Although the classes are not directly observed, they can be inferred from a set of observed categorical variables using LCA. Further, the relationships between latent class membership and covariates may be assessed using latent class regression (LCR). In simple LCA, it is assumed that subjects are independent conditional on the latent class membership. However, if the subjects are clustered in groups, the conditional independence assumption may not be met. Thus, multilevel techniques are in need to incorporate the intracluster dependence for those types of data.
In recent years, multilevel latent class analysis (MLCA) has been developed by a few groups [13] to apply LCA with nested data. They also extended MLCA to include Level 1 and Level 2 covariates in the model [1,3]. In multilevel LCR (MLCR), a twolevel multinomial logistic regression is adopted, by introducing random intercepts across Level 2 (cluster) units. When the number of latent classes C is more than 2, the C1 random intercepts are allowed to be correlated with one another. However, the computational burden of this model grows exponentially with C. Thus, Vermunt [1] suggested to model all the random intercepts using a common factor, which assumes the random intercepts are perfectly correlated.
To the best knowledge of the authors, the impact of using simple latent class regression (SLCR) instead of MLCR for nested data has not been investigated empirically. In this article, we present a Monte Carlo simulation study to examine the influence of intraclass correlation (ICC) on the estimation bias and coverage of regression coefficients using MLCR vs. SLCR. We also evaluate the consequences of assuming perfect correlation when the random intercepts are actually not perfectly correlated. More specifically, the performance of SLCR, MLCR with perfect correlation (MLCRP), and MLCR with ordinary correlation (MLCRO) are compared in a MLCR model with 3 latent classes.
Materials and Methods
SLCR
In a SLCR model, let Y_{i}_{ }= (Y_{i1},_{ … },Y_{iM}) ’ bean observed response vector for the ith individual, where variable Y_{im }takes possible values 1, 2, …, r_{m}, and c_{i }= 1,2, …, C denote the latent class membership of the ith individual. Let x_{i} = (x_{i1}, x_{i2},…, x_{ip}) ’ be a vector of explanatory variables for the ith individual. A LCR model can be expressed as:
$\mathrm{Pr}\left({Y}_{1}={y}_{1},\dots ,{Y}_{n}={y}_{n}\right)={\displaystyle \prod}_{i=1}^{n}{\displaystyle \sum}_{c=1}^{C}{\gamma}_{c}({x}_{i}){\displaystyle \prod}_{m=1}^{M}{\displaystyle \prod}_{k=1}^{{r}_{m}}{\rho}_{mkc}^{I({y}_{im}=k)},$
where ρ_{mkc}= Pr (Y_{im }= k  c_{i }= c) is the conditional probability of response k to the mth item given class membership c; γ_{c} (x_{i}) is the class membership probabilities for the cth class given x_{i}, which is related to γ_{c} through a multinomial logistic regression:
${\gamma}_{c}\left({x}_{i}\right)=\mathrm{Pr}\left({c}_{i}=c\text{}{x}_{i}\right)=\frac{\text{exp}({x}_{i}^{\text{'}}{\beta}_{c})}{1+{{\displaystyle \sum}}_{j=1}^{C1}\text{exp}({x}_{i}^{\text{'}}{\beta}_{j})},$
where β_{c} is a vector of logistic regression coefficients, with the reference class as C.
MLCR
If we denote c_{ij} as the class membership of the ith individual coming from the jth cluster, γ_{jc }(x_{ij},w_{j}) as the probability that c_{ij }=c with level 1 covariate x_{ij }andlevel 2 covariate w_{j }, assuming random intercept u_{jc} ~ N (0, 1), a MLCR model can be written as:
Level 1 (individual):
${\gamma}_{jc}\left({x}_{ij},{w}_{j}\right)=\mathrm{Pr}\left({c}_{ij}=c\text{}{x}_{ij},{w}_{j}\right)=\frac{\text{exp}({\beta}_{0jc}+{\beta}_{1c}{x}_{ij})}{1+{{\displaystyle \sum}}_{k=1}^{C1}\text{exp}({\beta}_{0jk}+{\beta}_{1k}{x}_{ij})},$
Level 2 (cluster):
${\beta}_{0jc}={\alpha}_{0c}+{\alpha}_{1c}{w}_{j}+{\sigma}_{c}{u}_{jc},$
The intraclass correlation (ICC) for class c in MLCR is defined as the proportion of the variance of the random effects out of the total variance, i. e. , i.e., .
${r}_{c}=\frac{{\sigma}_{c}^{2}}{{\sigma}_{c}^{2}+{\pi}^{2}/3}\text{}(1).$
When C is more than 2, MLCRO allows the C1 random intercepts u_{jc} to be correlated with one another. On the other hand, the MLCRP uses a common factor to model all the random intercepts (i. e. , u_{jc}= u_{j}for all c), which assumes the random intercepts are perfectly correlated.
Monte carlo simulation
In the Monte Carlo simulation study, we generated data from a 3class MLCR model using the R 2.15. 2 package. The observed vector, Y_{i }= (Y_{i1},_{…},Y_{i3}) ’, have 3 categorical variables, each of which has 5 categories. The model includes 2 covariate variables. One covariate is a Level 1 continuous variable with standard normal distribution, and the other is a Level 2 binary covariate. We assign Class 3 as the reference class, i. e. , α_{03} = α_{13} = β_{13}= σ_{3}= 0. We set the regression parameters as α_{01} =α_{02} = 1, α_{11} = 0.5,α_{12} = 0.5, β_{11}= 0.5,β_{12} = 0.5. The random intercept u_{j}_{1}is correlated with u_{j}_{2}through a bivariate normal distribution:
$\left(\begin{array}{c}{u}_{j1}\\ {u}_{j2}\end{array}\right)~BVN\left(\left(\begin{array}{c}0\\ 0\end{array}\right),\left(\begin{array}{cc}1& 0.5\\ 0.5& 1\end{array}\right)\right).$
In addition we set different values of σ_{1}= σ_{2}= σ to obtain ICCs at various levels. A value of σ = 0.416 generates data with ICC of 0.05, a value of σ = 0.6 generates data with ICC of 0.1, and a value of σ = 1 gives us data with ICC of 0.25. Finally, the conditional item response probabilities for m = 1,2,3 were chosen as:
$({\rho}_{m11},{\rho}_{m21},{\rho}_{m31},{\rho}_{m41},{\rho}_{m51})=\text{}\left(0.0\text{5},\text{}0.\text{8},\text{}0.0\text{5},\text{}0.0\text{5},\text{}0.0\text{5}\right),$
$({\rho}_{m12},{\rho}_{m22},\rho {,}_{m32},{\rho}_{m42},{\rho}_{m52})=\text{}\left(0.0\text{5},\text{}0.0\text{5},\text{}0.\text{1},\text{}0.\text{4},\text{}0.\text{4}\right)$
$({\rho}_{m13},{\rho}_{m23},{\rho}_{m33},{\rho}_{m43},{\rho}_{m53})=\text{}\left(0.\text{3},\text{}0.\text{3},\text{}0.\text{3},\text{}0.0\text{5},\text{}0.0\text{5}\right).$
We applied the SLCR, MLCRP and MLCRO approaches to estimate the regression coefficients of the two covariates using Mplus 7 software package [4]. Since MLCRO adopts the true model, we expect MLCRO would perform the best. We generated 500 replications of 3000 subjects with different ICCs. The 3000 subjects were grouped into 30 or 300 equally sized clusters, half of which were assigned with 0 for the Level 2 covariate, and the other half were assigned with 1.
Results and Discussion
Table 1 presents the biases and 95% confidence interval coverage rates for the estimates of regression coefficients using different models. It clearly shows that SLCR had the largest bias and worst confidence interval coverage among the three methods, especially for the Level 2 covariate. With the increase of ICC, the biases in regression coefficients increased while the coverage probabilities decreased. SLCR also had worse performance when only a limited number of clusters were available in the data comparing to the scenario when a large number of clusters were collected. In general, these results are consistent with the simulation results of multilevel logistic regression models [5], suggesting the importance of using multilevel analysis techniques when you have clustered/correlated data that do not satisfy the conditional independence assumption, especially when the regression coefficients for level 2 covariates are of interest.
Table 1 Summary of the relative bias
$\left(\frac{\widehat{\theta}\theta}{\theta}\right)$
and 95% confidence interval coverage for the regression coefficients.
Table 1
# of groups 
Group Size 
ICC 
Class 
Coefficient 
Relative Bias 
95% CI
Coverage Rate 





SLCR 
MLCRP 
MLCRO 
SLCR 
MLCRP 
MLCRO 
30 
100 
0.05 
Class 1 
Level 1 Covariate 
.02 
.00 
.00 
.93 
.94 
.95 




Level 2 Covariate 
.03 
.02 
.03 
.69 
.94 
.94 




Intercept 
.03 
.00 
.01 
.84 
.93 
.94 



Class 2 
Level 1 Covariate 
.01 
.00 
.01 
.96 
.95 
.94 




Level 2 Covariate 
.01 
.01 
.01 
.87 
.94 
.93 




Intercept 
.02 
.03 
.01 
.93 
.94 
.93 
30 
100 
0.1 
Class 1 
Level 1 Covariate 
.05 
.00 
.00 
.92 
.93 
.93 




Level 2 Covariate 
.06 
01 
.01 
.59 
.94 
.95 




Intercept 
.06 
.00 
.00 
.70 
.94 
.92 



Class 2 
Level 1 Covariate 
.01 
.02 
.01 
.94 
.93 
.95 




Level 2 Covariate 
.02 
.02 
.01 
.76 
.94 
.94 




Intercept 
.05 
.04 
.00 
.85 
.91 
.92 
30 
100 
0.25 
Class 1 
Level 1 Covariate 
.14 
.01 
.00 
.68 
.92 
.93 




Level 2 Covariate 
.13 
.05 
.00 
.44 
.93 
.94 




Intercept 
.13 
.00 
.02 
.51 
.93 
.91 



Class 2 
Level 1 Covariate 
.05 
.09 
.00 
.93 
.90 
.95 




Level 2 Covariate 
.09 
.12 
.03 
.66 
.92 
.94 




Intercept 
.11 
.13 
.01 
.75 
.88 
.93 
300 
10 
0.05 
Class 1 
Level 1 Covariate 
.03 
.00 
.00 
.94 
.95 
.95 




Level 2 Covariate 
.05 
.01 
.01 
.92 
.96 
.94 




Intercept 
.04 
.01 
.01 
.94 
.96 
.96 



Class 2 
Level 1 Covariate 
.00 
.00 
.01 
.96 
.95 
.95 




Level 2 Covariate 
.01 
.01 
.01 
.93 
.94 
.96 




Intercept 
.03 
.01 
.00 
.95 
.96 
.97 
300 
10 
0.1 
Class 1 
Level 1 Covariate 
.06 
.01 
.01 
.92 
.95 
.94 




Level 2 Covariate 
.07 
.00 
.01 
.87 
.95 
.95 




Intercept 
.06 
.01 
.00 
.90 
.95 
.97 



Class 2 
Level 1 Covariate 
.02 
.04 
.02 
.93 
.93 
.94 




Level 2 Covariate 
.02 
.02 
.01 
.95 
.95 
.95 




Intercept 
.04 
.04 
.01 
.94 
.95 
.93 
300 
10 
0.25 
Class 1 
Level 1 Covariate 
.16 
.01 
.01 
.63 
.95 
.95 




Level 2 Covariate 
.16 
.01 
.01 
.75 
.96 
.96 




Intercept 
.16 
.02 
.00 
.63 
.96 
.94 



Class 2 
Level 1 Covariate 
.04 
.10 
.01 
.95 
.92 
.95 




Level 2 Covariate 
.04 
.08 
.00 
.93 
.95 
.96 




Intercept 
.15 
.18 
.01 
.87 
.87 
.94 
Table 1 Summary of the relative bias
$\left(\frac{\widehat{\theta}\theta}{\theta}\right)$
and 95% confidence interval coverage for the regression coefficients.
×
When the MLCRP procedure was compared to the MLCRO procedure, it appears that the loss of efficiency was not substantial especially when ICC was low. Even for the cases with an ICC of 0.25, MLCRP was only slightly worse than MLCRO in terms of biases and coverage rates. Meanwhile, MLCRP procedure took much less computation time than MLCRO (2.5 minutes vs. 30 minutes for each simulated dataset on a PC with CPU of Intel i5 2.40GHz).
Conclusions
Consistent with previous studies of multilevel logistic regression models [5] and multilevel data analysis techniques broadly [6], the results of this empirical study demonstrate the importance of using multilevel regressions, more specifically, MLCR when performing LCR for clustered/correlated data structure.
When the number of latent classes (C) is more than 2, the computational complexity of the estimation procedure for MLCRO increases rapidly with the increase of C. To alleviate the computational intensity and reduce computation time, the perfect correlation assumption for the random intercepts may be adopted to use MLCRP in those situations. However, attention should be paid to the possible bias brought by such misspecification. Based on our Monte Carlo simulation results, we conclude that the bias caused by the misspecification of perfect correlation among random intercepts in MLCRP model estimation is slight, especially when the ICC is low. Therefore, MLCRP might serve as a computationally efficient method without substantial loss of accuracy in parameter estimates, hence could be a reasonable substitute for MLCRO procedure when computation burden is a concern.
Acknowledgements
Manuscript preparation was supported in part by American Diabetes Association (ADA #712CT36, L. Jiang).