Extending the Mann-WhitneyWilcoxon Rank Sum Test for Multiple Treatment Groups and Longitudinal Study Data
- 1. Department of Biostatistics and Computational Biology, University of Rochester, USA
Abstract
Popular models for longitudinal data analysis with continuous outcomes such as linear mixed-effects model and weighted generalized estimating equations lack robustness in the presence of outliers. For example, in a study to evaluate the efficacy of a sexual risk-reduction intervention for sexually active teenage girls in low-income urban settings, some adolescent girls reported very large numbers such as 450 and even 1,000,000 for their unprotected vaginal sex over a three-month period. Although answers like this are clearly not legitimate values of the outcome, they do indicate the extremely high level of sexual activity among these girls and thus should not be completely ignored. However, the mean-based GLMM and WGEE are not capable of dealing with this type of “ outliers”, due to the sensitivity of the sample mean to values of extremely large magnitude. Rank based methods such as the popular Mann-Whitney-Wilcoxon (MWW) rank sum test are more effective alternatives to address such outliers. Unfortunately, available methods for inference are limited to cross-sectional data and cannot be applied to longitudinal studies, especially in the presence of missing data.
In this paper, we propose to extend the MWW test for comparing multiple groups within a longitudinal data setting, by utilizing the function response models. Inference is based on a class of U-statistics weighted generalized estimating equations, which provides consistent estimates, with asymptotic normal distributions, not only for complete data but also for missing data under MAR, the most popular missing mechanism in real studies. The approach is illustrated with data from both real and simulated studies.
Citation
Chen R, Wu P, Ma F, Han Y, Chen T, et al. (2014) Extending the Mann-Whitney-Wilcoxon Rank Sum Test for Multiple Treatment Groups and Longitudinal Study Data. Clin Res HIV/AIDS 1(1): 1005.
Keywords
• Functional response models
• Missing data
• Outliers
• Sexual health
• U-statistics based weighted generalized estimating
equations
INTRODUCTION
Popular models for longitudinal data analysis with continuous outcomes such as linear mixed-effects models (GLMM) and weighted generalized estimating equations (WGEE) lack robustness in the presence of outliers. For example, in a study to evaluate the efficacy of a sexual risk-reduction intervention for sexually active teenage girls in low-income urban settings, a group at elevated risk for HIV, some adolescent girls reported very large numbers such as 450 and even 1,000,000 for their unprotected vaginal sex over a three-month period [1]. Although answers like this are clearly not legitimate values of the outcome, they do indicate the extremely high level of sexual activity among these girls, as compared to the rest of the study sample, and should not be removed for analysis. However, the mean-based GLMM and WGEE are not capable of dealing with this type of “outliers”, due to the sensitivity of the sample mean to large values. On the other hand, rank based methods such as the popular Mann-Whitney-Wilcoxon (MWW) rank sum test are more effective to address such outliers. However, available methods for inference are limited to cross-sectional data and cannot be applied to longitudinal data, especially in the presence of missing data. In this paper, we address this issue by extending the MWW test to a longitudinal data and multi-group setting within the framework of the functional response models (FRM). Inference for the FRMbased model is achieved by a class of U-statistics based weighted generalized estimating equations (UWGEE). The approach is illustrated with data from both real and simulated study data. In Section data application in sexual health research as well as simulated data to study the behavior of the estimate for small to moderate sample sizes.
MULTI-SAMPLE MANN-WHITNEY-WILCOXON TESTS
We first briefly review the classic Mann-Whitney-Wilcoxon rank sum test for between-group difference. We then discuss limitations of existing modeling paradigms to extend it for multi-group comparison within a longitudinal data setting and how the functional response model overcomes such difficulties to achieve the needed generalization.
The mann-whitney-wilcoxon rank sum test
Consider two independent samples with size nk and let yki be some continuous outcome from the i th subject within the k th group (1≤i≤nk , k=1,2). Let Rki denote the rank of yki in the pooled sample. The Wilcoxon rank sum statistic has the following form [2,3]:
APPLICATIONS
We demonstrate our considerations with both simulated and real data. We first investigate the performance of the proposed approach by simulation and then present an application to a real study on sexual health for a group of teenage girls in low-income urban settings who were at elevated risk for HIV, sexually transmitted infections (STIs), and unintended pregnancies. In all the examples, we applied the second approach for inference as discussed in Section 3.2 and set the statistical significance at = 0.05. All analyses were carried out using codes developed by the authors for implementing the models considered using the Matlab software [17].
Simulation study
We conducted a simulation study to examine the performance of the proposed FRM-based multi-sample Mann-Whitney-Wilcoxon Model for longitudinal data analysis. The data were simulated from a longitudinal study with two groups and three assessments under both complete and missing data. For space consideration, we only report results for three sample sizes, n1 (=n2 )=50, 100, and 300, representing small, moderate and large sample sizes, respectively. All simulations were performed with a Monte Carlo sample of 1,000.
hown in Table 1 are the UGEE and UWGEE estimates of θ, along with standard errors and type I errors for the complete and missing data cases based on 1,000 MC replications. For missing data under MAR, we used (a) the FRM in (15) with inference based on the UWGEE in (25) and Theorem 2, and (b) the FRM in (29) for jointly modeling ki t( ) 1 k y − and ki t, k r with inference based on UWGEE in (25), but redefined Gi , ?i , fi and hi in (30). Since the results were quite similar, only the ones from the latter approach were reported. As well, only estimates of θ were shown in the table, as they are of primary interest. The results from the logistic regression in (31) for the missing data were quite close to the true values set for the simulation.
As seen, both the UGEE and UWGEE estimates of ˆθ were quite accurate, even for the small sample size nk =50. The standard errors showed a stead decrease as nk increased. Also, the corresponding standard errors were slightly larger for the UWGEE estimates because of the loss of information due to missing data. The type I error rates based on the Wald statistic showed a small upward for the small sample size nk =50, which is typical of the anti-conservative behavior of this statistic, [18-22,9] but the bias disappeared at the larger sample size nk = 100 and 300.
Real study
Teenage girls in low-income urban settings are at elevated risk for HIV, sexually transmitted infections (STIs), and unintended pregnancies. A randomized controlled trial was recently conducted to evaluate the efficacy of a sexual risk-reduction intervention, supplemented with post-intervention booster sessions, targeting low-income, urban, sexually active teenage girls [1]. The study recruited sexually-active urban adolescent girls aged 15-19 from the Rochester, New York, a mid-size, northeastern U. S. city, and randomized them to a theory-based, sexual risk reduction intervention or to a structurally-equivalent health promotion control group. Assessments and behavioral data were collected at baseline, and again at 3 and 6 months post-intervention. The primary interest of the study is to compare frequency of unprotected vaginal sex between the intervention and controlled condition. More details about the demographic characteristics of the study population, the treatment conditions and the assessment battery can be found in [1].
As mentioned in Section 1, a difficult problem with the data are the extremely large values some subjects reported with respect to their sexual activities. For example, seven subjects reported over 100 episodes of unprotected vaginal sex over the past 3 months at the 3 month follow-up, with the largest one being 1,000,000. A common approach to this issue in psychosocial research is to trim such outliers using some ad-hoc rules such as the one based on trimming large values by setting such outliers at 3 times the standard deviation of the outcome [19,1]. However, these methods induce artifacts, because of their dependence on the specific rules used and subjective criteria used in each method. Rank-based approaches such as the proposed FRM model address this issue in a much more objective fashion.
DISCUSSION
In this paper, we extended the classic Mann-Whitney-Wilcoxon (MWW) for multi-group comparison within a longitudinal data setting. We achieved this generalization by utilizing the functional response models (FRM), which is uniquely positioned to model rank-based outcomes as in the MWW rank sum test within our context. Inference is based on the U-statistics weighted generalized estimating equations. Which provides consistent and asymptotically normal estimates not only for complete data but also for missing data under MAR, the most popular missing mechanism in real studies [3,25,26].
We examined the performance of the proposed approach through both simulated and real study data. Results from the simulation study show that the proposed approach performed really well, with good parameter and type I estimates even for a sample as small as 50 per group. The proposed approach applies to both continuos and discrete outcomes. As demonstrated by the real study on sexual health, it handled ties well as the number of unprotected vaginal sex is an intrinsically discrete outcome.
In addition to the MWW test, median regression may also be used to address the outlier issue arising from the sexual health study [27,28]. However, these methods may not work well, since they either do not address missing data in longitudinal outcomes or require a unique median. Given that discrete outcomes typically do not have a unique median and MAR is popular in most real studies, applications of such methods in practice are very limited.
We performed all the simulation and real data analyses using a program we developed in Matlab. Readers interested in applying the methods can download this program from “CTSpedia.org”, a popular reference and resource website as well as a repository of statistical and utility macros to facilitate and promote multidisciplinary interactions and collaborations involving biostatisticians.
The proposed approach has also limitations. For example, it cannot control for any covariate, which is particularly important for observational studies. Current work is underway to further extend the Mann-Whitney-Wilcoxon to a regression setting.
ACKNOWLEDGEMENT
This research was supported in part by grant R33 DA027521 from the National Institutes of Health, and by the University of Rochester CTSA award UL1TR000042 from the National Center for Advancing Translational Sciences of the National Institutes of Health.
REFERENCES
3. Kowalski J, Tu, XM. Modern Applied U Statistics. Wiley: New York. 2007; 1-378.
5. Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley: New York. 1980.
16. Tang W, He H, Tu XM. Applied Categorical Data Analysis. Chapman & Hall/CRC. 2012.
17. MathWorks Inc. MatLab version 7.12.
19. Randles RH, Wolfe DA. Introduction to the Theory of Nonparametric Statistics. Wiley: New York. 1979.
25. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd Edn. Wiley: New York. 1987.