Loading

Extending the Mann-WhitneyWilcoxon Rank Sum Test for Multiple Treatment Groups and Longitudinal Study Data

Research Article | Open Access

  • 1. Department of Biostatistics and Computational Biology, University of Rochester, USA
+ Show More - Show Less
Corresponding Authors
Xin Tu, Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Ave., Box 630, CTSB 4.239, Rochester, NY 14642, USA, Tel: 585 275-0413; Fax: 585 273-1031
Abstract

Popular models for longitudinal data analysis with continuous outcomes such as linear mixed-effects model and weighted generalized estimating equations lack robustness in the presence of outliers. For example, in a study to evaluate the efficacy of a sexual risk-reduction intervention for sexually active teenage girls in low-income urban settings, some adolescent girls reported very large numbers such as 450 and even 1,000,000 for their unprotected vaginal sex over a three-month period. Although answers like this are clearly not legitimate values of the outcome, they do indicate the extremely high level of sexual activity among these girls and thus should not be completely ignored. However, the mean-based GLMM and WGEE are not capable of dealing with this type of “ outliers”, due to the sensitivity of the sample mean to values of extremely large magnitude. Rank based methods such as the popular Mann-Whitney-Wilcoxon (MWW) rank sum test are more effective alternatives to address such outliers. Unfortunately, available methods for inference are limited to cross-sectional data and cannot be applied to longitudinal studies, especially in the presence of missing data.

In this paper, we propose to extend the MWW test for comparing multiple groups within a longitudinal data setting, by utilizing the function response models. Inference is based on a class of U-statistics weighted generalized estimating equations, which provides consistent estimates, with asymptotic normal distributions, not only for complete data but also for missing data under MAR, the most popular missing mechanism in real studies. The approach is illustrated with data from both real and simulated studies.

Citation

Chen R, Wu P, Ma F, Han Y, Chen T, et al. (2014) Extending the Mann-Whitney-Wilcoxon Rank Sum Test for Multiple Treatment Groups and Longitudinal Study Data. Clin Res HIV/AIDS 1(1): 1005.

Keywords

•    Functional response models
•    Missing data
•    Outliers
•    Sexual health
•    U-statistics based weighted generalized estimating 
equations

INTRODUCTION

Popular models for longitudinal data analysis with continuous outcomes such as linear mixed-effects models (GLMM) and weighted generalized estimating equations (WGEE) lack robustness in the presence of outliers. For example, in a study to evaluate the efficacy of a sexual risk-reduction intervention for sexually active teenage girls in low-income urban settings, a group at elevated risk for HIV, some adolescent girls reported very large numbers such as 450 and even 1,000,000 for their unprotected vaginal sex over a three-month period [1]. Although answers like this are clearly not legitimate values of the outcome, they do indicate the extremely high level of sexual activity among these girls, as compared to the rest of the study sample, and should not be removed for analysis. However, the mean-based GLMM and WGEE are not capable of dealing with this type of “outliers”, due to the sensitivity of the sample mean to large values. On the other hand, rank based methods such as the popular Mann-Whitney-Wilcoxon (MWW) rank sum test are more effective to address such outliers. However, available methods for inference are limited to cross-sectional data and cannot be applied to longitudinal data, especially in the presence of missing data. In this paper, we address this issue by extending the MWW test to a longitudinal data and multi-group setting within the framework of the functional response models (FRM). Inference for the FRMbased model is achieved by a class of U-statistics based weighted generalized estimating equations (UWGEE). The approach is illustrated with data from both real and simulated study data. In Section data application in sexual health research as well as simulated data to study the behavior of the estimate for small to moderate sample sizes.

MULTI-SAMPLE MANN-WHITNEY-WILCOXON TESTS

We first briefly review the classic Mann-Whitney-Wilcoxon rank sum test for between-group difference. We then discuss limitations of existing modeling paradigms to extend it for multi-group comparison within a longitudinal data setting and how the functional response model overcomes such difficulties to achieve the needed generalization.

The mann-whitney-wilcoxon rank sum test

Consider two independent samples with size nk and let yki be some continuous outcome from the i th subject within the k th group (1≤i≤nk , k=1,2). Let Rki denote the rank of yki in the pooled sample. The Wilcoxon rank sum statistic has the following form [2,3]:

APPLICATIONS

We demonstrate our considerations with both simulated and real data. We first investigate the performance of the proposed approach by simulation and then present an application to a real study on sexual health for a group of teenage girls in low-income urban settings who were at elevated risk for HIV, sexually transmitted infections (STIs), and unintended pregnancies. In all the examples, we applied the second approach for inference as discussed in Section 3.2 and set the statistical significance at = 0.05. All analyses were carried out using codes developed by the authors for implementing the models considered using the Matlab software [17].

Simulation study

We conducted a simulation study to examine the performance of the proposed FRM-based multi-sample Mann-Whitney-Wilcoxon Model for longitudinal data analysis. The data were simulated from a longitudinal study with two groups and three assessments under both complete and missing data. For space consideration, we only report results for three sample sizes, n1 (=n2 )=50, 100, and 300, representing small, moderate and large sample sizes, respectively. All simulations were performed with a Monte Carlo sample of 1,000.

 

 

hown in Table 1 are the UGEE and UWGEE estimates of θ, along with standard errors and type I errors for the complete and missing data cases based on 1,000 MC replications. For missing data under MAR, we used (a) the FRM in (15) with inference based on the UWGEE in (25) and Theorem 2, and (b) the FRM in (29) for jointly modeling ki t( ) 1 k y − and ki t, k r with inference based on UWGEE in (25), but redefined Gi , ?i , fi and hi in (30). Since the results were quite similar, only the ones from the latter approach were reported. As well, only estimates of θ were shown in the table, as they are of primary interest. The results from the logistic regression in (31) for the missing data were quite close to the true values set for the simulation.

As seen, both the UGEE and UWGEE estimates of ˆθ were quite accurate, even for the small sample size nk =50. The standard errors showed a stead decrease as nk increased. Also, the corresponding standard errors were slightly larger for the UWGEE estimates because of the loss of information due to missing data. The type I error rates based on the Wald statistic showed a small upward for the small sample size nk =50, which is typical of the anti-conservative behavior of this statistic, [18-22,9] but the bias disappeared at the larger sample size nk = 100 and 300.

Real study

Teenage girls in low-income urban settings are at elevated risk for HIV, sexually transmitted infections (STIs), and unintended pregnancies. A randomized controlled trial was recently conducted to evaluate the efficacy of a sexual risk-reduction intervention, supplemented with post-intervention booster sessions, targeting low-income, urban, sexually active teenage girls [1]. The study recruited sexually-active urban adolescent girls aged 15-19 from the Rochester, New York, a mid-size, northeastern U. S. city, and randomized them to a theory-based, sexual risk reduction intervention or to a structurally-equivalent health promotion control group. Assessments and behavioral data were collected at baseline, and again at 3 and 6 months post-intervention. The primary interest of the study is to compare frequency of unprotected vaginal sex between the intervention and controlled condition. More details about the demographic characteristics of the study population, the treatment conditions and the assessment battery can be found in [1].

As mentioned in Section 1, a difficult problem with the data are the extremely large values some subjects reported with respect to their sexual activities. For example, seven subjects reported over 100 episodes of unprotected vaginal sex over the past 3 months at the 3 month follow-up, with the largest one being 1,000,000. A common approach to this issue in psychosocial research is to trim such outliers using some ad-hoc rules such as the one based on trimming large values by setting such outliers at 3 times the standard deviation of the outcome [19,1]. However, these methods induce artifacts, because of their dependence on the specific rules used and subjective criteria used in each method. Rank-based approaches such as the proposed FRM model address this issue in a much more objective fashion.

DISCUSSION

In this paper, we extended the classic Mann-Whitney-Wilcoxon (MWW) for multi-group comparison within a longitudinal data setting. We achieved this generalization by utilizing the functional response models (FRM), which is uniquely positioned to model rank-based outcomes as in the MWW rank sum test within our context. Inference is based on the U-statistics weighted generalized estimating equations. Which provides consistent and asymptotically normal estimates not only for complete data but also for missing data under MAR, the most popular missing mechanism in real studies [3,25,26].

We examined the performance of the proposed approach through both simulated and real study data. Results from the simulation study show that the proposed approach performed really well, with good parameter and type I estimates even for a sample as small as 50 per group. The proposed approach applies to both continuos and discrete outcomes. As demonstrated by the real study on sexual health, it handled ties well as the number of unprotected vaginal sex is an intrinsically discrete outcome.

In addition to the MWW test, median regression may also be used to address the outlier issue arising from the sexual health study [27,28]. However, these methods may not work well, since they either do not address missing data in longitudinal outcomes or require a unique median. Given that discrete outcomes typically do not have a unique median and MAR is popular in most real studies, applications of such methods in practice are very limited.

We performed all the simulation and real data analyses using a program we developed in Matlab. Readers interested in applying the methods can download this program from “CTSpedia.org”, a popular reference and resource website as well as a repository of statistical and utility macros to facilitate and promote multidisciplinary interactions and collaborations involving biostatisticians.

The proposed approach has also limitations. For example, it cannot control for any covariate, which is particularly important for observational studies. Current work is underway to further extend the Mann-Whitney-Wilcoxon to a regression setting.

ACKNOWLEDGEMENT

This research was supported in part by grant R33 DA027521 from the National Institutes of Health, and by the University of Rochester CTSA award UL1TR000042 from the National Center for Advancing Translational Sciences of the National Institutes of Health.

REFERENCES

1. Morrison-Beedy D, Jones SH, Xia Y, Tu X, Crean HF, Carey MP. Reducing sexual risk behavior in adolescent girls: results from a randomized controlled trial. J Adolesc Health. 2013; 52: 314-321.

2. Wilcoxin F. Probability tables for individual comparisons by ranking methods. Biometrics. 1947; 3: 119-122. 

3. Kowalski J, Tu, XM. Modern Applied U Statistics. Wiley: New York. 2007; 1-378. 

4. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Statist. 1947; 18: 50-60. 

5. Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley: New York. 1980. 

6. Tu XM, Feng C, Kowalski J, Tang W, Wang H, Wan C, et al. Correlation analysis for longitudinal data: applications to HIV and psychosocial research. Stat Med. 2007; 26: 4116-4138.

7. Ma Y, Tang W, Feng C, Tu XM. Inference for kappas for longitudinal study data: applications to sexual health research. Biometrics. 2008; 64: 781-789.

8. Ma Y, Tang W, Tu XM. Modeling Concordance Correlation Coefficient for longitudinal study data. Psychometrika. 2010; 75: 99-119. 

9. Ma Y, Tang W, Tu XM. Modeling Cronbach Coefficient Alpha for longitudinal study data. Statistics in Medicine. 2011; 29: 659-670. 

10. Yu Q, Tang W, Kowalski J, Tu XM. Multivariate U-Statistics: A tutorial with applications. WIREs Computational Statistics. 2011; 3: 457-471. 

11. Gunzler D, Tang W, Lu N, Wu P, Tu XM. A Class of Distribution-Free Models for Longitudinal Mediation Analysis. Psychometrika. 2013 .

12. Yu Q, Chen R, Tang W, He H, Gallop R, Crits-Christoph P, et al. Distribution-free models for longitudinal count responses with overdispersion and structural zeros. Stat Med. 2013; 32: 2390-2405.

13. Lu N, White AM, Wu P, He H, Hu J, Feng C, Tu XM. Social network endogeneity and its implications for statistical and causal inferences. In Social Networking: Recent Trends, Emerging Issues and Future Outlook, Lu N, White AM, Tu XM, editors. Nova Science. New York. 2013. 

14. Kowalski J, Powell J. Nonparametric inference for stochastic linear hypotheses: Application to high-dimensional data. Biometrika. 2004; 91: 393-408. 

15. Wu P, Han Y, Chen T, Tu XM. Causal inference for Mann-Whitney-Wilcoxon rank sum and other nonparametric statistics. Stat Med. 2013; .

16. Tang W, He H, Tu XM. Applied Categorical Data Analysis. Chapman & Hall/CRC. 2012. 

17. MathWorks Inc. MatLab version 7.12. 

18. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986; 73: 13-22. 

19. Randles RH, Wolfe DA. Introduction to the Theory of Nonparametric Statistics. Wiley: New York. 1979. 

20. Rotnitzky A, Jewell NP. Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data. Biometrika. 1990; 77: 485-497. 

21. Boos DD, Brownie C. A rank-based mixed model approach to multisite clinical trials. Biometrics. 1992; 48: 61-72.

22. Guo X, Pan W, Connett JE, Hannan PJ, French SA. Small-sample performance of the robust score test and its modifications in generalized estimating equations. Stat Med. 2005; 24: 3479-3495.

23. Schroder KE, Carey MP, Vanable PA. Methodological challenges in research on sexual risk behavior: I. Item content, scaling, and data analytical options. Ann Behav Med. 2003; 26: 76-103.

24. Greenberg J, Hennessy M, MacGowan R, Celentano D, Gonzales V, Van Devanter N, et al. Modeling intervention efficacy for high-risk women. The WINGS Project. Eval Health Prof. 2000; 23: 123-148.

25. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd Edn. Wiley: New York. 1987. 

26. Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. JASA. 1995; 90: 106-121. 

27. Yi GY, He W. Median regression models for longitudinal data with dropouts. Biometrics. 2009; 65: 618-625.

28. Yan M, Alejandro GD, Hui Z, Tu XM. A U-statistics-based approach for modeling Cronbach coefficient alpha within a longitudinal data setting. Stat Med. 2010; 29: 659-670.

Received : 19 Dec 2013
Accepted : 29 Jan 2013
Published : 31 Jan 2013
Journals
Annals of Otolaryngology and Rhinology
ISSN : 2379-948X
Launched : 2014
JSM Schizophrenia
Launched : 2016
Journal of Nausea
Launched : 2020
JSM Internal Medicine
Launched : 2016
JSM Hepatitis
Launched : 2016
JSM Oro Facial Surgeries
ISSN : 2578-3211
Launched : 2016
Journal of Human Nutrition and Food Science
ISSN : 2333-6706
Launched : 2013
JSM Regenerative Medicine and Bioengineering
ISSN : 2379-0490
Launched : 2013
JSM Spine
ISSN : 2578-3181
Launched : 2016
Archives of Palliative Care
ISSN : 2573-1165
Launched : 2016
JSM Nutritional Disorders
ISSN : 2578-3203
Launched : 2017
Annals of Neurodegenerative Disorders
ISSN : 2476-2032
Launched : 2016
Journal of Fever
ISSN : 2641-7782
Launched : 2017
JSM Bone Marrow Research
ISSN : 2578-3351
Launched : 2016
JSM Mathematics and Statistics
ISSN : 2578-3173
Launched : 2014
Journal of Autoimmunity and Research
ISSN : 2573-1173
Launched : 2014
JSM Arthritis
ISSN : 2475-9155
Launched : 2016
JSM Head and Neck Cancer-Cases and Reviews
ISSN : 2573-1610
Launched : 2016
JSM General Surgery Cases and Images
ISSN : 2573-1564
Launched : 2016
JSM Anatomy and Physiology
ISSN : 2573-1262
Launched : 2016
JSM Dental Surgery
ISSN : 2573-1548
Launched : 2016
Annals of Emergency Surgery
ISSN : 2573-1017
Launched : 2016
Annals of Mens Health and Wellness
ISSN : 2641-7707
Launched : 2017
Journal of Preventive Medicine and Health Care
ISSN : 2576-0084
Launched : 2018
Journal of Chronic Diseases and Management
ISSN : 2573-1300
Launched : 2016
Annals of Vaccines and Immunization
ISSN : 2378-9379
Launched : 2014
JSM Heart Surgery Cases and Images
ISSN : 2578-3157
Launched : 2016
Annals of Reproductive Medicine and Treatment
ISSN : 2573-1092
Launched : 2016
JSM Brain Science
ISSN : 2573-1289
Launched : 2016
JSM Biomarkers
ISSN : 2578-3815
Launched : 2014
JSM Biology
ISSN : 2475-9392
Launched : 2016
Archives of Stem Cell and Research
ISSN : 2578-3580
Launched : 2014
Annals of Clinical and Medical Microbiology
ISSN : 2578-3629
Launched : 2014
JSM Pediatric Surgery
ISSN : 2578-3149
Launched : 2017
Journal of Memory Disorder and Rehabilitation
ISSN : 2578-319X
Launched : 2016
JSM Tropical Medicine and Research
ISSN : 2578-3165
Launched : 2016
JSM Head and Face Medicine
ISSN : 2578-3793
Launched : 2016
JSM Cardiothoracic Surgery
ISSN : 2573-1297
Launched : 2016
JSM Bone and Joint Diseases
ISSN : 2578-3351
Launched : 2017
JSM Bioavailability and Bioequivalence
ISSN : 2641-7812
Launched : 2017
JSM Atherosclerosis
ISSN : 2573-1270
Launched : 2016
Journal of Genitourinary Disorders
ISSN : 2641-7790
Launched : 2017
Journal of Fractures and Sprains
ISSN : 2578-3831
Launched : 2016
Journal of Autism and Epilepsy
ISSN : 2641-7774
Launched : 2016
Annals of Marine Biology and Research
ISSN : 2573-105X
Launched : 2014
JSM Health Education & Primary Health Care
ISSN : 2578-3777
Launched : 2016
JSM Communication Disorders
ISSN : 2578-3807
Launched : 2016
Annals of Musculoskeletal Disorders
ISSN : 2578-3599
Launched : 2016
Annals of Virology and Research
ISSN : 2573-1122
Launched : 2014
JSM Renal Medicine
ISSN : 2573-1637
Launched : 2016
Journal of Muscle Health
ISSN : 2578-3823
Launched : 2016
JSM Genetics and Genomics
ISSN : 2334-1823
Launched : 2013
JSM Anxiety and Depression
ISSN : 2475-9139
Launched : 2016
Clinical Journal of Heart Diseases
ISSN : 2641-7766
Launched : 2016
Annals of Medicinal Chemistry and Research
ISSN : 2378-9336
Launched : 2014
JSM Pain and Management
ISSN : 2578-3378
Launched : 2016
JSM Women's Health
ISSN : 2578-3696
Launched : 2016
Clinical Research in HIV or AIDS
ISSN : 2374-0094
Launched : 2013
Journal of Endocrinology, Diabetes and Obesity
ISSN : 2333-6692
Launched : 2013
Journal of Substance Abuse and Alcoholism
ISSN : 2373-9363
Launched : 2013
JSM Neurosurgery and Spine
ISSN : 2373-9479
Launched : 2013
Journal of Liver and Clinical Research
ISSN : 2379-0830
Launched : 2014
Journal of Drug Design and Research
ISSN : 2379-089X
Launched : 2014
JSM Clinical Oncology and Research
ISSN : 2373-938X
Launched : 2013
JSM Bioinformatics, Genomics and Proteomics
ISSN : 2576-1102
Launched : 2014
JSM Chemistry
ISSN : 2334-1831
Launched : 2013
Journal of Trauma and Care
ISSN : 2573-1246
Launched : 2014
JSM Surgical Oncology and Research
ISSN : 2578-3688
Launched : 2016
Annals of Food Processing and Preservation
ISSN : 2573-1033
Launched : 2016
Journal of Radiology and Radiation Therapy
ISSN : 2333-7095
Launched : 2013
JSM Physical Medicine and Rehabilitation
ISSN : 2578-3572
Launched : 2016
Annals of Clinical Pathology
ISSN : 2373-9282
Launched : 2013
Annals of Cardiovascular Diseases
ISSN : 2641-7731
Launched : 2016
Journal of Behavior
ISSN : 2576-0076
Launched : 2016
Annals of Clinical and Experimental Metabolism
ISSN : 2572-2492
Launched : 2016
Clinical Research in Infectious Diseases
ISSN : 2379-0636
Launched : 2013
JSM Microbiology
ISSN : 2333-6455
Launched : 2013
Journal of Urology and Research
ISSN : 2379-951X
Launched : 2014
Journal of Family Medicine and Community Health
ISSN : 2379-0547
Launched : 2013
Annals of Pregnancy and Care
ISSN : 2578-336X
Launched : 2017
JSM Cell and Developmental Biology
ISSN : 2379-061X
Launched : 2013
Annals of Aquaculture and Research
ISSN : 2379-0881
Launched : 2014
Clinical Research in Pulmonology
ISSN : 2333-6625
Launched : 2013
Journal of Immunology and Clinical Research
ISSN : 2333-6714
Launched : 2013
Annals of Forensic Research and Analysis
ISSN : 2378-9476
Launched : 2014
JSM Biochemistry and Molecular Biology
ISSN : 2333-7109
Launched : 2013
Annals of Breast Cancer Research
ISSN : 2641-7685
Launched : 2016
Annals of Gerontology and Geriatric Research
ISSN : 2378-9409
Launched : 2014
Journal of Sleep Medicine and Disorders
ISSN : 2379-0822
Launched : 2014
JSM Burns and Trauma
ISSN : 2475-9406
Launched : 2016
Chemical Engineering and Process Techniques
ISSN : 2333-6633
Launched : 2013
Annals of Clinical Cytology and Pathology
ISSN : 2475-9430
Launched : 2014
JSM Allergy and Asthma
ISSN : 2573-1254
Launched : 2016
Journal of Neurological Disorders and Stroke
ISSN : 2334-2307
Launched : 2013
Annals of Sports Medicine and Research
ISSN : 2379-0571
Launched : 2014
JSM Sexual Medicine
ISSN : 2578-3718
Launched : 2016
Annals of Vascular Medicine and Research
ISSN : 2378-9344
Launched : 2014
JSM Biotechnology and Biomedical Engineering
ISSN : 2333-7117
Launched : 2013
Journal of Hematology and Transfusion
ISSN : 2333-6684
Launched : 2013
JSM Environmental Science and Ecology
ISSN : 2333-7141
Launched : 2013
Journal of Cardiology and Clinical Research
ISSN : 2333-6676
Launched : 2013
JSM Nanotechnology and Nanomedicine
ISSN : 2334-1815
Launched : 2013
Journal of Ear, Nose and Throat Disorders
ISSN : 2475-9473
Launched : 2016
JSM Ophthalmology
ISSN : 2333-6447
Launched : 2013
Journal of Pharmacology and Clinical Toxicology
ISSN : 2333-7079
Launched : 2013
Annals of Psychiatry and Mental Health
ISSN : 2374-0124
Launched : 2013
Medical Journal of Obstetrics and Gynecology
ISSN : 2333-6439
Launched : 2013
Annals of Pediatrics and Child Health
ISSN : 2373-9312
Launched : 2013
JSM Clinical Pharmaceutics
ISSN : 2379-9498
Launched : 2014
JSM Foot and Ankle
ISSN : 2475-9112
Launched : 2016
JSM Alzheimer's Disease and Related Dementia
ISSN : 2378-9565
Launched : 2014
Journal of Addiction Medicine and Therapy
ISSN : 2333-665X
Launched : 2013
Journal of Veterinary Medicine and Research
ISSN : 2378-931X
Launched : 2013
Annals of Public Health and Research
ISSN : 2378-9328
Launched : 2014
Annals of Orthopedics and Rheumatology
ISSN : 2373-9290
Launched : 2013
Journal of Clinical Nephrology and Research
ISSN : 2379-0652
Launched : 2014
Annals of Community Medicine and Practice
ISSN : 2475-9465
Launched : 2014
Annals of Biometrics and Biostatistics
ISSN : 2374-0116
Launched : 2013
JSM Clinical Case Reports
ISSN : 2373-9819
Launched : 2013
Journal of Cancer Biology and Research
ISSN : 2373-9436
Launched : 2013
Journal of Surgery and Transplantation Science
ISSN : 2379-0911
Launched : 2013
Journal of Dermatology and Clinical Research
ISSN : 2373-9371
Launched : 2013
JSM Gastroenterology and Hepatology
ISSN : 2373-9487
Launched : 2013
Annals of Nursing and Practice
ISSN : 2379-9501
Launched : 2014
JSM Dentistry
ISSN : 2333-7133
Launched : 2013
Author Information X