Loading

JSM Environmental Science and Ecology

Some Statistical and Research Design Considerations towards Sound Scientific Inquires

Editorial | Open Access

  • 1. ENST Department, University of Maryland, USA
+ Show More - Show Less
Corresponding Authors
Bahram Momen, ENST Department, University of Maryland, College Park, 20742, USA
STATISTICAL INFERENCE

Scientific endeavors generally involve examination of samples’ characteristics to draw conclusions at broader scales, and this requires statistical analysis. If the parameters (characteristics) of the populations to be compared are known, there is no need for statistical analysis to make an inference. The population parameters of interest may simply be compared to gauge their differences with certainty (ignoring Heisenberg’s uncertainty principle). Although this seems to be a simple concept, searching the web reveals ample questions regarding the appropriate statistical procedures to compare population parameters.

FUNDAMENTAL ASSUMPTION IN MAKING ANY INFERENCE

Statistical analysis requires specific assumptions without which valid inference may not be made. Some of these assumptions should be strictly examined, followed, and/or enforced, but some may be treated as ‘ideal conditions’. An assumption that should be met strictly before making any inference to a larger scale is ‘independence’ of observational or experimental units within study samples for a desired scope of inference. In observational (mensurative) or experimental (manipulative) studies, sample size or replication, respectively, should indicate the number of independent units. Although there should be no excuse for missing such an important concept especially after Hurlbert’s (Ecological Monographs, 54:2,187-211, 1984) and many other follow-up articles, violation of such assumption abounds in presentations at scientific meetings and to a lesser degree in published articles.

REPLICATION, REPLICATION, REPLICATION

Units within a sample are just numbers for a mathematical statistician. However, they may represent watersheds, communities, hospitals, or units of an imaginary populations (those in mind when applying certain treatments), etc., that are hard, expensive, or sometimes impossible to replicate. In these cases, such limitations should be acknowledged, and qualitative, rather than statistical, comparison of the data should be performed. Detection of lack of independence is almost impossible when analyzing the data without proper information regarding the setup of the research and the way data were collected. Fortunately, the researcher has full control over ‘independence’ by replicating the units that represent groups (treatments) independently for a given scope of inference. Replication enables calculation of the magnitude of ‘noise’ against which the magnitude of a ‘signal’ is compared. Without replication the denominator of any statistical test becomes zero prohibiting further calculations. However, minimal replication allowing statistical calculation to proceed does not suffice. Depending on the availability of resources, replication should be maximized to increase the precision (reproducibility) of estimates and the power of the test to detect a real effect. Although it may not seem intuitive, increased replication does not affect the Type I error on average

IDEAL CONDITIONS

While independence should be strictly enforced for a desired scope of inference, other ideal conditions are less crucial either due to some mathematical concepts or due to the availability of analytical procedures. The effect of Central Limit Theorem (CLT: distribution of means of samples drawn from a non-normal distributions approaches normality with increasing sample size) on alleviation of the lack of normality (making ANOVA robust with regard to moderate departures from normality) seems to be unappreciated by researchers; perhaps because normality of ‘what distribution’ is rarely explained. In comparative studies, in which sample means are compared, the focus should be on the characteristics (e.g., normality) of the sampling mean distribution (or mean differences) rather than the distributions of the parent populations. This may not be mentioned clearly in applied statistical text books, and hence, researchers spend much effort on the examination of the normality of the collected data or more correctly on ‘residuals’ (treatment-mean adjusted data), using appropriate routines in statistical packages.

TEST OF IDEAL CONDITIONS AND DATA TRANSFORMATION

Available tests of normality cannot directly test normality of the sampling mean distribution as there is usually one mean for each group (treatment) unless simulated data are used. In addition, all statistical tests within the Frequentist statistics framework are intended to reject the null hypothesis (i.e., they cannot be used to prove the null). In the test of normality, it is hoped that the normality would not be rejected; but not rejecting normality does not mean that normality was proved. Finally, rejection of normality is affected greatly by the sample size. Through simulations it can be seen that normality may be rejected due to large sample size no matter how trivial (of no practical importance) the departure from normality is, while a severe departure from normality may not be detected due to small sample size. Undue emphasis on normality and ignoring the CLT may result in ‘data transformation’, which changes the nature of the responses, and magnitudes of variances and P-values, and hence, may lead to incorrect conclusions. Moreover, after data transformation, results should be reported and interpreted consistently based on the transformed units. However, this is cumbersome in addition to the fact that in many cases the transformed units may not make biological sense. Search of literature reveals abundant use of unnecessary data transformations and inappropriate reporting of the results.

Homogeneity of variances (HOV) is another ANOVA ideal condition with which researchers may be too obsessed albeit ANOVA is robust in terms of departure from HOV. Moreover, major statistical packages have routines to perform appropriate calculations based on homogeneous or heterogeneous variances. Perhaps a more detrimental issue of concern should be the correlation of variances with the group (treatment) means (as revealed by residual plots), regardless of homogeneous or heterogeneous variances.

CAUSATION OR CORRELATION

Causation versus correlation is another important consideration that has been dissuaded and addressed abundantly, but seems to be neglected frequently. It should be noted that causality can only be established through controlled experiments in which the researcher controls all variables being held constant or varied. Of course, the results of observational studies revealing correlations are of great value and can be used to suggest causation by an informed researcher in the field. Researchers may be aware of the issue but still use vague or inappropriate terminology (e.g., effect, in response to, a function of, related to, associated with, correlated with, etc. used interchangeably) to convey the results.

RESEARCH SETTING AND STATISTICAL ANALYSIS

Statistical analysis and experimental design concepts are linked closely, but differ sufficiently to warrant due attention. Since many educational programs offer only statistical analysis courses or offer statistical analysis courses prior to the experimental design courses, many students with limited time and course requirements may only take one statistical analysis course hoping to do justice to their research and publications. However, it is the design of the experiment that governs the validity of the research and its results. Experimental design or observational approach not only involves some preliminary steps such as setting objectives and scientific hypothesis (those that are falsifiable), realistic treatments levels and structure, use of covariates, etc., but also guides appropriate statistical analysis. The simplest case is the choice of two-independent or paired t-test that would depend on the experimental approach and the way the data were collected. Therefore, knowledge of research methods and experimental design is vital towards conducting a sound research and successful publication.

P-VALUE

I remember having a hard time linking the concept of the probability of an event to occur, such as predicting only 5% chance of rain during an important soccer game, with the fact that it actually occurred (100%). Subsequently, it was the conditional probability concept that became challenging. But perhaps it was not just me; rather Capt. Yossarian (a character in Catch 22 by J. Heller) might have also been surprised to see 99% of his comrades to be sick where he was recovering! Yet more challenging became the concept, use, and level of the P-value usually reported in scientific literature to declare statistical significance (probability of the Type I error). But again, it seems that it is not just me as an Editorial article in Nature Medicine (11: 1, 2005) acknowledged evidence that the authors of 31% of articles published by Nature Medicine in 2000, misunderstood the meaning of the P value.

There are more issues related to the use of the P-value beyond misunderstanding its meaning. The P-value routinely calculated by statistical packages and reported in scientific articles is usually intended to show the probability of the Type I error. This P value does not indicate the probability of existence of an effect or lack of it; rather it is a conditional probability (i.e., the probability of the observed results or more extreme ones pending the null were true --lack of an effect). The null is usually rejected if the P- value is less than, or equal to, a stated significance level (e.g., 0.01, 0.05, or 0.10) depending on the researchers’ liking or the publication venue. While it has been argued that there is nothing magic about these levels, their importance is emphasized to eliminate subjectivity. Of course, these levels were selected subjectively, when no computer program was available, to obviate printing of thousands of pages of tables for critical values at a given range of P. Faced with choosing such levels and assuming that positive results have a greater chance of being published (albeit incorrectly), misuse of statistical analysis may occur even unknowingly to achieve a desired P-value to declare a ‘significant’ effect. However, current statistical packages report the exact P-value obviating the use of related tables. Therefore, it may just be prudent to allow the scientist, who is conducting the research and is familiar with both the field of study and the research limitations such as sample size, decide the ‘existence’ of the effect s/he observes and just report the P-value (whatever it may be) for the reader to make her/his own decision. This would perhaps satisfy the suggestion by Higgs (American Scientist, 101:1-9, 2013) to abandon the term ‘significance’ in scientific literature. Considering the above and allowing the publication of the negative findings could result in a substantial decrease in the misuse and abuse of statistics as well as in clearer reporting of the research protocol and findings.

 

TYPE I AND II ERRORS

Traditionally, researchers have focused on protecting against and reporting the results, based on the type I error. However, in many recent fields, where there is a risk involved if a real effect is not detected, the Type II error should be emphasized. The relationship between the Type I and II errors is one-way. Protecting against and decreasing the Type I error (by decreasing the significance level) increases the Type II error. And consequently, all the strategies to protect against the Type I error (including use of conservative multiple mean comparison tests) would result in increased Type II errors (and thus decreased power) to detect a real effect. A simple example to illustrate which type of error should be emphasized through the use of conservative or sensitive tests is the choice of an ‘alarm system’ desirable for a cheap car in a wealthy neighborhood or in an airport. A powerful (sensitive, liberal) alarm system, that may result in frequent ‘false positive’ (Type I error) in a safe neighborhood may not be needed for a cheap car that can be replaced with not much harm. However, we all favor a very sensitive alarm system at an airport to scream due to any penny in our pocket (increased false positive) hoping to increase the probability of detecting a forbidden item when there is one, and hence, decreasing false negative or Type II error (i.e., increasing power).

FURTHER CONSIDERATIONS

Discussion of using linear, linearized, fixed, random, mixed, and non-liner systems, as well as the choice of Frequentist or Bayesian statistics warrants further and much more detailed attention. If this note is read with so many questions still remaining, it serves a purpose. I frequently hear from fellow faculty members and research scientists as to why their students, who have taken a course in statistical analysis, are unable to be statistically independent. I hope one of these days I can convince them that the field of statistics and experimental design is so broad that no one or several course(s) can make anyone statistically independent. To this, I might add the complexities involved with learning advanced statistical packages and their appropriate routines

Citation

Momen B (2013) Some Statistical and Research Design Considerations towards Sound Scientific Inquires. JSM Environ Sci Ecol 1(1): 1003.

Received : 09 Aug 2013
Accepted : 30 Aug 2013
Published : 02 Sep 2013
Journals
Annals of Otolaryngology and Rhinology
ISSN : 2379-948X
Launched : 2014
JSM Schizophrenia
Launched : 2016
Journal of Nausea
Launched : 2020
JSM Internal Medicine
Launched : 2016
JSM Hepatitis
Launched : 2016
JSM Oro Facial Surgeries
ISSN : 2578-3211
Launched : 2016
Journal of Human Nutrition and Food Science
ISSN : 2333-6706
Launched : 2013
JSM Regenerative Medicine and Bioengineering
ISSN : 2379-0490
Launched : 2013
JSM Spine
ISSN : 2578-3181
Launched : 2016
Archives of Palliative Care
ISSN : 2573-1165
Launched : 2016
JSM Nutritional Disorders
ISSN : 2578-3203
Launched : 2017
Annals of Neurodegenerative Disorders
ISSN : 2476-2032
Launched : 2016
Journal of Fever
ISSN : 2641-7782
Launched : 2017
JSM Bone Marrow Research
ISSN : 2578-3351
Launched : 2016
JSM Mathematics and Statistics
ISSN : 2578-3173
Launched : 2014
Journal of Autoimmunity and Research
ISSN : 2573-1173
Launched : 2014
JSM Arthritis
ISSN : 2475-9155
Launched : 2016
JSM Head and Neck Cancer-Cases and Reviews
ISSN : 2573-1610
Launched : 2016
JSM General Surgery Cases and Images
ISSN : 2573-1564
Launched : 2016
JSM Anatomy and Physiology
ISSN : 2573-1262
Launched : 2016
JSM Dental Surgery
ISSN : 2573-1548
Launched : 2016
Annals of Emergency Surgery
ISSN : 2573-1017
Launched : 2016
Annals of Mens Health and Wellness
ISSN : 2641-7707
Launched : 2017
Journal of Preventive Medicine and Health Care
ISSN : 2576-0084
Launched : 2018
Journal of Chronic Diseases and Management
ISSN : 2573-1300
Launched : 2016
Annals of Vaccines and Immunization
ISSN : 2378-9379
Launched : 2014
JSM Heart Surgery Cases and Images
ISSN : 2578-3157
Launched : 2016
Annals of Reproductive Medicine and Treatment
ISSN : 2573-1092
Launched : 2016
JSM Brain Science
ISSN : 2573-1289
Launched : 2016
JSM Biomarkers
ISSN : 2578-3815
Launched : 2014
JSM Biology
ISSN : 2475-9392
Launched : 2016
Archives of Stem Cell and Research
ISSN : 2578-3580
Launched : 2014
Annals of Clinical and Medical Microbiology
ISSN : 2578-3629
Launched : 2014
JSM Pediatric Surgery
ISSN : 2578-3149
Launched : 2017
Journal of Memory Disorder and Rehabilitation
ISSN : 2578-319X
Launched : 2016
JSM Tropical Medicine and Research
ISSN : 2578-3165
Launched : 2016
JSM Head and Face Medicine
ISSN : 2578-3793
Launched : 2016
JSM Cardiothoracic Surgery
ISSN : 2573-1297
Launched : 2016
JSM Bone and Joint Diseases
ISSN : 2578-3351
Launched : 2017
JSM Bioavailability and Bioequivalence
ISSN : 2641-7812
Launched : 2017
JSM Atherosclerosis
ISSN : 2573-1270
Launched : 2016
Journal of Genitourinary Disorders
ISSN : 2641-7790
Launched : 2017
Journal of Fractures and Sprains
ISSN : 2578-3831
Launched : 2016
Journal of Autism and Epilepsy
ISSN : 2641-7774
Launched : 2016
Annals of Marine Biology and Research
ISSN : 2573-105X
Launched : 2014
JSM Health Education & Primary Health Care
ISSN : 2578-3777
Launched : 2016
JSM Communication Disorders
ISSN : 2578-3807
Launched : 2016
Annals of Musculoskeletal Disorders
ISSN : 2578-3599
Launched : 2016
Annals of Virology and Research
ISSN : 2573-1122
Launched : 2014
JSM Renal Medicine
ISSN : 2573-1637
Launched : 2016
Journal of Muscle Health
ISSN : 2578-3823
Launched : 2016
JSM Genetics and Genomics
ISSN : 2334-1823
Launched : 2013
JSM Anxiety and Depression
ISSN : 2475-9139
Launched : 2016
Clinical Journal of Heart Diseases
ISSN : 2641-7766
Launched : 2016
Annals of Medicinal Chemistry and Research
ISSN : 2378-9336
Launched : 2014
JSM Pain and Management
ISSN : 2578-3378
Launched : 2016
JSM Women's Health
ISSN : 2578-3696
Launched : 2016
Clinical Research in HIV or AIDS
ISSN : 2374-0094
Launched : 2013
Journal of Endocrinology, Diabetes and Obesity
ISSN : 2333-6692
Launched : 2013
Journal of Substance Abuse and Alcoholism
ISSN : 2373-9363
Launched : 2013
JSM Neurosurgery and Spine
ISSN : 2373-9479
Launched : 2013
Journal of Liver and Clinical Research
ISSN : 2379-0830
Launched : 2014
Journal of Drug Design and Research
ISSN : 2379-089X
Launched : 2014
JSM Clinical Oncology and Research
ISSN : 2373-938X
Launched : 2013
JSM Bioinformatics, Genomics and Proteomics
ISSN : 2576-1102
Launched : 2014
JSM Chemistry
ISSN : 2334-1831
Launched : 2013
Journal of Trauma and Care
ISSN : 2573-1246
Launched : 2014
JSM Surgical Oncology and Research
ISSN : 2578-3688
Launched : 2016
Annals of Food Processing and Preservation
ISSN : 2573-1033
Launched : 2016
Journal of Radiology and Radiation Therapy
ISSN : 2333-7095
Launched : 2013
JSM Physical Medicine and Rehabilitation
ISSN : 2578-3572
Launched : 2016
Annals of Clinical Pathology
ISSN : 2373-9282
Launched : 2013
Annals of Cardiovascular Diseases
ISSN : 2641-7731
Launched : 2016
Journal of Behavior
ISSN : 2576-0076
Launched : 2016
Annals of Clinical and Experimental Metabolism
ISSN : 2572-2492
Launched : 2016
Clinical Research in Infectious Diseases
ISSN : 2379-0636
Launched : 2013
JSM Microbiology
ISSN : 2333-6455
Launched : 2013
Journal of Urology and Research
ISSN : 2379-951X
Launched : 2014
Journal of Family Medicine and Community Health
ISSN : 2379-0547
Launched : 2013
Annals of Pregnancy and Care
ISSN : 2578-336X
Launched : 2017
JSM Cell and Developmental Biology
ISSN : 2379-061X
Launched : 2013
Annals of Aquaculture and Research
ISSN : 2379-0881
Launched : 2014
Clinical Research in Pulmonology
ISSN : 2333-6625
Launched : 2013
Journal of Immunology and Clinical Research
ISSN : 2333-6714
Launched : 2013
Annals of Forensic Research and Analysis
ISSN : 2378-9476
Launched : 2014
JSM Biochemistry and Molecular Biology
ISSN : 2333-7109
Launched : 2013
Annals of Breast Cancer Research
ISSN : 2641-7685
Launched : 2016
Annals of Gerontology and Geriatric Research
ISSN : 2378-9409
Launched : 2014
Journal of Sleep Medicine and Disorders
ISSN : 2379-0822
Launched : 2014
JSM Burns and Trauma
ISSN : 2475-9406
Launched : 2016
Chemical Engineering and Process Techniques
ISSN : 2333-6633
Launched : 2013
Annals of Clinical Cytology and Pathology
ISSN : 2475-9430
Launched : 2014
JSM Allergy and Asthma
ISSN : 2573-1254
Launched : 2016
Journal of Neurological Disorders and Stroke
ISSN : 2334-2307
Launched : 2013
Annals of Sports Medicine and Research
ISSN : 2379-0571
Launched : 2014
JSM Sexual Medicine
ISSN : 2578-3718
Launched : 2016
Annals of Vascular Medicine and Research
ISSN : 2378-9344
Launched : 2014
JSM Biotechnology and Biomedical Engineering
ISSN : 2333-7117
Launched : 2013
Journal of Hematology and Transfusion
ISSN : 2333-6684
Launched : 2013
Journal of Cardiology and Clinical Research
ISSN : 2333-6676
Launched : 2013
JSM Nanotechnology and Nanomedicine
ISSN : 2334-1815
Launched : 2013
Journal of Ear, Nose and Throat Disorders
ISSN : 2475-9473
Launched : 2016
JSM Ophthalmology
ISSN : 2333-6447
Launched : 2013
Journal of Pharmacology and Clinical Toxicology
ISSN : 2333-7079
Launched : 2013
Annals of Psychiatry and Mental Health
ISSN : 2374-0124
Launched : 2013
Medical Journal of Obstetrics and Gynecology
ISSN : 2333-6439
Launched : 2013
Annals of Pediatrics and Child Health
ISSN : 2373-9312
Launched : 2013
JSM Clinical Pharmaceutics
ISSN : 2379-9498
Launched : 2014
JSM Foot and Ankle
ISSN : 2475-9112
Launched : 2016
JSM Alzheimer's Disease and Related Dementia
ISSN : 2378-9565
Launched : 2014
Journal of Addiction Medicine and Therapy
ISSN : 2333-665X
Launched : 2013
Journal of Veterinary Medicine and Research
ISSN : 2378-931X
Launched : 2013
Annals of Public Health and Research
ISSN : 2378-9328
Launched : 2014
Annals of Orthopedics and Rheumatology
ISSN : 2373-9290
Launched : 2013
Journal of Clinical Nephrology and Research
ISSN : 2379-0652
Launched : 2014
Annals of Community Medicine and Practice
ISSN : 2475-9465
Launched : 2014
Annals of Biometrics and Biostatistics
ISSN : 2374-0116
Launched : 2013
JSM Clinical Case Reports
ISSN : 2373-9819
Launched : 2013
Journal of Cancer Biology and Research
ISSN : 2373-9436
Launched : 2013
Journal of Surgery and Transplantation Science
ISSN : 2379-0911
Launched : 2013
Journal of Dermatology and Clinical Research
ISSN : 2373-9371
Launched : 2013
JSM Gastroenterology and Hepatology
ISSN : 2373-9487
Launched : 2013
Annals of Nursing and Practice
ISSN : 2379-9501
Launched : 2014
JSM Dentistry
ISSN : 2333-7133
Launched : 2013
Author Information X