Loading

JSM Clinical Pharmaceutics

“Everything Out” Validation Approach for Qsar Models of Chemical Mixtures

Research Article | Open Access | Volume 1 | Issue 1

  • 1. Department of Molecular Structure and Cheminformatics, AV Bogatsky Physical Chemical Institute, Ukraine
  • 2. Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, University of North Carolina, USA
  • 3. Department of Chemical-Technological, Odessa National Polytechnic University, Ukraine
  • 4. Institute of Computer Systems, Odessa National Polytechnic University, Ukraine
+ Show More - Show Less
Corresponding Authors
Eugene N Muratov and Alexander Tropsha, Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, BeardHall 301, CB#7568, Chapel Hill, NC, 27599, USA, Tel: +19199663459; Fax: +19199660204;
Abstract

Established strategies for validating QSAR models of binary mixtures of chemicals are not applicable to the most challenging case, which is the prediction of binary mixtures created by two compounds not present in the initial training set. In this study, we have addressed this challenge by introducing the “Everything Out” validation strategy where the external sets are deliberately formed by all binary combinations of two compounds excluded from the training set. The model accuracy is evaluated by the error of prediction for the external sets. We show that the “Everything Out” approach affords lower error of prediction for binary mixtures formed by two new compounds and similar error of prediction for mixtures with one new compound as compared to the alternative “Compound Out” validation approach. We posit that “Everything Out” should be employed as the preferred approach to validating QSAR models of binary mixtures.

Keywords

•    External validation
•    Molecular modeling
•    Structure-activity relationship
•    QSAR of mixtures

Citation

Muratov EN, Varlamova EV, Kuzmin VE, Artemenko AG, Muratov NN, et al. (2014) “Everything Out” Validation Approach for Qsar Models of Chemical Mixtures. J Clin Pharm 1(1): 1005.

INTRODUCTION

Chemical mixtures are widely used in products of pharmaceutical industry, agriculture, and cosmetics. The experimental safety testing of individual environmental organic compounds presents significant challenges as illustrated by extensive and costly projects such as REACH [1] or TOX21 [2]. These experimental challenges are dramatically exacerbated in the case of chemical mixtures due to the complexity of their compositions including molecular diversity and the relative ratios of their components. Nevertheless, since manufactured compounds rarely enter the environment independently, the evaluation of potential health and safety impacts of compound mixtures is perhaps even more critical than that of individual chemicals.

Computational approaches such as cheminformatics, especially QSAR modeling, may provide an effective alternative to experimental methods to reduce time and cost of developing new mixture formulations with the desired properties and safety profiles [3]. However, although modern QSAR methodology is fairly successful in dealing with individual compounds, there are no mature, well-established approaches that could be directly used to model properties of mixtures. This is mostly due to the absence (or lack) of reliable experimental data on mixture properties, adequate descriptors of mixtures, and robust strategies for the external validation of developed models. To the best of our knowledge, the issue of rigorous QSAR modeling of mixtures has been addressed only in a few publications [3–5].

Rigorous external validation is the integral part of any QSAR exercise, irrespective to the nature of the chemical objects under investigation [5]. However, proper external validation of QSAR models for mixtures is much less straightforward in comparison to traditional QSAR analysis [3]. Here, the conventional external cross-validation procedure,[6] when individual compounds are randomly placed in the external set (or fold), i.e., no information regarding excluded compound is present in the training set, is not scrupulous enough. The reason is obviously due to the fact that in traditional QSAR application each entry in a dataset is a single compound whereas a mixture consists of at least two compounds that could be blended in different ratios, i.e., each mixture could be represented by several entries. In traditional QSAR, the placement of several randomly-selected compounds into the external set will result in the complete absence of structural information about these compounds in the training set. On the contrary, in QSAR modeling of mixtures the information about a mixture created by certain compounds would be still available because other entries corresponding to the same mixture with different ratios of the same components will remain in the training set. As a result, the model’s predictive performance will be over-estimated. These considerations prompted us to start devising more rigorous protocols for external validation of QSAR models of mixtures [4,5].

Previously, we have introduced three different strategies for external validation depending on the initial data and the actual application of developed models:[5] (i) “Points Out” – prediction of the investigated property for any composition of the mixtures from modeling set, (ii) “Mixtures Out” –filling of missing data in the initial mixtures’ data matrix (i.e., prediction of the investigated property for mixtures with unknown activity created by pure compounds from the modeling set), and (iii) “Compounds Out” – prediction of the investigated property for mixtures formed by a novel pure compound that was absent in the modeling set. These strategies address the situations of predicting new mixtures created by (i) two compounds from the modeling set and (ii) a new compound and a compound from the existing matrix of mixtures. However, the most interesting and the most difficult case of evaluating the model accuracy for predicting a mixture created by two new compounds still remains uncovered.

The goal of this study is to introduce the “Everything Out” validation strategy for QSAR modeling of mixtures. This procedure simulates the addition of novel compounds to the existing matrix of mixtures and gives a reasonable idea about the expected error of prediction for the mixtures created by two new compounds that were absent in the modeling set. Although the error of prediction for this strategy is expected to be the largest, QSAR models passing “Everything Out” validation should be able to predict the investigated property for mixtures created by the compounds outside of the modeling set taking into account models’ applicability domain. Thus, we posit that “Everything Out” is the most rigorous method for external validation of QSAR models of mixtures.

MATERIALS AND METHODS

“Everything Out” validation strategy

Following this new strategy, the data matrix of mixtures is divided into three parts (see Figure 1A).

"Everything Out" strategy of external validation in QSAR modeling of mixtures.

Figure 1: "Everything Out" strategy of external validation in QSAR modeling of mixtures.

For example, let’s consider completely filled matrix of mixtures created by 10 compounds. The first part (compounds C1-C5 and all of their binary mixtures) is used as a training set; the second part (compounds C6-C10 and all of their mutual mixtures) is used as the “Everything out” external set; and the remaining part is employed as “Compounds out” external set [5]. It is important to stress that mixtures created by compounds from the same group belong only to either “training” or “everything out” part and not to both of them simultaneously; meanwhile, mixtures created by compounds from different parts create the “compounds out” part (see Figure 1). Then “training” and “everything out” sections are switched, i.e., “training” part becomes the “everything out” one and vice versa (Figure 1B). “Compounds out” part remains the same for both folds. Thus, every mixture in the “everything out” set is always created by two compounds that are absent in the training set. If the mixture matrix is completely filled, compounds that created this matrix could be sorted randomly or alphabetically. In case of a sparse data matrix, supervised selection of training and test sets is needed to keep the size of the sets more or less equal. However, even despite the supervised process of fold creation, sometimes “everything out” and “compound out” folds could be predicted poorly because some of them can be created mostly by compounds and mixtures that are very different from those in the training set. One could shuffle (re-order) the matrix of mixtures several times and repeat modeling to obtain more consistent prediction performances.

Data set

There is still a significant lack of experimental data for mixtures. Therefore, for the purposes of this study we have used the vapor/liquid equilibrium diagrams for bubble point temperatures of binary liquid mixtures we modeled earlier [4]. The dataset consisted of 67 pure liquids and 167 mixtures of these liquids. Each mixture was represented by several (7-57) points; thus, 167 mixtures in the modeling set have been described by 3,185 data points. More details about this dataset could be found elsewhere [4].

RESULTS AND DISCUSSION

The models were built using random forest and SiRMS descriptors [7] and validated using three strategies described above, i.e., “Points Out”, “Mixtures Out”, and “Compounds Out”. Detailed description of model building and validation could be found elsewhere [4]. Then, an independent external set consisting of 94 new mixtures made of 66 compounds was used for model validation as well. Among these 94 mixtures (632 data points), 27 combinations contain no new pure compounds, 63 mixtures (1,386 data points) contain one new compound, and four remaining mixtures were created by compounds that were absent in the modeling set. The results of 5-fold external cross-validation and performance of the models obtained using “Compounds Out” and “Everything Out” strategies are shown in Table 1. External set mixtures containing one and two new compounds were treated separately in order to estimate the error for the corresponding validation strategy (“Compounds Out” and “Everything Out” respectively).

We have preserved the initial splitting for modeling and external sets, model building and validation workflow, and the applicability domain estimation procedure from the previous study [4]. “Everything Out” set was formed from the modeling set compounds as shown in Figure 1. Twenty eight splits were generated in order to achieve more consistent results and to insure that every mixture was present in the “Everything Out” set at least once. Then, developed models were applied to the external set consisted of 95 mixtures (2065 data points). As obvious from the results (see Table 1), the RMSE for the “Compounds out” set is comparable to that obtained in the previous study,[4] i.e., 12.1 K vs 10.3 K. It means that, using “Everything Out” strategy, we could adequately estimate the error for mixtures containing one new component. However, the RMSE for “Compounds Out” estimated on the external set is significantly higher (~19K).

Expectedly, the error of prediction for “Everything Out” strategy estimated on the modeling set using 5-fold external cross-validation is higher than that for “Compounds out” strategy (17.1 K vs 12.1 K). This is fully in tune with our expectations that it is harder to predict a mixture containing two new components than a mixture with one new component. Meanwhile, the results obtained on the external set are not as encouraging because the error of prediction is somewhat higher (~23 K). However, we have to emphasize that the “Everything Out” set was very small (only eight compounds creating four sets of mixtures), and after taking into account the applicability domain for filtering out chemicals too dissimilar from the modeling set compounds, it was reduced to only four compounds and two sets of mixtures (44 data points). Thus, one must be extremely cautious with RMSE values computed for this very limited number of mixtures. Certainly, the new data obtained with a larger set of mixtures created by two new compounds are needed to make this comparison more reliable. However, our results clearly show that (i) “Everything out” strategy has similar performance with “compounds out” strategy for estimating the prediction error for the mixtures including one new compounds absent in a modeling set; (ii) “Everything Out” is more rigorous and thus more suitable for estimating the prediction error for mixtures created by both compounds absent in a modeling set than the “Compounds Out” strategy.

Table 1: RMSE (K) for different strategies of external QSAR model validation.

  Modeling set, 5-FECVa External set
  1 new compound 2 new compounds 1 new compoundb 2 new compoundsc
Compounds Out[4] 10.3 NA 18.8 23.1
Everything Out 12.1 17.1

Abbreviations: a 5-fold external cross-validation; b 62 combinations containing one new compound absent in the modeling set;
c 2 combinations containing two new compounds absent in the modeling set.
 

CONCLUSION

In conclusion, we have developed a robust and useful modeling and validation protocol to predict the properties of binary mixtures created by new compounds not found in the modeling set. This approach is universal and could be used for assessing the prediction error for both binary mixtures containing just one new component (expanding upon the application of “Compounds Out” strategy developed by us earlier) as well as for mixtures created by two new compounds. We suggest that the “Everything Out” strategy should be used as the method of choice in developing and validating QSAR models of mixtures.

ACKNOWLEDGEMENT

E.M., D.F., and A.T. gratefully acknowledge the financial support from NIH (grant GM66940) and EPA (RD 83382501 and R832720). E.M., A.A., and V.K. are thankful to STCU (Project 407) for the financial support. A.T. acknowledges partial support from Russian Scientific Foundation (project 14-43-00024).

Muratov EN, Varlamova EV, Kuzmin VE, Artemenko AG, Muratov NN, et al. (2014) “Everything Out” Validation Approach for Qsar Models of Chemical Mixtures. J Clin Pharm 1(1): 1005.

Received : 17 Oct 2014
Accepted : 18 Nov 2014
Published : 20 Nov 2014
Journals
Annals of Otolaryngology and Rhinology
ISSN : 2379-948X
Launched : 2014
JSM Schizophrenia
Launched : 2016
Journal of Nausea
Launched : 2020
JSM Internal Medicine
Launched : 2016
JSM Hepatitis
Launched : 2016
JSM Oro Facial Surgeries
ISSN : 2578-3211
Launched : 2016
Journal of Human Nutrition and Food Science
ISSN : 2333-6706
Launched : 2013
JSM Regenerative Medicine and Bioengineering
ISSN : 2379-0490
Launched : 2013
JSM Spine
ISSN : 2578-3181
Launched : 2016
Archives of Palliative Care
ISSN : 2573-1165
Launched : 2016
JSM Nutritional Disorders
ISSN : 2578-3203
Launched : 2017
Annals of Neurodegenerative Disorders
ISSN : 2476-2032
Launched : 2016
Journal of Fever
ISSN : 2641-7782
Launched : 2017
JSM Bone Marrow Research
ISSN : 2578-3351
Launched : 2016
JSM Mathematics and Statistics
ISSN : 2578-3173
Launched : 2014
Journal of Autoimmunity and Research
ISSN : 2573-1173
Launched : 2014
JSM Arthritis
ISSN : 2475-9155
Launched : 2016
JSM Head and Neck Cancer-Cases and Reviews
ISSN : 2573-1610
Launched : 2016
JSM General Surgery Cases and Images
ISSN : 2573-1564
Launched : 2016
JSM Anatomy and Physiology
ISSN : 2573-1262
Launched : 2016
JSM Dental Surgery
ISSN : 2573-1548
Launched : 2016
Annals of Emergency Surgery
ISSN : 2573-1017
Launched : 2016
Annals of Mens Health and Wellness
ISSN : 2641-7707
Launched : 2017
Journal of Preventive Medicine and Health Care
ISSN : 2576-0084
Launched : 2018
Journal of Chronic Diseases and Management
ISSN : 2573-1300
Launched : 2016
Annals of Vaccines and Immunization
ISSN : 2378-9379
Launched : 2014
JSM Heart Surgery Cases and Images
ISSN : 2578-3157
Launched : 2016
Annals of Reproductive Medicine and Treatment
ISSN : 2573-1092
Launched : 2016
JSM Brain Science
ISSN : 2573-1289
Launched : 2016
JSM Biomarkers
ISSN : 2578-3815
Launched : 2014
JSM Biology
ISSN : 2475-9392
Launched : 2016
Archives of Stem Cell and Research
ISSN : 2578-3580
Launched : 2014
Annals of Clinical and Medical Microbiology
ISSN : 2578-3629
Launched : 2014
JSM Pediatric Surgery
ISSN : 2578-3149
Launched : 2017
Journal of Memory Disorder and Rehabilitation
ISSN : 2578-319X
Launched : 2016
JSM Tropical Medicine and Research
ISSN : 2578-3165
Launched : 2016
JSM Head and Face Medicine
ISSN : 2578-3793
Launched : 2016
JSM Cardiothoracic Surgery
ISSN : 2573-1297
Launched : 2016
JSM Bone and Joint Diseases
ISSN : 2578-3351
Launched : 2017
JSM Bioavailability and Bioequivalence
ISSN : 2641-7812
Launched : 2017
JSM Atherosclerosis
ISSN : 2573-1270
Launched : 2016
Journal of Genitourinary Disorders
ISSN : 2641-7790
Launched : 2017
Journal of Fractures and Sprains
ISSN : 2578-3831
Launched : 2016
Journal of Autism and Epilepsy
ISSN : 2641-7774
Launched : 2016
Annals of Marine Biology and Research
ISSN : 2573-105X
Launched : 2014
JSM Health Education & Primary Health Care
ISSN : 2578-3777
Launched : 2016
JSM Communication Disorders
ISSN : 2578-3807
Launched : 2016
Annals of Musculoskeletal Disorders
ISSN : 2578-3599
Launched : 2016
Annals of Virology and Research
ISSN : 2573-1122
Launched : 2014
JSM Renal Medicine
ISSN : 2573-1637
Launched : 2016
Journal of Muscle Health
ISSN : 2578-3823
Launched : 2016
JSM Genetics and Genomics
ISSN : 2334-1823
Launched : 2013
JSM Anxiety and Depression
ISSN : 2475-9139
Launched : 2016
Clinical Journal of Heart Diseases
ISSN : 2641-7766
Launched : 2016
Annals of Medicinal Chemistry and Research
ISSN : 2378-9336
Launched : 2014
JSM Pain and Management
ISSN : 2578-3378
Launched : 2016
JSM Women's Health
ISSN : 2578-3696
Launched : 2016
Clinical Research in HIV or AIDS
ISSN : 2374-0094
Launched : 2013
Journal of Endocrinology, Diabetes and Obesity
ISSN : 2333-6692
Launched : 2013
Journal of Substance Abuse and Alcoholism
ISSN : 2373-9363
Launched : 2013
JSM Neurosurgery and Spine
ISSN : 2373-9479
Launched : 2013
Journal of Liver and Clinical Research
ISSN : 2379-0830
Launched : 2014
Journal of Drug Design and Research
ISSN : 2379-089X
Launched : 2014
JSM Clinical Oncology and Research
ISSN : 2373-938X
Launched : 2013
JSM Bioinformatics, Genomics and Proteomics
ISSN : 2576-1102
Launched : 2014
JSM Chemistry
ISSN : 2334-1831
Launched : 2013
Journal of Trauma and Care
ISSN : 2573-1246
Launched : 2014
JSM Surgical Oncology and Research
ISSN : 2578-3688
Launched : 2016
Annals of Food Processing and Preservation
ISSN : 2573-1033
Launched : 2016
Journal of Radiology and Radiation Therapy
ISSN : 2333-7095
Launched : 2013
JSM Physical Medicine and Rehabilitation
ISSN : 2578-3572
Launched : 2016
Annals of Clinical Pathology
ISSN : 2373-9282
Launched : 2013
Annals of Cardiovascular Diseases
ISSN : 2641-7731
Launched : 2016
Journal of Behavior
ISSN : 2576-0076
Launched : 2016
Annals of Clinical and Experimental Metabolism
ISSN : 2572-2492
Launched : 2016
Clinical Research in Infectious Diseases
ISSN : 2379-0636
Launched : 2013
JSM Microbiology
ISSN : 2333-6455
Launched : 2013
Journal of Urology and Research
ISSN : 2379-951X
Launched : 2014
Journal of Family Medicine and Community Health
ISSN : 2379-0547
Launched : 2013
Annals of Pregnancy and Care
ISSN : 2578-336X
Launched : 2017
JSM Cell and Developmental Biology
ISSN : 2379-061X
Launched : 2013
Annals of Aquaculture and Research
ISSN : 2379-0881
Launched : 2014
Clinical Research in Pulmonology
ISSN : 2333-6625
Launched : 2013
Journal of Immunology and Clinical Research
ISSN : 2333-6714
Launched : 2013
Annals of Forensic Research and Analysis
ISSN : 2378-9476
Launched : 2014
JSM Biochemistry and Molecular Biology
ISSN : 2333-7109
Launched : 2013
Annals of Breast Cancer Research
ISSN : 2641-7685
Launched : 2016
Annals of Gerontology and Geriatric Research
ISSN : 2378-9409
Launched : 2014
Journal of Sleep Medicine and Disorders
ISSN : 2379-0822
Launched : 2014
JSM Burns and Trauma
ISSN : 2475-9406
Launched : 2016
Chemical Engineering and Process Techniques
ISSN : 2333-6633
Launched : 2013
Annals of Clinical Cytology and Pathology
ISSN : 2475-9430
Launched : 2014
JSM Allergy and Asthma
ISSN : 2573-1254
Launched : 2016
Journal of Neurological Disorders and Stroke
ISSN : 2334-2307
Launched : 2013
Annals of Sports Medicine and Research
ISSN : 2379-0571
Launched : 2014
JSM Sexual Medicine
ISSN : 2578-3718
Launched : 2016
Annals of Vascular Medicine and Research
ISSN : 2378-9344
Launched : 2014
JSM Biotechnology and Biomedical Engineering
ISSN : 2333-7117
Launched : 2013
Journal of Hematology and Transfusion
ISSN : 2333-6684
Launched : 2013
JSM Environmental Science and Ecology
ISSN : 2333-7141
Launched : 2013
Journal of Cardiology and Clinical Research
ISSN : 2333-6676
Launched : 2013
JSM Nanotechnology and Nanomedicine
ISSN : 2334-1815
Launched : 2013
Journal of Ear, Nose and Throat Disorders
ISSN : 2475-9473
Launched : 2016
JSM Ophthalmology
ISSN : 2333-6447
Launched : 2013
Journal of Pharmacology and Clinical Toxicology
ISSN : 2333-7079
Launched : 2013
Annals of Psychiatry and Mental Health
ISSN : 2374-0124
Launched : 2013
Medical Journal of Obstetrics and Gynecology
ISSN : 2333-6439
Launched : 2013
Annals of Pediatrics and Child Health
ISSN : 2373-9312
Launched : 2013
JSM Foot and Ankle
ISSN : 2475-9112
Launched : 2016
JSM Alzheimer's Disease and Related Dementia
ISSN : 2378-9565
Launched : 2014
Journal of Addiction Medicine and Therapy
ISSN : 2333-665X
Launched : 2013
Journal of Veterinary Medicine and Research
ISSN : 2378-931X
Launched : 2013
Annals of Public Health and Research
ISSN : 2378-9328
Launched : 2014
Annals of Orthopedics and Rheumatology
ISSN : 2373-9290
Launched : 2013
Journal of Clinical Nephrology and Research
ISSN : 2379-0652
Launched : 2014
Annals of Community Medicine and Practice
ISSN : 2475-9465
Launched : 2014
Annals of Biometrics and Biostatistics
ISSN : 2374-0116
Launched : 2013
JSM Clinical Case Reports
ISSN : 2373-9819
Launched : 2013
Journal of Cancer Biology and Research
ISSN : 2373-9436
Launched : 2013
Journal of Surgery and Transplantation Science
ISSN : 2379-0911
Launched : 2013
Journal of Dermatology and Clinical Research
ISSN : 2373-9371
Launched : 2013
JSM Gastroenterology and Hepatology
ISSN : 2373-9487
Launched : 2013
Annals of Nursing and Practice
ISSN : 2379-9501
Launched : 2014
JSM Dentistry
ISSN : 2333-7133
Launched : 2013
Author Information X