Abstract
Aqueous solubility is the property of utmost interest for predicting the behavior of chemical compounds inside body, since water serves as the most ubiquitous component of any living cell. Predictive quantitative structure–property relationship models on aqueous solubility try to explore the essential chemical information of molecules that control their dissolution ability. Considering the importance of solubility controlling the absorption, distribution, metabolism, excretion, and toxicity properties of drug and other such chemicals, attempts were made to develop predictive models following OECD guidelines on aqueous solubility of a large set (N = 565) of diverse drug, drug like compounds, and agrochemicals with extended topochemical atom (ETA) indices using suitable chemometric tools. Because of the prime involvement of hydrophobicity in solubilization of structurally complex and crystalline organic compounds, computed lipophilicity parameter ClogP was used. Models were also developed using various other non-ETA descriptors. Additional attempt was made to build models employing ETA, non-ETA, and ClogP parameters. All the models were subjected to rigorous statistical validation using multiple strategies and encouraging results were obtained for internal, external, and overall validation of the models. Comparative analysis performed on the prediction set (test set) using general solubility equation, and the best model developed here with ETA and ClogP parameters demonstrated better predictive potential of the latter model.




Similar content being viewed by others
References
Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 23:3–25
Clarke ED, Delaney JS (2003) Physical and molecular properties of agrochemicals: an analysis of screen inputs, hits, leads and products. Chimia 57:731–734
Klamt A, Eckert F, Hornig M, Beck ME, Bürger T (2002) Prediction of aqueous solubility of drugs and pesticides with COSMO-RS. J Comput Chem 23:275–281
McElroy NR, Jurs PC (2001) Prediction of aqueous solubility of heteroatom-containing organic compounds from molecular structure. J Chem Inf Comput Sci 41:1237–1247
Schuster D, Laggner C, Langer T (2005) Why drugs fail-a study on side effects in new chemical entities. Curr Pharm Des 11:3545–3559
Hansen NT, Kouskoumvekaki I, Jørgensen FS, Brunak S, Jo′nsdo′ttir SO (2006) Prediction of pH-dependent aqueous solubility of druglike molecules. J Chem Inf Model 46:2601–2609
Di L, Kerns EH (2006) Biological assay challenges from compound solubility: strategies for bioassay optimisation. Drug Discovery Today 11:446–451
McGovern SL, Caselli E, Grigorieff N, Shoichet BK (2002) A common mechanism underlying promiscuous inhibitors from virtual and high throughput screening. J Med Chem 45:1712–1722
van de Waterbeemd H, Smith DA, Beaumont K, Walker DK (2001) Property-based design: optimization of drug absorption and pharmacokinetics. J Med Chem 44:1–21
Center for Drug Evaluation and Research (2000) Guidance for industry. Rockville, MD, CDER/FDA. http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm070246.pdf. Accessed 26 April 2012
EMEA (2007) Committee for medicinal products for human use, concept paper on BCS-based biowaiver. EMEA, London, EMEA/CHMP/EWP/213035/2007
Stegemann S, Leveiller F, Franchi D, de Jong H, Lindén H (2007) When poor solubility becomes an issue: from early stage to proof of concept. Eur J Pharm Sci 31:249–261
Smith CJ, Hansch C (2000) The relative toxicity of compounds in mainstream cigarette smoke condensate. Food Chem Toxicol 38:637–646
Pogãcean MP, Gavrilescu M (2009) Plant protection products and their sustainable and environmentally friendly use. Environ Eng Manag J 8:607–627
Waichman AV, Römbke J, Ribeiro MOA, Nina NCS (2002) Use and fate of pesticides in the Amazon State, Brazil. Risk to human health and the environment. Environ Sci Pollut Res 9:423–428
Jain N, Yalkowsky SH (2001) Estimation of the aqueous solubility I: application to organic nonelectrolytes. J Pharm Sci 90:234–252
Faller B, Ertl P (2007) Computational approaches to determine drug solubility. Adv Drug Deliv Rev 59:533–545
Taskinen J (2000) Prediction of aqueous solubility in drug design. Curr Opin Drug Discov Dev 3:102–107
Jorgensena WL, Duffy EM (2002) Prediction of drug solubility from structure. Adv Drug Deliv Rev 54:355–366
Worth AP, Bassan A, De Bruijn J, Saliner AG, Netzeva T, Patlewicz G, Pavan M, Tsakovska I, Eisenreich S (2007) The role of the European Chemicals Bureau in promoting the regulatory use of (Q)SAR methods. SAR QSAR Environ Res 18:111–125
OECD Environment Health and Safety Publications Series on Testing and Assessment No. 69 (2007) Guidance document on the validation of (quantitative) structure-activity relationship [(Q)SAR] models. http://www.oecd.org/officialdocuments/displaydocumentpdf/?cote=env/jm/mono(2007)2&doclanguage=en. Accessed 26 April 2012
Bhattachar SN, Deschenes LA, Wesley JA (2006) Solubility: it’s not just for physical chemists. Drug Discovery Today 11:1012–1018
Yalkowsky SH, Banerjee S (1992) Aqueous solubility: methods of estimation for organic compounds. Marcel Dekker, New York
Peterson DL, Yalkowski SH (2001) Comparison of two methods for predicting aqueous solubility. J Chem Inf Comput Sci 41:1531–1534
Ran Y, Yalkowsky SH (2001) Prediction of drug solubility by the general solubility equation (GSE). J Chem Inf Comput Sci 41:354–357
Ran Y, Jain N, Yalkowsky SH (2001) Prediction of aqueous solubility of organic compounds by the general solubility equation (GSE). J Chem Inf Comput Sci 41:1207–1208
Meylan WM, Howard PH, Boethling RS (1996) Improved method for estimating water solubility from octanol/water coefficient. Environ Toxicol Chem 15:100–106
Meylan WM, Howard PH (2000) Estimating log P with atom/fragments and water solubility with logP. Perspect Drug Discovery Des 19:67–84
Myrdal P, Ward GH, Dannenfelser RM, Mishra DS, Yalkowsky SH (1992) AQUAFAC 1: aqueous Functional group activity coefficients: application to hydrocarbons. Chemosphere 24:1047–1061
Ruelle P, Rey-Mermet C, Buchmann M, Nam-Tran H, Kesselring U, Huyskens P (1991) A new predictive equation for the solubility of drugs based on the thermodynamics of mobile disorder. Pharm Res 8:840–850
Roy K, Das RN (2011) On some novel extended topochemical atom (ETA) parameters for effective encoding of chemical information and modeling of fundamental physicochemical properties. SAR QSAR Environ Res 22:451–472
Delaney JS (2005) Predicting aqueous solubility from structure. Drug Discovery Today 10:289–295
Huuskonen J (2001) Estimation of aqueous solubility in drug design. Comb Chem HTS 4:311–316
Huuskonen J, Livingstone DJ, Manallack DT (2008) Prediction of drug solubility from molecular structure using a drug-like training set. SAR QSAR Env Res 19:191–212
Yalkowsky SH, Dannelfelser RM (1990) The Arizona database of aqueous solubility. College of Pharmacy, University of Arizona, Tucson
O’Neill MJ, Smith A, Heckelman PE (eds) (2001) The Merck Index: an encyclopedia of chemicals, drugs, and biologicals, 13th edn. Whitehouse Station, Rahway
CambridgeSoft Corporation (2012) Cambridge USA, http://chemfinder.cambridgesoft.com/. Accessed 26 April 2012
Syracuse Research Corporation (2012) Syracuse, USA, http://www.syrres.com/esc/physprop.htm. Accessed 26 April 2012
PubChem (2012) PubChem is a linked database of compounds and provides fast chemical structure similarity search tool. http://pubchem.ncbi.nlm.nih.gov/. Accessed 26 April 2012
The National Institute of Standards and Technology (NIST) Chemistry WebBook is a database of chemicals compiled under the Standard Reference Data Program. http://webbook.nist.gov/chemistry/. Accessed 26 April 2012
ChemSpideris (2012) ChemSpideris a free chemical structure database governed by the Royal Society of Chemistry, Cambridge. http://www.chemspider.com/. Accessed 26 April 2012
Roy K, Ghosh G (2003) Introduction of extended topochemical atom (ETA) indices in the valence electron mobile (VEM) environment as tools for QSAR/QSPR studies. Internet Electron J Mol Des 2:599–620
Roy K, Ghosh G (2004) Introduction of extended topochemical atom (ETA) Indices in the valence electron mobile (VEM) environment as tools for QSAR/QSPR studies QSTR with extended topochemical atom indices. 2. Fish toxicity of substituted benzenes. J Chem Inf Comput Sci 44:559–567
Roy K, Ghosh G (2004) QSTR with extended topochemical atom indices: 3. Toxicity of nitrobenzenes to Tetrahymena pyriformis. QSAR Comb Sci 23:99–108
Roy K, Ghosh G (2004) QSTR with extended topochemical atom indices: 4. Modeling of the acute toxicity of phenylsulfonyl carboxylates to Vibrio fischeri using principal component factor analysis and principal component regression analysis. QSAR Comb Sci 23:526–535
Roy K, Ghosh G (2005) QSTR with extended topochemical atom indices. Part 5. Modeling of the acute toxicity of phenylsulfonyl carboxylates to Vibrio fischeri using genetic function approximation. Bioorg Med Chem 13:1185–1194
Roy K, Ghosh G (2006) QSTR with extended topochemical atom (ETA) indices: vI. Acute toxicity of benzene derivatives to tadpoles (Rana japonica). J Mol Model 12:306–316
Roy K, Sanyal I (2006) QSTR with extended topochemical atom indices: 7. QSAR of substituted benzenes to Saccharomyces cerevisiae. QSAR Comb Sci 25:359–371
Roy K, Ghosh G (2006) QSTR with extended topochemical atom (ETA) indices: 8. QSAR for the inhibition of substituted phenols on germination rate of Cucumis sativus using chemometric tools. QSAR Comb Sci 25:846–859
Roy K, Ghosh G (2007) QSTR with extended topochemical atom (ETA) indices: 9. Comparative QSAR for the toxicity of diverse functional organic compounds to Chlorella vulgaris using chemometric tools. Chemosphere 70:1–12
Roy K, Ghosh G (2008) QSTR with extended topochemical atom indices: 10. Modeling of toxicity of organic chemicals to humans using different chemometric tools. Chem Biol Drug Des 72:383–394
Roy K, Ghosh G (2009) QSTR with extended topochemical atom (ETA) indices. 11. Comparative QSAR of acute NSAID cytotoxicity in rat hepatocytes using chemometric tools. Mol Simul 35:648–659
Roy K, Ghosh G (2009) QSTR with extended topochemical atom (ETA) indices. 12. QSAR for the toxicity of diverse aromatic compounds to Tetrahymena pyriformis using chemometric tools. Chemosphere 77:999–1009
Roy K, Ghosh G (2009) QSTR with extended topochemical atom (ETA) Indices. 13. Modeling of hERG K+ channel blocking activity of diverse functional drugs using different chemometric tools. Mol Simul 15:1256–1268
Roy K, Das RN (2010) QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis. J Hazard Mater 183:913–922
Roy K, Das RN (2012) QSTR with extended topochemical atom (ETA) indices. 15. Development of predictive models for toxicity of organic chemicals against fathead minnow using second generation ETA indices. SAR QSAR Environ Res 23:125–140
Roy K, Sanyal I, Roy PP (2006) QSPR of the bio-concentration factors of nonionic organic compounds in fish using extended topochemical atom (ETA) indices. SAR QSAR Environ Res 17:563–582
Roy K, Sanyal I, Ghosh G (2006) QSPR of n-octanol/water partition coefficient of non-ionic organic compounds using extended topochemical atom (ETA) indices. QSAR Comb Sci 25:629–646
Roy K, Ghosh G (2010) Exploring QSARs with extended topochemical atom (ETA) indices for modeling chemical and drug toxicity. Curr Pharm Des 16:2625–2639
Roy K, Das RN (2011) On extended topochemical atom (ETA) indices for QSPR studies. In: Castro EA, Hagi AK (eds) Advanced methods and applications in chemoinformatics: research progress and new applications. IGI Global, Hershey
Roy K, Kabir H (2012) QSPR with extended topochemical atom (ETA) indices. Modeling of critical micelle concentration of non-ionic surfactants. Chem Engg Sci 73:86–98
Pal DK, Sengupta C, De AU (1988) A new topochemical descriptor (TAU) in molecular connectivity concept: part I—aliphatic compounds. Ind J Chem 27B:734–739
Pal DK, Purkayastha SK, Sengupta C, De AU (1992) Quantitative structure—property relationships with TAU indices: part I—research octane numbers of alkane fuel molecules. Ind J Chem 31B:109–114
Roy K, Saha A (2003) QSPR with TAU indices: water solubility of diverse functional acyclic compounds. Internet Electron J Mol Des 2:475–491
Roy K, Saha A (2004) QSPR with TAU indices: boiling points of sulfides and thiols. Ind J Chem 43A:1369–1376
Roy K, Saha A (2005) QSPR with TAU indices: molar refractivity of diverse functional acyclic compounds. Ind J Chem 44B:1693–1707
Leo AJ (1991) CLOGP, version 3.63. Daylight Chemical Information Systems, Irvine
Roy PP, Leonard JT, Roy K (2008) Exploring the impact of the size of training sets for the development of predictive QSAR models. Chemom Intell Lab Syst 90:31–42
Stephens MA (1976) Asymptotic results for goodness-of-fit statistics with unknown parameters. Ann Stat 4:357–369
Massey FJ Jr (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46:68–78
Lilliefors HW (1967) On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J Am Stat Assoc 64:399–402
Hutter MC (2011) Determining the degree of randomness of descriptors in linear regression equations with respect to the data size. J Chem Inf Model 51:3099–3104
Darlington RB (1990) Regression and linear models. McGrawHill, New York
Wold S (1995) In: van de Waterbeemd H (ed) Chemometric methods in molecular design. VCH, Weinheim
Wold H (1966) In: David FN (ed) Research papers in statistics, Festschrift for J. Neyman. Wiley, New York
Holland J (1975) Adaptation in artificial and natural systems. University of Michigan Press, Ann Arbor
Friedman J (1988) Multivariate adaptive regression splines, technical report No. 102. Laboratory for Computational Statistics, Department of Statistics, Stanford University, Stanford, CA, Novemer (revised August 1990)
Rogers D, Hopfinger AJ (1994) Application of genetic function approximation to quantitative structure—activity relationships and quantitative structure—property relationships. J Chem Inf Comput Sci 34:854–866
Yap CW (2011) PaDEL-Descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1466–1474
Cerius 2 Version 4.10 (2005) Accelrys Inc., San Diego, CA, USA. Software. http://www.accelrys.com. Accessed 26 April 2012
MINITAB, Minitab Inc., USA (2012) Software. http://www.minitab.com/en-US/default.aspx. Accessed 26 April 2012
STATISTICA, STATSOFT Inc., USA (2012) Software. http://www.statsoft.com. Accessed 26 April 2012
Snedecor GW, Cochran WG (1967) Statistical methods. Oxford & IBH, New Delhi
Hawkins DM, Basak SC, Mills D (2003) Assessing model fit by cross-validation. J Chem Inf Comput Sci 43:579–586
Schürmann G, Ebert R-U, Chen J, Wang B, Kühne R (2008) External validation and prediction employing the predictive squared correlation coefficients test set activity mean vs training set activity mean. J Chem Inf Model 48:2140–2145
Roy PP, Paul S, Mitra I, Roy K (2009) On two novel parameters for validation of predictive QSAR models. Molecules 14:1660–1701
Mitra I, Roy PP, Kar S, Ojha PK, Roy K (2010) On further application of r 2m as a metric for validation of QSAR models. J Chemom 24:22–33
Ojha PK, Mitra I, Das RN, Roy K (2011) Further exploring r 2m metrics for validation of QSPR models. Chemom Intell Lab Syst 107:194–205
Roy K, Mitra I, Kar S, Ojha PK, Das RN, Kabir H (2012) Comparative studies on some metrics for external validation of QSPR models. J Chem Inf Model 52:396–408
Todeschini R (2010) Milano chemometrics, Italy (personal communication)
Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58:109–130
Acknowledgments
Financial assistance from the Council of Scientific and Industrial Research, Government of India, New Delhi in the form of a fellowship to R.N.D. is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Das, R.N., Roy, K. QSPR with extended topochemical atom (ETA) indices. 4. Modeling aqueous solubility of drug like molecules and agrochemicals following OECD guidelines. Struct Chem 24, 303–331 (2013). https://doi.org/10.1007/s11224-012-0080-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11224-012-0080-5