GOLMOHAMMADIHassanDASHTBOZORGIZahraKHOOSHECHIN Sajad
(1Young Researchers and Elite Club,Yadegar-e-Imam Khomeini(RAH)Shahr-e-Rey Branch,Islamic Azad University,Tehran,Iran;2Young Researchers and Elite Club,Central Tehran Branch,Islamic Azad University,Tehran,Iran)
Developing a Support Vector Machine Based QSPR Modelto Predict Gas-to-Benzene Solvation Enthalpy of Organic Compounds
GOLMOHAMMADIHassan1,*DASHTBOZORGIZahra2KHOOSHECHIN Sajad2
(1Young Researchers and Elite Club,Yadegar-e-Imam Khomeini(RAH)Shahr-e-Rey Branch,Islamic Azad University,Tehran,Iran;2Young Researchers and Elite Club,Central Tehran Branch,Islamic Azad University,Tehran,Iran)
The purpose of this paper is to present a novel way to building quantitative structure-property relationship(QSPR)models for predicting the gas-to-benzene solvation enthalpy(ΔHSolv)of 158 organic compounds based on molecular descriptors calculated from the structure alone.Differentkinds ofdescriptors were calculated for each compounds using dragon package.The variable selection technique of enhanced replacement method(ERM)was employed to selectoptimalsubsetofdescriptors.Our investigation reveals thatthe dependence ofphysico-chemicalproperties on solvation enthalpy is a nonlinear observable factand that ERM method is unable to modelthe solvation enthalpy accurately.The standard error value ofprediction set for support vector machine(SVM)is 1.681 kJ·mol-1while it is 4.624 kJ·mol-1for ERM.The results established thatthe calculatedΔHSolvvalues by SVM were in good agreementwith the experimentalones,and the performances ofthe SVM models were superior to those obtained by ERM one.This indicates that SVM can be used as an alternative modeling toolfor QSPR studies.
Quantitative structure-property relationship;Gas-to-benzene solvation enthalpy;Descriptor; Enhanced replacementmethod;Supportvector machine
The study of the solvation of organic molecules is very significant to the recognizing of chemical reactivity in the liquid phase,which plays a fundamental role in chemistry and biochemistry1,2.One basic quantity thatexemplifies the energetics of the solvation process is the standard enthalpy of solvation of a species from the gas phase.The enthalpy of solvation(ΔHSolv)is a very sensitive property to structuralvariants in solution,and as intermolecular forces in the original state are basically non-existent,the thermodynamic property is closely related to the influence produced in the solvent by the introduction of a solute molecule.
It is worthy to note thatthe thermodynamic properties related to a solvation process do not depend completely,even for very dilute solutions,on solute/solvent interaction as the addition of a solute contains the formation of a cavity of sufficient size to hold the solute molecule.The interruption ofthe structure of the solvent needs a quantity of energy which depends on the solute molecule size and the solvent structure.For structured solvents,cavity formation is always an importantterm,and the thermodynamic properties of solvation may lead to improper assumptions if the cavity term is nottaken into account.Therefore any experimental effort to determine the effect created on a solvent by adding a solute improves the reputation of thermodynamics in the interpretation of solvation.
In order to reach a molecular levelunderstanding of solvation thermodynamics,dissolution can be considered as a two-step process3-6:formation of an appropriate cavity to host the solute molecule,and turning on the solute-solventattractive interactions. The first step is an inevitable consequence of the fact that each molecule inhabits a limited volume impassable to the others,and thatliquids are a condensed state of the matter.The second step accounts for the existence of at least dispersive interactions between the solute molecule and the surrounding solventmolecules.
Enthalpies of solvation are significant in that the numerical values provide valuable information concerning solute-solvent interactions.From a thermodynamic standpoint,the solvation enthalpy can be used to predictgas-to-condensed phase partition coefficient,K,at other temperatures from measured partition coefficientdata at298.15 K and the solute′s enthalpy of solvation, ΔHSolv,between the two condensed phases7:
log10K(at T)-log10K(at298.15 K)=
The gas-to-organic solvent enthalpy is defined by subtracting enthalpy of solution,ΔHSoln,from the solute′s standard molarenthalpy of vaporization8,ΔHVap,298K,or standard molar enthalpy of sublimation9,ΔHSub,298K,at298.15 K:
Calorimetric methods permitthe experimental determination of the solvation enthalpies10,11.But the experimentalmethodologies related withΔHSolvdeterminations are laborious,expensive and time-consuming,and often require a suitable amountof the pure compounds.These methods are notsuitable for high-throughput screening of large numbers of different compounds.To address this problem,researchers have developed theoretical and computationalmethodologies to estimate solvation enthalpies.QSPR provides a talented method for the estimation of physico-chemical properties based on descriptors derived exclusively from the molecular structure to fit experimentaldata12.The advantage of this approach over other methods lies in the factthatitneeds only the knowledge of chemical structure and is notdependenton any experiment properties.The support vector machine(SVM)is a novelalgorithm developed from the machine learning society.Due to its wonderful generalization performance,the SVM has attracted interestand obtained extensive application13-19.In recent years,SVM has also exposed enormous performance in QSPR studies owing to its ability to construe the nonlinear relationships between molecularstructure and properties20-25.
In the presentstudy,for the firsttime,SVM was performed for the prediction of gas-to-benzene solvation enthalpies of 158 organic compounds at298 K.The main goalwas to produce a QSPR modelthat could be used for the prediction ofΔHSolvof a diverse set of compounds from their molecular structures alone and to exhibit the flexible modeling capability of SVM.Enhanced replacementmethod(ERM)was also used to generate quantitative linearrelationship to compare with the results obtained by SVM.
2.1 Experimentaldata set
The experimentaldata setof gas-to-benzene solvation enthalpies of 158 organic compounds was taken out from the values reported by Mintz et al.26.The molecules in data set contained alkanes,alkenes,alkyl halides,alcohols,phenols,ethers,esters, ketones,aldehydes,amines,anilines,nitriles,nitro compounds, polycyclic hydrocarbons,heterocyclic compounds,benzene derivatives,etc.,are summarized in Table S1(shown in Supporting Information).The solvation enthalpies of all molecules encompassed in data set were obtained under the same conditions and refer to a temperature of 298 K.The solvation enthalpies fallin the range from-1.26 to-100.80 kJ·mol-1for methane and 18-crown-6,respectively.The whole datasetwas randomly divided into two groups.Atraining setof 120 compounds and a prediction setof 38 compounds.The training setwas used to build and optimize the QSPR modeland the external prediction setwas used to evaluate the prediction powerof the obtained model.
2.2 Molecular descriptor calculation and screening
Descriptors are the ultimate consequence of a logical and mathematical process which converts chemical information encoded within a symbolic representation of a molecule into a convenient number or the result of some standardized experiment27.So as to calculate the molecular descriptors of a chemical compound,the structure of the molecule should be drawn by proper software,atfirst.The chemicalstructures of the molecules for this study were drawn using Hyperchem package(Hyper-ChemVersion7.0)28,and the finalgeometries were achieved with the semi-empirical PM3 method.Optimization was preceded by the Polak-Rebiere algorithm to reach 0.01 root mean square gradient.In a nextstep,the Hyperchem outputfiles were used by the Dragon package(Version 3)to calculate molecular descriptors29.
2.3 Variable selection method
Replacement method(RM)is a well-organized optimization device which produces multivariable linear regression models by searching in a dataset having D descriptors for an optimal subsection having d<<D ones with smallest standard error.The quality of the results realized with this method is close to performing a particular fullsearch of molecular descriptors,though entails less computationalwork30.
ERM variable selection was executed in MATLAB(Version 7.0)31.
2.4 Support vector machine(SVM)
Supportvector machines(SVMs)were suggested by Vapnik in the late 1960s.Since they are a talented estimator in data-driven fields,they have received imperative attention of researchers in numerous classes of practicalapplications32.SVMis a managed technique to achieve difference classification of multidimensional feature-vectors33,34.
Principally,SVMs estimated the function with three different features:(1)approximation of the regression in a set of linear functions,(2)defining the regression approximation as the problem of risk minimization regarding theε-insensitive loss function and(3)minimizing the risk based on the SRM principle35,36.
2.5 Applicability domain
The applicability domain(AD)for the QSPR models was developed so as to acquire reliable predictions for externalsamples. The AD is a theoreticalarea in the chemicalspace,defined by the model descriptors and modeled response,and consequently by the nature of the chemicals in the training set,as characterized in each modelby specific molecular descriptors37.The AD allows one to assessmentthe uncertainty in the estimation of a specific molecule based on how similar itis to the compounds used to constructthe model38-40.
In presentstudy,we performed the Williams plotto evaluate the AD of our QSPR model.The Williams plot provides a plot of standardized residuals versus the leverage values.The leverage value(h)measures the distance from the centroid of the training setand hifor the i th compound is calculated from the descriptor matrix(X)as follows41:
where xiis a row vector of descriptors for i th compound;X is the matrix of descriptors for the training set.The leverage is appropriate for assessing the degree of extrapolation,its limitof normal values is setas:
In addition,considering an exceptional QSPR modelshould be highly predictive,the conditions suggested by Golbraikh and Tropsha43were also engaged to further confirm the predictive aptitude for the prediction set.
where p is the number of descriptors involved in the model,and n is the numberof compounds in the training set.
Leverage greater than h*for the training set means that the chemical is highly powerful in determining the model,while for the prediction set,it means that the prediction is the result of considerable extrapolation of the modeland may notbe consistent.
2.6 Externalvalidation
Model validation is one of the most significant processes of QSPR development.42In order to verify the capability of modelto predict the solvation enthalpy of new compounds,the external validation technique was performed for the prediction set.The determination coefficientof estimation of prediction set(R2)and the root mean square error of prediction(RMSEP)were used as the measures of the predictive power for the models.
Here the parameters are represented as follows:
where yexpand ycalare the experimentaland the calculated values, respectively;yˉtrais the average value of the training set.R2is the correlation coefficient between the experimentaland calculated values in the prediction set,R20and R′02are the coefficients of determination,and k and k′are slopes of regression lines through the origin of calculated versus experimental and experimental versus calculated,respectively.yˉcaland yˉexpare the mean values of the calculated and experimental values of the prediction set,respectively.
3.1 Molecular diversity analysis
An influential QSPR model capable of estimation new compounds should be based on principally diverse chemicals in the training data and therefore,for a general modeldevelopment,itis significantto determine the structuraldiversity of a dataset.43In order to perform this process,the Euclidean distance between each two molecules of i and j in selected descriptor space,(dij),was computed according to following equation:
In the above equation,m is the number of descriptors in descriptor matrix,and n is the number of molecules.The parameters of xikand xjkare the k th descriptors of i and j molecules,respectively.In the nextstep the mean distances of one molecule to other molecules(dˉi)were computed according to:
Then the mean distances were normalized within the interval ofzero to one and plotted againstthe p ofchemicals as can be seen in Fig.1.Scattering of the chemicals in this plotspecifies that the training and prediction sets are illustrative of the entire data set. The training set with a comprehensive representation of the chemistry space was adequate to confirm models′stability and the diversity of prediction setcan evidence the predictive skillof the model.
3.2 Linear modeling
After the pre-reduction step of the descriptors calculated by Dragon,totally 425 descriptors were reserved for each compound. To select the bestdescriptor subset,425 descriptors were used as inputs for ERMvariable selection technique.When adding extra variable did not improve the performance of the model considerably,it means thatthe idealsubsetsize was achieved.Finally, a 4-variables modelwas selected as the optimal ERMmodel.
Table S1 presents the data setand corresponding observed ERM predicted values of the gas-to-benzene solvation enthalpies for all molecules studied in this investigation.Table 1 shows the specifications of selected ERM model.The multi-collinearity between the above five descriptors were identified by calculating their variation inflation factors(VIF),which can be calculated as follows:
Fig.1 Results of diversity analysis
where R is the correlation coefficient of the multiple regression between the variables in the model.If VIF equals to 1,then no inter-correlation exists for each variable;if VIF falls into the range of 1―5,the related modelis satisfactory;and if VIF is larger than 10,the related model is unsteady and a considerable recheck would be mandatory44.The corresponding VIF values of the five descriptors are shown in lastcolumn of Table 1.As can be seen from this Table,allthe implemented variables have VIF values of less than 5,representing that the achieved model has fulfilled statisticalconsequence.
To examine the comparative significance in addition to the influence of each descriptor in the model,for each descriptor,the value of the mean effect(MF)was calculated using the equation below(Eq.(20)):
In this equation,MFjsignifies the mean effect for the considered descriptor j,βjis the coefficientof the descriptor j,dijstands for the value of the target descriptors for each molecule and m is the number of the descriptors in the model.45The MF value represents the relative significance of each descriptor compared with the other ones in the recommended model.Its sign displays the variation direction in the values of the properties owing to the increase(or reduction)of this descriptor values.The calculated values of MFs are represented in Table 1 and are also plotted in Fig.2.Table 2 shows the correlation matrix between selected descriptors which are orthogonal and independent(R<0.5)that they were used in developmentof QSPR models.
Fig.2 Plot of descriptor′s mean effects
Table 1 Specification of descriptors selected by the enhanced replacement method(ERM)
3.3 Nonlinear modelling
So as to develop a more precise model,SVM was employed to construct a model by the training set compounds based on the same subset of descriptors.In this model,radial bias function (RBF)was used as Kernel function and the values of capacity parameter(C),epsilon(ε),and the Kernelparameter(γ)optimized. For optimizing these parameters,their values were changed in the range of 10―100 for C,0.01―0.1 forεand 0.1―1 forγ,and the efficiency of SVMs was checked.
Initially,the kernel function should be decided,which determines the sample distribution in the mapping space.Next,corresponding parameters,i.e.δ,of the kernelfunction significantly influence the number of support vectors which has an adjacent relation with the performance of the SVM and training time.Too many support vectors could yield over-fitting which greatly increase the training time.Besides,δcontrols the amplitude of the RBF function and,consequently,controls the generalization aptitude of SVM.The plot ofδversus RMSE on the LOO crossvalidation is shown in Fig.3.As can be seen from this figure,the optimalδwas 0.3.
Theε-insensitive parameter inhibits the whole training set meeting boundary conditions and consequently permits for the opportunity of sparsity in the dual formulation′s solution.In a computation procedure using SVM,the optimal value for?intensely associated with the type of noise presentin the data,which is typically unidentified.The RMSE of LOO cross validation on different epsilon is shown in Fig.4,and its optimal value was found to be 0.05.
Finally,the outcome of capacity parameter C was verified.This parameter controls the trade-off between maximizing the margin and minimizing the training error.If C is very low then insufficient pressure willbe placed on fitting the training data.Alternatively,underthe condition in which C is too large,the algorithm willoverfit the training data.However,the prediction error is barely influenced by C.The plotof RMSE versus C value is shown in Fig.5 regarding valuesδ=0.3,ε=0.05 and lastly the optimal value of 70 was selected for the term C.
Table 2 Correlation matrix for descriptors employed in this work
Fig.3 Gamma versus RMS error on LOO cross-validation [C=100,ε=0.1]
Table S1(shown in Supporting Information)presents the SVM predicted values of the solvation enthalpy for allmolecules studied in this study.The predicted results of the optimal SVMmodelare shown in Table 3.The modelgave a RMS error of 1.067 for the training set and 1.681 kJ·mol-1for the prediction set,and the corresponding correlation coefficients(R)were 0.997 and 0.995, respectively.Fig.6 shows the plot of the SVM predicted versus experimentalvalues for solvation enthalpy of allof the molecules in data set.In order to build reliable and predictive models,care must be taken in selecting training and prediction sets.When the prediction setis representative of the training set,one can obtain an accurate estimate of the model′s performance.The residuals of the SVM calculated values of the solvation enthalpy are plotted against the experimental values in Fig.7.The propagation of the residuals in both sides of zero line indicates that no systematic error exists in the constructed QSPR model.
Also,we can developed our modelfor the prediction of gas-to benzene solvation enthalpy of compounds at the temperatures other than 298 K by using ofΔHSolvof organic compounds which were obtained atdifferenttemperatures.
Fig.4 Epsilon versus RMS error on LOO cross-validation [C=100,γ=0.5]
Fig.5 Capacity factor versus RMS error on LOO cross-validation[γ=0.5,ε=0.06]
Table 3 Statisticalparameters obtained using the ERMand SVMmodels
Fig.6 Plot of SVMcalculated versus experimental gas-to-benzene solvation enthalpy
3.4 Descriptors interpretation
From a practicalopinion,interpreting the descriptors applied in the models could provide some insight into factors that are probable to administer the solvation enthalpy and help us comprehend which interactions may play an essential role in the solvation enthalpy.
The descriptors presented in this QSPR model are:complementary information content(neighbourhood symmetry of 1-order)(CIC1),solvation connectivity index chi-1(X1Sol),R maximal autocorrelation of lag 1/weighted by atomic Sandersonelectronegativities(R1e+)and Geary autocorrelation-lag 1/ weighted by atomic Sanderson electronegativities(GATS1e).
For evaluation of the relative significance and influence of each descriptor in the model,the value of mean effect(ME)was calculated and shown in Table 1.ME presented in this Table,permits allocating a greater importance to those molecular descriptors with larger absolute mean effectvalues.
The first descriptor according to its mean effect is solvation connectivity index chi-1(X1Sol).This topologicaldescriptor was proposed by Zefirov and Palyulin46in 1991 in order to treat the enthalpies of non-specific solvation and is calculated as follow:
Fig.7 Plot of SVMresidual versus experimentalvalues ofgas-to-benzene solvation enthalpy
where m is the order of index;summation is over allsub-graphs of order m;δiδj??δkare connectivities of vertexes of sub-graph; and ZiZj??Zkare coefficients characterizing the atom size,which coincide to the number of the period in the Periodic Table.The term 1 just normalizes values of Xmsol to provide their coi-
2m+1ncidence with the connectivity index Xm for the elements of the second row.The negative value of mean effect in the RMmodel indicates thatthe descriptor contributes negatively to the value of ΔHSolv.
The second descriptor according to its mean effect is the complementary information content(neighborhood symmetry of 1-order)(CIC1).This topologicalinformation index is based on neighbor degrees and edge multiplicity and is calculated as follows47:
where A is the atom number and the r th order CICr,measures the deviation of IC,from its maximum value,which corresponds to the vertex partition into equivalence classes containing one element.The numericalvalue of ICris calculated as:
where the summation g runs over the G equivalence classes,Agis the cardinality of the g th equivalence class,A is the totalnumber of atoms and Pgis the probability of randomly selecting a vertex of the g th class.Itrepresents a measure of structuralcomplexity per vertex.The positive value of mean effect for this descriptor in the RMmodelindicates thatthis descriptor contributes positively to values of solvation enthalpy.
The next descriptor is Geary autocorrelation―lag 1/weighted by atomic Sanderson electronegativities(GATS1e).This twodimensional(2D)Autocorrelation descriptor in general clarifies how the considered property is distributed along the topological structure and is defined:
where wiis any atomic property,wˉis its average value on the molecule,A is the atom number,d is the considered topological distance(i.e.the lag in the autocorrelation terms),δij,is a Kronecker delta(δij=1 if dij=d,zero otherwise).Δis the sum of the Kronecker deltas,i.e.the number of vertex pairs at distance equal to d.48The positive coefficientfor this descriptor reveals thatΔHSolvincreases as value of the GATS1e increases.
The lastdescriptor described here is R maximalautocorrelation oflag 1/weighted by atomic Sanderson electronegativities(R1e+). This descriptor is newly developed three-dimensional(3D)geometry topology and atomic weightassembly(GETAWAY)descriptor that was presented by Consonni et al.49,50.GETAWAY descriptors encode geometrical information from the influence matrix,topological information given by molecular graph and chemical information from selected atomic properties.One type of these molecular descriptors is R-GETAWAYand represented by Rk( w)thatcalculated as follows.
where rijis the 3D geometric distances between each pair of atoms i and j,dijis the topologicaldistance between atoms i and j,d is the topologicaldiameter,hiiand hjjare diagonalterms of the H matrix and d is a Dirac-delta function.The value of H was calculated from the molecular matrix M(M has A rows corresponding to the atoms in a molecule and three columns corresponding to the Cartesian coordinates x,y,z of each atom in optimized molecular structure)as follows:
where the superscript T refers to the transposed matrix.The diagonal elements h of the H matrix,called leverage,encode atomic information and represent the influence of each molecule atom in determining the hole shape of molecule;for example mantle atoms always have higher h values than atoms near the molecule center. Moreover,the magnitude of the maximum leverage in the molecule depends on the size and shape of the molecule itself.Lower leverage can be found for atoms in molecules of spherical shape, while higher leverage for atoms in more linear molecules.
The negative sign of mean effectfor this descriptor in the RM modelreveals thatby increasing the value of this descriptor,the values ofΔHSolvdecrease.
From the above discussion,it can be detected that all de-scriptors involved in the QSPR model have physicalmeaning,and that these descriptors can account for structural features influencing the solvation enthalpies of the interested molecules.
3.5 Comparison of the results obtained by different QSPRapproaches
The results of different QSPR models are tabulated in Table 3. The correlation coefficient(R)between experimental and predicted solvation enthalpy by RM and SVMare 0.967 and 0.997, respectively for training set and 0.965 and 0.995,respectively for the prediction set.The standard errors of training and prediction sets for the RMmodel are 3.503 and 4.624,respectively which would be compared with the values of 1.067 and 1.681,respectively,for the SVM model.
As can be seen from Table 3,the result of SVM model is superior to those attained by ERMmethod.Comparison the results achieved by SVM to those obtained by ERM method,we found that the SVM performed better than the other method.It also confirmed the improved generalization skill for SVM,and exhibited that SVM can be used as a powerful chemometrics tool for QSPR researches.It is notable to consider that as a general machine learning method,SVM is rooted in the structural risk minimization principle,which minimizes an upper bound of the generalization error rather than minimizes the training error.
3.6 Applicability domain Analysis
The domain of applicability describes the consistency boundaries for a model.Itis a plotof predictive aptitude(standardized cross validated residuals)versus the leverage of a model,and also recognized as the William plot.The Williams plot was used to instinctively demonstrate a visual image of the outliers:the response outliers(Youtliers)and the structurally influentialcompounds(Xoutliers).The Y outliers are those compounds thattheir standardized residuals are greater than 3.0 standard deviation units (>3 s)while the leverage value hiis relatively low.The Xoutliers are compounds with the leverage values greater than the threshold value(hi>h*)meanwhile the relatively low standard deviation.51
To imagine the applicability domain of SVMmodel,the William's plot was characterized for training and prediction sets in Fig.8.In this plot,the verticalline specifies the limitof X outliers is determined by the warning leverage value(h*).As can be seen in this figure,no compound has leverage value greater than the cutoff value of h*=0.125.The horizontal lines describe the compound with standardized residuals greater than 3 standard deviation units is an outlier.As can be seen in the Fig.8,all chemicals in the prediction set are in the applicability domain of model and there is no outlier response compound for prediction sets,which demonstrates the reliability of the predictions.
3.7 Externalvalidation result
In order to estimate the predictive authority of the proposed models using differentmethods,external validation technique was carried as explained in“Methodology”section.The results of validation recommended by Golbraikh and Tropsha for the externalprediction setwere listed in Table 4.
All the statistical parameters in Table 4 meet the conditions proposed by Golbraikh and Tropsha,so we can see thatmodels have a good and acceptable predictive ability.As mentioned above,the bestmodelwas obtained based on SVMmodel.Itcould be concluded that the result of external validation for the prediction set satisfied Golbraikh and Tropsha's condition,thus further verifying the predictability for the prediction setin SVM model.
Table 4 Statisticalcriteria of externalvalidation(prediction set) of the proposed QSPR models
In this paper,QSPR models based on RM and SVMtechniques have been developed for predicting the gas-to-benzene solvation enthalpies of a diverse set of organic compounds from the molecular structure for the first time.Results obtained,show that nonlinear models using SVM based on the same setof descriptors produced superior models with a good predictive aptitude than the other model.By performing modelvalidation,itcan be concluded that the presented model is an appropriate model and can be effectively used to predict theΔHSolvof organic compounds with accuracy similar to the accuracy of experimentalΔHSolvdetermination.It can be reasonably concluded that the planned model would be expected to predictΔHSolvfor new organic compounds or for other organic compounds for which experimentalvalues are unknown.
Supporting Information:available free of charge via the internetathttp://www.whxb.pku.edu.cn.
(4)Graziano,G.Biophys.Chem.1999,82,69.doi:10.1016/S0301-4622(99)00063-0
(5)Graziano,G.J.Phys.Chem.B 2000,104,9249.doi:10.1021/ jp001461
(6)Garde,S.;Garcia,A.E.;Pratt,L.R.;Hummer,G.Biophys. Chem.1999,78,21.doi:10.1016/S0301-4622(99)00018-6
(7)Mintz,C.;Burton,K.;Acree,W.E.,Jr.;Abraham,M.H.Fluid Phase Equilibr.2007,258,191.doi:10.1016/j.fluid.2007.06.016
(8)Chickos,J.S.;Acr
(9)ee,W.E.,Jr.J.Phys.Chem.Ref.Data 2003,32,519. doi:10.1063/1.1529214
(10)Chickos,J.S.;Acree,W.E.,Jr.J.Phys.Chem.Ref.Data 2002, 31,537.doi:10.1063/1.1475333
(11)Borges does Santos,R.M.;Muralha,V.S.F.;Correia,C.F.; Sim?es,J.A.M.J.Am.Chem.Soc.2001,123,12670. doi:10.1021/ja010703w
(12)Laarhoven,L.J.J.;Mulder,P.;Wayner,D.D.M.Acc.Chem. Res.1999,32,342.doi:10.1021/ar9703443
(13)Hansch,C.;Leo,A.Exploring QSAR:Fundamentals and Applications in Chemistry and Biology,American Chemical Society,Washington DC,1995.doi:10.1021/jm950902o
(14)Bao,L.;Sun,Z.R.FEBS Lett.2002,521,109.doi:10.1016/ S0014-5793(02)02835-1
(15)Belousov,A.I.;Verzakov,S.A.;Von Frese,J.Chemom.Intell. Lab.Syst.2002,64,15.doi:10.1016/S0169-7439(02)00046-1
(16)Cai,Y.D.;Liu,X.J.;Xu,X.B.;Chou,K.C.Comput.Chem. 2002,26,293.doi:10.1016/S0097-8485(01)00113-9
(17)Morris,C.W.;Autret,A.;Boddy,L.Ecol.Model.2001,146,57. doi:10.1016/S0304-3800(01)00296-4
(18)Song,M.H.;Breneman,C.M.;Bi,J.B.;Sukumar,N.;Bennett, K.P.;Cramer,S.;Tugcu,N.J.Chem.Inf.Comput.Sci.2002, 42,1347.doi:10.1021/ci025580t
(19)Liu,H.X.;Zhang,R.S.;Luan,F.;Yao,X.J.;Liu,M.C.;Hu,Z. D.;Fan,B.T.J.Chem.Inf.Comput.Sci.2003,43,900. doi:10.1021/ci0256438
(20)Liu,H.X.;Zhang,R.S.;Yao,X.J.;Liu,M.C.;Hu,Z.D.;Fan, B.T.J.Chem.Inf.Comput.Sci.2003,43,1288.doi:10.1021/ ci0340355
(21)Golmohammadi,H.;Dashtbozorgi,Z.;Acree,W.E.,Jr.Struct. Chem.2013,24,1799.doi:10.1007/s11224-013-0222-4
(22)Golmohammadi,H.;Dashtbozorgi,Z.;Acree,W.E.,Jr.Phys. Chem.Liq.2013,51,182.doi:10.1080/00319104.2012.708932
(23)Dashtbozorgi,Z.;Golmohammadi,H.;Acree,W.E.,Jr. Thermochim.Acta 2012,539,7.doi:10.1016/j.tca.2012.03.017
(24)Golmohammadi,H.;Dashtbozorgi,Z.;Acree,W.E.,Jr.Mol. Inf.2012,31,867.doi:10.1002/minf.201200091
(25)Dashtbozorgi,Z.;Golmohammadi,H.;Acree,W.E.,Jr.Eur.J. Pharm.Sci.2012,47,421.doi:10.1016/j.ejps.2012.06.021
(26)Mintz,C.;Clark,M.;Burton,K.;Acree,W.E.,Jr.;Abraham,M. H.QSAR Comb.Sci.2007,26,881.doi:10.1002/ qsar.200630152
(27)Toubaei,A.;Golmohammadi,H.;Dashtbozorgi,Z.;Acree,W. E.,Jr.J.Mol.Liq.2012,175,24.doi:10.1016/j. molliq.2012.08.006
(28)Todeschini,R.;Consonni,V.Molecular Descriptors for Chemoinformatics.Wiley VCH:Weinheim,2009.doi:10.1002/ 9783527628766.ch22
(29)Hyperchem,re.4.for Windows,Autodesk,Sansalito,CA,1995.
(30)Todeschini,R.;Consonni,V.;Pavan,M.Dragon Software, Milano,2002.
(31)Mercader,A.G.;Duchowicz,P.R.;Fernández,F.M.;Castro,E. A.J.Chem.Inf.Model.2011,51,1575.doi:10.1021/ci200079b
(32)MATLAB 7.0,The Mathworks Inc.,Natick,MA,USA,2005, http://www.mathworks.com.
(33)Baghban,A.;Ahmadi,M.A.;Pouladi,B.;Amanna,B. J.Supercrit.Fluids 2015,101,184.doi:10.1016/j. supflu.2015.03.004
(34)Vapnik,V.N.;Lerner,A.Autom.Remote Control1963,24,774.
(35)Vapnik,V.N.;Chervonenkis,A.Y.Autom.Remote Control 1964,25,821.
(36)Rojas,C.;Duchowicz,P.R.;Tripaldi,P.;Pis Diez,R. Chemometr.Intell.Lab.Syst.2015,140,126.doi:10.1016/j. chemolab.2014.09.020
(37)Mercader,G.;Duchowicz,P.R.;Fernández,F.M.;Castro,E.A. Chemometr.Intell.Lab.Syst.2008,92,138.doi:10.1016/j. chemolab.2008.02.005
(38)Gramatica,P.QSAR Comb.Sci.2007,26,694.doi:10.1002/ qsar.200610151
(39)Cao,D.S.;Liang,Y.Z.;Xu,Q.S.;Li,H.D.;Chen,X. J.Comput.Chem.2010,31,592.doi:10.1002/jcc.21351
(40)Yan,J.;Huang,J.H.;He,M.;Lu,H.B.;Yang,R.;Kong,B.; Xu,Q.S.;Liang,Y.Z.J.Sep.Sci.2013,36,2464.doi:10.1002/ jssc.201300254
(41)Cao,D.S.;Liang,Y.Z.;Xu,Q.S.;Yun,Y.H.;Li,H.D. J.Comput.Aided Mol.Des.2011,25,67.doi:10.1007/s10822-010-9401-1
(42)Eriksson,L.;Jaworska,J.;Worth,A.P.;Cronin,M.T.; McDowell,R.M.;Gramatica,P.Health Perspect.2003,111, 1361.doi:10.1289/ehp.5758
(43)Golbraikh,A.;Shen,M.;Xiao,Z.;Xiao,Y.;Lee,K.H.; Tropsha,A.J.Comput.Aided Mol.Des.2003,17,241. doi:10.1023/A:1025386326946
(44)Golbraikh,A.;Tropsha,A.J.Mol.Graph.Model.2002,20,269. doi:10.1016/S1093-3263(01)00123-1
(45)Agrawal,V.K.;Khadikar,P.V.Bioorg.Med.Chem.2001,911, 3035.doi:10.1016/S0968-0896(01)00211-5
(46)Pourbasheer,E.;Riahi,S.;Ganjali,M.R.;Norouzi,P. J.Enzyme.Inhib.Med.Chem.2010,256,844.doi:10.3109/ 14756361003757893
(47)Antipin,I.S.;Arslanov,N.A.;Palyulin,V.A.;Konovalov,A.I.; Zefirov,N.S.Dokl.Akad.Nauk.SSSR 1991,316,925.
(48)Sarkar,R.;Roy,A.B.;Sarkar,P.K.Math.Biosci.1978,39,299. doi:10.1016/0025-5564(78)90060-3
(49)Geary,R.C.Incorp.Statist.1954,5,15.doi:10.2307/2986645
(50)Moreau,G.;Broto,P.Nouv.J.Chim.1980,4,757.
(51)Todeschini,R.;Consonni,V.Handbook of Molecular Descriptors,In:Methods and Principles in Medicinal Chemistry;Mannhold,R.,Kubinyi,H.,Timmerman,H.Eds.; Wiley-VCH:Weinheim,2000.doi:10.1002/9783527613106
(52)Ma,S.;Lv,M.;Deng,F.;Zhang,X.;Zhai,H.;Lv,W.J.Hazard. Mater.2015,283,591.doi:10.1016/j.jhazmat.2014.10.011
10.1021/ja993663t
raziano,G.Can.J.Chem.2000,78,1233.doi:10.1139/v00-125
doi:10.3866/PKU.WHXB201701163
Received:December13,2016;Revised:January 16,2017;Published online:January 16,2017.
*Corresponding author.Email:hassan.gol@gmail.com;Tel/Fax:+98-21-66518561. ?Editorialoffice of Acta Physico-Chimica Sinica
(1)Duffy,E.M.;Jorgensen,W.L.J.Am.Chem.Soc.2000,122, 2878.
(2)Cornell,W.E.;Cieplak,P.;Bayly,C.I.;Merz,K.M.;Ferguson, D.M.;Spellmayer,D.C.;Fox,T.;Caldwell,J.W.;Kollman,P. A.J.Am.Chem.Soc.1995,117,5179.doi:10.1021/ ja00124a002