Jesus Morles Grí, Chiuike C. Udenigwe, Jorge Duitm,Andrés Fernndo González Brrios,*
a Grupo de Dise?o de Productos y Procesos (GDPP), Department of Chemical and Food Engineering, Universidad de los Andes, Bogotá 111711, Colombia
b School of Nutrition Sciences, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
c Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia
Keywords:
Whey hydrolysate
Antioxidant peptide
Peptidomics
Prediction
A B S T R A C T
Enzymatic hydrolysis of proteins is a breakdown process of peptide bond in proteins, releasing some peptides with potential biological functions. Previous studies on enzymatic hydrolysis of whey proteins have not identified the complete peptide profiles after hydrolysis. In this study, we reconstructed a profile of peptides from whey hydrolysates with two enzymes and different processing conditions. We also developed an ensemble machine learning predictor to classify peptides obtained from whey hydrolysis. A total of 2 572 peptides were identified over three process conditions with two enzymes in duplicate. 499 peptides were classified and chosen as potential antioxidant peptides from whey proteins. The peptides classified as antioxidants in the hydrolysates had a proportion of 13.1% –24.5% regarding all peptides identified. These results facilitate the selection of promising peptides involved in the antioxidant properties during the enzymatic hydrolysis of whey proteins, aiding the discovery of novel antioxidant peptides.
The enzymatic hydrolysis of food proteins is becoming a method of choice in the food and pharmaceutical industries. Peptides derived from proteolysis can be used to develop nutraceuticals or functional foods with improved biological functions such as antioxidant,antihypertensive, antimicrobial activities, among others [1,2].Whey is a byproduct of cheese production. Whey proteins have been extensively deemed for their high nutritional quality and content of bioactive peptides within their primary structure [3].Bovine whey contains 20% of the milk protein content, the entire milk lactose content (about 5% ), and traces of fat (0.1% ) and mineral salts (0.46% –10% ) [3]. The main bovine whey proteins areβ-lactoglobulin (β-LG, 1.3 g/L) andα-lactalbumin (α-LA, 1.2 g/L),and in minor concentrations are bovine serum albumin (BSA, 0.4 g/L),bovine lactoferrin (BLF, 0.1 g/L), immunoglobulins (0.5–1 g/L),lactoperoxidase (0.03 g/L), and proteinaceous glycomacropeptide(1.2 g/L) [4]. Hence, a significant volume of cheese whey, which is currently discarded as a byproduct in the manufacture of dairy products, could be processed to obtain novel food products.
Analytical methods applied for the characterization of foodderived bioactive peptides have been mainly investigated by enzymatic and microbialin vitromethods [5]. Moreover, liquid chromatography-tandem mass spectrometry (LC-MS/MS) techniques for the experimental identification of biomolecules are widely used for peptide sequencing in different applications, including enzymatic hydrolysis of proteins [6]. Current methods are able to predict with high accuracy the entire spectrum of sequences for peptides present in a particular solution. Nonetheless, experimental characterization and validation of the functions of hundreds of peptides present in protein hydrolysates are time consuming and require expensive experimental setups [7]. Thus, computational prediction methods can be developed to perform large-scale function prediction based on previous knowledge deposited in databases to prioritize peptide sequences derived from peptidomic analyses.
If present in the sample, bioactive peptides reported in publicly available databases can be used directly to infer predictions on proteome in the foods of interest, corroborating their potential bioactivities of the peptides that can obtain from protein sequences.However, direct queries of the sequenced peptides rarely provide function identification because most peptides identified in a new protein source would not have been characterized and deposited in knowledge databases. Hence, peptide function prediction needs to be performed using more indirect techniques based on machine learning models able to identify common functional roles from different combinations of peptide features [8].Besides predicting new candidate bioactive peptides, the models could also provide insights on the relationship between the peptide characteristics and their biological function. Among the main biological functions, the antioxidant properties of food protein hydrolysates is attractive for multiple food applications because of the different simultaneous mechanisms through which antioxidant peptides exert their effects [7].
In this study, peptides of whey hydrolysates from two enzymes at different hydrolysis conditions were identified by LC-MS/MS to relate their antioxidant properties and their peptide profiles. An ensemble machine learning model was used to infer the antioxidant capacity of the sequenced peptides and prioritize promising peptides for further experimental validation of antioxidant capacity.
The whey protein hydrolysates were obtained from sweet whey powder with a protein concentration of 12% purchased from Saputo Inc. (Lincolnshire, IL, USA). Whey proteins were resuspended in water at a concentration of 1 g/100 mL of protein, and hydrolyzed with Alcalase 2.4 L and Flavourzyme from Novozymes Corp.(purchased from Sigma Aldrich, St. Louis, MO, USA). The variables enzyme/substrate ratio (1, 2, and 3 g/100 g), temperature (42, 50,and 58 °C), pH (5.5, 7, and 8.5), and hydrolysis time (60, 180, and 300 min) were adjusted according to a Taguchi design 34(Table 1).The whey resuspended was stirred at 250 r/min with a magnetic stirrer in a 250 mL jacketed beaker, adjusting the temperature and controlling the constant pH with a TitroLine 6000/7000 titrator (SI Analytics/xylem Mainz, Germany) with 0.5 mol/L NaOH. After hydrolysis, the whey protein hydrolysates were heated to 90 °C for 10 min to deactivate the enzymes. Thereafter, the whey protein hydrolysates were freeze dried and stored at –20 °C for further analysis.
Table 1Process conditions for whey protein hydrolysis in the design of Taguchi.
2.2.1 Trolox equivalent antioxidant capacity (TEAC)
The 2,2’-azino-bis(3-ethylbenzothiazoline-6-sulfonic acid)(ABTS+) radical scavenging capacity of whey hydrolysates was measured by relative capacity in spectrophotometric changes of the ABTS+radical using a Trolox calibration curve and expressed as TEAC value as previously described by Re et al. [9], with slight modifications. Brie fly, 100 μL of the sample or Trolox standard were mixed with 1 mL of the ABTS+solution and incubated at 30 °C for 30 min, then the absorbance was measured at 730 nm [9].
2.2.2 Metal chelating assay
The metal chelating assay was based on a previous method [10]with minor modification. An aliquot (1 mL) of aqueous whey protein hydrolysate at a final concentration of 1 mg/mL was combined with 0.05 mL of FeCl2solution (2 mmol) and 1.85 mL of double-distilled water. Afterwards, 0.1 mL FerroZine (3-(2-pyridyl)-5,6-diphenyl-1,2,4-triazine-4’,4’’-disulfonic acid sodium salt) solution (5 mmol)was added followed by vigorous mixing. The mixture was kept at room temperature for 10 min and 200 μL was added into a clearbottom 96-well microplate. Absorbance was determined at 562 nm.In the control, the hydrolysate sample was replaced with double distilled water. The metal ion chelating capacity of the whey protein hydrolysates was calculated by the following equation:
The antioxidant assays were conducted in triplicate, and results were expressed as mean ± SD. Statistical significance of difference(P< 0.05) between treatments was analyzed by one-way analysis of variance, followed by the Tukey multiple comparison test using Minitab 18 Statistical Software (Minitab, LLC, State College, PA, USA).
LC-MS/MS analysis was performed at the John L. Holmes Mass Spectrometry Facility (Ottawa, ON, Canada). The system used was an EASY-nLC integrated nano-HPLC system (Thermo Fisher, San Jose,CA, USA) directly interfaced with a quadrupole Orbitrap (Q-Exactive)mass spectrometer and the analysis was based on a previously described procedure [11]. The MaxQuant software (Version 1.6.6.0,Max Planck Institute of Biochemistry, Martinsried, Munich) was used to perform the peptidomic analysis. The protein database of bovine(Bos Taurus) was retrieved from UniProt (http://www.uniprot.org/,accessed on July 22nd2019). The MaxQuant software was set up,assuming non-specific enzyme digestion. Fragment and parent ion tolerances were 0.50 Da and 10.0 × 10-6, respectively. Oxidation of methionine and acetylation of protein N-terminal were specified as variable modifications.
Two replicates were analyzed by one LC-MS/MS run. Then, the obtained MS/MS raw files were searched by the MaxQuant software to obtain the peptide sequences.
2.4.1 Data collection
A positive set of antioxidant peptides was obtained from the BIOPEP-UWM? database (http://www.uwm.edu.pl/biochemia/biopep/start_biopep.php, accessed on October 30th2019) [12],screening the peptides between 5 and 25 amino acid residues to be consistent with the lengths of peptides identified by LC-MS/MS. The positive dataset contained 282 positive peptides. Then, 100 iterations were performed choosing on each iteration a random dataset of 282 peptides from those identified by LC-MS/MS and labeling them as presumably negative to complete balanced benchmark datasets with the positive dataset. The training and test datasets were randomly formed dividing the benchmark dataset on each iteration 1:1. The benchmark positive dataset is available in Supplementary Table 1.
2.4.2 Feature extraction in peptide sequence
In this study, after analyzing the sequence properties of antioxidant peptides and comparing them with the set of peptides identified by LC-MS/MS, the features extracted from information of the amino acid sequence were selected for identification of antioxidant peptides as they correlate with the intrinsic properties of the peptides.
Three descriptors of the sequences of amino acid composition (C),transition (T), and distribution (D) were used to describe the global composition, the frequencies with which the property changes along the entire peptide chain length, and the property distribution pattern along the peptide sequence, respectively. The concept composition,transition and distribution (CTD) was proposed by Dubchak et al. [13]to extract information from the primary sequence for predicting the protein folding class, and was used recently in the classification of bioactivity of proteins and peptides [8,14]. The 20 natural amino acids are divided into 4 groups based on their hydrophobicity and charge character, that is the hydrophobic group C1 = {A, F, G, I, L,M, P, V, W}, the polar group C2 = {C, N, Q, S, T, Y}, the positively charged group C3 = {H, K, R}, and the negatively charged group C4 = {D, E} [15]. Based on these properties, composition (C)calculates the frequencies of each group in a given peptide, which is defined by the following equation:
Here,Ni,i∈ {1, 2, 3, 4} is the number of each group,Lis the chain length of the peptide.
In a given peptide, transition (T) describes the frequencies of an amino acid with a particular property followed by an amino acid with another property, which is defined by the following equation:
Here,i,j∈ {1, 2, 3, 4} represents the corresponding group,Ni,jis the number of the dipeptide containing two residues from two different groups.
Finally, distribution (D) describes the distribution pattern of each group measured by the position of the first, 25% , 50% , 75% ,and 100% of each of the 4 groups along the sequence, which was calculated by the following equation:
Here,Ni,1is the chain length within which the first amino acid of theithgroup is located,Ni,2,Ni,3Ni,4,Ni,5measure the chain lengths within which the 25% , 50% , 75% , and 100% of the amino acids of theithgroup are located, respectively.
2.4.3 Machine learning algorithms
First, single classifiers were used for binary classification including random forest (RF), K-nearest neighbor algorithm (KNN),na?ve bayes (NB), logistic regression (LR), and support vector machine (SVM). To obtain a good prediction of antioxidant peptides,we applied an ensemble of different individual classifiers, which makes use of the different decision boundaries generated from the individual classifiers to strategically combine the classification results.
2.4.4 Performance evaluation
Based on the prediction result generated by the independent dataset test, the following evaluation indexes were calculated to compare the proposed method with the existing algorithms.
Sensitivity (Sn) represents the proportion of antioxidant peptides that were predicted by each method and is calculated by the following equation:
Specificity (Sp) represents the proportion of negative peptides which were not predicted by each method and is calculated by the following equation:
Accuracy (Acc) is one of the ways to combine sensitivity and specificity to obtain a single quality measurement and it is calculated by the following equation:
WhereTP,FP,TN, andFNrepresent true positive, false positive,true negative, and false negative, respectively. To further evaluate performance of predictors, the receiver operating characteristic (ROC)curve was obtained, and the area under the ROC curve (AUC) was calculated. The Python software (version 3.7.4, Python Software Foundation) library Scikit-learn [16]was used to implement all the algorithms and the performance evaluation.
The antioxidant capacity of whey protein hydrolysates was measured by determining the cation (ABTS+) and Fe2+chelating.In both assays, all the whey hydrolysates exhibited ABTS+radical scavenging and Fe2+chelating capacity.
ABTS+radical scavenging capacity is based on the ability of antioxidants to scavenge the ABTS+radical cation through decolorization generated by oxidation with potassium persulfate and reduced by the interaction with hydrogen donor molecules.Fig. 1A shows that alcalase hydrolysates were the most effective in scavenging ABTS+compared to hydrolysates generated using flavourzyme. Previous studies using different proteases (at their optimal pH and temperature) for whey protein hydrolysis have reported the highest antioxidant capacity with alcalase [17,18].
Fig. 1 Antioxidant properties of whey hydrolysates using alcalase (purple bars) and flavourzyme (green bars) by ABTS+ radical scavenging (A) and Fe2+ chelating (B).Error bars represent the mean ± SD of triplicate experiments. *on top of the bars denotes P < 0.05 between both enzymes; ***on top of the bars denotes P < 0.01 between both enzymes.
According to the results presented in Fig. 1A, the ABTS+radical scavenging capacity of whey hydrolysates produced at pH 5.5 by both enzymes were significantly lower (P< 0.01) than those produced at the highest pH 8.5. In addition, hydrolysates produced with flavourzyme had significantly lower ABTS+scavenging capacity (P< 0.05)at the lowest E/S ratio than the others (Supplementary Table 2).
Table 2Proteins identified in whey hydrolysates and the number of peptides derived from each protein.
As shown in Fig. 1B, the hydrolysates from whey possessed Fe2+chelating capacity, displaying some significant differences (P< 0.01)between samples produced with alcalase and flavourzyme. For Fe2+chelation capacity of the whey hydrolysates, there was a significant difference (P< 0.05) between the pH levels for both enzymes. The antioxidant mechanisms of protein hydrolysates may be due to the mutual effect of multiple reactions such as the ability to stabilize radicals, donate hydrogen and chelating pro-oxidative metal ions [18].
Three treatments were chosen from each enzyme for peptidomic analysis, including samples with the highest and lowest antioxidant capacity.
We performed an analysis combining HPLC and MS to obtain a comprehensive list of sequenced peptides coming from 12 whey hydrolysates from the three different conditions in duplicate for both alcalase and flavourzyme (Supplementary Table 3). A total of 2 572 peptide sequences were identified across the different experiments.The capacity of high-resolution MS techniques provides an in-depth peptidomic analysis of samples without prior purification through fractionation or multiple LC-MS runs for each sample [19].
Table 3Prediction performance of machine learning algorithms and ensemble classifier.
The majority of peptides in the whey hydrolysates were found to be derived fromβ-lactoglobulin, followed byβ-casein, andα-lactalbumin (Table 2); the caseins commonly occuras minor component of crude whey products. Fig. 2 shows both the total number of identified peptides in the hydrolysates and the number of peptides detected in the two replicates of each hydrolysate. A principal component analysis of the identified peptides in each hydrolysate showed repeatability in the duplicates and differentiation between the enzymes (Fig. 3). Furthermore, a clear separation of the conditions of hydrolysis carried out for each enzyme can be observed on the PCA score plot based on the first two PCA components (Fig. 3).These findings are important for validating the specificity of proteases in releasing peptides of interest in different production batches.
Fig. 2 Total number of peptides identified by LC-MS/MS in the whey hydrolysates for each enzyme and process condition (light blue). Dark blue represent number of peptides detected in both replicate.
Fig. 3 PCA score plot of the whey hydrolysates using the peptide data.
Taking into account that the interest in this work is focused on the peptides related to the enhancement of antioxidant properties in the whey hydrolysates, some of the sequenced peptides could potentially have antioxidant properties. Therefore, training sets were constructed from BIOPEP-UWM as the positive dataset, and equal sizes were randomly selected from peptides identified by LC-MS/MS as the negative dataset to have balanced datasets. The selection of appropriate features from amino acid sequences that can re flect their intrinsic correlation with a biological attribute to be predicted is essential to establish a powerful predictor [20]. Peptides features related to CTD were extracted for both the antioxidant peptides and sequenced peptides, and different machine learning algorithms were trained on this dataset, to predict the antioxidant capacity of the sequenced peptides. Table 3 summarizes the results of this analysis. The prediction accuracy of the single classifiers is in the range of 0.730 to 0.772, indicating a good prediction for antioxidant peptides [21]. The prediction performance of the ensemble classifier was the highest among the different classifiers with an accuracy of 0.787 and an AUC of 0.871. Fig. 4 depicts the ROC curves of the classifiers with the highest AUC for each classifier among the 100 evaluated classifications. As indicated by previous works, the ensemble method makes use of the different decision boundaries generated from the individual classifiers to strategically achieve satisfactory prediction results [22].
Fig. 4 The ROC curves with best AUC of the individual classifiers and ensemble classifier among the 100 evaluated classifications.
Taking into account the highest prediction performance of the different classifiers, the ensemble model (LR+KN+SVM+RF)combining the predictions of the individual classifiers was used to the classification of peptides identified in whey hydrolysates.
Only peptides predicted as antioxidant from the whey hydrolysates in at least 51 among the 100 iterations were assigned as antioxidant peptides using the ensemble classifier trained for each iteration; the rest were assigned as non-antioxidant peptides.The complete list assigned to each peptide classified from the whey hydrolysates is reported in Supplementary Table 3. To differentiate the predicted antioxidant peptides identified by peptidomic analysis of the whey hydrolysates, the peptides detected in the two replicates under three different conditions were selected for comparative analysis (Fig. 5). The whey proteins hydrolyzed with alcalase showed a higher percentage of peptides classified as antioxidants than those hydrolyzed with flavourzyme being consistent with thein vitroantioxidant analysis where the whey hydrolysates using alcalase had higher capacities than flavourzyme measurein vitroassay TEAC and metal chelating. Moreover, the hydrolysates under pH 8.5 conditions that generated the highest antioxidant capacities in alcacase are those containing the highest proportion of peptides classified as antioxidants.
Fig. 5 Proportion of antioxidant peptides predicted as antioxidants in the whey hydrolysates for each treatment.
The graphical distributions of the antioxidant peptides are shown as Venn diagrams in Fig. 6. The whey hydrolysates generated with the same enzyme share 11% and 18% of the predicted antioxidant peptides for alcalase and flavourzyme, respectively. The highest number of unique peptides predicted as antioxidants were found at pH 5.5 where the lowest antioxidant capacities were reported for both alcalase and flavourzyme, with 61 and 46 peptides, respectively.The peptide sizes below 5 amino acids that were not identified by LC-MS/MS may be limiting the number of peptides in the conditions with higher antioxidant capacities give that short peptide lengths have been reported to have antioxidant capacity [23,24]. Nonetheless, the percentage of peptides classified as antioxidants in alcalase was lower at pH 5.5 since a high number of peptides were identified in this condition concerning the other conditions.
Fig. 6 Venn diagram detailing the number of peptides predicted as antioxidants for each conditions of the whey hydrolysates.
The classified peptide sequences and the positive set from BIOPEP-UWM were subjected to anin silicoanalysis using PeptideRanker [25]to assign a score of bioactivity probability. This bioactive peptide predictor provides a computational prediction assigning scores between 0 and 1 to peptides based on the probability of being bioactive from the N-to-1 neural network (N1-NN)computed based on the peptide primary sequence trained from different bioactive peptide databases [25]. Any peptide possessing a score above the 0.5 threshold is labeled as bioactive. In this way,the proportion of bioactive peptides was calculated for each set. The percentage of the bioactive peptides in the peptide group classified as antioxidants was higher than the non-antioxidants peptide group identified from the whey protein hydrolysates, 34.1% and 12.9% respectively, and even higher than the benchmark antioxidant set from BIOPEP-UWM, 31.9% (Supplementary Fig. 1). Despite being more general, this prediction is a useful tool to correlate bioactive peptides and our results of antioxidant peptides.
To compare the length of peptides detected in the whey hydrolysates with that of known antioxidant peptides, a length distribution was performed for the antioxidant peptides from BIOPEP-UWM, the peptides identified in the whey hydrolysates, and the identified peptides classified as antioxidants (Fig. 7). The length of antioxidant peptides from BIOPEP-UWM showed a negative exponential distribution with a higher number of peptides having shorter chain length. The peptides identified in the whey hydrolysates tend to have a normal distribution of the chain length, with the highest frequency observed for peptides of 11 amino acid residues.In contrast, the peptides classified as antioxidants tend to have an asymmetric normal distribution skewed towards the shorter chain length. Although peptide length was not included as a feature used in the classification and all features were normalized by peptide length,it is clear that the classification changed the peptide chain length distribution in the whey peptides classified as antioxidants.
Fig. 7 Distribution of peptide lengths among the antioxidant peptides from BIOPEP-UWM, the peptides of whey hydrolysates identified and the classified as antioxidants using the ensemble classifier.
In this study, for the first time, the complete peptide profiles of whey hydrolysates at different processing conditions was reported.Following peptidomic analysis on whey hydrolysates produced using two commercial enzymes and three processing conditions, PCA analysis distinguished between the different processing conditions and enzymes. The whey hydrolysates are potentially a source of antioxidant peptides. To identify antioxidant peptides in the peptides of whey hydrolysates, an ensemble learning classifier has been presented in this study using four learning algorithms. The ensemble learning classifier enabled the effective prediction of the antioxidant peptides in the whey hydrolysates. A higher proportion of the peptides classified as antioxidants was accordant to the hydrolysates with the highest antioxidant capacities concerning enzymes, and processing conditions. Identification of antioxidant peptides accurately may not only provide new antioxidant peptides reported but also contribute to the relationship between the peptide characteristics and their biological function.
Acknowledgements
This work was supported and funded by the Gobernación del Cesar-Ministry of Science, Technology, and Innovation through resources for the higher education (grant 736/2015),and the Natural Sciences and Engineering Research Council of Canada (NSERC).
Declaration of Competing Interest
The authors declare no conflict of interest.
Appendix A. Supplementary data
Supplementary data associated with this article can be found in the online version, at http://doi.org/10.1016/j.fshw.2021.11.011.