Ke Luo · Yufeng Wei · Jie Du · Liang Liu ·Xinrui Luo · Yuehong Shi · Xiangjun Pei ·Ningfei Lei · Ci Song · Jingji Li · Xiaolu Tang
Abstract Accurate estimates of forest aboveground biomass (AGB) are critical for supporting strategies of ecosystem conservation and climate change mitigation.The Jiuzhaigou National Nature Reserve, located in Eastern Tibet Plateau, has rich forest resources on steep slopes and is very sensitive to climate change but plays an important role in the regulation of regional carbon cycles.However,an estimation of AGB of subalpine forests in the Nature Reserve has not been carried out and whether a global biomass model is available has not been determined.To provide this information, Landsat 8 OLI and Sentinel-2B data were combined to estimate subalpine forest AGB using linear regression, and two machine learning approaches–random forest and extreme gradient boosting, with 54 inventory plots.Regardless of forest type, Observed AGB of the Reserve varied from 61.7 to 475.1 Mg ha -1 with an average of 180.6 Mg ha -1 .Results indicate that integrating the Landsat 8 OLI and Sentinel-2B imagery significantly improved model efficiency regardless of modelling approaches.The results highlight a potential way to improve the prediction of forest AGB in mountainous regions.Modelled AGB indicated a strong spatial variability.However, the modelled biomass varied greatly with global biomass products, indicating that global biomass products should be evaluated in regional AGB estimates and more field observations are required,particularly for areas with complex terrain to improve model accuracy.
Keywords Aboveground biomass·Linear regression ·Random forest·Extreme gradient boosting·Landsat 8 OLI·Sentinel-2B
Although forests cover about one third of the global land surface (FAO 2015), forests have important biophysical, biogeochemical, hydrological, economic and cultural roles in the earth systems (Qureshi et al.2012; Reichstein and Carvalhais 2019).For example, forests contribute up to 75% of terrestrial gross primary production (GPP) and store more carbon as biomass and in soils compared to the atmosphere (Beer et al.2010; Pan et al.2013).However,due to the difficulties in measuring forest biomass in the field at regional scales, particularly in remote areas, quantitative estimations with high accuracy are ofimportance for reducing the uncertainties in assessing the role of forests in global carbon cycling, and in mitigating and alleviating global climate changes.
Previous studies have shown that remote sensing is an effective tool to estimate aboveground biomass (AGB)with high accuracy on a regional scale; therefore, various types of remote sensors and algorithms have been used (de Almeida et al.2019; Li et al.2020; López-Serrano et al.2020).Among the various types of sensors, a medium-resolution sensor, such as Landsat 8 OLI (Operational Land Imager), is widely used (Dube and Mutanga 2015; Zhu and Liu 2015; López-Serrano et al.2020).However, using Landsat 8 OLI to achieve a satisfactory estimation accuracy remains a challenge, especially in mountainous areas and areas with high biomass (Cutler et al.2012).Optical remote sensing data becomes saturated in high biomass areas, which greatly reduces the precision of estimation (Li et al.2020).Compared with Landsat 8 OLI, Sentinel-2B has a higher resolution and more spectral bands (red-edge bands), which produces better results for measuring forest canopy cover and leaf area index (LAI) (Korhonen et al.2017).Pandit et al.( 2018) used Sentinel-2 data to estimate subtropical forest AGB with a high accuracy, and demonstrated that the red-edge bands can contribute to addressing the saturation problem.Forkuor et al.( 2018) compared the use of Landsat 8 and Sentinel-2 for mapping land use and land cover,where the performance by the combination of both produced better results than using Landsat 8 data alone, offering a great potential to combine different remote sensing data to improve the modelling accuracy.
In addition to the selection of satellite data, the use of suitable algorithms to build the AGB regression models is also important.Currently, the optimum biomass regression algorithm has not been obtained (Vafaei et al.2018).The regression algorithms can be grouped into two broad categories: parametric and non-parametric (Mohd Zaki and Abd Latif 2016).In previous studies, parametric algorithms, such as linear regression (LR), were the most commonly used(Tonolli et al.2011).However, this method does not accurately illustrate the complex nonlinear relationship between AGB and remote sensing data (Li et al.2020).Therefore, in order to estimate biomass more precisely, non-parametric algorithms have been widely used (Powell et al.2010; Rodríguez-Veiga et al.2016).Unlike linear regression models,non-parametric algorithms can handle a large number of variables and can more accurately describe the nonlinear relationships between AGB and predicting variables (Hanes 2013).Currently, the most widely used non-parametric algorithms include support vector machine (SVR), random forest (RF), K-nearest neighbor (KNN), and gradient boosting(GB), which have proven to have excellent performances in modelling forest AGB (Blackard et al.2008; Nelson et al.2009; Monnet et al.2011; Carreiras et al.2012).However,variations in performance still exist in different non-parametric algorithms.For example, Vafaei et al.( 2018) compared four machine learning approaches for AGB estimation,and found that the SVR model had the highest prediction accuracy, followed by the Gaussian processes (GP), the RF and the multi-layer perceptron neural networks (MPL Neural Nets) models.Li et al.( 2020) used the LR, RF and XGBoost models to estimate the AGB of subtropical forests, and the XGBoost model had the best performance.Therefore, the performances of different algorithms in AGB estimation still need further comparison.
The Jiuzhaigou National Nature Reserve, located on the Eastern Tibet Plateau, is a World Heritage Site and UNESCO World Biosphere Reserve, and is one of the most popular tourist attractions in China (Bossard et al.2015).Although the Reserve is known for amazingly beautiful water scenes, it has rich primary and secondary forests (Li et al.2005).However, due to steep slopes and varying soil depths with elevations 1996 to 4764 m a.s.l.(Bossard et al.2015 ), an accurate estimation of regional AGB in subalpine forests in the Reserve is challenging.Although the Reserve is highly sensitive to climate change, a regional estimation of AGB has not been undertaken.In this study, AGB of subalpine forests across the Jiuzhaigou National Nature Reserve were estimated by combining Landsat 8 OLI and Sentinel-2B images using LR and two machine algorithms–RF and extreme gradient boosting (XGBoost).
The specific objectives were to: (1) compare AGB among different forest types; (2) predict regional AGB with Landsat 8 OLI and Sentinel-2B images separately and their combinations using different modelling approaches; and, (3) evaluate the availability of the two global AGB products using the predicted AGB of this study.The results could provide a scientific basis for estimating regional AGB in subalpine forests, which could be used to monitor areas with complex terrain.The estimated AGB across the study area could further provide a benchmark for globally modelled AGB for regional carbon budgets in the study of carbon cycling under ongoing climate change.
The Jiuzhaigou National Nature Reserve (103°46’E-104°05’ E, 32°53’ N-33°20’ N, Fig.1), has a forest cover of more than 80%, over a total area of 64,297 ha with high terrain in the south and low terrain in the north with deep valleys (Fig.1).The annual average temperature is 7.5 °C and it is a typical humid climate (Li et al.2005).
A total of 54 inventory plots (Fig.1) with a radius of 10 m were set across the study area, including 25 coniferous plots,8 broad-leaved plots, and 21 coniferous mixed plots.Within each plot, species were identified, height and diameter at breast height (1.3 m DBH) of stems greater than 5 cm were measured using the Vertex 5 (Hagl?f Ltd., Sweden) and a diameter tape.
Fig.1 Location of study area and field data
Biomass of leaf, branch and stem were calculated according to allometric equations (Table S1).AGB was calculated as their summation.Since there are few understory grass,herbs or trees < 5 cm in subalpine forests, their AGB was ignored (Fig.S1).
Two types of satellite images were used–Sentinel-2B and Landsat 8 OLI.Sentinel-2B (Table S2) includes 13 spectral bands ranged from 0.4 μm to 2.4 μm at three spatial resolutions (10, 20 and 60 m), with a temporal resolution of 10 days and a width of 290 km (Korhonen et al.2017).The Sentinel-2B MSI L1C data was acquired on Dec 31,2019 ( https:// scihub.coper nicus.eu).Data preprocessing was conducted according to the guidelines of Sentinel-2B using Sen2cor ( https:// step.esa.int/ main/ snap- suppo rted- plugi ns/sen2c or/) and ENVI5.3 ( https:// www.l3har risge ospat ial.com/ Softw are- Techn ology/ ENVI), including radiometric calibration, atmospheric correction, geometric correction,mosaic and clipping.After the preprocessing, 12 spectral bands, other than band 10, were extracted.
Landsat 8 (Table S3), the eighth satellite of the Landsat program, was successfully launched by Atlas-v rocket at Vandenberg Air Force base, California February 11, 2013(López-Serrano et al.2020).Landsat 8 satellite carries two sensors: Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) (Roy et al.2019).It is basically consistent with Landsat 1-7 in spatial resolution and spectral characteristics and there are 11 bands in total.Landsat 8 OLI includes 9 spectral bands ranged from 0.43 μm to 2.3 μm with a spatial resolution of 30 m, except for band 8 (15 m resolution), and it has a temporal resolution of 16 days and a width of 185 km (Roy et al.2019).The Landsat 8 OLI L1T data was acquired on Nov 22, 2019 ( https:// earth explo rer.usgs.gov).Data preprocessing is based on ENVI5.3, including radiometric calibration, atmospheric correction, geometric correction, mosaic and clipping.After the preprocessing,spectral bands 1-7 and 9 were extracted.The workflow is shown in Fig.2.
Fig.2 Workflow of modelling AGB
Numerous studies have shown that original bands and some characteristic factors derived from remote sensing data, such as vegetation index and texture index, play an important role in improving the accuracy of forest parameter estimations(Haboudane et al.2004; Barbosa et al.2014; Lu et al.2014;Shen et al.2016).In this study, in addition to spectral bands,vegetation indices and texture indices were extracted from L2A Sentinel-2B and L1T Landsat 8 data.
According to the typical spectral characteristics of vegetation, such as absorption band, ref lectance peak and red edge, the following 11 vegetation indices were extracted:enhanced vegetation index (EVI) and red-edge-based EVIs,normalized difference vegetation index (NDVI), ratio vegetation index (RVI), normalized difference water index(NDWI), specific leaf area vegetation index (SLAVI), visible atmospherically resistant index (VARI), green–red ratio index (GRRI), modified simple ratio (MSR) and normalized green–blue difference index (NGBDI).More details are shown in Table S4.
Texture is a feature used to represent the correlation between pixels in a region, and is used for image classification and scene recognition (Haralick et al.1973).The gray-level co-occurrence matrix (GLCM) texture analysis method, as proposed by Haralick ( 1979), was used to extract eight common texture features derived from 12 Sentinel-2B bands and 8 Landsat 8 OLI bands (Table S5), including mean, variance, homogeneity, contrast, dissimilarity,entropy, second moment and correlation.
In order to establish a high-precision model of biomass estimation, Pearson correlation was used to analyze the relationships between field AGB and the variables listed above.A total of 52 variables were significantly correlated with AGB, and these were kept for feature selection of the most important variables to predict AGB (Fig.S2).
Three datasets were tested for AGB modeling: (1) 32 variables derived from Sentinel-2B; (2) 20 variables from Landsat 8 OLI; and, (3) their combination (52 variables).To reduce the dimension and model complexity, while retaining most of the useful information, two feature selection methods were used for linear model and machine learning model, respectively.
For the linear model, multicollinearity test and stepwise regression methods were adopted to select the most relevant variables (Mansfield and Helms 1982).The problem of multicollinearity among predictive variables is likely to cause large model errors, insignificance of predictive variables, and reduce the accuracy and stability of the model (Farrar and Glauber 1967).In this study, the multicollinearity test of variables in Sentinel-2B, Landsat 8 OLI and their combination dataset was carried out by calculating variance inflation factor (VIF).The vifcor function of the usdm package in R (4.0.2) was called and set the correlation threshold between variables to 0.6,effectively excluding variables with multicollinearity problems.The remaining variables then were input to a stepwise regression model and the most relevant variables selected for modeling (Table S6).For the machine learning model, the recursive feature elimination (RFE) algorithm was adopted to select the most important variables for AGB estimation (Guyon et al.2002).This method assessed the effect of the number ofinput variables over model performance.The feature selection process started with all variables for each dataset.Predictors were ranked according to the importance criteria of each machine learning regression method, then the least important variable was removed from the modeling until there was one predictor left.Finally, the optimal subset size of feature was obtained, which was defined as the number of predictors whoseRMSEwas the lowest (Fig.S3).
Three modelling approaches were used: linear regression(LR) and random forest (RF) and XGBoost, specifically:
Linear regression assumes that there is a linear relationship between a response and predictor variables (Li et al.2020).It can predict AGB by establishing a linear relationship between plot AGB and predicting variables.Unlike machine learning algorithms, LR is a parametric method, meaning that a definite regression equation can be obtained between the predicting variables and AGB(López-Serrano et al.2020).In this study, stepwise linear regression was performed, and the VIFs used to evaluate multicollinearity problems.
RF is a machine learning algorithm proposed by Breiman ( 2001), which uses bootstrap resampling to extract samples for decision tree modeling and obtains the results by voting through the prediction of the decision tree.RF algorithm can be used for classification, regression and feature extraction (% IncMSE and IncNodePurity).In the modelling process, two parameters were optimized, which were ntree (the number of trees) and mtry (the number of variables that can be split at each node of the tree).The RF model was trained bycaretby linking RandomForest package in R.
XGBoost was proposed by Chen and Guestrin ( 2016),on the basis of gradient boosting decision tree (GBDT)and RF, and a C + + version was developed.XGBoost is one of the boosting algorithms, and the core idea is to integrate many weak classifiers (CART) to form a strong classifier (He et al.2018).It is an improvement over GBDT and makes it more powerful and suitable for a wider range of applications (Li et al.2020).Thus, XGBoost is widely used in various fields due to its high accuracy, parallelizable processing and portability.In addition, this algorithm has the advantages of robustness and without over-fitting(James et al.2013).However, the parameter tuning process of XGboost is very complex due to a large number of parameters (Li et al.2020).The most important parameters include: (1) Gamma, the minimum loss reduction required to further partition a leaf node of the tree; (2) Min_child_weight, the sum of minimum leaf node instance weight; (3)Max_depth, the maximum depth of an individual tree; (4)Subsample, the proportion of random samples of each tree;(5) Nrounds, the maximum number of boosting iterations;and, (6) Eta, used to prevent over fitting.In this study, the XGBoost model was trained by caret package with the linkage of xgboost package in R.
To evaluate model performance, a tenfold cross-validation approach was applied, which divided the dataset into 10 parts with each part containing a similar number of samples.In the modelling process, one part was selected in each turn as the test dataset and the remaining nine parts were used as training models to predict the targeted value(Fushiki 2011).Such a process was conducted ten times for each part of the observations.The determination coefficient (R2) and root mean square error (RMSE) were used as the accuracy evaluation criteria, calculated by Eqs.1 and 2, respectively.
Two globally modelled AGB products were evaluated by the predicted AGB, specifically:
The first global AGB product was from the Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC), Oak Ridge, Tennessee, USA, which has a spatial resolution of 300 m (Spawn et al.2020), and can be freely accessed from ( https:// daac.ornl.gov/ cgi- bin/ dsvie wer.pl? ds_ id= 1763, named ORNL2010-AGB).
The second global AGB product was from the European Space Agency’s (ESA’s) Climate Change Initiative (CCI)Biomass team, which provided global forest AGB with a spatial resolution of 100 m (Santoro and Cartus 2019), and can be freely accessed from ( http:// dap.ceda.ac.uk/ neodc/esacci/ bioma ss/ data/ agb/ maps, named CCI2017-AGB).
Before evaluating the two globally modelled AGB, the predicted AGB of this study was resampled to 300 m for ORNL2010-AGB and 100 m for CCI2017-AGB using the bilinear approach.Since AGB from ORNL2010-AGB was expressed by Mg C ha-1, and CCI2017-AGB and modelled AGB were expressed by Mg ha-1, a carbon content of 0.5 g g-1was used to convert Mg C ha-1to Mg ha-1(Fonseca and Marques 2011).Both total AGB and its spatial variability were compared across the study area.
Aboveground forest biomass ranged from 61.7 to 475.1 Mg ha-1, with an average of 180.6 Mg ha-1(Table 1).The forest types had a significant influence on AGB across the study area as revealed by one-way analysis of vairance(ANOVA,p< 0.05).The highest AGB was observed in coniferous forests (222.1 ± 104.6 Mg ha-1), significantly higher(p< 0.05) than that of mixed forests (149.9 ± 57.4 Mg ha-1)and broad-leaved forests (131.4 ± 48.7 Mg ha-1).AGB difference between the broadleaved forests and mixed forests was insignificant.
Table 1 Characteristics of the forest plots AGB (Mg ha -1 )
After the feature selection using the RFE, the rank of variables’ importance (Fig.S4), the relationships betweenRMSEand the number of variables among different modelling approaches were obtained (Fig.S3).According to the variables’ importance, theS2B_B4,S2B_NGBDIandS2B_MSRderived from Sentinel-2B and theLT8_Near.Infrared.Contrastfrom Landsat 8 OLI were the most important factors for AGB modelling.The variables derived from the Sentinel-2B were more important than that of Landsat 8 OLI.The number of the most important variables involved in the final models were then determined by the relationships betweenRMSEand the number of variables among different modelling approaches.Finally, nine models of LR, RF and XGBoost combining different satellite images were obtained (Fig.3).
With regards to satellite images, models using Sentinel-2 images generally performed better than Landsat 8 OLI (Fig.3 a, b, d, e, g, h), while the model performance was greatly improved when combining Landsat 8 OLI and Sentinel-2B variables.
Among the different modelling approaches, the performances of the two machine learning models (RF and XGBoost) were better than the LR model, with higherR2and lowerRMSEvalues.The XGBoost model performed best among the three models.Therefore, the XGBoost model combining Landsat 8 and Sentinel-2 images was selected to predict the spatial variability of AGB (R2=0.71 andRMSE=46 Mg ha-1, Fig.3 i).
It was also found that most models, with the exception of the XGBoost model, overestimated AGB in low biomass areas (< 100 Mg ha-1), and underestimated AGB in high biomass areas (> 300 Mg ha-1).
The aboveground biomass from the XGBoost model(Fig.3 i) showed a strong spatial variability.Predicted forest AGB varied from 25.0 to 492.0 Mg ha-1, with a total of 6.6 × 106Mg across the study area (Fig.4).High forest AGB was mainly found in the north and central areas, while it was relatively low in the southern region.This was contrary to the variation in elevations in the study area where the terrain was high in the south and low in the north (Fig.1).
Fig.4 Predicted aboveground biomass (AGB, Mg ha -1 ) in the Jiuzhaigou National Nature Reserve generated from the best-fit XGBoost model combining Landsat 8 OLI and Sentinel 2 images
There were low correlations between observed AGB and ORNL2010/CCI2017-AGB (R2< 0.1, Fig.S5).Similarly, there were large spatial differences in ORNL2010-AGB (Fig.5 a) and CCI2017-AGB (Fig.5 b) compared with the predicted AGB (XGB-AGB).The ORNL2010-AGB had the highest AGB (26.4 × 106Mg) across the study area, followed by XGB-AGB (6.6 × 106Mg) and CCI2017-AGB (3.8 × 106Mg) (Fig.6).Spatially, the difference between ORNL2010-AGB and XGB-AGB varied from - 369.0 to 2120.0 Mg ha-1, with a total difference of 19.8 × 106Mg across the study area, while the difference between CCI2017-AGB and XGB-AGB varied from -470.0 to 159.0 Mg ha-1, with a total difference of - 2.8 × 106Mg(Fig.7).Spatially, the greatest differences between XGBAGB and ORNL2010-AGB were found in the northern area,which was above 1000 Mg ha-1.Similarly, great differences were also observed between XGB-AGB and CCI2017-AGB.
Fig.5 AGB spatial distribution for a ORNL2010-AGB (the aboveground biomass product from the Oak Ridge National Laboratory in 2010) and b CCI2017-AGB (the aboveground biomass product from the European Space Agency’s Climate Change Initiative Biomass team in 2017) for the Jiuzhaigou National Nature Reserve
Fig.6 Total aboveground biomass (AGB) of XGB-AGB (Predicted by the XGB model), ORNL2010-AGB (from the Oak Ridge National Laboratory in 2010) and CCI2017-AGB (from the European Space Agency’s Climate Change Initiative Biomass team in 2017) for the Jiuzhaigou National Nature Reserve
In this study, the aboveground biomass in the Jiuzhaigou National Nature Reserve was predicted using Sentinel-2B and Landsat 8 OLI images separately and combined with different modelling approaches.Generally, the models using Sentinel-2B performed better than those of Landsat 8 OLI (Fig.3).For the model using the combined dataset,the variables derived from Sentinel-2B were more important than those of Landsat 8 OLI (Fig.S4).Our results are similar to Sibanda et al.( 2015), who estimated grassland aboveground biomass using Sentinel-2 and Landsat images, the results showing that the Sentinel-2 data was more accurate than the Landsat 8 data, with anR2and anRMSEof prediction (RMSEP) of 0.81, 1.07 kg m-1and 0.76,1.15 kg m-1, respectively.The results may be attributed to the advanced sensor design of Sentinel-2B and 13 spectral bands ranging from visible to short-wave infrared, including four red-edge bands essential for vegetation monitoring, of which visible and near-infrared bands have a high spatial resolution of 10 m (Sibanda et al.2015).Currently,medium-resolution remote sensing data such as Landsat 8 OLI are widely used for biomass estimation, but it is a challenge for aboveground biomass estimation in complex and dense forest areas (Adam et al.2010).Due to the high AGB in the study area, there was a significant difference between the models of biomass and vegetation index because the spectral index reaches saturation with the increase of biomass (Steininger 2000).It is generally reaches saturation at approximately 100 - 150 Mg ha-1(Lu et al.2014).However, previous studies suggest that higherresolution optical remote sensing data, such as Sentinel-2 and its strategically positioned bands (red-edge bands),may be effective at overcoming this problem (Sibanda et al.2015).Although optical data had saturation problems in aboveground biomass estimation, the combination of multiple optical images had the potential to be exploited in AGB estimation.Our results also showed that model performance was greatly improved after the combination of Landsat 8 OLI and Sentinel-2B data.These results are similar to those of Forkuor et al.( 2018), the Landsat 8 plus Sentinel-2 red-edge bands outperformed Landsat 8 alone in mapping land use and land cover, which demonstrates the potential of the synergy of the data.Different optical images had different zenith angles and azimuths and different imaging times which were complementary in forest information acquisition, especially in mountainous area which cause a large amount of shadows when using single source images.
Fig.3 Correlation between predicted and observed aboveground biomass (AGB) using different modelling approaches and satellite images; (LM is linear model; RF, random forest; XGBoost, extreme gradient boosting; R 2 and RMSE are the determination coefficient and the root mean square error)
The selection of the prediction methods is very crucial and has significant influence on aboveground biomass prediction.Among the three modelling approaches in this study,the XGBoost performed best with the highestR2and the lowestRMSE, followed by the RF and LR models.It might be expected that the LR model would have the poorest performance as it cannot accurately describe the complex nonlinear relationship between aboveground biomass and remote sensing data (Li et al.2020).Compared with LR, the machine learning algorithms such as XGBoost and RF can handle a large number of variables and can more accurately describe the nonlinear relationship between AGB and variables (Ali et al.2015).XGBoost is an advanced GB system which is f lexible and can correct residuals to create a new tree based on the existing tree, while the trees are independent in the RF model (Friedman 2002; Chen and Guestrin 2016) This result was similar to the study by Li et al.( 2020),of which the XGBoost had the best performance, followed by the RF and LR models.
The predicted forest aboveground biomass in this study showed great divergency with the global biomass products in terms of total AGB and its spatial variability (Fig.7).This difference may be attributed to: (1) the large scale of global biomass products and the few observed values in specific areas (Su et al.2016); and, (2) complex terrain conditions in this study area.These results indicate that more observations in areas with complex terrains are required to improve the accuracy of AGB prediction.Therefore, using a global AGB product to estimate regional biomass should be evaluated first.Considering the spatial transferability of biomass models, the accuracy of global biomass products in local areas needs to be further improved (Cutler et al.2012).Therefore,our study could contribute to enriching information on the spatial distribution of biomass in specific regions and to reevaluating the application of global biomass products at regional scales.
Fig.7 Differences between XGB-AGB (Predicted by the XGB model) and a ORNL2010-AGB (product derived from the Oak Ridge National Laboratory in 2010), and b CCI2017-AGB (Product derived from the European Space Agency’s Climate Change Initiative Biomass team in 2017) for the Jiuzhaigou National Nature Reserve
Estimating biomass using remote sensing data remains a challenge, especially in high biomass areas where there are a number ofinfluencing factors including topography,soil conditions, and forest structures (Vafaei et al.2018).Data saturation is also a problem that has not been completely resolved (Lu et al.2014).In this study, the combination of Sentinel-2B and Landsat 8 OLI to solve this problem was used and good results were obtained.However, a large amount of data is needed to verify whether this method can be applied to other regions, due to the regional limitations of aboveground biomass estimation (Liang 2007).Even for the best-performing XGBoost algorithms, the problem of overestimation or underestimation still remains.This is determined by the decision tree component of the two machine algorithms, XGBoost and RF, which cannot be extrapolated out of the training set (Stelmaszczuk-Górska et al.2016).Another reason may be the insufficient number of sample plots and the lack of the model of classification of biomass levels and forest types.However, when using observations of different forest types to verify the models, it was found that the model could only successfully estimate the AGB of coniferous forests (Fig.S6) and mixed forests (Fig.S7),while the prediction of broad-leaved forests AGB was poor(Fig.S8).This may be attributed to the different spectral characteristics and the few broad-leaved forests training samples used in model development.
On the other hand, although different modelling approaches were compared to model AGB and achieved a good result, further studies are also required.For example,54 plots were included in the current study, however, the uneven distribution could be a limitation.Therefore, including more observations that have wider spatial coverage would be an important step to further improve the accuracy to predict regional AGB in the Jiuzhaigou Natural Reserve.In addition, more advanced modelling approaches, e.g., deep learning approaches, should be also considered.The limited number of field observations constrained our capability to use deep learning to predict aboveground biomass in the current study.
Despite the limitations, this study demonstrated that the combination of Sentinel-2B and Landsat 8 OLI data improves aboveground biomass prediction.It also showed that machine learning algorithms such as RF and XGBoost outperformed the classical LR algorithm for aboveground biomass prediction.
LR, RF and XGBoost approaches were used to model forest aboveground biomass across the Jiuzhaigou National Nature Reserve, combining Sentinel-2B and Landsat 8 OLI imagery and forest inventory, which has significance to estimate regional carbon cycling and evaluate global biomass.Specifically, (1) aboveground biomass in forest ecosystems varied from 61.7 to 475.1 Mg ha-1with an average of 180.6 Mg ha-1, indicating considerable spatial variability; (2) machine learning algorithms performed better than the LR model, and the XGBoost model performed best; (3) regardless of modelling approaches,the combination of Sentinel-2B and Landsat 8 OLI data improved predicting accuracy.This highlights value of above ground biomass integrating different remote sensing data to improve aboveground biomass prediction accuracy using machine algorithms, particular in mountainous areas; and, (4) significant differences were observed between modelled aboveground biomass and ORNL2010-AGB/CCI2017-AGB.This indicated that more observations in areas with complex terrain should be included to improve the accuracy of global aboveground biomass products, and using the global biomass to estimate regional biomass should be evaluated first.
Journal of Forestry Research2022年4期