BP neural networks and random forest models to detect damage by Dendrolimus punctatus Walker

2020-01-18 15:29:14ZhanghuaXuXuyingHuangLuLinQianfengWangJianLiuKunyongYuChongchengChen

Journal of Forestry Research 2020年1期

Zhanghua Xu·Xuying Huang·Lu Lin·Qianfeng Wang·Jian Liu·Kunyong Yu·Chongcheng Chen

Abstract The construction of a pest detection algorithm is an important step to couple‘‘ground-space''characteristics,which is also the basis for rapid and accurate monitoring and detection of pest damage.In four experimental areas in Sanming City,Jiangle County,Sha County and Yanping District in Fujian Province,sample data on pest damage in 182 sets of Dendrolimus punctatus were collected.The data were randomly divided into a training set and testing set,and five duplicate tests and one eliminating-indicator test were done.Based on the characterization analysis of the host for D.punctatus damage,seven characteristic indicators of ground and remote sensing including leaf area index,standard error of leaf area index(SEL)of pine forest,normalized difference vegetation index(NDVI),wetness from tasseled cap transformation(WET),green band(B2),red band(B3),near-infrared band(B4)of remote sensing image are obtained to construct BP neural networks and random forest models of pest levels. The detection results of these two algorithms were comprehensively compared from the aspects of detection precision,kappa coefficient,receiver operating characteristic curve,and a paired t test.The results showed that the seven indicators all were responsive to pest damage,and NDVI was relatively weak;the average pest damage detection precision of six tests by BP neural networks was 77.29%,the kappa coefficient was 0.6869 and after the RF algorithm,the respective values were 79.30%and 0.7151,showing that the latter is more optimized,but there was no significant difference(p ＞0.05);the detection precision,kappa coefficient and AUC of the RF algorithm was higher than the BP neural networks for three pest levels(no damage, moderate damage and severe damage). The detection precision and AUC of BP neural networks were a little higher for mild damage,but the difference was not significant(p ＞0.05)except for the kappa coefficient for the no damage level(p ＜0.05).An‘‘over-fitting''phenomenon tends to occur in BP neural networks,while RF method is more robust,providing a detection effect that is better than the BP neural networks.Thus,the application of the random forest algorithm for pest damage and multilevel dispersed variables is thus feasible and suggests that attention to the proportionality of sample data from various categories is needed when collecting data.

Keywords BP neural networks·Detection precision·Kappa coefficient·Pine moth·Random forest·ROC curve

Introduction

Dendrolimus punctatus Walker(pine moth)is a forest defoliator,which causes the most extensive and harmful damage in southern China during periodic outbreaks.It represents an important model for ground work to rapidly and accurately monitor pest damage and elucidate the tree response mechanism and establish algorithms to detect the damage.Remote sensing techniques have been studied for this purpose for many years(Cui et al.1997;Xu et al.2012)because the pest response mechanism can be studied from two levels,the ground and remote sensing.Remote sensing can reflect characteristics at the pixel scale and can be influenced by factors such as atmosphere,terrain and mixed surface feature information.The ground includes several levels,such as host tissue,individual,forest stand and so on,and has a distinct advantage for the pine forests external representation.Therefore,the coupling of groundspace characteristics is the main way to monitor pest damage by remote sensing at the present time.How can effective coupling of the ground-space be achieved?The method is bound to rely on effectively coupling the data with the construction of pest damage detection algorithm that will characterize the pest damage response.A variety of mathematical algorithms have been favored for detecting plant diseases and pest damage.Park and Chung(2006)used supervised and unsupervised artificial neural networks to predict pest damage levels in pine forests.Li et al.(2010)introduced four modeling approaches-classification and regression trees(CART),genetic algorithm for rule-set prediction (GARP), maximum entropy method(Maxent), and logistic regression (LR)-to predict the incidence area of Bursaphelenchus xylophilus based on Web and geographic information system(GIS)data.Luo and Wang(2011)used ArcObjects and Visual Basic as the secondary development platforms to forecast the incidence period,amount,scope and severity of pests and diseases based on the Markov chain.Xu et al.(2014)integrated multidimensional information of the canopy spectrum of pine forests,climate,terrain,forest stand,pest population source,and human-environment and used Fisher discriminant analysis for the valid prediction of different severities of damage by D.punctatus.Kantola et al.(2016)developed a monitoring method for tree mortality caused by hemlock woolly adelgid(Adelges tsugae Annand,HWA)by means of a decision tree and support vector machine.In addition,cellular automaton has also been used to construct a propagation model of plant pests and diseases for simulating epidemics(He et al.2013;Peixoto et al.2014).Compared with forestland,the climate,terrain and vegetation of farmland are relatively simple,human controllability is stronger; therefore, forecasting models for agricultural pests and diseases have been more successful(Patil and Mytri 2013;Agatz et al.2017)and can provide an effective technical reference for the detection and prevention of forest pests and diseases.

Among the numerous approaches,the artificial neural networks model has been widely used in fields such as geography,meteorology,and forestry due to characteristics such as distributed storage, error tolerance, massively parallel processing,self-learning,self-organizing,adaptability,complex nonlinear dynamic system and ability to deal with complicated and ambiguous problems(Haddad et al.2013;Chen et al.2013;Lee et al.2013;Tomassetti et al.2013).Among the numerous neural network models,the BP neural networks model is considered the essence of artificial neural networks and was proposed by a group of scientists headed by Rumelhart and McCelland in 1986.It has been used to predict and forecast damage by D.punctatus.For example,by trapping D.punctatus adult males from the overwintering generation,Zhang et al.(2001)took environmental factors such as the height of pine tree,slope direction,plant type aspect as forecast factors and the number of adult males trapped as the occurrence in the field,then built BP neural networks of the D.punctatus classification forecasts and compared the model with the LOGIT model.Chen et al.(2003)constructed BP neural networks among incidence area,population density,and damage rate of D.punctatus and related meteorological factors to effectively predict pest incidence.On the whole,most of the existing results take the area of pest damage as the dependent variable and use BP neural networks to obtain a regression model among variables,while there are relatively few results of classification forecast using various pest levels and other discrete values as the objective.In 2001,Breiman and Cutler developed a new method of data mining based on classification and regression trees-random forest (RF) (Breiman 2001),which was rapidly applied to medicine,economics,management,ecology and other fields.Others have demonstrated its capability to predict and simulate geological disasters such as landslides(Chen et al.2014;Youssef et al.2015;Provost et al.2017).However,there are few reports on using this method to predict pest and disease outbreaks.The existing RF method categorizes the object into either‘‘yes''or‘‘no''states,but there is little research on multi-level state identification.

In pest monitoring or forecasting,we not only need to know whether pests occur in the forest,but also need to grasp the level and degree of pest incidence to help classify different disaster levels and develop targeted measures for forest protection and quarantine to prevent an increase in the areas with moderate and severe damage and prevent further deterioration in areas with mild damage.Therefore,in the present study,we used D.punctatus damage as an example for constructing BP neural networks and RF models that used various pest levels as dependent variables to test the accuracy of these two algorithms in detecting pests and diseases and other multilevel dispersed variables.Our comprehensive comparisons of these two detection methods will provide references for monitoring forest pests and diseases.

Materials and methods

Description of experimental areas

The selected experimental areas were four counties(districts or cities)in Fujian Province:Sanming City,Jiangle County,Sha County and Yanping District(Fig.1),with a total area of about 7900 km2(26°01′-27°04′N,117°05′-118°40′E).The climate is subtropical monsoon with annual rainfall of 1500-2100 mm,mean annual temperature of 14-20°C,mean summer temperature of 26-29°C,mean winter temperature of 8-13°C,and mean annual daylight hours of more than 1600 h.The highest altitude is over 1500 m asl.The experimental areas are located between the Wuyi Mountains and Daiyun Mountains.There are rolling hills and mountains range across its length and breadth,which is convenient for forestry.Thus,as an important forestry area in Fujian Province,it is a good experimental area for developing comprehensive forest reforms in southern China.The forest coverage in the four places is over 75%and over 85%in Jiangle County,the highest in Fujian Province.Pinus massoniana is one of the main coniferous species within an area of numerous mountains and complex topography,and the temperature,precipitation,humidity,daylight and other climatic elements are suitable for the growth,development and periodic outbreaks of D.punctatus,the major pest in the area.The outbreaks cause significant economic losses and are a severe threat to forest health and ecological stability.

Fig.1 a Location and b,c remote sensing images of experimental areas and distribution of measuring points

Field investigation

The forest stand,topography and other factors in the P.massoniana forest plots were investigated in the experimental area from February to March 2012.For collecting samples,12 subcompartments were chosen in each county(district or city)and four fixed monitoring points were set in each subcompartment where the values of elements such as forest stand and topography were measured and recorded.It set‘‘↑↓↓↓↓'',namely,collecting one value for the canopy and four values under canopy and obtaining a leaf area index(LAI)for each monitoring point with an LAI-2000 Plant Canopy Analyzer (LI-COR, Lincoln, NE,USA).At the same time,the corresponding standard error of leaf area index(SEL)was recorded and the position fixed using two handheld GPS(Magellan,San Dimas,CA,USA).The data were recorded when the results of two systems were basically consistent.In all,182 measuring points samples were used;their distribution is illustrated in Fig.1.

During the field investigation,D.punctatus was in its overwintering period,so it was difficult to identify the pest level by assessing leaf loss.Rather population density of pests on tree trunks and the level the previous year were used to estimate the pest level as Average population density=Total number of pests/Total number of surveyed plants.According to the State Forestry Administration‘‘Standard of Forest Pests Occurrence and Disaster''(LY/T 1681-2006),the corresponding relationship between the population density of pests and pest level is 0-4 pests per tree is non-damage(or basically without damage),5-13 pests per tree is mild damage,14-30 pests per tree is moderate damage,and over 31 pests per tree is severe damage.Because pine needles grow slowly during that season,the collecting data of the investigation can reflect the damage level in the pine forests from the previous year.If the pest level of the last year is almost consistent with this year's,the pest level can be identified and the pest level for those 2 years used as the standard for the selected experimental data.For leaf-eating pests,the first generation generally causes the most severe damage and reflects the degree of occurrence and damage,so compared with the first generation and second generation,the survey during the overwintering generation is more important.In the field survey,data for the overwintering generation are also used to predict the pest situation for that year,according to the forestry department's reporting rules and provides very important data for the pest long-term monitoring.It also provides decision references for forest protection and quarantine measures.

Acquisition of remote sensing images and preprocessing

Considering the biocycle of D.punctatus,we selected a Landsat 7 ETM+remote sensing image that had fuller coverage of the experimental areas and fewer clouds from the end of October 2011 to early March 2012.The satellite transited on December 20,2011,and the track number was 120/042.Radiometric calibration,FLAASH atmospheric correction,destriping,geometric correction,and cropping to the multi-spectral data were done to obtain a preprocessed image.The type of image projection was Transverse Mercator,the ellipsoid was Krasovsky and spatial resolution was 30 m(Fig.1).Because the satellite passed and field investigation was done during overwintering generation of D.punctatus,and changes to the pine forest were slight,the image could reflect the pest damage situation in 2011 and the use of‘‘ground-space''data shows good synchronicity.,

Characterization of Dendrolimus punctatus damage to the host and acquisition of characteristic indicators

Abundant analyses of forests attacked by D.punctatus have led forestry researchers to conclude that the appearance of forests with D.punctatus damage differs significantly from that of healthy forests.When damage from D.punctatus feeding on pine needles,the main source of damage on its host pine plants,is serious,the pine needles are eaten completely,leaving the trees naked and appearing yellow and black as if scorched at a distance(Fig.2).Obviously,either the pine leaves or the greenness of the damaged trees will be reduced.D.punctatus damage is called‘‘forest fire without smoke''because the trees die from losing a lot of water as a result of the extensive loss of leaves,and the forest form also changes at the same time(Xu et al.2008).This change in the forest form represents degradation of the pine forest ecosystem.The change in leaf volume,greenness,humidity,forest form and other characteristics also reflect differences in the spectral characteristics of leaf tissue,individual tree,forest canopy,etc.On the basis of the characteristics of the damaged pine forests and previous studies(Xu et al.2014,2018),we selected indicators of leaf area,uniformity,greenness,and humidity,that provide characteristic bands for assessing the characteristics.Ground measurements and remote sensing extraction of features to obtain the relevant characteristic indicators.

1. Leaf area component:the leaf area index(LAI)is a main indicator to measure the leaf area of the plant.It not only directly reflects the energy in the plant canopy and the condition of CO2and physical environment,but also reflects the dynamic characteristics of plant growth,development and health status(Martinez et al.2013;Wong et al.2013).Therefore,LAI is considered an important indicator to survey pest damage,especially to predict damage by the defoliators(Wang et al.2010).

Fig.2 Photographs of different levels of Dendrolimus punctatus and damage in a pine forest

2. Uniformity component:the standard error of leaf area index(SEL)is also an indicator measured by the LAI-2000 instrument to measure discrete degrees in different directions under the plant canopy.The higher the SEL is, the greater the difference between the measured values is greater and the less uniform is the density of the canopy in different directions.The lower the SEL is,the smaller the difference between the measured values and the more uniform is forest form at the measuring points;that is,higher SEL shows that the LAI is more discrete and that the forest form is less uniform, and vice versa (Xu et al.2013a,b).Therefore,this indicator can reflect the uniformity of pine forests.

3. Greenness component:the characterization of greenness estimates the green content in the vegetation.Many indicators measure greenness, such as the normalized difference vegetation index(NDVI),transformed normalized difference vegetation index(TNDVI),modified chlorophyll absorption reflectance index(MCARI)and greenness component from the tasseled cap index.Of these greeness indicators,NDVI is used more extensively and is quite sensitive to changes in the growth of vegetation.NDVIis calculated as

where NIR is the reflectivity of near-infrared band and R is the reflectivity of red band.

4. Humidity component: because extremely damaged pine trees have lost their needles and will wither and die,leaf humidity should also considered in construction of the remote sensing characteristic index for D.punctatus damage. The main methods for remote sensing monitoring of vegetation humidity are (1)extracting evapotranspiration information from thermal infrared data to reflect changes in a canopy under humidity stress,(2)establishing a related model of vegetation index and humidity content according to an analysis of a multi-temporal series of a vegetation index(e.g.,NDVI),(3)estimating humidity content of vegetation can be estimated by the temperature difference between the canopy and the air,(4)establishing a remote sensing index(e.g.,GVMI)to reflect information on water content and humidity;(5)using wetness from a tasseled cap transformation to represent the humidity component of the pine needles.The wetness computing formula of Landsat 7 ETM+multi-spectral data is:

where WET is wetness from the tasseled cap transformation;B1is the reflectivity of the first band of the remote sensing image,and so on.

5. Characteristic bands component:in the field of spectral detection of pests,the red edge region is widely used.The so-called ‘‘red edge'' refers to the optimal wavelength region for reflectance from pigments in green plants(680-780 nm)and is an important characteristic for vegetation spectral analysis(Pu et al.2003;Cho et al.2012).The green band B2,red band B3and near-infrared B4were selected as the characteristic bands by correlation analysis(Xu et al.2013a,b)and correspondence of the Landsat Enhanced Thematic Mapper data+multispectral remote sensing image with the red edge data.

The LAI and SEL were obtained from ground measurements.Based on data from the remote sensing images,NDVI was calculated from Eq.1,WET was obtained using the tasseled cap transformation,and characteristic bands B2,B3,B4were extracted from the multi-spectral image using ERDAS software(Intergraph,USA).The formula x′=(x-xmin)/(xmax-xmin),where x′is the normalized value,x is the original value,xminis the minimum value of x in the area,and xmaxis the maximum value of x in the area,was used as a normalization method to remove the influence of the scale and dimension of the different component indicators and obtain a value in the range of 0-1(Fig.3).Based on the coordinates of the measuring points,the corresponding values of remote sensing characteristic indicators were obtained and summarized in a table with the ground characteristic indicators and pest levels.

Algorithm for assessing Dendrolimus punctatus damage

Variable setting

Seven characteristic indicators-leaf area index (LAI),standard error of leaf area index(SEL),normalized difference vegetation index(NDVI),wetness from tasseled cap transformation(WET),and characteristic bands B2,B3,and B4were used as independent variables.The four levels of damage from D.punctatus(no damage,mild damage,moderate damage and severe damage)were the dependent variables.

BP neural networks

Back propagation neural networks is a multilayer feedforward network in the light of an error back algorithm,in which the signal transmits forward and the error propagates backward.In forward transmission,the imported signal is manipulated by an input layer then a hidden layer until it reaches the output layer.Each neuron state can only affect the next neuron.If the output layer fails to come to the expected output,namely,the deviation is still greater than the presupposition,it will turn to error back propagation.It can make the predicted output constantly approach the expected output by adjusting weight and threshold values of the network according to the deviation.Similar to the Windrow-Hofflearning algorithm,the standard BP neural networks adopts a gradient descent algorithm in which network weights can adjust conversely along the gradient of performance function(Wang and Xiong 2013).The common BP neural network is a 3-layer network with‘‘I input layer+1 hidden layer+1 output layer'';see the structure diagram in Fig.4.

Random forest

Random forest is also known as the random forest classifier,whose function mechanism is to draw k samples from the original training set by the bootstrap method and ensure each sample is the same size as the original training set,then construct a decision tree model for each sample,so that the random forest will be composed of the above obtained k modeling results.Finally,through all modeling results of the decision tree,the final classification result will be obtained by voting(Fig.5).

RF constructs different training sets to increase the difference between classification models to improve the extrapolated prediction ability of the compound classification model.By using k-round training,a classification model sequence{h1(X),h2(X),…,hk(X)}is obtained and used to construct a multi-classification model system.The final classification result of the system adopts the simple majority voting method.The final classification decision is Eq.3:

where H(x)is the compound classification model;hiis a single decision tree classification model;Y is the output variable(or target variable);I(·)is the indicator function.This formula illustrates the use of the majority voting decision tree to determine the final class(Svetnik et al.2003;Fang et al.2011).

Fig.3 Characteristic indicators to assess host damage from Dendrolimus punctatus

Given an ensemble of classifiers {h1(X), h2(X), …,hk(X)},and with the training set drawn at random from the distribution of the random vector Y,X,define the margin function as Eq.4:

The margin measures the extent to which the average number of votes at X,Y for the right class exceeds the average vote for any other class.The larger the margin,the more confidence in the classification.The generalization error is given by Eq.5:

where PE*indicates the generalization error,the subscript X,Y indicate that the probability is over the X,Y space.

In a random forest, hk（X）=h（X，Θk）. For a large number of trees,it follows from the Strong Law of Large Numbers and the tree structure that(Eq.6):

This result explains why the random forest algorithm does not over-fit as more trees are added,but produces a limiting value of the generalization error(Breiman 2001).

Group design and algorithm evaluation

The 182 samples were randomly divided into a training set at 70%(number of samples=127)and testing set at 30%(number of samples=55).This process was repeated five times and five different sample groups were obtained and marked as tests 1-5 after the respective calculation by the BP neural networks and RF model.After ranking according to the importance of each factor in the five models and eliminating one independent variable, these two pest damage detection algorithms were performed and the result marked as test 6.The detection effects of the two algorithms were then analyzed from the following aspects:

1. Detection precision:to check the perception of the correct number of samples in total number of samples and calculate the detection precision of training test and testing set by two algorithms.

2. The kappa coefficient:is an indicator to check the consistency, which can be obtained by adding all results of total number of samples in the real classification multiplying diagonal lines of confusion matrix,then subtracting all the products of the total real number of real samples of some ground factor and total number of misclassified samples,then dividing that result by the result of the square of the total number of samples minus all the products of the total real number of real samples of some ground factor and total number of wrong samples(Eqs.7,8):

Fig.4 Three-layer BP neural network structure. Note:xi is the neuron node in the input layer;hj is the neuron node unit in the hidden layer;yk is the neuron node in the output layer;iw is the weight matrix between the input layer and the hidden layer;lw is the weight matrix between the hidden layer and the output layer;bj and bk are threshold values of the neural networks

where K is the kappa coefficient;Pois proportion of observed agreement;Peis the proportion of chance agreement;aiiis the number of correctly classified samples of class i;n is the total number of samples;N is the class number.

3. Receiver operating characteristic(ROC)curve:according to a series of various dichotomies,the curve sets true positive rate(sensitivity)as an ordinate and false positive rate(1-specificity)as the abscissa.Detection effect will be analyzed by calculating the area under the curve(AUC).When the AUC value is in the range of 0.5-1,the closer the value is to 1,the more accurate the detection:0.5-0.7,lower accuracy;0.7-0.9,moderate accuracy;＞0.9,higher accuracy.

Fig.5 Schematic diagram of the random forest model

4. Paired t test:aggregates the samples of the training set and the testing set and,respectively,calculates the detection precision,kappa coefficient and AUC for the four pest levels.The results of the two algorithms are checked by paired t test,then the detection precision,kappa coefficient,value of t and P of all levels are calculated and the significance level of any difference determined.

Results and discussion

Dendrolimus punctatus damage detection based on BP neural networks

There four main steps to build BP neural networks.(1)Determination of the number of neuron nodes in the input layer and output layer.On the basis of the host characterization of D.punctatus damage and characteristic analysis,the seven indicators-LAI,SEL,NDVI,WET,B2,B3,and B4-were obtained,and seven neuron nodes were in the input layer.Dependent variables were no damage,mild damage,moderate damage and severe damage-four discrete variables,i.e.,four categories-so the number of neuron nodes in the output layer was 4.(2)Determination of the number of neuron nodes in the hidden layer.If there are not enough neuron nodes in hidden layer,the neural networks cannot be fully trained because the training time is too brief;thus,the precision will be lowered.If there are excessive neuron nodes in the hidden layer,error tolerance will be lowered and pose a burden of additional computation.Therefore,it is important to choose the appropriate number of neuron nodes in the hidden layer,which is related to the predicted effects.The number of neuron nodes in the hidden layer can be determined using the empirical formula+c,where j is the number of neuron nodes in the hidden layer and i and k is the number of neuron nodes in the input layer and the output layer,respectively.That is,i=7,k=4 and c is the natural integer during[1,10],and after calculating,the result is j ∈[5,13].Figure 6 shows that when j increases from 5 to 10,the error appears as an overall decline characteristic;when j ＞10,the error increases.After testing,the number of neuron nodes in hidden layer is determined as 10.(3)Creation,training and simulation of networks.A 3-layer BP neural networks with the structure‘‘7-10-4''was built using newfffunction in Matlab.The training function was set as Trainlm,i.e.,the Levenberg-Marquardt algorithm.The adaptation learning function and performance function chose Learngdm and MSE.The transfer function of the hidden layer and output layer was Logsig and Purelin respectively;trainParam.goal=0.01.Unnecessary numerical problems and more convenient calculations were avoided by normalizing the seven indicators of the input layer,making all values of the input layer in the range[0,1]and using[1,0,0,0],[0,1,0,0],[0,0,1,0]and[0,0,0,1]to represent no damage,mild damage,moderate damage and severe damage in turn.

Fig.6 Relationship between error and number of neuron modes in hidden layer

The seven neuron nodes of the input layer map to the hidden layer through connection weights,then 10 neuron nodes are generated.The weight matrix reflects the corresponding relation between the input layer and the hidden layer,which is similar to the theory underlying methods such as the principal component analysis.The weight can reflect the contributions of factors;therefore,it adds all values of each factor,and the obtained value can be used to assess the importance of characteristic indicators.Considering that there are positive and minus weights,their absolutes should be added.Figure 7 shows that in five tests,the connection weights of three original bands(B2,B3,and B4)of the remote sensing image rank in front,SEL ranks at last place twice and NDVI ranks at last place three times so the tests are performed after eliminating NDVI.The MSE of six tests are as follows:0.00877,0.00899,0.00988,0.00938,0.00995,0.00944,all of which reach the convergence objective of 0.01.The testing set data is put into simulation,and the simulated pest level is output after using the well-trained neural networks.

Detectino of Dendrolimus punctatus based on the RF algorithm

In the random forest program package(varSelRF)of R software,the original values for the seven characteristic indicators were set as independent variables,the four pest levels as dependent variables,the number of decision trees as 5000(ntree=5000),and the node segmentation variable as 5(mtry=5).Using varImpPlot,the indicators of the samples were ranked according to importance.Figure 8 shows that in test 1-5,the three original bands from the remote sensing image were ranked as the first three places,which shows that image information has the potential to detect the level of pest damage.The importance of LAI and WET was second to that,then SEL and NDVI.During the five tests,SEL was ranked last once,and NDVI was ranked last four times.Test 6 was performed after NDVI was eliminated. The random forest model was constructed based on the testing set data and predicted which category the testing set samples belonged to.The RF method is based on random thought,and the constructed model is like an individual‘‘tree'',which constitutes a‘‘forest''.The detection model is abstract,but the detection precision of the model is concrete.

Fig.7 Sum of absolute values of connection weights from input layer to hidden layer in BP neural networks.a Test 1,b test 2,c test 3,d test 4,e test 5 and,f test 6

Analysis and comparison of pest damage detection effects

Detection precision and kappa coefficient

Fig.8 Importance sorting of characteristic indicators by the RF algorithm.a Test 1,b test 2,c test 3,d test 4,e test 5 and f test 6

Table 1 Detection precision of pest damage and kappa coefficient for the two algorithms

The detection precision and kappa coefficient(Table 1)of the pest levels in the training set and testing set in test 1-6 were respectively counted using the two algorithms.Comparing the detection precision of test 1-5,it can be seen that all detection precisions of training set by BP neural networks were above 74%,the average of the five tests was 80.79%;all detection precision of testing set was above 61%(mean:68.73%),substantially lower than the training set.For the RF algorithm,all detection precisions were above 74%(mean:77.95%);all detection precisions of the testing sets were above 78%(mean:81.82%),which proves that the precision of the testing set was a little higher than training set.After NDVI was eliminated,the precision of the training set by BP neural networks was 82.68%,the precision of the testing set was 67.62%,which is in the middle level of 6 tests.However,the precision of the training set by the RF algorithm rises to 82.68%,the highest of the six tests,while the precision of the testing set fell to 74.55%,the lowest of six tests.There were no obvious differences in the kappa coefficient of the training set among the five tests using these two algorithms,and the kappa coefficient was almost equal in test 6,but the RF algorithm result was higher than with the BP neural networks in terms of the testing set.Therefore,(1)either the detection precision or kappa coefficient of the training set by the BP neural networks was higher than for the testing set,while there was little difference in the kappa coefficient between the training set and testing set using the RF algorithm.(2)In terms of the training set,the detection precision and kappa coefficient of BP neural networks are higher than RF algorithm,but in terms of the testing set,the latter is higher than the former.(3)Judging from these two indicators,the RF algorithm had a better capability than the BP neural networks to detect D.punctatus damage,especially for generalization capability.

ROC curve

The training set and testing set were combined to draw ROC curves for the four pest levels,two algorithms and six tests and calculate the AUC.As shown in Fig.9,(1)for no damage,the AUC values for the six tests using the RF algorithm were greater than with the BP neural networks.Except for RF3(i.e.,test 3 of RF algorithm),the other five tests showed that the RF algorithm had higher accuracy in predicting no damage.BP2,BP3 and BP5 showed that the BP neural networks had a moderate level of accuracy in predicting no damage.(2)For mild damage,the average AUC for the six tests was nearly the same using the two algorithms,achieving a moderate accuracy of detection of mild damage.(3)The two algorithms also had a moderate accuracy for detecting moderate damage.The AUC of BP2 was the highest,but the RF algorithm was higher than the BP neural network in general.(4)For severe damage,the AUC values of two algorithms did not differ significantly in rank distinctions,except that AUC of BP1 was 0.910;the others were in the range 0.7-0.9,showing that these two algorithms have detection capability for severe damage.Both the BP neural networks and the RF algorithm had average AUCs above 0.7 for each pest level,indicating that the two algorithms can be used to detect the four damage levels.The detection precision differed for the different damage levels;precision was best for no damage,the worst for mild damage. Aside from mild damage detection,detection of the other levels by the two algorithms was nearly identical;the AUC using the RF algorithm was greater than with the BP neural networks in general;thus,the RF algorithm was better than the BP neural networks at detecting each level of pest damage.

Paired t test

The analyzing results of the detection precision,kappa coefficient and ROC curve showed that the RF algorithm was better than the BP neural networks for detecting levels of D.punctatus damage,but some indicators for the BP neural networks also had advantages several times.When using a paired t test to further analyze differences in detection precision,kappa coefficient and AUC values of the two algorithms,the data were divided into four pest levels and all levels(all samples),and the average of each indicator in six tests was determined.Table 2 shows that(1)for no damage,the detection effect of RF algorithm was better than that of the BP neural networks and a significant difference in the kappa coefficient(p ＜0.05),but not in detection precision and AUC(p ＞0.05).(2)For mild damage,the detection precision and AUC of BP neural networks was a little higher than with the RF algorithm,and the detection effect was also better,but the difference between two algorithms was not significant(p ＞0.05).(3)RF algorithm had better detection capability than did the BP neural networks for moderate and severe damage,but the difference was not significant(p ＞0.05).(4)Generally,the RF algorithm had a better detection effect than did the BP neural networks,the difference was not significant(p ＞0.05).

Fig.9 ROC curves and AUC values for the tests using the BP and RF algorithms.a No damage detection,b mild damage detection,c moderate damage detection and d severe damage detection.Note:the mean AUC values for BP and RF,respectively,were in a 0.899 and 0.913,b 0.800 and 0.799,c 0.836 and 0.869,d 0.836 and 0.850

Discussion

Pine forests with D.punctatus damage differ in appearance from healthy pine forests.A critical issue for effective detection of pest damage is selecting characteristics thatrespond sensitively enough to act as indicators of the level of pest damage.Here,we selected seven relevant characteristic indicators,including leaf area,uniformity,greenness, humidity and characteristic wavelengths. The experimental results proved that the responsiveness of the indicators the levels of pest damage.This multidimensional information provides a reference for detecting forest diseases and pest damage,especially from defoliators.Using the seven indicators,the two algorithms showed that NDVI was the weakest indicator and after it was removed,the two algorithms still maintained stable detection of pest damage.The NDVI is an important indicator of greenness.For example,when Coops et al.(2006)used QuickBird high spatial resolution images to detect red-attack damage from mountain pine beetles infestation,differences in the NDVI could delineate red attack,faders,non-attack and shadowed crowns in the images,but NDVI was not as good as the red-green index to detect pest damage.Jepsen et al.(2009)used MODIS-NDVI data to monitor the spatiotemporal dynamics of geometrid moth outbreaks in birch forests.Therefore,NDVI is often considered for monitoring pest damage by remote sensing.The NDVI is the arithmetical combination of the red band B3and near-infrared band B4from the remote sensing image and is thus strongly correlated with these two bands.It is also closely related to LAI(Stenberg et al.2008;Capodici et al.2013;He et al.2016),which is the intrinsic reason why NDVI was eliminated in the six tests.Compared with the two ground indicators LAI and SEL,the green band,red band and nearinfrared band from the remote sensing image play a more important role in constructing the model,which shows that remote sensing image has potential to effectively monitor D.punctatus damage and can also lay the foundation to further couple the‘‘ground-space''characteristics for rapid,accurate remote sensing monitoring.

Table 2 Paired t test of damage detection indices for the BP neural networks and the RF algorithms

Another critical issue is constructing the detection algorithm after the analysis and selection of characteristic indicators.Random forest was applied to D.punctatus damage detection,which expands its application field and verifies its predictive capability for multilevel dispersed variables.The RF algorithm is an efficient processing algorithm due to its faster learning procedures.It has a stronger robustness to noise from data concentration and is not sensitive to multiple collinearity(Zhang et al.2014).BP neural networks are prone to falling into a local optimum with lessening generalization ability and‘‘over-fitting''(Li et al.2002;Zhang et al.2016).Table 1 shows that the detection precision and kappa coefficient of testing set by the BP neural networks are lower than those of the training set within the same test,which suggests that BP neural networks have a stronger learning ability,and the training model is beyond the rules of the original samples,so it weakens the generalization ability of the model.There was no tilting among detection precision and kappa coefficient of training set and testing set in the six tests by the RF algorithm, which appears to be more robust. The mechanism of random forest can effectively avoid the‘‘over-fitting''phenomenon,which is consistent with our results.The detection precision of the RF algorithm for severe damage was the lowest among the four pest levels due to its main disadvantage of tending to categories with many observations(Wang et al.2015),but there were relatively few samples of severe damage in this study.Therefore,attention should be paid to the proportionality of sample volumes in each category when using RF algorithm to detect pest damage.

The two algorithms have the characteristic‘‘gray box''with an ambiguous internal model and stronger learning ability,widely used in many application fields.The corresponding variables should be normalized when using the BP neural networks because its training effect is easily affected by network layer,node number,transfer function,training function,adaption learning function and other parameters,while the RF model only needs two parameters,ntree and mtry,which is more efficient and simpler than the other algorithm.

Conclusion

The absolute values of connection weights from the input layer to the hidden layer in the BP neural networks and the importance sorting of random forest show that the seven characteristic indicators(LAI,SEL,NDVI,WET,B2,B3and B4)were responsive to D.punctatus damage,although the response of the NDVI was relatively weak.They can thus be used for rapid remote sensing monitoring and accurate detection of pest damage.

Although both the BP neural networks and random forest have detection capability for D.punctatus damage,the detection precision,kappa coefficient and AUC of the RF algorithm were higher than for the BP neural networks,on the whole.Therefore,the pest detection effect of the RF algorithm is superior.

In terms of pest levels,the detection precision,kappa coefficient and AUC for no,moderate and severe damage was highest using the RF algorithm,except that there was a significant difference in the kappa coefficient for no damage(p ＜0.05);none of the other differences were significant (p ＞0.05). Although the detection precision and AUC for mild damage using the BP neural networks was slightly higher than with the RF algorithm and the detection effect was also a little better,the difference was not significant(p ＞0.05).

Both algorithms belong to the‘‘gray box''model,and the RF algorithm is more robust.Not only can it be use for pest detection,its application can also be expanded to detecting multilevel dispersed variables,which provides a reference for detecting forest diseases and pest damage.

AcknowledgementThe authors are grateful to the National Natural Science Foundation of China (Grant Nos. 41501361, 41401385,30871965), the China Postdoctoral Science Foundation (No.2018M630728),the Open Fund of Fujian Provincial Key Laboratory of Resources and Environment Monitoring and Sustainable Management and Utilization(No.ZD1403),the Open Fund of Fujian Mine Ecological Restoration Engineering Technology Research Center(No.KS2018005)and the Scientific Research Foundation of Fuzhou University(No.XRC1345).

Journal of Forestry Research2020年1期

Journal of Forestry Research的其它文章: Status of cypress aphid on Cupressus lusitanica and Juniperus procera in protected and cultivated forests of South Wollo,Ethiopia; Urban land-use impacts on composition and spatiotemporal variations in abundance and biomass of earthworm community; Source of mycorrhizal inoculum influences growth of Faidherbia albida seedlings; Effects of phenolic acids on soil nitrogen mineralization over successive rotations in Chinese fir plantations; Nutrients in litterfall,forest floor and mineral soils in two adjacent forest ecosystems in Greece; Variation in glomalin in soil profiles and its association with climatic conditions,shelterbelt characteristics,and soil properties in poplar shelterbelts of Northeast China

亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放