XU Hui, SONG Liming, 2), *, ZHANG Tianjiao, LI Yuwei, 2), SHEN Jieran,ZHANG Min, 2), and LI Kangdi
Effects of Different Spatial Resolutions on Prediction Accuracy ofFishing Ground in Waters Near the Cook Islands Based on Long Short-Term Memory (LSTM) Neural Network Model
XU Hui1), SONG Liming1), 2), *, ZHANG Tianjiao3), LI Yuwei1), 2), SHEN Jieran4),ZHANG Min1), 2), and LI Kangdi5)
1),,201306,2),201306,3),,201306,4),518035,5)g,,200135,
Albacore tuna () is one of the target species of tuna longline fishing, and waters near the Cook Islands are a vital albacore tuna fishing ground. Marine environmental data are usually presented with different spatial resolutions, which leads to different results in tuna fishery prediction. Study on the impact of different spatial resolutions on the prediction accuracy of alba- core tuna fishery to select the best spatial resolution can contribute to better management of albacore tuna resources. The nominal catch per unit effort () of albacore tuna is calculated according to vessel monitor system (VMS) data collected from Chinese distant- water fishery enterprises from January 1, 2017 to May 31, 2021. A total of 26 spatiotemporal and environmental factors, including temperature, salinity, dissolved oxygen of 0 – 300 m water layer, chlorophyll-concentration in the sea surface, sea surface height, month, longitude, and latitude, were selected as variables. The temporal resolution of the variables was daily and the spatial resolutions were set to be 0.5? × 0.5?, 1? × 1?, 2? × 2?, and 5? × 5?. The relationship between the nominaland each individual factor was analyzed to remove the factors irrelavant to the nominal, together with a multicollinearity diagnosis on the factors to remove factors high- ly related to the other factors within the four spatial resolutions. The relationship models betweenand spatiotemporal and en- vironmental factors by four spatial resolutions were established based on the long short-term memory (LSTM) neural network model. The mean absolute error () and root mean square error () were used to analyze the fitness and accuracy of the models, and to determine the effects of different spatial resolutions on the prediction accuracy of the albacore tuna fishing ground. The results show the resolution of 1? × 1? can lead to the best prediction accuracy, with theandbeing 0.0268 and 0.0452 respective- ly, followed by 0.5? × 0.5?, 2? × 2? and 5? × 5? with declining prediction accuracy. The results suggested that 1) albacore tuna fishing ground can be predicted by LSTM; 2) the VMS records the data in detail and can be used scientifically to calculate the; 3) correlation analysis, and multicollinearity diagnosis are necessary to improve the prediction accuracy of the model; 4) the spatial re- solution should be 1? × 1? in the forecast of albacore tuna fishing ground in waters near the Cook Islands.
albacore tuna; fishing ground prediction accuracy; VMS; spatial resolution; LSTM; the Cook Islands
Albacore tuna () is an oceanic, highly migratory fish species, which is distributed in the tropical, subtropical, and temperate waters of the Pacific, Indian and Atlantic oceans (Miao and Huang, 2003; Fan., 2015). It has high nutritional and commercial value, rich in fish- ery resources (Fan., 2007). The western and central Pacific Ocean, where the Cook Islands are located, is one of the most important tuna production areas, and the out- put increases yearly (Chen., 2005). Waters near the Cook Islands are a crucial albacore tuna fishing ground. Un- derstanding the effect of prediction accuracy on the target tuna species is extremely important for exploring the rela- tionship between the spatial and temporal distribution of tu- na resources and the marine environment. The accuracy of tuna fishing ground prediction is affected by marine envi- ronment data with different spatial resolutions. Many scho- lars have paid much attention to the effects of spatial re- solutions (Turner., 1989; Wiens, 1989). Improper use of spatial resolutions may result in failure to clarify the spa- tial distribution of the study object accurately, since that might hide some important spatial and ecological informa- tion (Fengl., 2019). Therefore, adoption of an appro- priate spatial resolution helps improve the accuracy of fish- ing ground prediction.
There were few comparative studies on the prediction ac- curacy of tuna fishing ground by different spatial resolu- tions, and few marine environmental factors were applied in the study of albacore tuna fishing ground prediction and stock assessment. For example, Zainuddin(2008) only used the sea surface temperature, sea surface chlorophyll-concentration in the sea surface, and sea surface height ano- maly (SSHA) in the prediction of the relationship between albacore tuna fishing ground and ocean conditions. Com- monly used models for tuna fishing ground forecasting in- clude habitat suitability index (HSI) model, neural network model, generalized additive model (GAM), stacking model,. HSI model is applied more frequently because of its advantages other habitat models did not have (Jin., 2008). However, there were certain limitations in the ob- jectivity and comprehensiveness of habitat data acquisition, the reliability of the model, the representativeness of the overall data reflected by the sample, and the structure of the model (Gong., 2011), and there were drawbacks in its assumptions (Eastwood., 2003). The prediction accuracy could be gradually improved by continuously mo- difying the model according to the measured data. GAM could better evaluate the effects of environmental factors on stocks, but it could not accurately reflect the relation- ship between measurement factors and biological response (Thomson., 1996). Neural networks were relatively flexible in describing nonlinear relationships; however, they could not intuitively explain the relationship betweenand independent variables (Hinton and Maunder, 2003). In recent years, the stacking model has been applied in tuna fishing ground prediction and achieved good results (Song., 2021a). Moreover, in the actual application process, the generalization degree of the stacking model could not be guaranteed to be higher than that of the single model.
Long short-term memory (LSTM) is a critical product based on the improvement of recurrent neural network (RNN). It can selectively remember patterns for long du- ration of time and make full use of historical information. It has been successfully applied in many fields, such as text analysis, speech recognition, and image processing, which have promoted the development of deep learning. In recent years, it has also been applied in fishing ground forecast- ing and shown good fitness. For example, Yuan. (2021) predicted the Pacific bigeye tuna () fishing ground based on LSTM, combined with empirical mode de- composition and two-way long-term and short-term memory neural network model, and concluded that LSTM showed a higher prediction accuracy than other models. However, their study lacked reasonable explanation for thecalcula- tion.
The satellite-based vessel monitoring system (VMS) aims at fishery monitoring and surveillance, which provides po- tentially valuable information on spatial and temporal pat- terns of multi-scale fishing activities (Mills., 2007). The state and trajectory of fishing vessels can be inversed based on VMS data, so that fishing efforts can be correct- ly defined (Walker and Bez, 2010). At present, VMS data have been applied in fishery information processing such as trawlling (Murawski., 2005) and purse seine (Bez., 2011). Watson. (2018) studied how the combi- nation of VMS and associated metrics can be expanded for use in management strategy evaluation. In their study, VMS data were used to accurately calculate vessel track, speed and other information. Moreover, fishing hooks and catch numbers were not gridded, hence inaccurate estimation re- sults were caused.
often represents resource abundance in fishery re- search, assuming that it is proportional to fishery stock (Hil- born and Walters, 1992; Mangel., 1999). It has been applied in tuna fishery resource research (Maunder and Punt, 2004), and can also be used as an important basis for eva- luating fishery and resource abundance (Chen., 2013). Scientific and reasonable calculation ofis a prere- quisite to ensure the accuracy of tuna fishery-related re- search results. Longline fishing has a long duration and a large geographical space span, which requires higher accura- cy of. However, there has been no relevant study on grid processing of VMS data to estimateaccurately.
According to VMS data in waters near the Cook Islands from January 1, 2017 to May 31, 2021, the number of de- ployed hooks and the catch numbers of albacore tuna in longline fishing were gridded, and thewas estimated scientifically to provide a method reference for the subse- quent estimation of. To analyze the fitness and accu- racy of LSTM, and to determine the prediction accuracy of albacore tuna fishing ground by different spatial resolutions (0.5? × 0.5?, 1? × 1?, 2? × 2?, and 5? × 5?), twenty-three factors including dissolved oxygen concentration, temperature and salinity of 0 – 300 m water layer (taking corresponding en- vironmental factors every 50 m water layer from the surface layer, totally seven layers), chlorophyll-concentration in the sea surface, and sea surface height were selected as en- vironmental variables, while longitude, latitude and month were selected as three spatiotemporal factors. The results provide a basis for the selection of spatial resolution when predicting the albacore tuna fishing ground in waters near the Cook Islands in the future.
2.1.1 Longline fishing vessels
The fishery data in this paper were collected from 29 longline fishing vessels in Shenzhen Liancheng Overseas Fishery Co., Ltd. that operate in waters near the Cook Is- lands. The 29 vessels were roughly of the same size. Ves- sel parameters were: overall length of 42.28 m, moulded width of 5.70 m, moulded depth of 2.60 m, gross tonnage of 97.00 t, net tonnage of 34.00 t, and main engine power of 400 kW.
2.1.2 Operation parameters and fishing gear parameters
During the operation of fishing vessels, the average du- ration of line deployment was about 6.5 h, and the average speed of line deployment was about 9 kn. The average line retrieving duration was about 11 h, and the average ship speed was about 5.5 kn. The number of hooks between two adjacent floats was 28 as usual, and the average time inter- val between two hooks was about 6 s. The total number of hooks deployed by each fishing vessel during each opera- tion was around 2500 – 4200.
2.1.3 Data type, characteristics and source
The fishery data in this study were collected from the fishing company’s VMS. For each commercial fish species captured, a set of data was recorded in VMS, including fish species, capture location and body mass. During the study duration, the average of the catch numbers recorded in each operation was 48 and the gear deployment distance on the surface was about 60 nautical miles. Therefore, the position was recorded by a 1.26 nautical mile interval. Environmen- tal data were downloaded from the website of Copernicus marine environment monitoring (http://marine.copernicus. eu). The time resolution was day, and the spatial resolutions were gridded by 0.5? × 0.5?, 1? × 1?, 2? × 2?, and 5? × 5?. Theof albacore tuna in a grid on a certain day with dif- ferent spatial resolutions was matched with the marine en- vironment data of the grid on that day by MATLAB.
2.1.4 Study area
In order to ensure the uniformity and update of the data, the fishery data were downloaded from January 1, 2017 to May 31, 2021 and the study areas were defined as 7?24?S – 17?36?S and 156?W – 168?W (waters near the Cook Is- lands) (Fig.1).
Fig.1 Operation sites of tuna longliners in waters near the Cook Islands.
2.1.5 Reasons of data selection
The hook depth determines the fishing performance of tuna longline fishing (Song and Xu, 2021). According to previous studies, albacore tuna spend most of their time between 150 m and 250 m during the daytime and between 0 and 200 m during the nighttime (Domoko., 2007), the high catch rate has been found in the water layer below 150 m (Beverly., 2003; Kelleher, 2005). The tempe- rature and dissolved oxygen of the water layer have a great influence on tuna catch rate. The relationship between en- vironmental factors and catch rate varies with the depth of water layer (Song., 2021b). In case of limited study area and discrete time, it is difficult to obtain reliable conclusions (Guo., 2016). Therefore, it is necessary to deeply ana- lyze the effect of environmental factors in different water layers on the spatial distribution of albacore tuna. In this paper, dissolved oxygen concentration (0,50,100,150,200,250,300), temperature (0,50,100,150,200,250,300) and salinity (0,50,100,150,200,250,300) of 0 – 300 m water layer, chlorophyll-concentration () in the sea surface, sea surface height (), were selected as environmental factors, and longitude (), latitude () and month () were selected as spatiotemporal factors.
The number of fish and the number of hooks were as- signed to 0.5? × 0.5? grid by day. The gridding calculation method of the nominalwas shown in Fig.2, where A – K represent the catch position of albacore tuna of a fish- ing vessel on a specific day; L and M respectively represent the estimated positions of the longline fishing vessel at the beginning and the end of the line retrieving; O, P and Q are the intersection of the line between the two catch positions and the grid boundary; N is the intersection of the line be- tween the estimated starting position of the fishing vessel and the first fish caught position to the grid boundary; and R is the intersection of the line between the end position of the fishing vessel and the last fish caught position to the grid boundary.
Fig.2 Schematic diagram of CPUE grid computing. ?, the fish caught position; ●, the intersection to the grid bound- ary of the connecting line between the two albacores caught positions or between the estimated position of the fishing vessel and albacore caught position.
The specific calculation steps were as follows:
1) the number of floats was estimated according to the total number of hooks deployed by a vessel on the day and the number of hooks between two floats (Eq. (1)), the to- tal time of deployment was calculated according to the time interval between two adjacent hooks (Eq. (2)), and then the entire distance of deployment was obtained from the ave- rage vessel speed (Eq. (3)):
whereNdenotes the number of floats,Ndenotes the to- tal number of hooks deployed by a fishing vessel in a cer- tain operation (range: 2500 – 4200);ndenotes the number of hooks between the two floats, and the number is 28.
whereTdenotes the total time of deploying hook,tde- notes the hook deploying time interval, taking as the ave- rage value of 6 s.
whereSdenotes the total distance of deployment,denotes the average speed of the fishing boat when deploying the hook, taking as 9 kn.
2) In the actual operations, longliners drift with ocean currents, so the deployment distance may not coincide with the retrieving distance. This study assumed that the deploy- ment distance was consistent with the retrieving distance.
3) Assuming that there was a straight line between the two fish caught positions, and LA between the estimated fishing starting position and the first fish caught position was also a straight line, the known distance (Eq. (4)) was ob- tained by accumulating the distance between every two fish caught positions recorded by VMS, and the remaining dis- tance was obtained by the difference between the total dis- tance of the deployment and the known distance:
whereSdenotes the known total distance of deploying hook,Sdenotes the distance between the positions of the fish- ing vessels when the first two fish were caught,Sdenotes the distance between the positions of the second caught fish and the third caught fish,Sdenotes the distance between the positions of the last two caught fish.
4) The remaining distance (the sum of the distance be- tween the estimated starting position of the fishing vessel and the first fish caught position and the distance between the estimated end position of the fishing vessel and the last fish caught position) was divided into two sections on ave- rage (Eq. (5)):
whereSdenotes the distance from the beginning of de- ployment to the position of the first caught fish,Sde- notes the distance between the last caught fish position and the position of the end of retrieving. In the theoretical cal- culation, the remaining distance was divided into two sec- tionsSandS.
wheredenotes the total number of hooks in grid,denotes total number of hooks in grid,Sdenotes the distance ofsegment,Sdenotes the distance ofsegment,Sdenotes the distance ofsegment.
The calculation method of nominalin the grid (,) was shown in Eq. 8.
whereCPUE,ydenotes the actualof a grid center (latitude and longitude,); Cdenotes the total number of fish of each fishing vessel in the grid (,) extending 0.25? up, down, left and right respectively (., centered on this point 0.5? × 0.5? grid) during the operation on a certain day;Hdenotes the total number of hooks of each fishing vessel in the grid (,) during the operation on a certain day.CandHare calculated by the following formulas:
whereC,C,C, L denote the numbers of fish by fish- ing vessel,,, L in the grid (,) on a certain day, res- pectively;H,H,H, L denote the numbers of hooks of fishing vessels a, b, c, L in the grid (,) on a certain day. The total caught fish and the total number of deployed hooks of different fishing vessels in each grid were obtained bymatching, and the resolution conversion was carried out to obtain the nominalof albacore tuna with diffe- rent spatial resolutions in the unit grid on each day.
2.3.1 Data normal distribution test
Pearson correlation test requires data to obey normal dis- tribution. Kolmogorov Smirnov test (, K-S test) was ap- plied for large sample size (generally more than 100), while W-S test was applied for small size (Shapiro and Wilk, 1965). Given the large sample size in this study, the K-S test was selected.
2.3.2 Correlation test betweenand environmental factors
The statistical software SPSS 23.0 was used to conduct bivariate correlation analysis betweenof albacore tu- na and 26 spatiotemporal environmental factors. The Pear- son correlation coefficient was calculated, followed by a sig- nificance test. The range of correlation coefficientwas [?1, 1], and its absolute value represented the strength of the correlation between the two variables. Environmental factors with high correlation were selected for subsequent multicollinearity analysis.
2.3.3 Multicollinearity analysis
After the correlation test, multicollinearity diagnosis was performed on the environmental variables to exclude the possible high correlation between marine environmental fac- tors (dissolved oxygen concentration, temperature, salinity in the 0 – 300 m water layer, chlorophyll-concentration in the sea surface,) and spatiotemporal factors (,and). The criterion of multicollinearity diagnosis was whether the VIF value was less than 10 (He and Liu, 2007; Zhang., 2019). When the VIF of all variables was less than 10, it was considered that there was no multicollinea- rity and the diagnosis was completed.
Due to different dimensions and orders of magnitude ofand different spatiotemporal and environmental fac- tors, it might have adverse effects on modeling. In order to improve the prediction accuracy of the model, it was nece- ssary to normalize the feature sequence and the values of marine environmental factors under different resolutions in the range of 0 to 1 (Quan., 2018). The calculation for- mula is:
whereXdenotes the value of an environmental factor nor- malized by data at a certain resolution, xdenotes the ini- tial value of thedata of an environmental factor at the cer- tain resolution,maxdenotes the maximum value of the en- vironmental factor at the certain resolution, andmindenotes the minimum value of the environmental factor at the cer- tain resolution.
After normalization, data sets with different spatial reso- lutions were obtained,, 0.5? × 0.5? (data set A), 1? × 1? (data set B), 2? × 2? (data set C), and 5? × 5? (data set D).
2.5.1 Introduction to LSTM
LSTM improves the efficiency of transmitting informa- tion from the previous cell to the next cell in the same layer by adding a ‘gate’ data structure. Its main body is composed of a forget gate, an input gate and an output gate. The inter- nal structure of the model is shown in Fig.3.
Fig.3 Network structure of LSTM.
In the forget gate, except that the initial value of the cell at the first time is manually set, the cell at other times needs to obtain the cell stateC?1of the previous cell, and then compare it with the current input vector informationx, and the value range [0, 1] is obtained by activating the function sigmoid. If the corresponding position of the vector is 0, it means that the corresponding position information of the previous time is forgotten. If it is 1, the information me- mory of the corresponding position is performed. The forget gate filters the input data and passes it to the input gate. The expression is as follows:
where fdenotes the forget door;h?1denotes the cell state at the previous time;x denotes the output variable at time;wis the weight matrix of the forget gate;bis the offset term of the forget gate;is the Sigmoid activation function for the forget gate.
The input gate selects and updates the value of the input variable, and continues to pass it to the output gate, and cal- culates the degree to which the input informationxat the current time is saved in the cell stateCat the current time. The expression is as follows:
The output gate integrates the selected new variables to update the cell state, so as to reduce the interference and ac- celerate the training speed, and also effectively reduce the gradient distance dependence of the traditional RNN.
whereOis the output gate;is the weight matrix of the output gate;bis the offset term of the output gate;his the output at time.
At time, the inputs of the neuron include the state va- riableh?1of the hidden layer at time?1, the state variableC?1of the memory cell at timex?1, and the input variableat time. Afterwards, the output of the three gate struc- ture model unit includes the output variablehat timeand the state variableCof the memory unit at time.
2.5.2 Establishment of LSTM for albacore tunaprediction
The normalized spatiotemporal and environmental fac- tors were used as the independent variables, and thevalue as the dependent variable. 80% of the data were se- lected from the data sets A, B, C and D respectively as train- ing data sets, 10% as verification data sets and 10% as test data sets for the prediction model (Shook, 2021; Al- ghazzawi., 2022). The prediction errors of the predic- tion model under different spatial resolutions were obtain- ed and analyzed.
The establishment of LSTM for the albacore tunaprediction was essentially a regression problem of recurrent neural network. The historical feature data and the current feature data characterizing the albacore tunawere taken as network input, and thecharacteristic data of a certain day in the future was taken as network output. The network was trained by comparing with the measured value, so as to establish the mapping relationship between the characteristic data (historicaland future) and realize the calculation and prediction of the albacore tuna. The logic diagram of LSTM prediction model was shown in Fig.4. For thevalue of albacore tuna on a specific day, its characteristic value() at timecould be expressed as:
where() denotes,,,,,,,(month, longitude, latitude, chlorophyll-concentration in the sea surface, sea surface height, dissolved oxygen con- centration, salinity and temperature in each water layer) at time, and represented several characteristics affectingof albacore tuna.
In order to realize the prediction, the albacore tunacharacteristic data(?+ 1), L ,(? 1) and() at con- secutivetimes were taken as the network input, and the albacore tunacharacterization data(+ 1) at timewas output, wherecorresponded to the step size of the in- put layer.
2.5.3 LSTM performance evaluation index
Mean absolute error () and Root mean square error () were used to evaluate the accuracy and stability of the LSTM model.referred to the average value of the absolute error between the predicted value and the observed value. The smaller the value, higher the prediction accuracy is.referred to the sample standard deviation of the difference (residual) between the predicted value and the observed value, which could reflect the degree of dispersion of the sample. When performing nonlinear fitting, the small- ermeans more stable. The calculation formulas ofandare as follows:
wheredenotes the number of data in the test data set;Cdenotes the predicted value of the; Tdenotes the value of the nominal.
The training data set and verification data set of four spa- tial resolutions were iterated in the LSTM prediction mo- del. When the loss value did not change with the increase of the number of iterations, the iteration ended, and the change diagram of the number of iterations and the loss value were obtained. The best spatial resolution was determined by comparing the correspondingandby using the test data set of four spatial resolutions.
3.1.1calculation and normal distribution test
From January 1, 2017 to May 31, 2021, 29 longline fish- ing vessels caught a total of 460191 albacore tuna, with a total of 28188600 fish, and the averageof albacore tuna was 16.325 per thousand hooks. K-S test was conduct- ed forof 2017 – 2021 albacore tuna, and the-value was 0.06 (> 0.05), which proved thatin this period obeyed normal distribution and also demonstrated the ra- tionality of the correlation test.
3.1.2 Correlation test results ofand spatiotemporal environmental factors
The correlation test results ofand spatiotemporal environmental factors were shown in Table 1. From the de- tection results, among the spatiotemporal environmental factors of the 0.5? × 0.5? resolution, month, longitude, lati- tude, chlorophyll-concentration in the sea surface, sea sur- face height, the dissolved oxygen concentration of 50 – 300 m water layer, salinity of 0 – 50 m water layer, salinity of 200 – 300 m water layer and temperature of 0 – 300 m water layer were in relatively high relation with. Among the spatiotemporal environmental factors of 1? × 1? resolu- tion, month, longitude, latitude, chlorophyll-concentration in the sea surface, sea surface height, dissolved oxygen con- centration of 0 – 300 m water layer, salinity of 0 – 50 m wa- ter layer, salinity of 200 m water layer, salinity of 300 m wa- ter layer, temperature of 0 – 100 m water layer and tempera- ture of 200 – 300 m water layer were in relatively high rela- tion with. Among the spatial-temporal environmen- tal factors of 2? × 2? resolution, the factors with high corre- lation towere month, longitude, latitude, chlorophyll-concentration in the sea surface, sea surface height, dis- solved oxygen concentration of 0 – 100 m water layer, dis- solved oxygen concentration of 200 – 300 m water layer, sea surface salinity, salinity of 100 – 300 m water layer, and tem- perature of 0 – 300 m water layer. Among the spatiotempo- ral environmental factors of 5? × 5? resolution, the factors with high correlation towere the month, longitude, sea surface height, dissolved oxygen concentration of 0 – 300 m water layer, sea surface salinity, salinity of 100 – 300 m water layer and temperature of 100 – 200 m water layer.
Table 1 Pearson correlation test results (P value) for the rela- tion between CPUE and spatiotemporal environmental factors in four spatial resolutions
3.1.3 Results of multicollinearity analysis
A multicollinearity diagnosis was made for the environ- mental factors that were highly related to the, as des- cribed in 3.1.2. The variables were eliminated gradually un- til the VIF values were all below 10. The results of multi- collinearity analysis were shown in Table 2. The results showed that when the spatial resolution was 0.5? × 0.5?, the factors included in the model were month, longitude, chlo- rophyll-concentration in the sea surface, sea surface height, dissolved oxygen concentrations at 50, 100, 150 and 200 m, salinity at 200 m, and temperatures at 50, 100 and 150 m. When the spatial resolution was 1? × 1?, the factors includ- ed in the model were month, longitude, chlorophyll-con- centration in the sea surface, sea surface height, dissolved oxygen concentrations at 0, 50, 100 and 150 m, salinity at 200 m, and temperatures at 0, 50, 100, 200 and 250 m. When the spatial resolution was 2? × 2?, the factors included in the model were month, longitude, chlorophyll-concentration in the sea surface, sea surface height, dissolved oxygen con- centrations at 50 m and 100 m, salinity at 50, 150, 250 and 300 m, and temperatures at 100, 150, 200, 250, 300 and 350 m. When the spatial resolution was 5? × 5?, the factors in- cluded in the model were month, longitude, latitude, sea surface height, dissolved oxygen concentrations at 0, 50, 100, 150, 200 and 300 m, salinities at 100, 150, 200, 250 and 300 m, and temperatures at 100, 150 and 200 m.
Table 2 Multicollinearity diagnosis results (VIF value) for the environmental factors in four spatial resolutions
3.2.1 Model prediction error statistics
The statistical results of LSTM’s prediction error forof albacore tuna with four spatial resolutions were shown in Table 3. The resolution with the best prediction accuracy was 1? × 1?, and theandwere 0.0268 and 0.0452, respectively. The prediction accuracy was fol- lowed by the resolutions of 0.5? × 0.5?, 2? × 2?, and 5? × 5? (Table 3). The distribution maps ofof albacore tuna at 1? × 1? resolution in the waters near the Cook Island was shown in Fig.5.
Table 3 Statistical results of prediction errors of the model for different resolutions
Fig.5 Distribution maps of CPUE of albacore tuna at 1? × 1? resolution in the waters near the Cook Island.
3.2.2 Results of model loss value
The statistics of iteration times and loss values of the training data set and verification data set of four spatial re- solutions were shown in Fig.6. The data sets of four spa- tial resolutions did not have evident underfitting and over- fitting in the process of training the model (Fig.6). The mo- del of 1? × 1? had the best fitting effect and the lowest mo- del complexity, with improved convergence speed and re- duced jitter. The fitting effect of the model of 0.5? × 0.5? ranked second, followed by that of the model of 2? × 2?, which was general and stable, and that of the model of 5? × 5?, which was comparatively poor when the number of iterations was low, with significant difference in the loss function between the training data set and the validation da- ta set. When the number of iterations increased, the effect was improved.
Our results demonstrate that LSTM was fit to the actual situation of the effect of marine environmental factors on the fishing ground, and therefore theof albacore tu- na could be predicted based on LSTM. The LSTM at the present stage could overcome the disadvantage of gradi- ent disappearance commonly seen in RNN to a large extent, especially since its performance in long-distance dependent tasks was significantly more robust than that of RNN (Gers and Schraudolph, 2002). It would no longer be trapped by the problem of gradient disappearance in the process of gra- dient backpropagation. Instead, it accurately realizes data modeling with short-term or long-term dependence. LSTM provides access to a more detailed internal processing unit to store and update historical information and future infor- mation effectively. Because theof albacore tuna is affected by many factors such as spatiotemporal factor and marine environment, the historical chlorophyll-concen- tration, sea surface temperature and other variables are like- ly to affect theof albacore tuna in the future.
According to the actual situation of longline fishing, it is more practicable to estimateusing the data record- ed by VMS. When using fishery data, spatial components should be considered carefully, since the spatial distribution of fishing vessels may change with time. Therefore,was not a random distribution, and rough calculation may lead to deviation in interpretation (Rose and Kuika, 1999). Unlike the manually recorded observer data and logbook data, the VMS data have higher time and space resolution. The state and trajectory of fishing vessels could be esti- mated based on VMS, so as to define fishing efforts correct- ly (Walker and Bez, 2010; Joo., 2013). Detailed data recorded by VMS match the number of hooks and catches to the unit grid, and could be used scientifically to calculate the. VMS records specific information such as fish species, exact caught time, caught location (longitude and latitude), name of the fishing vessel that caught the fish, and a total number of hooks that the fishing vessel deployed in one set. With different spatial resolutions obtained by grid processing VMS, accuratecould improve tuna re- source assessment and management.
Data for traditionalcalculation are usually derived from logbooks, which only record the starting and the end- ing positions of each deployment and the total numbers of daily caught fish. In thecalculation, the starting or the ending position of each deployment is taken as an ope- ration position. The actual fishing hooks and numbers of fish are not considered at the starting or the ending position of the deployment, but are often distributed on the trajectory of about 60 nautical miles. Therefore, they could not truly reflect the actual situation of the distribution of fishing hooks and catches. Scientific estimation offor longline fish- ing was conducive to improving the accuracy of fishing ground prediction (Dettloff, 2021). Therefore, it was more reasonable to estimateby using VMS recorded data in combination with the actual information of tuna longline fishing.
Fig.6 Statistics of iteration times and loss value of the model under different resolutions. (A), 0.5? × 0.5?; (B), 1? × 1?; (C), 2? × 2?; (D), 5? × 5?.
In order to improve prediction accuracy of the model, correlation analysis and multicollinearity diagnosis should be carried out before data processing. Correlation analysis can explain the correlation betweenand various spa- tiotemporal factors. Marine environmental factors played an essential role in the growth, reproduction, and distribu- tion of tuna. They influenced and restricted each other and had a correlation (He., 2010). In essence, the predic- tion ofby LSTM belongs to the regression problem of neural networks. In the test of individual influence of explanatory variables on continuous response, the multi- collinearity of confusing explanatory variables hindered the analysis, thus influenced its statistical and reasonable inter- pretation (Graham, 2003). It was a standard method to judge whether there was multicollinearity according to the VIF value (Trustrum and Fox, 1993; Abeysiriwardana and Go- mes, 2022). Due to the complexity of marine environment, the interaction between various ecological factors might cause multiple collinearity among variables, resulting in er- rors in prediction results (Song., 2022a, 2022b). In this paper, Pearson correlation coefficient was used to analyze the correlation between spatiotemporal environmental fac- tors and. The environmental factors with weak in- fluence were eliminated, and VIF test was used to eliminate the confounding factors of marine environmental data and the multicollinearity among factors, so as to improve the prediction accuracy of the model.
Environmental data and fishery data with a spatial reso- lution of 1? × 1? can better reflect the spatiotemporal dy- namic characteristics of marine environment for the fish- ing ground, and therefore contribute to the accuracy of fish- ing ground prediction. Okamoto (2014) compared the ef- fects of data resolution of 1? × 1? and 5? × 5? ontrend of bigeye tuna in longline fisheries. The results showed that the normalized residual distribution and normal distri- bution of the two spatial resolutions were not obviously different, but the result of 1? × 1? was closer to that of non- aggregated operational data. In the related research of tuna longline fishing, some scholars selected the spatial resolu- tion as 1? × 1? (Oshima., 2012; Zhao., 2016). Zhang. (2021) studied the improvement of ocean environment characteristics on the prediction model of bigeye tuna fish- ing situation in the Indian Ocean and concluded that the spatial resolution of 1? × 1? could better reflect the spatio- temporal dynamic characteristics. Based on LSTM, the best spatial resolution forprediction of albacore tuna is 1? × 1?, follows by 0.5? × 0.5?, 2? × 2?, and 5? × 5?.
The reasons for higher prediction accuracy of 1? × 1? onmight be 1) the operation characteristics of tuna longline fishing for albacore tuna. Fishing vessels operat- ing in waters near the Cook Islands deployed hooks in the east-west direction. The fishing gear was generally in the east-west direction, with a range of about 60 nautical miles, a longitude span of about 1?, and the hooks were general- ly deployed in the grid of 1? × 1?; 2) characteristics of en- vironment data. If the spatial resolution is too large or too small, due to the slight change of adjacent environmental factors, information of environmental factors tend to over- lap with each other, resulting in a large number of redun- dant data. When the resolution was 0.5? × 0.5?, the nume- rical value changed slightly; however, when the resolution was 2? × 2?, the range of numerical change significantly in- creased. At 5? × 5?, marine environment data could not re- flect the catch rate well because of relatively lower reso- lution. In the spatial resolution of 1? × 1?, environmental data were with high quality, which could better reflect the change of the catch rate; 3) the influence of data sample size. The sea area of this study was in the range of 7?24? – 17?36?S, 156? – 168?W. The grids by the spatial resolution of 0.5? × 0.5? were 20 × 24 = 480. The grids by the spatial resolution of 1? × 1? were 10 × 12 = 120. The grids by the spatial reso- lution of 2? × 2? were 5 × 6 = 30, while the grids by the spa- tial resolution of 5? × 5? were about 5 (2 × 2.4). The number of grids corresponded to the data sample size. The largest data sample size was 0.5? × 0.5?, followed by 1? × 1?, then 2? × 2?, and 5? × 5?. Due to low resolution of 5? × 5? and sig- nificantly reduced sample size, the prediction accuracy was relatively poor.
In the study of the pelagic highly migratory species, the population behavior was an important feature to be consi- dered, and it would occur in space and time. Therefore, it was important to use an appropriate resolution in the mo- del (Rybicki., 2021). Different spatial resolutions had different effects on research results of longline fishing ground (Nishida and Kitakado, 2011). Matear. (2015) studied the effects of climate change in the western tropical Pacific Ocean on tuna fishing ground accuracy by high- resolution ocean models, and concluded that enhanced re- solution would affect prediction results of phytoplankton concentration and primary productivity, which indicated that the resolution could reflect the changes of environ- mental factors. Meanwhile, for most physical environment variables, enhanced resolution and deviation correction had little impacts on prediction accuracy, which indicated that spatial resolution should be combined with fishery charac- teristics to achieve good prediction results. Guan. (2015) compared the habitat models and prediction results under various data sources, and concluded that usually there were systematic deviations between different spatial resolution data, resulting in significant differences in the estimation of model parameters. Tunas are highly migratory species, and longline fishing for tunas has the operation characteristics of large range and long duration (Brown., 2021), so it is necessary to select an appropriate spatial resolution in combination with the operation characteristics. Therefore, based on the research results of this paper, it is suggested to use environmental data and fishery data with a spatial re- solution of 1? × 1? to improve the accuracy of research re- sults in the study ofor stock assessment of albacore tuna in waters near the Cook Islands.
This study considered various factors that affect theof albacore tuna in different water layers. Some fac- tors such as planktonic organism and bait also affect the, however, relevant data were not available for this study. They will be used as variables in subsequent rele- vant studies to improve the model accuracy.
This study used the fishery information provided by VMS recorded data in detail to match the number of fishing hooks and the number of catches to the unit grid, and resulted in more accuratecalculation. The results also show that albacore tuna fishing ground can be predicted by LSTM and to improve the prediction accuracy of the model, cor- relation analysis and multicollinearity diagnosis are neces- sary before model establishment. Since tuna longline fish- ing was unique compared with other fishing gear, the re- sults based on LSTM suggest that the spatial resolution should be 1? × 1? in relevant researches, such as for the pre- diction of fishing grounds of target species,, albacore tuna, to make the results more accurate.
We thank Liancheng Overseas Fishery (Shenzhen) Co., Ltd., for providing VMS data. This research was support- ed by the National Natural Science Foundation of China (No. 32273185), the National Key R&D Program of Chi- na (No. 2020YFD0901205), and the Marine Fishery Re- sources Investigation and Exploration Program of the Mi- nistry of Agriculture and Rural Affairs of China in 2021 (No. D-8006-21-0215). Gratitude also goes to Dr. Huihui Shen, School of Foreign Languages, Shanghai Ocean Uni- versity for improving the manuscript.
Abeysiriwardana, H. D., and Gomes, P., 2022. Integrating vege- tationindices and geo-environmental factors in GIS-based land- slide-susceptibility mapping: Using logistic regression., 19 (2): 16, DOI: 10.1007/s11629-021- 6988-8.
Alghazzawi, D., Bamasag, O., Albeshri, A., Sana, I., Ullah, H., and Asghar, M. Z., 2022. Efficient prediction of court judge- ments using an LSTM+CNN Neural Network Model with an optimal feature set., 10: 683, DOI: 10.3390/math 10050683.
Beverly, S., Chapman, L., and Sokimi, W., 2003.. Secretariat of the Pacific Community, Nouméa, 130pp.
Bez, N., Walker, E., Gaertner, D., Rivoirard, J., and Gaspar, P., 2011. Fishing activity of tuna purse seiners estimated from ves- sel monitoring system (VMS) data., 68 (11): 1998-2010, DOI: 10.1139/ f2011-114.
Brown, C. J., Desbiens, A., Campbell, M. D., Game, E. T., Gil- man, E., Hamilton, R. J.,., 2021. Electronic monitoring for improved accountability in western Pacific tuna longline fisheries., 132 (1-3): 104664, DOI: 10.1016/j. marpol.2021.104664.
Chen, J. T., Dai, X. J., and Gu, B., 2005. Analysis of the deve- lopment of South Pacific albacore in China., 2: 49-50, 55, DOI: 10.3969/j.issn.0253-4193.2013. 01.018 (in Chinese with English abstract).
Chen, X. Z., Fan, W., Cui, X. S., Zhou, W. F., and Tang, F. H., 2013. Fishing ground forcastingin Indian Ocean based on random forest., 35 (1): 158-164, DOI: 10.3969/j.issn.0253-4193.2013.01.018 (in Chi- nese with English abstract).
Dettloff, K., 2021. Improvements to the Stephens-MacCall approach for calculatingfrom multispecies fisheries log- book data., 242: 106038, DOI: 10.1016/j. fishres.2021.106038.
Domokos, R., Seki, M. P., Polovina, J. J., and Hawn, D. R., 2007. Oceanographic investigation of the American Samoa albacore () habitat and longline fishing grounds., 16 (6): 555-572, DOI: 10.1111/j.1365- 2419.2007.00451.
Eastwood, P. D., Meaden, G. J., Carpentier, A., and Rogers, S. I., 2003. Estimating limits to the spatial extent and suitability of sole () nursery grounds in the Dover Strait., 50: 151-165, DOI: 10.1016/S1385-1101 (03)00079-0.
Fan, W., Zhang, J., and Zhou, W. F., 2007. The relationship be- tween longline albacoreand sea surface tem- perature in the South Pacific., 5: 366-371, DOI: 10.3969/j.issn.1000-9957.2007.05. 010 (in Chinese with English abstract).
Fan, Y. C., Chen, X. J., and Wang, J. T., 2015. Forecasting cen- tral fishing ofbased on multi-factors habi- tat suitability index in the South Pacific., 2: 36-44, DOI: 10.13984/j.cnki.cn37-1141.2015.02. 006 (in Chinese with English abstract).
Feng, Y. J., Chen, L. J., and Chen, X. J., 2019. The impact of spa- tial scale on local Moran’s I clustering of annual fishing effort foroffshore Peru., 37 (1): 330-343, DOI: 10.1007/s00343-019-7316-9.
Gers, F., and Schraudolph, N. N., 2002. Learning precise timing with LSTM recurrent networks., 3 (1): 115-143, DOI: 10.1162/153244303768966139.
Gong, C. X., Chen, X. J., Gao, F., Guan, W. J., and Lei, L., 2011. Review on habitat suitability index in fishery science., 20 (2): 260-269, DOI: CNKI: SUN:SSDB.0.2011-02-017 (in Chinese with English abstract).
Graham, M. H., 2003. Confronting multicollinearity in ecological multiple regression., 84 (11): 2809-2815.
Guan, W. J., Gao, F., Lei, L., and Chen, X. J., 2015. Comparison of habitat models and prediction results under various data sources., 22 (1): 149-157 (in Chinese with English abstract).
Guo, G. G., Zhang, S. M., Fan, W., Chen, X. J., and Yang, S. L., 2016. Spatial analysis of vertical active layer of albacore tuna () in the South Pacific., 12 (5): 123-130, DOI: 10.3969/j.issn.2095-0780. 2016.05.016.
He, R. Y., Chen, K., Moore, T., and Li, M. K., 2010. Mesoscale variations of sea surface temperature and ocean color patterns at the Mid-Atlantic Bight shelfbreak., 37 (9): 493-533, DOI: 10.1029/2010GL042658.
He, X. Q., and Liu, W. Q., 2007.. Renmin University of China Press, Beijing, 171-184 (in Chi- nese).
Hilborn, R., and Walters, C., 1992. Quantitative fisheries stock assessment: Choice., 7: 241-296, DOI: 10.1086/417864.
Hinton, M. G., and Maunder, M. N., 2003. Methods for stan- dardizingand how to select among them., 56: 169-177.
Jin, R. L., Sun, K. P., He, H. S., and Zhou, Y. F., 2008. Research advances in habitat suitability index model., 5: 841-846 (in Chinese with English abstract).
Joo, R., Bertrand, S., Tam, J., and Fablet, R., 2013. Hidden Mar- kov models: The best models for forager movements?., 8 (8): e71246, DOI: 10.1371/journal.pone.0071246.
Kelleher, K., 2005. Discards in the world’s marine fisheries: An update.. No.470. Rome, FAO, 131pp.
Mangel, M., Quinn, T. J., and Deriso, R. B.,1999. Quantitative fish dynamics., 2 (1): 286-287, DOI: 10.2307/177155.
Matear, R. J., Chamberlain, M. A., Sun, C., and Feng, M., 2015. Climate change projection for the western tropical Pacific Ocean using a high-resolution ocean model: Implications for tuna fish- eries., 113: 22-46, DOI: 10.1016/j. dsr2.2014.07.003.
Maunder, M. N., and Punt, A. E., 2004. Standardizing catch and effort data: A review of recent approaches., 70: 141-193, DOI: 10.1016/j.fishres.2004.08.002.
Miao, Z. Q., and Huang, X. C., 2003.. Shang- hai Science and Technology Literature Press, Shanghai, 1-13 (in Chinese).
Mills, C. M., Townsend, S. E., Jennings, S., Eastwood, P. D., and Houghton, C. A., 2007. Estimating high resolution trawl fish- ing effort from satellite-based vessel monitoring system data., 64 (2): 248-255, DOI: 10. 1093/icesjms/fs l026.
Murawski, S. A., Wigley, S. E., Fogarty, M. J., Rago, P. J., and Mountain, D. G., 2005. Effort distribution and catch patterns adjacent to temperate MPAs., 6: 1150-1167, DOI: 10.1016/j.icesjms.2005.04.005.
Nishida, T., and Kitakado, T., 2011. Investigation of the sharp drop of swordfishof Japanese tuna longline fisheries in 1990’s in the SW Indian Ocean., Vic- toria, Seychelles, 1-14.
Okamoto, H., 2014.of bigeye and yellowfin tuna caught by Japanese longliner in the Indian Ocean standardized by GLM considering several aspects of area, catchability and data resolution.. Bali, Indo- nesia, 5-18.
Oshima, K., Mizuno, A., Ichinokawa, M., Takeuchi, Y., Nakano, H., and Uozumi, Y., 2012. Shift of fishing efforts for Pacific bluefin tuna and target shift occurred in Japanese coastal long- liners in recent years. Working document submitted to the ISC Pacific Bluefin Tuna Working Group. Honolulu, Hawaii.
Quan, B., Yang, B. C., Hu, K. Q., Guo, C. X., and Li, X. C., 2018. Prediction model of ship trajectory based on LSTM., 45(S2): 126-131 (in Chinese with English abstract).
Rose, G. A., and Kulka, D. W., 1996. Hyperaggregation of fish and fisheries: How catch-per-unit-effort increased as the nor- thern cod () declined., 56 (S1): 118-127, DOI: 10.1139/ f99-207.
Rybicki, S., Hamon, K. G., Simons, S., and Temming, A., 2021. The more the merrier? Testing spatial resolution to simulate area closure effects on the pelagic North Sea autumn spawn- ing herring stock and fishery., 48: 102023.
Shapiro, S. S., and Wilk, M. B., 1965. An analysis of variance test for normality (complete samples)., 52 (3-4): 591-611, DOI: 10.1093/biomet/52.3-4.591.
Shook, J., Gangopadhyay, T., Wu, L., Ganapathysubramanian, B., Sarkar, S., and Singh, A. K., 2021. Crop yield prediction in- tegrating genotype and weather variables using deep learning., 16 (6): e0252402, DOI: 10.1371/journal.pone.025 2402.
Song, L. M., and Xu, H., 2021. A review of tuna longline catch performance., 28 (7): 925- 937, DOI: 10.12264/JFSC2020-6002 (in Chinese with English abstract).
Song, L. M., Ren, S. Y., Hong, Y. R., Zhang, T. J., Sui, H. S., Li, B.,., 2022a. Comparison on fishing ground forecast mo- dels ofin the tropical waters of Atlantic Ocean., 53 (2): 496-504, DOI: 10.11693/hyhz20211000253 (in Chinese with English ab- stract).
Song, L. M., Ren, S. Y., Zhang, M., and Sui, H. S., 2021a. Fishing ground forecasting models for yellowfin tuna () in the tropical waters of the Atlantic Ocean based on ensemble learning., 28 (8): 1069-1078 (in Chinese with English abstract).
Song, L. M., Ren, S. Y., Zhang, M., and Sui, H. S., 2022b. Fish- ing ground forecasting of bigeye tuna () in the tropical waters of Atlantic Ocean based on ensemble learning., 47 (4): 64-76, DOI: 10.11964/ jfc.20210312692 (in Chinese with English abstract).
Song, L. M., Xu, H., Sui, H. S., and Zhang, M., 2021b. Research progress on key technologies of marine fishery acoustic equip- ment., 48 (03): 18-27, 35, DOI: 10.3969/ j.issn.1007-9580.2021.03.003 (in Chinese with English ab- stract).
Thomson, J. D., Weiblen, G., Thomson, B. A., and Alfaro, S,1996. Untangling multiple factors in spatial distributions: Lilies, go- phers and rocks., 77: 1698-1715, DOI: 10.2307/226 5776.
Trustrum, K., and Fox, J., 1993. Regression diagnostics: An in- troduction., 42 (2): 201, DOI: 10.2307/2348998.
Turner, M. G., O’Neill, R. V., Gardner, R. H., and Milne, B. T., 1989. Effects of changing spatial scale on the analysis of land- scape pattern., 3 (3-4): 153-162, DOI: 10. 1007/BF00131534.
Walker, E., and Bez, N., 2010. A pioneer validation of a state- space model of vessel trajectories (VMS) with observers’ data., 221: 2008-2017, DOI: 10.1016/j.ecolmo del.2010.05.007.
Watson, J. T., Haynie, A. C., Sullivan, P. J., Perruso, L., O’Farrell, S., Sanchirico, J. M.,., 2018. Vessel monitoring systems (VMS) reveal an increase in fishing efficiency following regu- latory changes in a demersal longline fishery., 207: 85-94, DOI: 10.1016/j.fishres.2018.06.006.
Wiens, J. A., 1989. Spatial scaling in ecology., 3 (4): 385-397, DOI: 10.2307/2389612.
Yuan, H. C., Zhang, Y., and Zhang, T. J., 2021. Research on fore- cast model of Pacificfishing ground based on EMD-BiLSTM., 48 (1): 87-96, DOI: 10. 3969/j.issn.1007-9580.2021.01.012 (in Chinese with English abstract).
Zainuddin, M., Saiton, K., and Saiton, S., 2008. Albacore () fishing ground in relation to oceanographic con- ditions in the western North Pacific Ocean using remotely sensed satellite data., 17 (2): 61-73, DOI: 10.1016/j.dsr2.2006.01.007.
Zhang, T. J., Song, L. M., Yuan, H. C., and Narcisse, E. B., 2019. A comparative study onstandardization of bigeye tuna in the Indian Ocean using multi-scale fisheries data and envi- ronment data.. Donostia-San Sebastian, Spain, 15: 1-31.
Zhang, T. J., Liao, Z. Z., Song, B., Yuan, H. C., Song, L. M., and Zhang, S. S., 2021. Improvement of marine environment fea- ture extraction based on deep convolution embedded cluster- ing (DCEC) for fishery forecast model – A case study of big- eye tuna (s) in the Southwest Indian Ocean., 43 (8): 105-117, DOI: 10.12284/hyxb 2021072 (in Chinese with English abstract).
Zhao, H. L., Chen, X. J., and Fang, X. Y., 2016. Forecasting fish- ing ground of yellowfin tuna in the eastern Pacific Ocean based on the habitat suitability index., 36 (3): 778-785, DOI: 10.5846/stxb201405130975 (in Chinese with English abstract).
(August 25, 2022;
October 12, 2022;
February 27, 2023)
? Ocean University of China, Science Press and Springer-Verlag GmbH Germany 2023
. E-mail: lmsong@shou.edu.cn
(Edited by Qiu Yantao)
Journal of Ocean University of China2023年5期