Carbon Price Forecasting Approach Based on Multi-Scale Decomposition and Transfer Learning

2023-05-13 09:24:54XiaolongZhangYadongDouJianboMaoWenshengLiuHaoHan

Journal of Beijing Institute of Technology 2023年2期

Xiaolong Zhang, Yadong Dou, Jianbo Mao, Wensheng Liu, Hao Han

Abstract: Accurate carbon price forecasting is essential to provide the guidance for production and investment.Current research is mainly dependent on plenty of historical samples of carbon prices,which is impractical for the newly launched carbon market due to its short history.Based on the idea of transfer learning, this paper proposes a novel price forecasting model, which utilizes the correlation between the new and mature markets.The model is firstly pretrained on large data of mature market by gated recurrent unit algorithm, and then fine-tuned by the target market samples.An integral framework, including complexity decomposition method for data pre-processing,sample entropy for feature selection, and support vector regression for result post-processing, is provided.In the empirical analysis of new Chinese market, the root mean square error, mean absolute error, mean absolute percentage error, and determination coefficient of the model are 0.529, 0.476,0.717% and 0.501 respectively, proving its validity.

Keywords: carbon emission trading; price forecasting; transfer learning; gated recurrent unit

1 Introduction

Since 1880, the earth’s temperature has risen by 0.08℃ per decade [1].The warmer climate has brought new threat and challenge to human society, which is mainly caused by industrial production and excessive carbon dioxide emissions.To resolve this problem, the European Union Emission Trading Scheme（EU ETS), which is the largest and earliest trading market in the world,was launched in 2005.It provides an efficient mechanism for humans to handle with temperature change and gives new industrial production guidelines.

In the carbon trading market, carbon prices clearly reflect the real-time demand of different industries and enterprises for carbon emission rights.Hence, forecasting the carbon price accurately is of great significance assisting the enterprises to capture the market changes and make more reasonable decisions.The research on carbon prices can be summarized into three aspects,including the parametric method, the statistical method, and the artificial intelligence method.With the rapid development of computer technology, artificial intelligence method has become the hotspot of research and proved its superiority in machine learning and deep learning algorithms[2, 3].

Most of the existing artificial intelligence methods for carbon price forecasting are based on plenty of historical data.It is common to find that carbon price data of at least 2 to 3 years are needed for model training and testing in literatures [2–4].The most popular trading market for study is EU ETS, because it has operated for 17 years, where carbon price data needed for modeling can be easily collected.However, for the carbon trading market newly launched, the sample data of carbon prices are few, due to its short operation history.Insufficient sample data will lead to instability with overfitting, large prediction error, and even difficulty in building multilayer complex model.Therefore, the application of intelligence algorithm in the carbon price forecasting with small samples is seriously restricted.

Transfer learning can solve the bottleneck problems, such as data scarcity and knowledge scarcity, by using the knowledge learned from adequate samples of relative domain [5, 6].It has significant advantages in the case where the samples of the target domain are poor, while the samples of the related domain are rich [7, 8].Therefore, it can be applied to the carbon price forecasting in the emerging markets, with the help of the historical data from mature markets.However, the relative research is still lacking at present stage, and several key technical details need to be explored.

In this paper, a new carbon price forecasting algorithm based on the transfer learning framework, is proposed.The algorithm adopts model-based transfer strategy, and uses fine-tuning technology to achieve information migration among forecasting models.Firstly, the prototype model is established based on the dataset of the price in the source domain (i.e., mature markets), and then the model framework and some model parameters are transferred to the target domain (i.e., emerging markets).Finally,the price forecasting model for the target domain is obtained by adjustment with the training dataset of target carbon market.

2 Literature Review

Scholars have conducted substantial research on carbon trading price forecasting, which can be divided into three categories: parametric model based on influencing factors, statistical model based on distribution characteristics, and intelligent model based on historical series.The development of these various models reflects the concerns for forecasting accuracy and stability.

Parametric method aims to establish a structural model, including various influencing factors,to explore the internal driving force of carbon price changes.Generally, factors such as coal price, power price, market activity and administrative policy will impact on carbon prices [9].This kind of forecasting methods tries to capture the change trend of carbon price in advance from the driven force of each factor.However, parametric method requires the forecasting of multiple factors, and if any decisive or important factors are ignored, the forecasting uncertainty will be introduced.Thus, this approach should be used cautiously.

In the research of statistical method, the model is established using distribution characteristics, such as Holt’s exponential smoothing model (Holt), autoregressive integrated moving average model (ARIMA), generalized autoregressive conditional heteroscedasticity model, gray model, and support vector regression (SVR)model.However, statistical models are based on several distribution assumptions and can hardly capture the actual fluctuation regularity in the time series.Therefore, intelligent prediction models have been widely used along with the enhanced computer technology.

The intelligent method includes some models, such as multi-layer perceptron, artificial neural network, support vector regression with least square, back propagation neural network(BPNN) [10, 11], Bayesian network, extreme learning machine, long short-term memory(LSTM) [12, 13] and gated recurrent unit(GRU) [14, 15].In order to get the optimal hyper-parameters to improve the convergence ability and accuracy of forecasting, these methods often utilize various optimization algorithms,such as optimizers of whale group, gray wolves,bee colony, and particle swarm.However, the traditional intelligent method requires a sufficient number of samples from the target domain to train the forecasting model, so as to ensure its accuracy and reliability.

Since 2016, an intelligent method called transfer learning has brought significant academic attention [16], and been applied in the field of computer vision, voice recognition, natural language process, or text classification[17–20], which typically suffer from insufficient samples.Transfer learning is able to transfer the knowledge learned from the source domain with rich data to the target domain with rare data, by using the correlation among multiple domains.Therefore, the forecasting model is able to lower the demand for a big sample dataset of the target domain.However, in the field of time series forecasting, the application of transfer learning is not widespread, because massive data (e.g., economic data, weather data, and stock data) can be easily achieved here.

Besides the forecasting model itself, scholars have also conducted data-preprocessing studies to increase the prediction stability.The preprocessing method aims to decompose the complicated price sequence into multiple patterns with apparent regularity and different trends, which can better represent the characteristics and signals of carbon price changes.Commonly used decomposition models are singular spectrum analyses,variational modal decomposition, wavelet transforms, Hilbert-Huang transforms, empirical mode decomposition (EMD), and ensemble empirical mode decomposition (EEMD).In this paper, the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) model was applied to preprocess the input data according to its outstanding ability of robust and complete decomposition.

The researches discussed above are typical works of forecasting approaches through different ways.Some improved methods have greatly increased the forecasting accuracy by either using novel algorithms for optimization or adopting data-preprocessing methods.However, according to the analysis and summary of the above literature on carbon price forecasting, there are still some shortcomings in the current work.

1) The current studies lack forecasting approach on carbon prices with small samples;

2) The transfer learning method is mainly applied in the field of computer vision, voice recognition, or text classification having sparse dataset, rather than the time series cases usually having big history data;

3) Current researches mostly integrate the predicted component results by simple accumulation, which lack an appropriated data postprocessing.

Therefore, to overcome the deficiencies of current research, this paper proposed a transfer learning framework for carbon price forecasting with a few datasets.To ensure the validity of the forecasting input, this paper introduced the CEEMDAN decomposition method for data preprocessing, and used the Sample Entropy for feature selection.On this basis, a novel gated recurrent unit algorithm transGRU based on transfer learning was proposed for price forecasting.Finally, the SVR algorithm was used to integrate the individual predicted price features,which further improves the accuracy and stability of the forecast.

3 Methods

3.1 Data Decomposition

The effectiveness of data preprocessing technology (i.e., complexity decomposition and feature extraction) has been proved in the price forecasting of financial assets [21, 22].In this paper, the data decomposition method CEEMDAN is introduced to decompose and analyze the input data,which is the improved variety from EMD and CEEMD [23].CEEMDAN has high superiority to adaptively decompose the complex raw data into several intrinsic mode functions (IMFs) with distinct frequencies and amplitudes, and the detail steps of CEEMDAN are as follows.

1) The Gaussian white noisevi(t)～N(0, 1) is added to the original signaly(t), and the first IMF of CEEMDAN is calculated by averaging as

whereεis the noise standard deviation, usually set as 0.02;Ei(·) is defined as theith mode component of EMD.tis the time sequence, andNis the component number.

2) The first residual component can be obtained as

3) Construct the new signal as

and decompose it by EMD.The second mode component can be obtained as

4) Thekth residual signal and the (k+1)th IMF can be obtained according to the process of Step 3 as

5) Repeat the above steps until the residual signals can’t be decomposed.The original signaly(t)is finally expressed as

whereKis the number of IMFs by CEEMDAN,andr(t) is the final residual mode.

3.2 Feature Selection

Sample entropy is usually used for measuring the data complexity of time sequence [24].Generally,a higher sample entropy represents a more complex sequence.In this paper, sample entropy is used to identify the sequence features which have the close sample entropy.

Assume that a time series is{x(n)}=x(1),x(2),...,x(N)of lengthN(n= 1, 2, …,N),and a template vector isXm(1),Xm(2),...,Xm(N ?m+1)of lengthm, whereXm(i)={x(i),x(i+1),...,x(i+m ?1)}.Define a function to measure the distance between vectorsXm(i)andXm(j) as

Letbmbe the number ofjpoints nearXm(i)satisfyingd[Xm(i),Xm(j)] ≤r(ris the distance threshold), and then defineBm(r) as

Increase the dimension to (m+1), calculate the number of thejpoints nearXm+1(i) satisfyingd[Xm+1(i),Xm+1(j)] ≤r, and then defineAm(r) as

Am(r) is the probability of pairwise sequences matching (m+1) points, andBm(r) is that matchingmpoints.Finally, sample entropy is defined as

IfNis a finite value, sample entropy can be estimated by

3.3 Carbon Price Prediction

1) First step: Train the prototype model based on source domain, which includes GRU hidden layer and sigmoid output layer.The structure of GRU module is shown in Fig.1, and the structure of the recurrent network is shown in Fig.2.

Fig.1 Diagram of GRU module

Fig.2 Structure of recurrent network

The input of GRU module isAtat timetfrom source domain, the update gate iszt, the reset gate isrt, the candidate state is, and the output state isht.The weights ofWz,Wr,Wcare the model parameters of hidden layer to be trained, andht?1is the output state of previous hidden layer.The symbolsσ,tanh,·,?represent for sigmoid function, hyperbolic tangent function,matrix multiplication and point-wise product respectively.

The gate and state variables can be calculated according to Eqs.(12)–(14).Besides, the output state of current hidden layer at timetcan be calculated as

where the weightWois the model parameter of output layer to be trained,hTois the output state attimeT.Calculate the loss function according toy︿tandyt, and then minimize the loss function with a random initialization of model parameters.After adequate iterations, the minimal loss and optimal parameters can be obtained.

2) Second step: Train the target model based on target domain.The input of GRU moduleisBtfrom target domain at timet, the hidden layer state is, and the predicted output is.Furthermore,are the weight parameters of the model.Initialize the GRU parameters with the ones learned from the above prototype model, and then fine-tune the parameters of target model according to the minimization of loss function betweenandzt.

The parameters from target domain can be calculated by Eqs.(17)–(21), wherez't,r'tare the update gate and reset gate respectively, andh'Tois the output state at timeTfrom target domain.The complete training process of trans-GRU algorithm is described in Fig.3, whereWhincludesWz,Wr,Wcwhich are the weights of hidden layer.The model framework and optimal parametersW?are transferred from the source domain to the target domain.

Fig.3 Procedure of transGRU algorithm

The detailed description of the algorithm in Fig.3 is as below.

a) Set the GRU structure of the prototype model according to Fig.1.

b) Initialize the model parameters.The weights are set by He Kaiming initialization algorithm, and the biases are set as zero.

c) Train the prototype model with the carbon price from mature market.while(i≤ epochs)

d) Re-initialize the model parameters.Repeat step c) to train the model to get the final parameters and set them as the initial parameter values of the GRU hidden layers.Initialize the weight parameters of the Sigmoid layer according to He Kaiming algorithm, and set the initial bias value as zero.

e) Fine-tune the model with carbon price from the emerging market.

while(i≤ epochs)

3.4 Data Integration

SVR model is specialized in capturing the complex relationships among variables.SupposeXiis the input variable vector,yiis the output variable,S(·) is the SVR regression function,εis the max deviation betweenS(Xi) andyi,lεis the lose function,Cis the penalty coefficient, andw,bare the model parameters.The SVR regression can be formulated as a minimization problem.

In this paper, SVR model is applied to integrate the predicted pricey︿i(t) of different feature sequences.Let the input vector beXi=(y︿1(t),y︿2(t),...y︿i(t)...), the output be the actual priceyi, and then minimize the model to get a mapping relationshipS(·) between the predicted results and actual values of training dataset.The predicted prices for testing dataset can be obtained by an SVR function as

3.5 Performance Evaluation

To measure the performance of carbon price forecasting models, this paper utilizes four frequently-used error evaluation criteria, including root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error(MAPE), and determination coefficient R2.The above criteria can be calculated in Python Scikitlearn module, and the corresponding formulas are as

4 Empirical Study

4.1 Data Sources

The Chinese carbon trading markets are chosen for empirical study in this paper, and the market information (from Wind Database) is listed in Tab.1, where GDEA, HBEA, SZA, SHEA,CQEA, BEA, TJEA, and FJEA represent the local pivot carbon trading markets from Guangdong, Hubei, Shenzhen, Shanghai, Chongqing,Beijing, Tianjin, and Fujian of China respectively, while CEA represents the Chinese national market.Considering that CEA, GDEA,and HBEA are the carbon exchanges with relatively large transaction volume and transaction amount, accounting for about 77% of the total Chinese market, the three markets are chosen as the objective ones.

The national CEA market was launched in Jul.16, 2021, and only 273 historical data of trading days are in record.So, in this paper, the CEA market is taken as the target domain, while the other two markets are taken as source domain, to study the model’s effectiveness in forecasting time series with a few samples.The close prices of three carbon markets are shown in Fig.4, and it is supposed that the carbon price in a certain day is consecutive and associated to the prices in the previous 20 days.The programming configurations for algorithm implementation and data visualization are listed in Appendix A.

Tab.1 Market information (until Aug.29, 2022)

Fig.4 Carbon prices of three trading markets (until Aug.29,2022)

4.2 CEEMDAN Decomposition

Following the theory and steps of Section 3.1,this paper applies the function “CEEMDAN”from module EMD-signal 1.2.3 with Python 3.7.6 to decompose the carbon price series of the objective markets.Fig.5 draws the decomposition curves of the raw price data of the CEA market,and the 6 decomposed IMFs are obtained in rank from up to down.The axis in horizontal direction is labelled with the number of the time series, and the axis in the vertical direction is labelled with the price value (unit: CNY) of each component.It is presumed that the close price of carbon trading in each day is consecutive, leaving out the intervals between the transaction days.A progressive decline of the complexity and frequency can be observed from IMF 0 to IMF 5,where the changing modes are more apparent than the raw data series, and the overall price tendency is also more obvious.

Fig.5 Decomposition results of the CEA market: (a) raw data; (b) IMF 0; (c) IMF 1; (d) IMF 2; (e) IMF 3; (f)IMF 4; (g) IMF 5

4.3 Feature Selection and Price Prediction

According to Section 3.2, the sample entropy values of the above decomposed IMFs are calculated in the SampEn module of Python.As seen in Fig.6, the sample entropy values of IMF 0 and IMF 1 are far higher than other IMFs, which means the first two IMFs follow a volatile and complex pattern.In contrast, the last two IMFs(IMF 4 and IMF 5) have much smaller values,and their change trends are of little volatility and complexity.Literature [25] has concluded that it is efficient to prevent over-fitting problems and improve the convergence ability by integrating IMFs having close sample entropy.Accordingly,the total six IMFs can be integrated into three sequence features: sequence IMF 0 –IMF 1 as Feature 0 having high-frequency, sequence IMF 2–IMF 3 as Feature 1 having low-frequency, and sequence IMF 4–IMF 5 as Feature 2 having a long trend, as shown in Fig.7.Compared with the original IMFs, the three feature sequences have a relatively clear and regular changing pattern, which will benefit for further obtaining the fluctuation characteristics of every IMF and training the forecasting model.

Fig.6 Sample entropy of IMFs

Fig.7 Feature sequences: (a) raw data; (b) Feature 0; (c)Feature 1; (d) Feature 2

Several observations can be got from the feature sequences in Fig.7: 1) During the first 50 trading days after the foundation of the national market, the feature sequences (Feature 0 and Feature 1) fluctuated strongly, reflecting that the newly established market was nonstable and the risk was high.The same phenomenon also appeared at the 30 beginning trading days(100–130) of year 2022, inferring that after 5 months of adaption and transition, more and more companies participate in the national market, which aggravated the market’s disturbance.2) Furthermore, the carbon price reached up to 61.2 CNY at point 133.Thereafter, the feature sequences (Feature 0 and Feature 1) tended to be flat and slow, indicating that the national carbon trading market turned to be stable with small volatility.3) For the Feature 3 sequence,the national carbon price dropped first and then raised in a U-shape.The apparent decline may be caused by a few participants and transactions at the start, but the up-trend afterwards shows that the national market gradually got on the steady track.

In this paper, each sequence feature was predicted by transGRU separately.The individual sequence forecasting result and the final SVR ensemble result with the training data are shown in Fig.8.For the original carbon prices of the markets (CEA, GDEA, and HBEA), 80%datasets are applied in each training process, and 20% datasets of price are applied in each testing process.Because of the dropout mechanism, the forecasting results of each program running are slightly different, the analysis of one of which is shown in Tab.2.

Fig.8 Forecasting results: (a) Feature 0 forecasting result; (b) Feature 1 forecasting result; (c) Feature 2 forecasting result;(d) ensemble forecasting results

Tab.2 Evaluation of forecasting results

As noted in Tab.2, the forecasting model gives an unsatisfactory result for Feature 0,whose RMSE is 0.079, MAE is 0.061, MAPE is 13.61%, and R2 is –0.101.It is probably because the data in Feature 0 sequence is of high frequency, and changes sharply.However, both the Feature 1 sequence with low-frequency and the Feature 2 sequence with long trend have an outstanding prediction result.Although a slight lag exists in the result of Feature 2, it is tolerable.In addition, the last ensemble result offers preferable forecasting of carbon prices, whose RMSE is 0.529, MAE is 0.476, MAPE is 0.717%, and R2 is 0.501, while the predictor takes a total of 126.11 s to complete the forecasting.

4.4 Comparative Analysis

4.4.1 Comparisons of Performance The proposed model is mainly compared with four typical ones in the field of time series forecasting, such as Holt, ARIMA, BPNN, and LSTM.The parameters of Holt’s exponential smoothing are set asn=0.3,t=0.4, andr=0.2,wherenis the seasonal parameter,tis the trend parameter, andris the fluctuation parameter.For ARIMA (p,d,q), the order of AR (p) is set asp=2, the difference degree is set asd=1, and MA (q) is set asq=2.The activation function,maximum iterations, hidden layers, and learning rate of BPNN are set to Relu, 2000, 32, and 5e-5,respectively.The activation function, number of trainings, and batch size of LSTM are set as tanh, 2000, and 256, respectively, and the stop condition is that the value of loss function no long decreases.

In order to explore the influence effect of the data postprocessing, the forecasting models are compared in two modes (i.e., Mode 1 and Mode 2).Mode 1 contains the processes of data decomposition, feature selection and price prediction, while Mode 2 has an additional step of component integration by SVR post-processing.The forecasting results and performance analysis are displayed in Fig.9 and Tab.3.

Fig.9 Forecasting results of different models

In Tab.3, under both Mode 1 and Mode 2,the transGRU model performs better in a variety of forecasting methods.Compared with Holt,ARIMA, BPNN, and LSTM, the error evaluation criteria of transGRU are enhanced by 27.13%–45.63% of RMSE, 11.09%–36.77% of MAE, 20.51%–44.37% of MAPE, and 38.58%－54.29% of R2 respectively.The results show that the traditional technique has limited forecasting capacity for carbon price data with a few samples, and the forecasting model based on transfer learning algorithm is more effective.In the comparison of component integration, the error evaluation indicators of the five pairwise methods under Mode 2 are always smaller than models under Mode 1.It is obvious that the use of postprocessing methods can efficiently improve forecasting accuracy.Comparing Mode 2 with Mode 1, the determination coefficient of the raw data and the predicted data has also been greatly improved by 12.25%–46.97% through the data processing technology.The results show that SVR regression is an efficient data post-processing technology, which can optimally fit the subcomponents to the final result.

Tab.3 Performance analysis of different models

The iterations and running time in Tab.3 are obtained from the training processes of three individual feature sequences.It is found that although the error evaluation indicators of the statistical models (i.e., Holt-1, Holt-2, ARIMA-1,and ARIMA-2) are relatively large, the number of iterations (mean value of 20 experiments) and running time of the algorithms are small, which is primarily because the structures of the statistical models are simple and the parameters to be solved are not enough.In addition, under both Mode 1 and Mode 2, the iterations and time of the proposed transGRU are fewer than other intelligent models, accounting for nearly 26% and 43% when compared with BPNN and LSTM algorithms, respectively.Besides, there is no significant difference of running iterations and time between Mode 1 and Mode 2 for a pairwise algorithm, which is mainly because the post-processing has little influence on the overall computational cost.

4.4.2 Comparisons of Sample Size

In order to illustrate the impact of sample size on accuracy, 5 groups of training samples are chosen in the training dataset of the target market,which are 40, 80, 120, 160, and 200, respectively.In Fig.10, as the sample size of target source increases, the accuracy of all methods gradually improves.The RMSE of transGRU is always superior to other methods.The accuracy of the transfer training model with 40 target samples is better than the one of the normally-trained models with 160 target samples.The prediction effect of the proposed model with 80 training samples can reach the one of the traditional models with 200 samples.

Fig.10 RMSE of algorithms with various trained samples

4.4.3 Comparisons of Parameter Change

In order to further illustrate the effectiveness of the algorithms, the change ranges of the model parameters during the training processes are observed.Considering that the comparability of parameters is the premise, only neural network models with the same number of hidden layers are compared.LetRd=|Ri?Rf| be the absolute value of the difference between the initial (Ri)and final (Rf) parameters of the models during the iterations, and ?R=Rdo?Rdtbe the parameter’s change range of the other model (Rdo) and the proposed model (Rdt).The results are in Fig.11, where the symbols of B1～T2, B2～T2,L1～T2, L2～T2, and T1～T2 in the figure represent the comparison of BPNN-1, BPNN-2,LSTM-1, LSTM-2, transGRU-1 with transGRU-2 respectively.When the sample size is 40－200,the mean ?Rare all positive, which meansRdt

Fig.11 Delta changes of parameters during training

5 Discussion

From Tab.3, the RMSE, MAE, MAPE, and R2 of the proposed algorithm are better than other model’s criteria, and the RMSE is smaller by 0.19–0.47, the MAE is smaller by 0.05–0.31, the MAPE is smaller by 0.18%–0.64%, and the R2 is larger by 0.21–0.34.This indicates that the generalization ability of the proposed model is good,and it has performance advantages in price forecasting and can provide an effective tool for the decision-making of carbon emission and trading.Besides, the number of iterations of the proposed algorithm is only 26%–43% of other intelligent models, indicating a faster training speed.In the application of China carbon market, the forecasting accuracy of the proposed model with only 80 training samples is equivalent to that of the other models with 200 training samples (see Fig.10).To obtain a similar forecasting performance,the number of historical price samples required by transGRU is much smaller than that of others, which indicates that the transGRU algorithm can reduce the requirement for sample size.In conclusion, the transGRU algorithm based on the transfer learning framework has the advantages of higher prediction accuracy, faster training speed, and lower requirements on the number of target samples.

The advantages of the proposed model can be understood from two aspects: 1) Although the source market and the target market are two kinds of markets with variant distributions of samples, they comply to the same carbon trading mechanism, where the corresponding relationship between the price change and the trading environment is consistent, indicating that the source domain and the target domain are related,so the effectiveness of transfer learning can be achieved; 2) The proposed algorithm provides a more reasonable starting point for the training of target model, which is obtained from the source domain data rather than a random initialization.So, it can obtain the optimal value with fewer iterations (see Tab.3) and smaller parameter adjustment (see Fig.11).

Vapnik-Chervonenkis Dimension (VCD) is a common criterion for evaluating model complexity, which is approximately equal to the number of free parameters of the model according to practice [26].Suppose the sample size isNand VCD isM, and thenN/M<20 means a few-sample situation.Let the VCD of the proposed model in this paper beV, the number of the model parameters beP, and the number of the GRU layer’s parameters beL, and then it satisfies thatV ≈P>L= ((20+32)×32+32)×3=5 088.If onlyS=273 samples in the emerging market are used to establish the forecasting model,S/V<273/5088=0.054is far less than 20, which means the sample size is seriously insufficient.However, in the proposed model based on transfer learning, the number of free parameters that need initialization is 33, andS/V=273/33=8.3, has greatly improved over 0.054, so trans-GRU algorithm is able to solve the problems of overfitting with excessive errors caused by a few samples.

6 Conclusions

In the newly launched carbon market, the number of price samples is so small that the forecasting model will easily fall into overfitting with excessive errors.This paper proposes a novel gated recurrent unit algorithm transGRU based on transfer learning framework, which includes complexity decomposition method for data preprocessing, sample entropy for feature selection,and support vector regression for result post-processing.The proposed model transGRU provides an effective way for carbon price forecasting with a few samples, which can be further extended to other time series fields.The major contribution and innovation are summarized as follows.

1) In terms of theory contribution, a novel model based on transfer learning and fine-tuning method is proposed to solve the forecasting problem with a few samples in this paper.The forecasting framework, algorithm principles, and realization steps are comprehensively designed, by which the model parameters are capable to be trained in the mature carbon markets with sufficient data, and then slightly adjusted in the new carbon market for a target model with limited data.

2) In terms of methodology contribution, the data decomposition and feature extraction methods are introduced in the field of carbon price forecasting to utilize the characteristics of input data more sufficiently.The change pattern and regularity trend of CEA market is fully analyzed,which is benefit for better understanding of fluctuation law of the new carbon market in China.Compared with traditional forecasting methods,the proposed method has higher forecasting accuracy and faster training speed, and requires fewer samples.

3) In terms of industrial contribution, the industries and investors are able to have deep insight of the market and follow the movement of carbon price more precisely, which will further guide their production or transaction when participating in the market trading.Furthermore,the proposed method is expected to promote the technical development and application on forecasting approaches for other industrial background with short time series.

The limitation of the proposed approach is that only historical price information is used for model construction.The information, including turnover rate, transaction volume, change percentage and transaction amount, is not taken into account.However, these factors are also efficient for revealing the changing trend of carbon price.Thus, a comprehensive forecasting model based on multi-factor fusion should be studied in the future work.

Journal of Beijing Institute of Technology2023年2期