亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Novel Channel Attention Residual Network for Single Image Super-Resolution

2020-11-06 01:25:06WenlingShiHuiqianDuandWenboMei

Journal of Beijing Institute of Technology 2020年3期

Wenling Shi， Huiqian Du and Wenbo Mei

（School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China）

Abstract: A novel channel attention residual network (CAN) for SISR has been proposed to rescale pixel-wise features by explicitly modeling interdependencies between channels and encoding where the visual attention is located. The backbone of CAN is channel attention block (CAB). The proposed CAB combines cosine similarity block (CSB) and back-projection gating block (BG).CSB fully considers global spatial information of each channel and computes the cosine similarity between each channel to obtain finer channel statistics than the first-order statistics. For further exploration of channel attention, we introduce effective back-projection to the gating mechanism and propose BG. Meanwhile, we adopt local and global residual connections in SISR which directly convey most low-frequency information to the final SR outputs and valuable high-frequency components are allocated more computational resources through channel attention mechanism. Extensive experiments show the superiority of the proposed CAN over the state-of-the-art methods on benchmark datasets in both accuracy and visual quality.

Key words: back-projection；cosine similarity；residual network；super-resolution

Single image super-resolution (SISR) aims to generate a high-resolution image from a lowresolution image. It is a typical and challenging ill-posed inverse problem, with applications in many areas such as medical imaging, remote sensing, etc.

Recently, with the rapid development of deep learning, a large number of deep learningbased SISR methods have emerged. C. Dong[1]first introduced a three-layer convolutional neural network to SISR and achieved better results than state-of-the-art traditional methods. Since then, numerous deep learning based methods[2-12]have been proposed and have verified the great potential of deep learning in the field of SISR.These methods learn the high-frequency information missing in the LR image by training a set of HR and LR pairs, where the architecture of the network plays an important role. The architecture is determined by the network depth, width and topology.

The network depth controls the representation power of the network. Comparing with the later proposed deep SR networks like VDSR[2],SRResNet[3], and DBPN[4], shallow networks like FSRCNN[5]and ESPCN[6]are inferior in accuracy.

The network width provides powerful capability by increasing the number of parameters.EDSR[7]and WDSR[8]increase the channel’s number of output features to widen network and bring great performance improvement.

However, simply stacking more CNN layers to deepen the network or expanding more channels to widen the network may cause a series of gradient vanishing/exploding problems. SRDense-Net[9], RDN[10], MemNet[11], etc. effectively alleviate these problems by introducing various skip connections between shallow and deep layers.

Recently, as a tool for reallocating limited computational resources based on the importance of informative components, attention mechanisms have attracted widespread attention and proved their great potential in the field of deep learning[12-13]. RCAN[12]first introduced channel attention[13]into the SISR domain and demonstrated significant performance improvements.However, RCAN only explores first-order statistics (e.g. global average pooling), filtering out most of the spatial information which might hide important information, and thus hinders further learning of channel interdependencies.

Back-projection[14]is well-known as an efficient iterative procedure to minimize reconstruction error. Previous studies have proven the effectiveness of back-projection[15]. For SISR,DBPN[4]has applied it to reconstruct HR images and demonstrates its efficiency. However, it has not been considered for integration with other mechanisms for further performance improvement, such as gating mechanism.

In this paper, we propose a channel attention residual network (CAN) combining cosine similarity, back-projection, and an attention mechanism. Specifically, we propose to use cosine similarity instead of average pooling to enhance discriminative learning ability. Meanwhile,in order to further explore channel dependencies,we introduce back-projection to the gating mechanism. Overall, our contribution is three-fold:

① We propose a cosine similarity block(CSB) which considers second order channel statistics to adaptively refine features. Unlike average pooling used in RCAN, which simply computes the mean value of the spatial pixels in each channel, our CSB fully considers global spatial information of each channel and computes finer channel statistics.

② We propose a back-projection gating block (BG) which introduces back-projection to the gating mechanism to further explore channel interdependencies. It works in a down-top manner, using low-level encoded information to refine high-level channel statistics.

③ We propose a channel attention residual network (CAN) based on CSB and BG for SISR.Extensive experiments on public datasets demonstrate the effectiveness and efficiency of our method compared with state-of-the-art SISR methods.

1 Channel Attention Residual Network

In this section, we first introduce CSB and BG. Then we present the framework of CAN. We finally describe the implementation details of our CAN.

1.1 Cosine similarity block

Most previous CNN-based SISR models do not consider feature interdependencies. To utilize this information, RCAN[12]first introduced channel attention[13]into the SISR domain to rescale channel-wise features. However, RCAN only exploits the first-order statistics of features by using global average pooling while filtering out most of the spatial information. This may hide important information, hindering further learning of channel interdependencies. To take full advantage of the spatial information before applying the squeeze operation, we introduce cosine similarity to compute the interdependencies between any two feature maps along the channel dimension.

Cosine similarity measures the difference between two vectors by the cosine of the angle between the two vectors in vector space. The closer the cosine is to 1, the closer the angle is to 0, which means the more similar the two vectors are. For two vectors a=(a1,a2,,··· ,an),b=(b1,b2,,··· ,bn), the formula for deriving cosine similarity is

As shown in Fig.1, given a feature F =[f1,··· ,fC] with H×W features of C-dimension, we reshape F to a feature matrix X ∈RC×HWwith HW features of C-dimension. xiis the i-th row of X. ||xi|| is the L2norm of the ith row of x. Let x′=(||x1||,||x2||,···||xC||)T. Then the cosine similarity matrix Z can be computed as

where Z(i,j) denotes the cosine similarity between the feature maps of channel i and channel j. The division above is the dot division of the matrix.

Here, since the range of the cosine function is within [–1, 1], which is not conducive to attention weight (see Section 2.2.1), we use a conversion function to normalize it to the interval of[0, 1].

As illustrated in Fig.1, let Z =[z1,··· ,zC],the channel-wise statistics s ∈RC×1can be obtained by shrinking Z. Then the c-th dimension of s is computed as

where HAP(·) denotes the squeeze operation.

Compared with the commonly used first-order pooling (e.g. global average pooling), our CSB explores the feature distribution through computing the interdependencies between two feature maps along the channel dimension and captures finer feature statistics for more discriminative representations.

Fig.1 Architecture of proposed cosine similarity block (CSB)

1.2 Back-projection based gating mechanism

To fully explore channel-wise dependencies,we introduce back-projection to the gating mechanism. The channel descriptor experiences a back-projection operation as it passes through the gate. The back projection can be seen as a self-correcting procedure that the projection error is fed back to refine the descriptor.

Specifically, the BG first maps the input s to a low-level descriptor d1; then, d1is mapped to an (intermediate) high-level descriptor u1. After u1is mapped back to low-level descriptor d2, the projection error ed between d1and d2is then mapped to a new intermediate high-level descriptor u2. The final output s? of BG is obtained by summing u1and u2.

As shown in Fig.2, the BG is defined as follows.

Scale down:

Scale up:

Back project:

Projection error:

Scale error up:

Final output:

where * is the spatial convolutional operator, ↑dand ↓dare the up- and down-sampling operators with scaling factor d, respectively, WUiand WDiare (de)convolutional filters at the i-th stage,and f(·) and δ(·) denote sigmoid and ReLU functions, respectively.

As shown in Fig.2, we cascade CSB and BG to construct our basic block, channel attention block (CAB). After BG outputs the final channel statistics s? , s? is used to rescale the input of block Xg,m.

However, directly stacking attention modules will lead to significant performance degradation, because the value of features in deep layers will degrade after repeated dot products with the mask which has a range [0, 1]. We introduce attention residual learning (ARL) to address this problem. We modify the output of CAB as

Fig.2 Structure of residual channel attention block (RCAB) which includes the implementation of back-projection gating block (BG)

Fig.3 Network architecture of our channel attention residual network (CAN)

1.3 Network architecture

It should be noted that our CAN has similar structure to state-of-the-art RCAN[12]at the most general level. As shown in Fig.3, both CAN and RCAN are decomposed of the residual group (RG) and residual channel attention block(RCAB). However, RCAB in our CAN is completely different from RCAB in RCAN. It combines CSB and BG. Meanwhile, we adopt much fewer numbers of RCAB and RG compared to RCAN for a better balance between performance and parameter.

1.4 Implementation details

Now we specify the implementation details of our proposed CAN. In CAN, we set the RG number as 6, the RCAB number as 12, and the channel number C as 64. In each RCAB, we set the scaling factor d = 4 and 3×3 as the kernel size of all convolutional layers except for those in BG. In each BG, we set 8×8 as the kernel size of all (de)convolutional layers and we adopt ReLU as the activation function after all the convolutional layers and the sigmoid function after the deconvolutional layer. Our model can process both color and grayscale images dependent on whether coutis 3 or 1.

2 Experiments

2.1 Settings

Datasets: In order to evaluate the performance of our proposed CAN, we train our model on the DIV2K dataset[16]which contains 800 training images and 100 validation images. For testing, we use four standard benchmark datasets: Set5[17], Set14[18], BSD100[19], Urban100[20].

Metrics and degradation model: The SR results are evaluated with PSNR and SSIM[21]on the luminance (Y) channel. We conduct experiments with the Bicubic (BI) degradation model.

2.2 Ablation studies

2.2.1 Effects of BG, CSB and ARL

BG: As can be seen from Tab. 1, comparing the PSNR results in the second and fifth columns, we find that the PSNR value increases from 38.04 dB to 38.07 dB after replacing the original gating mechanism in RCAN with BG,which fully proves the effectiveness of BG. When we compare PSNR in the third and last columns,we find our BG always achieves better results with or without CSB and ARL.

CSB + ARL: Comparing the PSNR results in the second column (RCAN) and the third column, we can see that after we replace original average pooling with (CSB+ARL), the PSNR value increases from 38.04 dB to 38.08 dB. This demonstrates that (CSB+ARL) plays an important role in fine-tuning the feature map. When we compare the fifth and last columns, we find that(CSB+ARL) achieves better results than average pooling whether or not BG is used. Lastly,comparing the PSNR values in the last two columns, we find that ARL has an impact on the effectiveness of the CSB which we explain in Section 1.2. Comparing the PSNR in the third and fourth columns (U-CSB denotes unnormalized CSB), we find that the reconstruction performance is improved after normalizing the cosine similarity.

From the PSNR in the first, second and the last columns, we observe that our novel fine-scale block (CAB) combining CSB, ARL, and BG can extract finer channel attention maps than CA inRCAN and can achieve significant performance improvements over model with no attention block appended.

Tab. 1 Study of BG and CSB

2.2.2 Model efficiency trade-off

Fig.4 depicts the trade-off study of PSNR vs. parameters. Compared with other methods,our CAN and CAN+ achieve higher performance,with a better tradeoff between model size and performance. It is worth noting that although the PSNR result of RCAN is slightly higher than our CAN, CAN has only about 37% of the RCAN parameters. CAN has a better efficiency trade-off than RCAN, but the calculation and training difficulty of the network are still not negligible.Therefore, we created a lightweight version of CAN by reducing the number of RG to 3 and RCAB to 6, and denoted it as CAN-s. CAN-s has only 30% of the back-projection operations of CAN, but the PSNR value decreases from 32.51 dB to 32.27 dB as shown in Fig.4, which means a tradeoff existing between performance and calculation.

2.3 Results with BI degradation model on DIV2K dataset

To confirm the ability of the proposed network, we compare our method with 10 state-ofthe-art methods: SRCNN[1], FSRCNN[5], VDSR[2],LapSRN[24], MemNet[11], EDSR[7], CARN[25], DDBPN[4], SRFBN[26], and RDN[10]. We also introduce self-ensemble strategy to further improve our CAN and denote that model as CAN+.

2.3.1 Quantitative results by PSNR/SSIM

Tab. 2 shows quantitative comparisons for ×4 and ×8 SR. We can see that our CAN+ and CAN perform the best and the second best on all the datasets. Note that our model has only about 27%, 59% and 14% of the parameters of RDN, D-

Fig.4 Trade-off study of PSNR vs. parameters

DBPN, and EDSR, respectively. However, our CAN can earn better results than them.

Tab. 2 Quantitative results with BI degradation model

2.3.2 Visual results

Fig.5 shows SR visual results with scale factor ×4. Our proposed CAN infers the clearest and most realistic images. For image “img_092”,we observe that most of the compared methods produce blurring artifacts along the diagonal direction. In contrast, our CAN can better alleviate the blurring artifacts and restore lines with correct directions. For image “img_046”, we observe that EDSR and D-DBPN produce images with severe structural dislocation, and other compared methods cannot even recover clear images.For image “img_004”, all the compared methods suffer from severe blurring artifacts, falling to recover a clear grid. However, our CAN can alleviate the blurring artifacts and recover sharp edges close to the ground truth.

3 Conclusion

Fig.5 Visual results with Bicubic (BI) degradation (×4) on Urban100 dataset

In this paper, we propose a novel channel attention residual network for SISR. In CAN, we propose a channel attention block based on cosine similarity, back-projection and an attention mechanism to bias the allocation of available computational resources to the most informative features. The extensive experiments demonstrated the superiority of CAN in comparison with the state-of-the-art methods. Our network can be used in many fields, such as medical imaging, surveillance systems, etc. In future, we plan to extend the method to other tasks, such as object detection and image classification.

Journal of Beijing Institute of Technology2020年3期

Journal of Beijing Institute of Technology的其它文章: Temperature Field Reconstruction in High-Temperature Gas by Using the Colored Background Oriented Schlieren Method; Multi-Objective Structural Optimization of Wind Turbine Tower Using Nondominated Sorting Genetic Algorithm; Simulation Research on the Effect of 2-Stroke Engine Exhaust Resonance on Aspiration; Modelling and Simulation on the Effect of Hot Forming Damage on Three-Point Bending Performance of Beam Components; Test and Evaluation of Aviation Die Tyre Accuracy Based on Industrial Photogrammetry; Improved Soft Cancellation Decoding of Polar Codes