DOI QR코드

DOI QR Code

Cascaded Residual Densely Connected Network for Image Super-Resolution

  • Received : 2022.04.19
  • Accepted : 2022.09.07
  • Published : 2022.09.30

Abstract

Image super-resolution (SR) processing is of great value in the fields of digital image processing, intelligent security, film and television production and so on. This paper proposed a densely connected deep learning network based on cascade architecture, which can be used to solve the problem of super-resolution in the field of image quality enhancement. We proposed a more efficient residual scaling dense block (RSDB) and the multi-channel cascade architecture to realize more efficient feature reuse. Also we proposed a hybrid loss function based on L1 error and L error to achieve better L error performance. The experimental results show that the overall performance of the network is effectively improved on cascade architecture and residual scaling. Compared with the residual dense net (RDN), the PSNR / SSIM of the new method is improved by 2.24% / 1.44% respectively, and the L performance is improved by 3.64%. It shows that the cascade connection and residual scaling method can effectively realize feature reuse, improving the residual convergence speed and learning efficiency of our network. The L performance is improved by 11.09% with only a minimal loses of 1.14% / 0.60% on PSNR / SSIM performance after adopting the new loss function. That is to say, the L performance can be improved greatly on the new loss function with a minor loss of PSNR / SSIM performance, which is of great value in L error sensitive tasks.

Keywords

1. Introduction

Image super-resolution(SR) problem is a complex nonlinear problem in essence, and there is no accurate analytical solution to such problems. Generally, the problem of image SR research can be divided into two categories: the first case is both the low-resolution image and the corresponding high-resolution image are known. In this situation, the low-resolution image is usually obtained from down sampling of the high-resolution image, but information will be lost in this process[1]-[5]. Moreover, the quality of image reconstruction and restoration can be measured since the true value is known. Another situation is that only the low resolution image is known. For example, we need to reconstruct the early low resolution image to obtain higher quality. This kind of problem has no accurate solution and is an open-ended problem. Theoretically, it is impractical to completely restore to the accurate image, so we can only restore as much detail information as possible through the modeling.

In recent years, the research on image SR based on machine learning has gradually become a new hotspot with the introduction of convolution neural network (CNN) and deep learning method. Deep learning method is very good at dealing with this kind of nonlinear problems. For example, SRCNN(Super Resolution Convolutional Neural Network)[6][7] method is a deep learning based image SR network model based on CNN. On this basis, it is possible to build a deeper deep learning network especially with the help of residual network[8]. For example, VDSR(Very Deep Super Resolution)[9] builds a deeper SR network combined with residual module[8][10][11], and the performance of the network has also been greatly improved. Residual connection and deeper network are one of the important directions of SR network[12] since then. The characteristics of different network layers can be reused due to the addition of multiple skip-connection in dense residual connection[13]-[16] and cascade[17] [18]connection, residual information and gradient information can be transmitted more quickly in this situation, hence its performance can be improved significantly.

The problem of the current method: For the RDN(Residual Dense Network) method, only the skip-connection is added between the previous RDB(Residual Dense Block) and the last RDB, as a result the global feature reused is not sufficient, and the feature map between the RDB not adjacent is not effectively reused.

Therefore, we proposed a cascade connection architecture of our RSDB(Residual Scaling Dense Block) combined with continuous memory mechanism (CM)[19] and residual scaling in order to further improve the reuse of features map between RSDB. Each RSDB output can be connected to each layer of the next RSDB, so as to reuse the previous features and current features to learn more effective features adaptively, further increasing the efficiency of feature reuse.

In addition, traditional methods usually focus on L1 or L2 loss function, this kind of loss function could achieve good PSNR / SSIM performance, but L error performance is not satisfactory. Therefore, we propose a new hybrid loss function, which can greatly improve the L performance. Our main contributions are as follows:

1.An improved dense residual connection network based on cascade connection architecture is proposed. In comparison to the previous RDN approach, we enhanced the RDB connection and increased the skip-connection between RDB. This enhancement makes better use of the feature map at each layer, which contributes to an increase in network efficiency and accuracy. With the adoption of RSDB in place of RDB and our new hybrid loss function, the performance of the new network has improved significantly.

2. A new L1-∞ hybrid loss function is proposed and verified, which can effectively reduce the L error. Compared with the preceding L1 or L2 loss function, the advantage of this loss function is that it can greatly improve the L error performance with a minor degrading of the PSNR and SSIM performance. Parameter can be adjusted easily according to user requirement, which is of great value in the L performance sensitive occasions.

3. Based on residual scaling, we proposed a residual scaling dense block (RSDB). To further improve the better utilization of hardware resources, this method, when compared to the conventional RDB, raises the residual scaling layer and removes Batch Normalization (BN), dropout, and other layers. This new block has a better convergence performance, according to our tests.

Next, Sec. 2 will introduce the related research; Sec. 3 will introduce the new method proposed by us; Sec. 4 will introduce the test results, including ablation test and visual performance comparison; Sec. 5 will summarize our research.

2. Related research

The early convolutional neural network is mainly applied to the field of image classification. After continuous development and innovation, CNN is widely used in the field of image processing.

Among them, SRCNN[6] proposed a SR network model based on CNN. The network is mainly composed of three convolution layers, which mainly achieves the functions of image extraction, feature representation, nonlinear feature mapping and final reconstruction. However, the network depth is shallow as it is only composed of convolution layer. Compared with SRCNN, the main improvement for Fast-SRCNN[7] are the addition of deconvolution layer at the end of the network to realize image upsampling and reconstruction. Therefore, the input of the network is the low resolution image, which can reduce the calculation of the whole network and improve the speed of the network. VDSR[9] is a deeply enhanced SR network constructed by residual module[8]. The model is composed of 20 convolution layers equipped with residual modules.

Besides, there are still networks or structure for special purpose, for example, Edge-preserving filters and the single-image super-resolution (SISR) approach are used to provide a novel edge-aware multi-focus picture fusion[20]. Meta method integrate meta-learning[21] into its fusion network to produce images with different levels of stochasticity from a single fusion model. In order to deal with special domain application such as extreme motion blur and recover sharp high-resolution (HR) image [22], thermal image [23][24], X-ray image [25], 3D image SR[26], Stereo Image[27], Remote Sensing Image[18] and so on. Multi-level image SR, such as Pyramidal structure [28], Multi-Scale SR [29]-[31], GAN(Generative neural network)[32], RNN(Recurrent neural network)[33], Cascade[17][18] and so on.

Among the current methods, residual learning and dense connection are the most import structure. Residual learning[26][35]-[37] is constructed between the input low resolution image and the final output high-resolution images. This connection can also be called global connection. However, there is no upsampling inside the network since the input low resolution image is interpolated into the same size as the output high-resolution image, which leads to a large amount of calculation for such networks. In addition, a large learning rate could be set in training without vanishing/expoding gradients through residual learning and adaptive gradient clipping. LapSR(Laplacian Pyramid Super Resolution Networks)[12] model is mainly composed of two parts: feature extraction and image reconstruction. The network is also composed of convolution layers connected in series. The difference is that this method adds the upsampling operation in the convolution layer. The advantage of this processing is that the input of the network is a low resolution image, which ensures the high computational efficiency of the network, and most of the subsequent SR methods use the same low resolution input mode. SRdenseNet[13] is mainly constructed by DenseNet[15][38]-[46] module. DenseNet not only greatly reduces the amount of network parameters, but also alleviates the problem of gradient vanishing to a certain extent by reusing the features from the shallow layers and the bypass channel. The input of each DenseNet block is set as the output of the previous block. Meanwhile the output features of all layers are element-wise concatenate instead of adding operation, which is different from the direct adding of ResNet. This improvement brings many advantages. For example, it has fewer parameters and additional bypass channel compared with ResNet to enhance feature propagation and reuse.

At the same time, it alleviates model degradation at small gradient making the network more easier to train. This architecture makes full use of the features extracted from the shallow layer and is reused in the deep layer. The improvement in RDN[16] is it introduced continuous memory mechanism (CM), global connection and local residual connection to increase the propagation of residual. Compared with SRresNet[47], EDSR[19] mainly removes redundant modules in SRresNet, such as batch normalization (BN) and Dropout layer, which can expand the depth of the model and improve the quality of the reconstructed image. This is because the original ResNet structure is used to solve high-level computer vision problems, but it is not the best choice when applied to low-level computer vision problems such as SR. More convolutional layers can be added with the same computing resources after removing the BN layer.

3. Our method

3.1 Network structure

Our network is mainly composed of Cascaded Residual Scaling Dense Block (CRSDB) module, upscale module (UPM) and Cascaded Global Feature Learning Module (CGFLM). The detailed network structure as shown in Fig. 1. CRSDB is comprised of Residual Scaling Dense Block (RSDB) through cascade architecture. RSDB mainly achieves local feature learning function. Compared with the original Dense Residual Block (RDB), it increased the residual scaling layer and it removed batch normalization (BN) as well as dropout layer in order to improve the utilization efficiency for the limited hardware resources, this is one of the differences between the original RDN method. At the same time, it is also the basic module for our Cascaded Residual Dense Network (CRDN).

E1KOBZ_2022_v16n9_2882_f0001.png 이미지

Fig. 1. Architecture of Cascaded Residual Scaling Dense Network

The CGFLM is composed of CRSDB which connected in series. The difference is that the CRSDB module based on multi-channel connection is adopted compared with the original RDN method. Besides, the improved hybrid loss function is proposed, which can realize rapid propagation of residual and gradient with faster convergence for L error, hence it is also the most important improvement compared with the original RDN method.

The UPM is realized by the subpixel[19] method, which is more flexible than upsampling method, and the amount of calculation is independent of the scaling factor, especially when the scaling factor is large. It has better performance Compared with the traditional upsampling methods.

Generally speaking, a deeper network can obtain higher accuracy. But it is also more difficult to train. However, the training time will be prolonged and the problem of divergence is becoming more and more obvious with the deepening of the network. We can design and train deeper networks with the help of residual learning network, and the performance of training can also be greatly improved. Therefore, there are more and more variants networks based on the residual network. Its sparkle lies in obtaining better residual convergence performance by adding skip connection between the residual block. In practice, it is found that adding more denser connections and residual scaling in the local feature learning module can also achieve better convergence. Therefore, we propose to further equip cascade architecture and residual scaling layer in the global feature learning module. In this way, the number of connections and the dimension of residual concatenation feature map in the global feature learning module can be increased, hence better learning performance could be obtained.

3.2 Residual scaling dense block

The most important ideal of CRDN model is to establish "shortcuts" between the previous layer and the successive layer, which is helpful for back propagation of gradient in training processes, hence we can train a deeper network. The basic residual learning channel is the same as ResNet, but CRDN established a cascaded dense architecture among all the previous layers to the successive layers. Another key highlights of CRDN is to maximize the feature reuse through the connection between different layers on the element-wise concatenation.

For better explanation and comparison, Fig. 2 shows the connection mechanism of ResNet network and Fig. 3 shows the dense connection mechanism for our RSDB. It can be seen that ResNet is a short cut connection between each adjacent layer and its previous layer (generally 2 ~ 3 layers), and the connection method is element-wise addition. While in RSDB, each layer will be concatenated in element-wise with all previous layers in channel dimension. For a network of L convolutional layers, RSDB contains L(L+1)/2 connections, which has more connection than ResNet.

E1KOBZ_2022_v16n9_2882_f0002.png 이미지

Fig. 2. Short cut connection for ResNet network

E1KOBZ_2022_v16n9_2882_f0003.png 이미지

Fig. 3. Dense connection mechanism for RSDB, where C represents the channel-wise concatenation operation and S is residual scaling

Moreover, RSDB inserted a residual scaling layer and directly combines feature maps from different layers, which can realize feature reuse and improve the learning efficiency. This is the main difference between RSDB and ResNet. Compared with ResNet, CRDN proposed a more aggressive dense connection mechanism: all layers are connected to each other. Specifically, each layer will accept all the layers in front of it as its additional input.

If expressed by formula, the output of network of layer l is:

xl = Hl(xl−1)       (1)

For ResNet, the identity function from the previous layer input is:

xl = scale * Hl(xl-1) + xl-1       (2)

Where scale is the residual scaling factor, which is usually set to 0.1. In RSDB, all previous layers will be concatenated in input of the successive layer:

xl = Hl[(x0, x1 ···xl-1)]       (3)

Among them, the above Hl representative non-linear transformation, which is a combined operation, which may include activation layer, convolution layer and residual scaling layer. Note that there may actually be multiple convolution layers between layer l and layer l−1.

The forward process of RSDB is shown in Fig. 3 and Fig. 4, in which dense connection can express more intuitively for better understanding. For example, the input of h3 includes not only the input from h2, but also the x1 and x2 of the first two layers, which are connected together with the channel dimension.

E1KOBZ_2022_v16n9_2882_f0004.png 이미지

Fig. 4. Forward propagation process of RSDB

3.3 Cascade architecture

Unlike the connections in ResNet and RDN, the cascading architecture is shown in Fig. 5. The main highlights of the cascade architecture are that the input layer has a short cut connection with each subsequent convolution layer after passing through a 1x1 convolution layer, and there is also a connection between the RSDB module and the subsequent RSDB module. In the cascade scheme, all connections are fused through 1x1 convolution layer. The advantage of this connection is that the number of feature maps can be easily changed from the convolution layer, which is convenient for the connection between feature maps of different sizes. This scheme can realize more sufficient propagation of information flow such as residual and gradient, and it is easy to set a given number of feature maps according to the requirements of subsequent upscaling modules.

E1KOBZ_2022_v16n9_2882_f0005.png 이미지

Fig. 5. Cascade network connection of CRSDB

In the cascade architecture, the output of the jth residual module in the ith cascade module is represented by Bi,j ; Wci is used to represent the parameters of the ith cascade module. The ith local cascade module is defined as:

Blocali = (Hi-1; Wli) = Bi,U       (4)

Among them, Bi,U can be defined by Bi,u in recursion:

\(\begin{aligned}\left\{\begin{array}{l}B^{i, 0}=H^{i-1} \\ B^{i, u}=f\left(\left[I, B^{i, 0}, \cdots, B^{i, u-1}, R^{u}\left(B^{i, u-1} ; W_{R}^{u}\right)\right] ; W_{R}^{i, u}\right) \text { for } u=1, \cdots U\end{array}\right.\end{aligned}\)       (5)

Finally, the output of the cascaded block can be defined combining both the local and global cascading. Here H0 is the output of the first convolution layer. ) .Note that because our model has a single convolution layer before each residual block, the first residual block gets f(X; Wc) as input, where Wc is the parameter of the convolution lay. WRi is the parameter set of the \(\begin{aligned}\underline{i}^{\text {th }}\end{aligned}\) residual block; WRi,j is the parameter of the jth convolution layer in the ith block. Hence it yields:

\(\begin{aligned}\left\{\begin{array}{l}H^{0}=f\left(X, W_{c}\right) \\ H^{b}=f\left(\left[H^{0}, \cdots, H^{b-1}, B_{\text {local }}^{u}\left(H^{b-1} ; W_{R}^{u}\right)\right]\right) \text { for } \quad u=1, \cdots U\end{array}\right.\end{aligned}\)       (6)

The main difference between our method and ResNet lies in the cascading mechanism. The cascade connection architecture integrates multi-layer feature information, which can make the network realize multi-level feature information reused. The cascade scheme contains multi-channel short cut, which can not only to allow the residual to propagate along a shorter path, but also multiple different paths, so it has better learning efficiency. Cascade connection adopts multi-layer connection scheme, which can reconstruct high-resolution images through multi-layer features, making it easier for the network to obtain more detailed information, so it is more efficient.

3.4 Hybrid loss function

CRDN network adopts a hybrid loss function composed of L1 error and L error, which improves the instability of L error compared with the original RDN network. Meanwhile the PSNR /SSIM performance of our CRDN is not significantly affected, but the L error is greatly reduced, reflecting the obvious advantages of the improved network. Refer Sec. 4.3 for more details.

(1) L1-Norm

L1-Norm is one of the most common norms, which is defined as follows:

\(\begin{aligned}\|x\|_{1}=\sum_{i}\left|x_{i}\right|\end{aligned}\)       (7)

The L1-norm can be used to measure the difference between two vectors, such as the Mean Absolute Error (MAE),

\(\begin{aligned}\operatorname{MAE}\left(x_{1}, x_{2}\right)=\frac{\sum_{i}^{n}\left|x_{1 i}-x_{2 i}\right|}{n}\end{aligned}\)       (8)

(2) L-Norm

L-Norm is mainly used to measure the maximum value of vectors. It is defined as:

\(\begin{aligned}\|x\|_{\infty}=\sqrt{\sum_{i} x_{i}^{\infty}},\left(x=x_{1}, x_{2}, \cdots, x_{n}\right)\end{aligned}\)       (9)

In general, it can be expressed by the following formula:

∥x∥= max(|xi|)       (10)

A very good feature of L-norm is that it is independent of the dimension of the vectors. This feature has certain advantages in comparing error vectors of different dimensions.

(3) L1∙∞ hybrid loss function

The hybrid loss function L1∙∞ is defined as:

\(\begin{aligned}\left\{\begin{array}{l}L_{1 \bullet \infty}=\alpha L_{1}+\beta L_{\infty} \\ \alpha+\beta=1\end{array}\right.\end{aligned}\)       (11)

In SR applications, it is generally believed that L1-norm performance is better than L2-norm [16] [19][48]. In practice, we usually use mean absolute error instead of summation of absolute error in order to avoid the correlation between L1-norm and vector dimension. However, the direct use of L1-norm may reduce the average error, while the absolute error of some pixels may be still huge. This situation does occur. Because the L1-norm only reduces the average error, there is no limit to the error of a single pixel. Therefore, we need a new loss function that can not only reflect the overall error, but also effectively reduce the maximum error of a single pixel, so as to further improve the quality of image reconstruction.

The L-norm just satisfies this requirement, which allows us to easily compare the max error between a single pixels, which is independent of the number of vector dimension. Therefore, we proposed a hybrid error loss function combining L1-norm and L-norm, which can not only ensure the overall error of the image, but also effectively reduce the maximum error between individual pixels. Through test and analysis, it is found that the recommended value range of β is [0.002, 0.01]. Too large or too small will deteriorate the performance of the network.

3.5 Implementation details

The baseline is designed as an RDN network. At the same time, another five comparative networks such as CRDN0, CRDN 1, CRDN 2, CRDN 3 and CRDN 4 are designed respectively. The performance of residual scaling and different β are described, refer Sec 4 for more details. RSDB is residual scaling dense connection module, each RSDB module contains 9 convolution layers. Except that the residual link between the first layer and the last layer is realized by addition operation, the other connections are connected by concatenation. The size of convolution kernel of RSDB is 3 and the activation function is Relu. Layers such as BN, dropout and pooling are removed from the RSDB module in order to improve the overall computing efficiency of the network, which only includes necessary calculations such as convolution layer, activation layer, residual scaling layer and concatenation / addition layer.

CRSDB: the basic unit in the cascade module is composed of RSDB through cascade connection, and the core part of the whole network is connected by multiple CRSDB. Since each RSDB contains 9 convolutional layers and each CRSDB contains 4 RSDBs, so each CRSDB contains 36 convolutional layers, and the connection between these convolution layers is dense. If you need to build a deeper network, it can be easily realized. For example, two CRSDBs are enough if you need to build a 72 layers cascaded dense network.

Upscale module: the number of feature images input into the subpixel[19] module must be a multiple of the square of the scaling factor, otherwise the scaling of the image cannot be realized. Because this method is different from the up sampling or down sampling method, new pixel information will not be lost or introduced. And the number of characteristic images can be set through a 1x1 convolution layer.

Input and output: for networks with different scaling factor, 64x64 images are used as input, while the output high-resolution images vary according to the scaling factor, which are 128x128, 192x192, 256x256 respectively. The input images are cropped to different sizes according to different scaling requirements. The stride in the cropping process varies according to the scaling factor.

4. Experimental results

4.1 Data, evaluation and training platform

For SR research, there are public data sets for this task in test and comparison, mainly include Set5 [49], Set14[50], BSD100[51], Urban 100[52], DIV2K[53] and other data sets. The evaluation indexes of training efficiency are mainly PSNR[54] and SSIM[55]. PSNR (peak signal to noise ratio) is an objective standard for evaluating images. PSNR is the most common and widely used objective measurement method to evaluate image quality. Therefore, the greater the PSNR value, the less distortion. SSIM (structural similarity) is not only a structural similarity, but also a full reference images quality evaluation index. It measures image similarity in three aspects: brightness, contrast and structure. SSIM value range is [0, 1]. The larger the value, the smaller the image distortion. The training platform and relevant parameters used in this method are shown in Table 1.

Table 1. Training platform and related parameters

E1KOBZ_2022_v16n9_2882_t0001.png 이미지

Training settings and results: training optimizer is Adam, Leaning Rate (lr) = 0.001, β1=0.9, β2=0.999. The convolution kernel size is 3x3, the data set is set according to reference[57], the input image size is 64x64, and the training platform is Keras 2.7. The training results are shown in Table 2. The CRDN method proposed by us has achieved ideal results from different data sets and different scaling factor, and has better stability compared with the original RDN method.

Table 2. Benchmark tests results, average PSNR / SSIM, red bold is the best result, and green is the second best result

E1KOBZ_2022_v16n9_2882_t0002.png 이미지

4.2 Comparison of training results

Table 2 shows the comparison of PSNR / SSIM results from CRDN method and several recent important methods. CRDN network adopts CRDN1 configuration with residual scaling and cascade connection. From the results, it can be seen that the PSNR / SSIM of CRDN method of standard data set is better than that of CARN, EDSR, RDN and other methods. For training settings please refer to literature[19].

4.3 Ablation tests

The main difference between CRDN model and RDN method in this paper is that cascade and residual scaling are added. RDN method can be obtained after cancellation of such modules. Four groups of comparative tests are set in ablation test, the specific settings are shown in Table 3. RDN method is set as the baseline of this test. CRDN0 adds cascade on the basis of RDN; CRDN1 adds cascade and residual scaling layer on the basis of RDN. rs is the residual scaling factor; CRDN2 also adds cascading, residual scaling and L1∙∞ hybrid loss function, where β=0.002.

Table 3. Setup for ablation tests

E1KOBZ_2022_v16n9_2882_t0003.png 이미지

As can be seen from Table 4, the PSNR/SSIM/L errors of CRDN0 network are 25.9555/0.8158/0.7372 respectively. Compared with RDN method, PSNR/SSIM is increased by 1.10% / 0.96% respectively, while L error is reduced by 0.11%, indicating that PSNR/SSIM performance is improved and L error is reduced after cascade connection. At the same time, it can be seen from Fig. 6 that the CRDN0 network result is almost always above the RDN result, indicating that the performance of the CRDN0 network after adding cascade is also better than the original RDN network.

Table 4. Comparison of ablation test results (average on last 20 results)

E1KOBZ_2022_v16n9_2882_t0004.png 이미지

E1KOBZ_2022_v16n9_2882_f0006.png 이미지

Fig. 6. the ablation test results are shown in the figure, dataset = urban100, epoch = 50, iteration = 1000 (for more convenient comparison, the output PSNR and SSIM are subject to the same smoothing processing)

After the residual scaling, the network performance is improved more obviously. As shown in the CRDN1 curve, compared with the original RDN network, the CRDN1 performance of the network with cascade and residual scaling has been greatly improved. PSNR / SSIM increased by 2.24% / 1.44% respectively, while L error decreased by 3.64%. It can be seen from Fig. 6 that CRDN1 has the best performance, indicating that the performance is enhanced after cascading and residual scaling. Therefore, it can be seen from the results that the performance of the network can be improved by using cascade connection and residual scaling.

As shown in Table 5, Fig. 7-Fig. 8 after using L1∙∞ hybrid loss function, the network PSNR / SSIM performance is slightly reduced, but the reduction is very small. Taking CRDN1 and CRDN2 as examples, the average PSNR/SSIM/L errors of CRDN1 and CRDN2 are 26.2466/0.8197/0.7111 and 25.9471/0.8147/0.6323 respectively. Compared with the two method, the performance of PSNR / SSIM is reduced by 1.14% / 0.60% respectively when using L1∙∞ hybrid loss function(β= 0.002), but L is reduced by 11.09% (performance is improved). In other words, although PSNR / SSIM performance has slight loss respectively, the improvement performance of L error is very obvious after using the new loss function.

Table 5. Impact of β on PSNR / SSIM / L index

E1KOBZ_2022_v16n9_2882_t0005.png 이미지

E1KOBZ_2022_v16n9_2882_f0007.png 이미지

Fig. 7. Impact of β on PSNR / SSIM, PSNR / SSIM performance can be maintained when β=0.002

E1KOBZ_2022_v16n9_2882_f0008.png 이미지

Fig. 8. the influence of β on L, the hybrid loss function effectively reduces the L error

As β continues to grow, such as β= 0.01, that is, CRDN4 in the figure, the performance of PSNR / SSIM is reduced by 3.64% / 2.60% and the L error is reduced by 22.24%. At this time, the result was worsened if β is increased continuously, such as β= 0.03, the performance of CRDN3 and PSNR/SSIM in the figure is reduced by 11.35% / 11.32% respectively, but the L error is only reduced by 14.25%, which is lower than that of β= 22.24% at 0.01. Therefore, we suggest the range of β is [0.002, 0.01]. The improvement on L error is not obvious if β is too small, PSNR / SSIM and L also will be worsened while it is too large.

4.4 Visual performance comparison

In order to further to compare the visual performances of different methods, the results of EDSR, RDN and CRDN methods are compared respectively. Fig. 9-Fig. 14 shows the results of different methods on data sets Set14, Urban100 and DIV2K, and the PSNR / SSIM / L error index are given. In this result, the PSNR / SSIM / L error index of CRDN result is the best. The CRDN network is configured as residual scaling factor rs = 0.1, loss function L1∙∞ hybrid loss function, and β= 0.002。

E1KOBZ_2022_v16n9_2882_f0009.png 이미지

Fig. 9. Baboon from Data set Set14

E1KOBZ_2022_v16n9_2882_f0010.png 이미지

Fig. 10. Lenna from Data set Set14

E1KOBZ_2022_v16n9_2882_f0011.png 이미지

Fig. 11. Img_009 from data set Urban100

E1KOBZ_2022_v16n9_2882_f0012.png 이미지

Fig. 12. Img_068 from data set Urban100

E1KOBZ_2022_v16n9_2882_f0013.png 이미지

Fig. 13. Image 0001 from data set DIV2K

E1KOBZ_2022_v16n9_2882_f0014.png 이미지

Fig. 14. Image 0010 from data set DIV2K

Visually, the local details of EDSR and RDN methods in Fig. 9 are blurred, the results of CRDN method are obviously more refined. The L errors of EDSR and RDN are 0.502/0.535 respectively, which is much greater than 0.343 of CARN method.

In Fig. 10, the three methods have achieved good PSNR / SSIM indexes. The PSNR / SSIM / L indexes of EDSR and RDN methods are 30.340/0.856/0.270 and 31.757/0.884/0.521 respectively. Although the PSNR / SSIM index of RDN method is better than that of EDSR method, the L error of RDN method is as high as 0.521, it is much higher than that of EDSR / CRDN method. This shows that the improvement of PSNR / SSIM performance index can not ensure the better L performance index. However, except that PSNR / SSIM performance index of CRDN is better than EDSR / RDN method, L performance is also better than these two methods, which shows the effectiveness of the hybrid loss function proposed in this paper.

In addition, the local details of EDSR method of Fig. 13 are obviously deformed. The PSNR/SSIM/L errors of EDSR, RDN and CRDN are 26.867/0.778/0.429, 28.462/0.809/0.338 and 29.610/0.844/0.321 respectively. From the results, it can be seen that the L error of CRDN method is the smallest, which also shows that CRDN can not only improve the performance of PSNR / SSIM, but also effectively reduces the L error. The results show that the maximum error between the reconstructed image and the original image pixels is smaller, which also verifies the reliability of the hybrid loss function proposed in this paper.

4.5 SR for Real World Images

For the previous experiments, there are corresponding high resolution images, which can carry out quantitative comparison. For some certain images, the corresponding high resolution image does not exist, so they can only be compared with visual results. Similarly, we compare the proposed methods to further compare the SR performance of different methods. The data sets are from DIV2K[53] and Historical[56]. For DIV2K datasets, this is different from the previous usage. In the previous section, it is used for training after down sampling. Here, it is directly used as input for testing as a low resolution image. That is, the SR of DIV2K images to obtain a higher resolution image. The historical dataset is a set of black-and-white images with low resolution.

Since such images cannot be compared with high resolution images, it is impossible to calculate indexes such as PSNR and SSIM for quantitative comparison. Therefore, only visual performance is compared here.

As shown in Fig. 15-Fig. 18, the SR performance of BICUBIC, EDSR, RDN and CRDN methods are compared respectively. It is obvious that the image reconstructed by BICUBIC method has blurred details and serious aliasing phenomenon, and the performance is the worst among all comparison methods. EDSR, RDN and CRDN methods can restore local details, but the specific restoration details are different. As shown in Fig. 17, for the outline of the whole body of the tower, the EDSR method has ghosting; the reconstructed contour of RDN method is fuzzy, while the contour of CRDN method is clear and sharp. Similarly, in Fig. 18, the EDSR reconstruction result is very fuzzy for the cracks of the building exterior wall; RDN method almost misses this detail, and the reconstruction result of CRDN method is still very clear and sharp. Similar results can also be reflected on Fig. 15-Fig. 16. Therefore, from the visual effect comparison, the effect of CRDN method is also better than the other three methods.

E1KOBZ_2022_v16n9_2882_f0015.png 이미지

Fig. 15. Image 0015 from dataset DIV2K.

E1KOBZ_2022_v16n9_2882_f0016.png 이미지

Fig. 16. Image 0019 from dataset DIV2K.

E1KOBZ_2022_v16n9_2882_f0017.png 이미지

Fig. 17. Image img007 from dataset Historical

E1KOBZ_2022_v16n9_2882_f0018.png 이미지

Fig. 18. Image img008 from dataset Historical.

5. Conclusions

We propose a higher performance image SR network CRDN. The network adopts the improved RSDB module proposed by us. By adding the residual scaling layer to the original RBD module, the experiment shows that this method further improves the performance of the original RBD module. Through RSDB, we build a CRSDB module based on cascade architecture, and we build a CRDN network based on cascade connection through CRSDB module.

Meanwhile, we found that although the traditional L1 or L2 loss function in RDB network can obtain good PSNR / SSIM performance, but it cannot effectively reduce the L error. The maximum error of individual pixels is too large to meet the occasions with higher requirements for L error. Therefore, we proposed a hybrid loss function based on L1 error and L error. The error loss function can not only ensure the accuracy of PSNR / SSIM, but also greatly reduces the L error. The parameters in the loss function can be adjusted according to the user's requirement, and the recommended parameter range is given through our experiments. Compared with the RDN method, the PSNR / SSIM of the new method is improved by 2.24% / 1.44% respectively, and the L performance is improved by 3.64%. The L performance is improved by 11.09% with only a minimal loses of 1.14% / 0.60% on PSNR / SSIM performance after adopting the new loss function.

The experimental results show that CRDN can not only obtain better PSNR / SSIM performance, but also obtain better L error performance, which verifies the better performance for new network.

Acknowledgement

The author would like to acknowledge the support National Natural Science Foundation of China (NO. 62162027); Science and technology project of Jiangxi Provincial Department of Education(NO. GJJ210646); Key R & D Projects of Jiujiang City(NO. 2020069).

References

  1. Chen H, He X, Qing L, et al., "Real-world single image super-resolution: A brief review," Information Fusion, 79(3), 124-145, 2022. https://doi.org/10.1016/j.inffus.2021.09.005
  2. Hou M, He X, Dou F, et al., "Semi-supervised image super-resolution with attention CycleGAN," IET Image Processing, 16(4), 1181-1193, 2022. https://doi.org/10.1049/ipr2.12401
  3. Shah Z H, M Muller, Wang T C, et al., "Deep-learning based denoising and reconstruction of super-resolution structured illumination microscopy images," Photonics Research, 9(5), B168-B181, 2021. https://doi.org/10.1364/PRJ.416437
  4. Jiang M, Zhi M, Yang L I, et al., "Super-resolution reconstruction of MR image with self-attention based generate adversarial network algorithm," Scientia Sinica Informationis, 51(6), 959-970, 2021. https://doi.org/10.1360/SSI-2020-0100
  5. Zhang LX, Dong RM, Yuan S et al., "Making Low-Resolution Satellite Images Reborn: A Deep Learning Approach for Super-Resolution Building Extraction," Remote Sensing, 13(15), 2872, 2021.
  6. Dong C, Loy C C, He K, et al., "Image super-resolution using deep convolutional networks," IEEE transactions on pattern analysis and machine intelligence, 38(2), 295-307, 2015. https://doi.org/10.1109/TPAMI.2015.2439281
  7. Chao Dong, Chen Change Loy, Xiaoou Tang, "Accelerating the Super-Resolution Convolutional Neural Network," in Proc. of European Conference on Computer Vision (ECCV), 391-407, 2016.
  8. He K, Zhang X, Ren S, et al., "Deep Residual Learning for Image Recognition," in Proc. of the Conference on Computer Vision and Pattern Reco-gnition (CVPR), 1-12, 2015.
  9. Kim J, Lee J K, Lee K M, "Accurate Image Super-Resolution Using Very Deep Convolutional Networks," in Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  10. Hai H, Li P, Zou N, et al., "End-to-End Super-Resolution for Remote-Sensing Images Using an Improved Multi-Scale Residual Network," Remote Sensing, 13(4), 666, 2021.
  11. D Qiu, L Zheng, J Zhu, et al., "Multiple improved residual networks for medical image superresolution," Future Generation Computer Systems, 116, 200-208, 2021. https://doi.org/10.1016/j.future.2020.11.001
  12. Lai W S, Huang J B, Ahuja N, et al., "Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution," in Proc. of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5835-5843, 2017.
  13. Tong T, Li G, Liu X, et al., "Image Super-Resolution Using Dense Skip Connections," in Proc. of the Conference on Computer Vision (ICCV 2017), 2017.
  14. Zha L, Yang Y, Lai Z, et al., "A Lightweight Dense Connected Approach with Attention on Single Image Super-Resolution," Electronics, 10(11), 1234, 2021.
  15. Huang G, Liu Z, Laurens V, et al., "Densely Connected Convolutional Networks," in Proc. of the Conference on Computer Vision and Pattern Recognition(CVPR), 2261-2269, 2017.
  16. Zhang Y, Tian Y, Kong Y, et al., "Residual Dense Network for Image Super-Resolution," in Proc. of the Conference on Computer Vision and Pattern Recognition(CVPR), 2018.
  17. Ahn N, Kang B, Sohn K A., "Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network," in Proc. of 15th European Conference, Conference on European Conference on Computer Vision, Springer, Munich, Germany, 2018.
  18. Guo D, Xia Y, Xu L, et al., "Remote Sensing Image Super-resolution Using Cascade Generative Adversarial Nets," Neurocomputing, 443, 117-130, 2021. https://doi.org/10.1016/j.neucom.2021.02.026
  19. Lim B, Son S, Kim H, et al., "Enhanced Deep Residual Networks for Single Image SuperResolution," in Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 1134-1140, 2017.
  20. Gopalakrishnan S, Ovireddy S., "Hybridisation of single-image super-resolution with edge-aware multi-focus image fusion for edge enrichment," IET Image Processing, 14(16), 2020.
  21. Ma HU, Gong BC, Yu YZ, "Structure-aware Meta-fusion for Image Super-resolution," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(2), 1-25, 2022.
  22. Fang N, Zhan Z., "High-resolution optical flow and frame-recurrent network for video superresolution and deblurring," Neurocomputing, 489, 128-138, 2022. https://doi.org/10.1016/j.neucom.2022.02.067
  23. Gupta H, Mitra K., "Toward Unaligned Guided Thermal Super-Resolution," IEEE Transactions on Image Processing, 31, 2022.
  24. Yang X, Zhang M, Li W, et al., "Visible-Assisted Infrared Image Super-Resolution Based on Spatial Attention Residual Network," IEEE Geoscience and Remote Sensing Letters, 19, 2021.
  25. Vyas N, Kunne S, Fish T M, et al., "Protocol for image registration of correlative soft X-ray tomography and super-resolution structured illumination microscopy images," STAR Protocols, 2(2), 100529, 2021.
  26. Wang L, Du J, Gholipour A, et al., "3D dense convolutional neural network for fast and accurate single MR image super-resolution," Computerized Medical Imaging and Graphics, 93, 101973, 2021.
  27. Chu X, Chen L, Yu W., "NAFSSR: Stereo Image Super-Resolution Using NAFNet," Computer Science - Computer Vision and Pattern Recognition, 2022.
  28. Wu H, Gui J, Zhang J, et al., "Pyramidal Dense Attention Networks for Lightweight Image SuperResolution," in Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  29. Li J, Fang F, Li J, et al., "MDCN: Multi-Scale Dense Cross Network for Image Super-Resolution," IEEE Transactions on Circuits and Systems for Video Technology, 31(7), 2547-2561, 2021. https://doi.org/10.1109/TCSVT.2020.3027732
  30. Lv X, Wang C, Fan X, et al., "A novel image super-resolution algorithm based on multi-scale dense recursive fusion network," Neurocomputing, 489, 98-111, 2022. https://doi.org/10.1016/j.neucom.2022.02.042
  31. Wang M J, Yang X, Anisetti M, et al., "Image super-resolution via enhanced multi-scale residual network," Journal of Parallel and Distributed Computing, 152, 57-66, 2021. https://doi.org/10.1016/j.jpdc.2021.02.016
  32. Yang G, Wang Y, Yi C, et al., "A New super-resolution restoration method with Generated Adversarial Network for underground video images in coal mines," Journal of Physics: Conference Series, 2031(1), 012011, 2021.
  33. Park J H, Lee J, Lee K, et al., "FBRNN: Feedback Recurrent Neural Network for Extreme Image Super-Resolution," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, 2020.
  34. Zhao D, Zhang F, Wang W, et al., "Medical Images Super Resolution Reconstruction based on Residual Network," in Proc. of ICCAI 2021: 2021 7th International Conference on Computing and Artificial IntelligenceApril, 119-126, 2021.
  35. Qiu D, Zheng L, Zhu J, et al., "Multiple improved residual networks for medical image superresolution," Future Generation Computer Systems, 116, 200-208, 2021. https://doi.org/10.1016/j.future.2020.11.001
  36. Zhu D, Qiu D, "Residual Dense Network for Medical Magnetic Resonance Images SuperResolution," Computer Methods and Programs in Biomedicine, 209, 2021.
  37. Wang M J, Yang X, Anisetti M, et al., "Image super-resolution via enhanced multi-scale residual network," Journal of Parallel and Distributed Computing, 152, 57-66, 2021. https://doi.org/10.1016/j.jpdc.2021.02.016
  38. Li Y, Cao J, Li Z, et al., "Lightweight Single Image Super-resolution with Dense Connection Distillation Network," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 17(1s), 1-17, 2021.
  39. Yu H A, Ft A, Jin J, et al., "Dense channel splitting network for MR image super-resolution," Magnetic Resonance Imaging, 88, 53-61, 2022. https://doi.org/10.1016/j.mri.2022.01.016
  40. Qiu C, Yao Y, Du Y, "Nested Dense Attention Network for Single Image Super-Resolutionn" in Proc. of the 2021 International Conference on Multimedia Retrieval, 250-258, 2021.
  41. Liu D, Li J, Yuan Q, "A Spectral Grouping and Attention-Driven Residual Dense Network for Hyperspectral Image Super-Resolution," IEEE Transactions on Geoscience and Remote Sensing, 59(9), 7711-7725, 2021. https://doi.org/10.1109/TGRS.2021.3049875
  42. Li G, Zhu Y, "A Novel Dual Dense Connection Network for Video Super-resolution," Computer Science - Computer Vision and Pattern Recognition, 2022.
  43. Dun Y, Da Z, Yang S, et al., "Image Super-Resolution based on Residually Dense Distilled Attention Network," Neurocomputing, 443, 47-57, 2021. https://doi.org/10.1016/j.neucom.2021.02.008
  44. Zhou Y, Zhang Y, Xie X, et al., "Image super-resolution based on dense convolutional autoencoder blocks," Neurocomputing, 423, 98-109, 2021. https://doi.org/10.1016/j.neucom.2020.09.049
  45. Lv Y, Ma H, Li J, et al., "Fusing dense and ReZero residual networks for super-resolution of retinal images," Pattern Recognition Letters, 149, 120-129, 2021. https://doi.org/10.1016/j.patrec.2021.05.019
  46. Sun L, Liu Z, Sun X, et al., "Lightweight Image Super-Resolution via Weighted Multi-Scale Residual Network," IEEE/CAA Journal of Automatica Sinica, 8(7), 1271-1280, 2021. https://doi.org/10.1109/JAS.2021.1004009
  47. Ledig C, Theis L, F Huszar, et al., "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network," in Proc. of the Conference on Computer Vision and Pattern Recognition(CVPR), 7, 105-114, 2017.
  48. Hu X, Mu H, Zhang X, et al., "Meta-SR: A Magnification-Arbitrary Network for SuperResolution," in Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  49. M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low-complexity single-image super-resolution based on nonnegative neighbor embedding," in Proc. of Conference on British Machine Vision Conference(BMVC), 135-1-135-10, 2012.
  50. R. Zeyde, M. Elad, and M. Protter, "On single image scale-up using sparse-representations," in Proc. of 7th International Conference Curves Surface, 711-730, 2010.
  51. D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics," in Proc. of the Conference on Computer Vision (ICCV), 2021.
  52. J.-B. Huang, A. Singh, and N. Ahuja, "Single image super-resolution from transformed selfexemplars," in Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR), 5197-5206, 2015.
  53. Cai J, Gu S, Timofte R, et al., "NTIRE 2019 Challenge on Real Image Super-Resolution: Methods and Results," in Proc. of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019.
  54. Huynh-Thu Q, Ghanbari M, "Scope of validity of PSNR in image/video quality assessment," Electronics Letters, 44(13), 800-801, 2008. https://doi.org/10.1049/el:20080522
  55. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P., "Image quality assessment: from error visibility to structural similarity," IEEE Transactions on image processing (TIP), 13(4), 600-612, 2004. https://doi.org/10.1109/TIP.2003.819861
  56. SAHEBI. Super resolution Dataset [EB/OL]. [Online]. Available: https://www.kaggle.com/datasets/msahebi/super-resolution. 2022-4-19
  57. B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, "Enhanced deep residual networks for single image super-resolution," in Proc. of the Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 1- 7, 2017.