DOI QR코드

DOI QR Code

Infrared and visible image fusion based on Laplacian pyramid and generative adversarial network

  • Wang, Juan (Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology) ;
  • Ke, Cong (Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology) ;
  • Wu, Minghu (Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology) ;
  • Liu, Min (Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology) ;
  • Zeng, Chunyan (Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology)
  • Received : 2020.05.27
  • Accepted : 2021.05.11
  • Published : 2021.05.31

Abstract

An image with infrared features and visible details is obtained by processing infrared and visible images. In this paper, a fusion method based on Laplacian pyramid and generative adversarial network is proposed to obtain high quality fusion images, termed as Laplacian-GAN. Firstly, the base and detail layers are obtained by decomposing the source images. Secondly, we utilize the Laplacian pyramid-based method to fuse these base layers to obtain more information of the base layer. Thirdly, the detail part is fused by a generative adversarial network. In addition, generative adversarial network avoids the manual design complicated fusion rules. Finally, the fused base layer and fused detail layer are reconstructed to obtain the fused image. Experimental results demonstrate that the proposed method can obtain state-of-the-art fusion performance in both visual quality and objective assessment. In terms of visual observation, the fusion image obtained by Laplacian-GAN algorithm in this paper is clearer in detail. At the same time, in the six metrics of MI, AG, EI, MS_SSIM, Qabf and SCD, the algorithm presented in this paper has improved by 0.62%, 7.10%, 14.53%, 12.18%, 34.33% and 12.23%, respectively, compared with the best of the other three algorithms.

Keywords

1. Introduction

With increasing sensor technology, different type images of fusion are a paramount issue in image processing field. Image fusion is to generate an image containing the complete details of the same scene [1]. In recent years, two different types of image fusion have become a crucial part of the image fusion field, such as infrared and visible images. It has been widely used in object detection, military actions, video surveillance [2] [3] and remote sensing [4] [5]. For example, the United Kingdom's Waterfall Solution Company developed an image fusion system. This system is applied to police helicopters [6]. Its function is to fuse infrared images with color visible images, and the image obtained has both infrared features and color image effect. In addition, William from the United States proposed an image fusion algorithm based on discrete Haar wavelet transform [7] [8], which was used in the vehicle-mounted system of night driving. Therefore, it is indispensable to further explore the fusion method of these two images.

 Up to now, a host of fusion methods have been presented. Multi-scale transform method [9] is the most popular method in the early years. Classic multi-scale fusion methods include the Laplacian pyramid (LP)-based method [10], the morphological pyramid (MP)-based method [11], the discrete wavelet transform (DWT)-based method [12], the dual-tree complex wavelet transform (DTCWT)-based method [13] and the non-subsampled contourlet transform (NSCT)-based method [14]. These methods obtained the fused image by decomposition, fusion, and reconstruction [15]. However, the fused images obtained by these methods would lose a slice of detail information, which would influence the visual quality of the images. Moreover, these traditional fusion methods need to design complex fusion rules and perform poorly in extracting features.

 Due to the advantages of deep learning in image feature extraction, some fusion methods based on deep learning had also attracted great attention. Yu Liu [16] proposed a deep convolutional neural network (CNN)-based multi-focus image fusion method. It was the first time to apply CNN to image fusion. However, this method only used the results of the last layer of the network model. Hui Li [17] presented a deep learning framework for infrared and visible image fusion. However, this method only used the average strategy to fuse the base layer, which would lead to the loss of some details. Jiayi Ma [18] introduced generative adversarial network (GAN) for infrared and visible image fusion. However, this method only obtains detailed information by distinguishing the generated image from the visible image, resulting in less infrared features in the generated image. Therefore, the network models of these deep learning fusion methods need to be improved.

 Although the above methods have achieved success in image fusion, their shortcoming lies in that they only pay attention to one type of source image while ignoring the details of another source image in retaining source image information. To overcome these deficiencies, we present a fusion method based on Laplacian pyramid and generative adversarial network (Laplacian-GAN). The image is decomposed into two parts through a guide filter [19]. We use the fusion method based on Laplacian pyramid to obtain the fused base layer. To address the detail layer problem, a novel generative adversarial network is employed to obtain the fused detail layer. The fused image is reconstructed by fusing the base layer and the detail layer.

 The rest of this paper is organized as follows. Section2 describes the related work of the proposed Laplacian-GAN. In Section3, the proposed Laplacian-GAN is full introduced. Section4 describes the fusion results and further analysis. Finally, summary of this paper in Section5.

2. Related Work

Image pyramid and generative adversarial network are related to the research of this paper. This section focuses on the two related methods.

2.1 Laplacian Pyramid

Laplacian pyramid is to process images at multiple scales and resolutions. It can decompose the edge and texture features of the image into different resolution tower layers according to different scales [10]. But the Laplacian pyramid is based on the Gaussian pyramid [20].

 The Gaussian pyramid is composed of a series of images, the bottom of which is the original image, and each upper layer of the image is obtained by the previous layer through low-pass filtering, so this operation is called subtraction operation, and is represented by the REDUCE function The bottom layer of the pyramid is G0 and the l layer is Gl , which is defined as follows.

\(G_{l}=R E D U C E\left(G_{l-1}\right)=\sum_{m=-2}^{2} \sum_{n=-2}^{2} \omega(m, n) G_{l-1}(2 i+m, 2 j+m)\)       (1)

\(\omega=\frac{1}{256} *\left[\begin{array}{ccccc} 1 & 4 & 6 & 4 & 1 \\ 4 & 16 & 24 & 16 & 4 \\ 6 & 24 & 36 & 24 & 6 \\ 4 & 16 & 24 & 16 & 4 \\ 1 & 4 & 6 & 4 & 1 \end{array}\right]\)       (2)

 where ω is a 5 5 × filter used for subsampled G0 . i and j represent the rows and columns of the image. Then ω is convolved with the image of layer l to get the same size as the image of layer l −1.

 The Laplacian pyramid consists of a series of error images L0 , L1 , ⋅⋅⋅, LN , each of which is obtained by subtracting the corresponding two layers of the Gaussian pyramid. As each layer of the Gaussian pyramid is obtained from the previous layer by subsampling, the image resolution of each layer is not the same. Therefore, the image of the current layer needs to be up-sampled to obtain the image with the same resolution size. Therefore, the EXPAND function is introduced to insert the up-sampled image to obtain the image with the same resolution as the upper layer. The formula is defined as follows.

\(G_{l}^{*}=\operatorname{EXPAND}\left(G_{l}\right)=4 \sum_{m--2}^{2} \sum_{n=-2}^{2} \omega(m, n) G_{l}\left(\frac{i+m}{2}, \frac{j+n}{2}\right)\)       (3)

\(G_{l}\left(\frac{i+m}{2}, \frac{j+n}{2}\right)\) has a corresponding value when (i+m)/2 and (j+n)/2 are integers, otherwise it is 0. The image of each layer of the Laplacian pyramid can be expressed by the following formula:

\(L_{l}=G_{l}-E X P A N D\left(G_{l+1}\right)\)       (4)

 where l represents the level of the pyramid, ranging from 0 to N, and set L G N N = . Fig. 1 is a diagram of the decomposition of the Laplacian pyramid.

E1KOBZ_2021_v15n5_1761_f0001.png 이미지

Fig. 1. Decomposition of the Laplacian pyramid

2.1 Generative Adversarial Network

Generative adversarial network [21] uses an adversarial model to estimate sample distribution and generate new data. The generator and discriminator form an adversarial network, represented by G and D, respectively. The generator G generates data similar to the sample data by analyzing the distribution of the sample data. It is the task of the discriminator D to calculate the probability that the samples come from real data instead of generating data. The probability that D output is large represents real data, otherwise it represents generated data. The schematic diagram of the generative adversarial network is shown in Fig. 2.

E1KOBZ_2021_v15n5_1761_f0002.png 이미지

Fig. 2. Schematic diagram of generative adversarial network

 The loss function of generative adversarial network is an important part used to predict whether the model is good or not. Its purpose is to make the data generated by the generator closer to the real data, that is, the difference between the two is getting smaller, and the discriminator cannot distinguish them. Therefore, the loss function of generative adversarial network is expressed as follows.

\(\min _{G} \max _{D} V_{G A N}(G, D)=E_{x-P_{d+a}(x)}[\log D(x)]+E_{z-P_{z}(z)}[\log (1-D(G(z)))]\)       (5)

 where x represents the input sample, z denotes the noise of the input G. Pdata(x) represents real data distribution and Pz(z) denotes noise distribution. D(x) represents the probability that the discriminator determines whether the real data is real or not, and D(G(z)) denotes the probability that the discriminator determines whether the generated data is true or not.

3. Proposed Fusion Method

3.1 Image Decomposition

Each image can be decomposed two parts: the base layer and the detail layer. The base layer is a description of the change in the intensity value of an image. The detail layer reflects the changes in detail on an image. In our paper, we choose two preregistered images as input images. The input images are represented by \(I_{k}, k \in\{1,2\}\).

 Nowadays, decomposition methods such as wavelet decomposition have achieved great results on images. But compared to those methods, the guide filter [19] is more efficient and time-saving. So this method is used to decompose the input images into two dimensions. For each input image Ik, the decomposed base layer and detail layer are represented by \(I_{k}^{b}\) and \(I_{k}^{d}\), respectively. By solving the optimization problem in (6), the base layer can be gained.

\(I_{k}^{b}=\arg \min _{l_{k}^{\circ}}\left\|I_{k}-I_{k}^{b}\right\|_{F}^{2}+\lambda\left(\left\|g_{x} * I_{k}^{b}\right\|_{F}^{2}+\left\|g_{y} * I_{k}^{b}\right\|_{F}^{2}\right)\)       (6)

where gx =[−1 1] represents the horizontal gradient operator, gy =[−1 1]T denotes the vertical gradient operator. In the above formula, the parameterλ is set to 5. The detail layer is the base layer obtained by subtracting the source image. The equation is as follows.

\(I_{k}^{d}=I_{k}-I_{k}^{b}\)       (7)

 The method framework of this paper is shown in Fig. 3. Firstly, the input images obtain the base layers and detail layers by (6) and (7). Then the fused base layers are obtained by the Laplacian pyramid transform and the detail layers are fused by a novel generative adversarial network. Finally, the two parts are reconstructed to obtain the fusion image.

E1KOBZ_2021_v15n5_1761_f0003.png 이미지

Fig. 3. Framework of the method

3.2 Fusion of Base Layer

Conventionally, most multi-scale fusion methods adopt average strategy for processing lowfrequency information in images. However, the average strategy have some defects, which cannot well to keep the low-frequency information about the source images [22]. The image pyramid fusion method aims to decompose the input image into different frequencies and design corresponding fusion rules for each layer to obtain the fused pyramid layer. This method can effectively retain the intensity information of the base part. So in our paper, we utilize the Laplacian pyramid-based method to fuse these base layer. The procedure is shown in Fig. 4.

E1KOBZ_2021_v15n5_1761_f0004.png 이미지

Fig. 4. The procedure of base layer fusion

 \(I_{1}^{b}\) and \(I_{2}^{b}\) are decomposed into four levels [23] through the Laplacian pyramid, respectively. \(L\left\{I_{1}^{b}\right\}^{l}\) and \(L\left\{I_{2}^{b}\right\}^{l}\) represent the pyramid decomposition of layer L of \(I_{1}^{b}\) and \(I_{2}^{b}\). First of all, for the fusion of the top-level image \(L\left\{I_{1}^{b}\right\}^{4}\) and \(L\left\{I_{2}^{b}\right\}^{4}\), it is necessary to calculate the regional average gradient m*n(m ≥ 3, n ≥ 3 ,m and n are both odd)with each pixel as the center in the image. The calculation formula is as follows.

\(G=\frac{1}{(m-1)(n-1)} \sum_{i=1}^{m-1} \sum_{i=1}^{n-1} \sqrt{\left(\Delta I_{x}^{2}+\Delta I_{y}^{2}\right) / 2}\)       (8)

 where ∆Ix and ∆Iy represent the first-order difference of pixel point f (x,y) on the x-axis and y-axis, and the formula is defined as follows.

\(\Delta I_{x}=f(x, y)-f(x-1, y)\)       (9)

\(\Delta I_{y}=f(x, y)-f(x, y-1)\)       (10)

 The average regional gradient of each pixel in the top-level image \(L\left\{I_{1}^{b}\right\}^{4}\) and \(L\left\{I_{2}^{b}\right\}^{4}\) is represented by G1 and G2 respectively. Since the average gradient represents the texture feature changes and the richness of detail information of the image, the top-level image fusion can be expressed as follows.

\(L\left\{F_{b}\right\}^{4}(i, j)= \begin{cases}L\left\{I_{1}^{b}\right\}^{4}(i, j) & G_{1}(i, j) \geq G_{2}(i, j) \\ L\left\{I_{2}^{b}\right\}^{4}(i, j) & G_{1}(i, j)<G_{2}(i, j)\end{cases}\)       (11)

 Next is the fusion of other layers of images. When 0<l<4, the regional energy of these layers needs to be calculated, and the calculation formula is as follows.

\(A R E(i, j)=\sum_{-p}^{p} \sum_{-q}^{q} \varpi(p, q)\left|L\left\{I_{1}^{b}\right\}^{l}(i+p, j+q)\right|\)       (12)

\(\operatorname{BRE}(i, j)=\sum_{-p}^{p} \sum_{-q}^{q} \varpi(p, q)\left|L\left\{I_{2}^{b}\right\}^{l}(i+p, j+q)\right|\)       (13)

 where ARE and BRE represent the regional energy of the two source images in the corresponding pyramid layer respectively. p and q are set as 1. ϖ is a 3 × 3 matrix, represented as follows.

\(\varpi=\frac{1}{16}\left[\begin{array}{lll} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \end{array}\right]\)       (14)

When 0<l<4, the fusion result of the Laplacian image pyramid of the L layer is shown as follows.

\(L\left\{F_{b}\right\}^{l}(i, j)= \begin{cases}L\left\{I_{1}^{b}\right\}^{l}(i, j) & \operatorname{ARE}(i, j) \geq B R E(i, j) \\ L\left\{I_{2}^{b}\right\}^{l}(i, j) & \operatorname{ARE}(i, j)<B R E(i, j)\end{cases}\)       (15)

 The above formula can be used to obtain the fusion image of each layer of the Laplacian pyramid, namely, \(L\left\{F_{b}\right\}^{1}, L\left\{F_{b}\right\}^{2}, L\left\{F_{b}\right\}^{3}\) and \(L\left\{F_{b}\right\}^{4}\). Then, the fused Laplace pyramid layer is reconstructed to obtain the corresponding Gaussian pyramid, and the final fused imageG1 is obtained. The reconstruction formula is shown below.

\(\left\{\begin{array}{c} G_{4}=L\left\{F_{b}\right\}^{4} \\ G_{l}=L\left\{F_{b}\right\}^{l}+\operatorname{EXPAND}\left(G_{l+1}\right) \end{array}\right.\)       (16)

3.3 Fusion of Detail Layer

In terms of detail layer \( I_{1}^{d}\) and \(I_{2}^{d}\), a fusion strategy based on generative adversarial network is presented in this paper. Because the detail layer contains the change of the detail information of the image, the generative adversarial network can better retain the detail information of the source image. This framework is shown in Fig. 5.

E1KOBZ_2021_v15n5_1761_f0005.png 이미지

Fig. 5. Framework of detail layer fusion

 In Fig. 5, we employ the generator to generate a fused image. Then employ the discriminator to distinguish the generated image and two detail layers’ images to get a fused detail image.

3.3.1 Network Structure of the GAN

The network structure of GAN in this paper includes two parts: generator model G and discriminator model D. To ensure that the image information is not subject to large loss in network propagation, both the generator and the discriminator use a convolutional network.

 The generator network structure is shown in Fig. 6. The purpose is to extract more detailed information in the source images and generate a fused image with rich details. As shown, a five-layer convolutional network constitutes G, where the first two layers use 5 × 5 filters, the middle two layers use 3 × 3 filters, the last layer uses a 1 × 1 filter and stride of each layer is set to 1. Besides, to solve problems such as vanishing gradient, we use BatchNorm [24] to normalize the data at each layer, then employ LeakyRelu [25] activation function to enhance the non-linear degree of the network.

E1KOBZ_2021_v15n5_1761_f0006.png 이미지

Fig. 6. Generator network structure

 The LeakyRelu activation function is a variation of the Relu activation function. Relu is a rectified line unit in the neural network. When its input is x and x is greater than 0, its gradient is not 0. In this case, Relu can be used to update the weight. When x is less than 0, then its gradient is 0, and Relu cannot be used to update weights. And the advantage of LeakyRelu is that when x is less than zero, LeakyRelu preserves a very small constant. In this way, when the input is less than 0, the information cannot be completely lost, and the corresponding information is retained. In other words, the LeakyRelu still has a small gradient for updating the weight in the part where x is less than zero. Schematic diagrams of the two functions are shown in Fig. 7.

Fig.7. Schematic diagram of two activation functions

The discriminator network structure is shown in Fig. 8. The purpose is to correctly distinguish between the generated image and the source image. The discriminator D is an eight-layer convolutional network in Fig. 8. Each layer uses 3 3 × filters in the discriminator. The second to fourth layers of the stride are set to 2, and the remaining layers are set to 1.

E1KOBZ_2021_v15n5_1761_f0008.png 이미지

Fig. 8. Discriminator network structure

To avoid introducing noise, the pooling layer is replaced by a convolutional layer with stride is set to 2, so that the classification effect of the discriminator is better. Similar to the generator, the first layer to the seventh layer employ BatchNorm to normalize the data, then use LeakyRelu as the activation function, and the last layer is the linear layer for classification.

3.3.2 Loss Function

In this paper, the loss function of GAN is composed of two parts, namely LG and LD . The purpose is to minimize the loss function so as to obtain the best training model. Next, we will introduce the total loss function LGAN , generator loss function LG and discriminator loss function LD , respectively. First, the total loss function of GAN is shown as follows.

\(L_{\mathrm{GAN}}=\left\{\min \left(L_{\mathrm{G}}\right), \min \left(L_{\mathrm{D}}\right)\right\}\)       (17)

 where LGAN denotes the total loss, LG denotes the generator loss and LD denotes the discriminator loss. The generator loss function LG is defined as follows.

\(L_{\mathrm{G}}=V+\alpha L_{\text {content }}\)       (18)

where parameter α is used to balance V and Lcontent , V represents the adversarial loss between generator and discriminator, as shown in (19). The second term Lcontent denotes the content loss of image details during the generation process, as shown in (20).

\(V=\frac{1}{N} \sum_{n=1}^{N}\left(D\left(F_{d}^{n}\right)-a\right)^{2}\)       (19)

\(L_{\text {content }}=\frac{1}{H W}\left[\beta\left(\left\|\nabla F_{d}-\nabla I_{1}^{d}\right\|_{F}^{2}+\gamma\left\|\nabla F_{d}-\nabla I_{2}^{d}\right\|_{F}^{2}\right)\right]\)       (20)

 where N denotes the number of fused image, Fd denotes the fused image, and \(D\left(F_{d}^{n}\right)\) represents the result of classification. \(I_{1}^{d}\) and \(I_{2}^{d}\) represent the detail layer of the input two source images, respectively. a is the value of the generator generated fake data. In (20), H denotes the height of  \(I_{1}^{d}\) and \(I_{2}^{d}\), W denotes the width of \(I_{1}^{d}\) and \(I_{2}^{d}\), ∇ denotes the gradient operator [26]. \(\|.\|\)F represents the matrix Frobenius norm. Lcontent aims to retain more gradient detail information in both \(I_{1}^{d}\) and \(I_{2}^{d}\). The parameters β and γ are designed to control the balance between them. To obtain better effect of the fused image, discriminator is introduced to distinguish the fused image. Specifically, the discriminator loss function LD is shown as follows.

\(L_{D}=\frac{1}{N} \sum_{n=1}^{N}\left(D\left(I_{1}^{d}\right)-b\right)^{2}+\frac{1}{N} \sum_{n=1}^{N}\left(D\left(I_{2}^{d}\right)-c\right)^{2}+\frac{1}{N} \sum_{n=1}^{N}\left(D\left(F_{d}^{n}\right)-d\right)^{2}\)  (21)

where b and c denote the labels of the detail layer \(I_{1}^{d}\) and \(I_{2}^{d}\), with values ranging from 0.4 to 0.7 and 0.8 to 1.1, respectively. d denote the labels of the fused detail layer, whose value ranges from 0 to 0.3. \(D\left(I_{1}^{d}\right), D\left(I_{2}^{d}\right)\) and \(D\left(F_{d}^{n}\right)\) represent the discriminator distinguish results of \(I_{1}^{d}\),\(I_{2}^{d}\) and Fd, respectively.

3.3.3 Training Details

41 image pairs were selected from the TNO database as training for this experiment. Although 41 pairs of images were not adequate for GAN training, we crop these source images to increase the diversity of training data, and we design a sliding window with size of 120 × 120 and stride of 14 to crop the detail layer image. Thus, 58,658 pairs of infrared and visible patches were obtained. Each patch was padded to the size of 132 × 132 and then fed into the generator. The size of the fusion detail image generated is120 × 120 . The generated fusion image is used as the input of discriminator, and Adam [27] is utilized to optimize the algorithm until the maximum number of training is reached.

3.3.4 Reconstruction

After obtaining the fused detail layer Fd, the fused image F is obtained by using Fb and Fd. The formula is as below.

\(F=F_{b}+F_{d}\)       (22)

4. Experimental Results and Analysis

4.1 Experimental Settings

To verify the performance of the proposed Laplacian-GAN, compare it with three fusion methods including FusionGAN [18], deep learning framework (DLF) [17], convolutional neural network (CNN) [16]. In order to more intuitively express the effectiveness of the Laplacian-GAN method, we used six fusion evaluation metrics. The server configuration for this experiment is as follows: Ubuntu server 18.04, CPU is Intel Xeon(R) Gold 5120 and GPU is Nvidia Tesla P100.

4.2 Fusion Metrics

Since the qualitative evaluation is hard to accurately evaluate the fused image, we also need metrics to quantitatively evaluate fused images. In recent years, various evaluation metrics have been proposed, but different metrics represent the performance of fused images in different aspects. Multiple metrics need to be selected to evaluate these fused images. So six metrics are used to quantify the performance of fusion images, i.e. mutual information (MI) [28], average gradient (AG) [29], edge intensity (EI) [30], multi-scale structural similarity (MS_SSIM) [31], quality of edge ( Qabf ) [32], and the sum of the correlations of differences (SCD) [33]. These metrics are defined as follows.

(1) MI measuresthe preserved input images information about fused image. The MI formula is as follows.

\(M_{A, B}=\sum_{a, b} p_{A, B}(a, b) \log \frac{p_{A, B}(a, b)}{p_{A}(a) p_{B}(b)}\)       (23)

\(M I=M_{A, H}+M_{B, H}\)       (24)

 where pA(a) denotes the normalized histogram of image A, pB(b) denotes the normalized histogram of image B . pA,B represents the jointly normalized histogram of the input two images. M A,H and M B,H represent the information preserved by the fused image from two input images, respectively.

(2) AG is an evaluation metric that represents the fused image gradient information. Its value denotes the details and texture of the fused image. The calculation formula is as follows:

\(A G=\frac{1}{M N} \sum_{i=1}^{M} \sum_{j=1}^{N} \sqrt{\frac{[H(i, j)-H(i+1, j)]^{2}+[H(i, j)-H(i, j+1)]^{2}}{2}}\)       (25)

 where M and N represent the height and width of the fused image. H(i,j) denotes the value of the fused image at (i,j).

\(E I=\sqrt{\nabla x f(i, j)^{2}+\nabla y f(i, j)^{2}}\)       (26)

 where∇x and∇y represent the first order difference of the fused image in the direction x and y at (i,j) , respectively. f (i,j) represents the value of the fused image at (i,j) . These formulas are defined as.

\(\nabla x=f(i, j)-f(i-1, j)\)       (27)

\(\nabla y=f(i, j)-f(i, j-1)\)       (28)

(4) MS_SSIM represents the structural similarity of two images at multiple scales.

\(M S_{-} \operatorname{SSIM}(Z, K)=\left[l_{M}(Z, K)\right]^{\alpha_{M}} \prod_{i=1}^{M}\left[s_{i}(Z, K)\right]^{\beta_{i}}\left[z_{i}(Z, K)\right]^{\gamma_{s}}\)       (29)

 where M is the highest scale selected for the reference image, l (Z,K),s (Z,K) and z (Z,K) denote brightness similarity, contrast similarity and structure similarity between image Z and K, respectively. α, β andγ are the weight of the three, which are generally set as 1.

\(l(Z, K)=\frac{2 \mu_{Z} \mu_{K}+C_{1}}{\mu_{Z}^{2}+\mu_{K}^{2}+C_{1}}\)       (30)

\(s(Z, K)=\frac{2 \sigma_{Z} \sigma_{K}+C_{2}}{\sigma_{Z}^{2}+\sigma_{K}^{2}+C_{2}}\)       (31)

\(z(Z, K)=\frac{2 \sigma_{Z K}+C_{3}}{\sigma_{Z} \sigma_{K}+C_{3}}\)       (32)

 where µZ and µK , σ Z and σ K are the mean and standard deviation of image Z and K, respectively. σZK is the gray covariance between image Z and K. C1, C2 and C3 are all constants.

(5) Qabf is to calculate the edge information transferred from the source images to the fused image to evaluate the quality of the fused image. The calculation is as follows:

\(Q_{a b f}=\frac{\sum_{i=1}^{M} \sum_{j=1}^{N} Q_{A F}(i, j) w_{A}(i, j)+Q_{B F}(i, j) w_{B}(i, j)}{\sum_{i=1}^{M} \sum_{j=1}^{N} w_{A}(i, j)+w_{B}(i, j)}\)       (33)

 where wA and wB are weighting coefficients for calculating the edge intensity value of the source image using Sobel operator. QAF and QBF represent the edge information preserved by the fused image F from image A and image B, respectively.

\(Q_{A F}(i, j)=Q_{g}(i, j) Q_{\alpha}(i, j)\)       (34)

 where Qg and Qα are edge strength and direction similarity calculated by Sobel operator.

(6) SCD is the sum of the differential relations between the source images and the fused image, and its computational formula is defined as follows.

\(L_{1}=F-I_{2}\)       (35)

\(L_{2}=F-I_{1}\)       (36)

\(S C D=r\left(L_{1}, I_{1}\right)+r\left(L_{2}, I_{2}\right)\)       (37)

 where I1 , I2 and F denote the two source images and the fused image, respectively. The formula of r(,) function is defined as follows.

\(r\left(L_{k}, I_{k}\right)=\frac{\sum_{i} \sum_{j}\left(L_{k}(i, j)-\bar{L}_{k}\right)\left(I_{k}(i, j)-\bar{I}_{k}\right)}{\sqrt{\left(\sum \sum_{j}\left(L_{k}(i, j)-\bar{L}_{k}\right)^{2}\right)\left(\sum \sum_{j}\left(I_{k}(i, j)-\bar{I}_{k}\right)^{2}\right)}}\)       (38)

 where \(\bar{L}_{k}\) and \(\bar{I}_{k}\) represent the average pixel values of Lk and Ik , k ∈{1,2} .

4.3 Qualitative Evaluation

The eight different fused images obtained by the three existing neural network methods and Laplacian-GAN are shown in Fig. 9.

E1KOBZ_2021_v15n5_1761_f0009.png 이미지

Fig. 9. Infrared and visible image fusion based on four typical methods. From top to bottom: infrared images, visible images, results of CNN, DLF, FusionGAN and Laplacian-GAN method.

 As shown in Fig. 9, the fused image obtained by CNN have more visual artifacts and artificial noise. The method causes serious loss of image background information, so this method is not suitable for image fusion of two different types. In contrast, the fused image obtained by DLF and FusionGAN look more natural, but there are some infrared details that are not extract. Thus, compared with the above methods, the fused image of our proposed method obviously keep more infrared feature and our results increase the brightness more suitable for human perception system.

4.4 Quantitative Evaluation

Since qualitative evaluation is largely dependent on visual subjective judgment, the evaluation is not accurate. Therefore, we also used some quantitative metrics to verify the comprehensive performance of Laplacian-GAN method. These metrics include MI, AG, EI, MS_SSIM, Qabf and SCD. The average values of these metrics acquired by four deep learning methods for the eight fused images are shown in Fig. 10.

E1KOBZ_2021_v15n5_1761_f0010.png 이미지

Fig. 10. Quantitative comparisons of four methods on six metrics

 Among the six metrics, MI, AG, EI and Qabf are used to measure the information richness in the fused image. However, there is a certain difference between the three metrics. MI is used to measure the dependence between two domain variables. In other words, the larger its value is, the more information of fused image retains about the source image. AG, EI and Qabf represent the texture transformation, edge information intensity and edge information transfer in the image respectively. As shown in Fig. 10, the Laplacian-GAN has achieved the superior performance in MI, AG, EI and Qabf . Compared with the CNN and DLF fusion methods, they improved by 0.62%, 7.10%, 14.46% and 34.33%, respectively. Therefore, it can indicate that the fused images obtained by our method contain richer information. By capturing the blur across multiple scales, MS_SSIM could influence the visual perception of the human visual system by considering the influence of sampling rate, observation distance and other factors on the quality evaluation of the fused image. In terms of MS_SSIM metric, our method improves by 12.18% on the basis of the higher DLF method. SCD is the calculation of the sum of all relevant information between the fused image and the source image. The result shows that our method is superior to other methods in SCD. In summary, our method contains more detail information and the extracted infrared features are obvious, which is easier to observe and apply to the field of target recognition.

5. Conclusion

In this paper, we introduce a novel fusion method based on Laplacian pyramid and generative adversarial network. The proposed Laplacian-GAN can avoid designing complicated fusion rule manually. Compared with other three methods, the performance of fused images obtained by Laplacian-GAN method is greatly improved. The quantitative comparisons of three fusion methods demonstrate that Laplacian-GAN method preserve abundant detail information and also produce better visual effects. In future, we believe that our Laplacian-GAN can not only apply infrared and visible image fusion, but also solve other fusion problem.

References

  1. Goshtasby A A, Nikolov S, "Image fusion: Advances in the state of the art," Information Fusion, vol.8, no.2, pp.114-118, 2007. https://doi.org/10.1016/j.inffus.2006.04.001
  2. Wu M, Li X, Liu C, et al., "Robust global motion estimation for video security based on improved k-means clustering," Journal of Ambient Intelligence and Humanized Computing, vol.10, no.2, pp.439-448, 2019. https://doi.org/10.1007/s12652-017-0660-8
  3. Yeh C H, Lin C H, Lin M H, et al., "Deep learning-based compressed image artifacts reduction based on multi-scale image fusion," Information Fusion, vol.67, pp.195-207, 2021. https://doi.org/10.1016/j.inffus.2020.10.016
  4. Dian R, Li S, Sun B, et al., "Recent advances and new guidelines on hyperspectral and multispectral image fusion," Information Fusion, vol.69, pp.40-51, 2020. https://doi.org/10.1016/j.inffus.2020.11.001
  5. Li H, Zhang L, Jiang M, et al., "Multi-focus image fusion algorithm based on supervised learning for fully convolutional neural network," Pattern Recognition Letters, vol.141, pp.45-53, 2021. https://doi.org/10.1016/j.patrec.2020.11.014
  6. Duan C, Wang Z, Xing C, et al., "Infrared and visible image fusion using multi-scale edgepreserving decomposition and multiple saliency features," Optik, vol.228, no.1, pp.165775, 2021. https://doi.org/10.1016/j.ijleo.2020.165775
  7. Rena K, Zhang D, Wan M, et al., "An Infrared and Visible Image Fusion Method Based on Improved DenseNet and mRMR-ZCA," Infrared Physics & Technology, vol.115, no.4, pp.103707, 2021. https://doi.org/10.1016/j.infrared.2021.103707
  8. Liu Y , Chen X , Ward R , et al., "Image Fusion With Convolutional Sparse Representation," IEEE Signal Processing Papers, vol.23, no.12, pp.1882-1886, 2016. https://doi.org/10.1109/LSP.2016.2618776
  9. Ma J, Ma Y, Li C, "Infrared and visible image fusion methods and applications: A survey," Information Fusion, vol.45, pp.153-178, 2019. https://doi.org/10.1016/j.inffus.2018.02.004
  10. Burt P J, Adelson E H, "The Laplacian Pyramid as a Compact Image Code," Readings in Computer Vision, vol.31, no.4, pp.671-679, 1987.
  11. Toet A, "A morphological pyramidal image decomposition," Pattern Recognition Letters, vol.9, no.4, pp.255-261, 1989. https://doi.org/10.1016/0167-8655(89)90004-4
  12. Li H, Manjunath B S , Mitra S K , "Multisensor Image Fusion Using the Wavelet Transform," Graphical Models and Image Processing, vol.57, no.3, pp.235-245, 1995. https://doi.org/10.1006/gmip.1995.1022
  13. Lewis J J , Robert J. O'Callaghan, Nikolov S G, et al., "Pixel- and region-based image fusion with complex wavelets," Information Fusion, vol.8, no.2, pp.119-130, 2007. https://doi.org/10.1016/j.inffus.2005.09.006
  14. Zhang Q, Guo B L, "Multifocus image fusion using the nonsubsampled contourlet transform," Signal Processing, vol.89, no.7, pp.1334-1346, 2009. https://doi.org/10.1016/j.sigpro.2009.01.012
  15. Piella G, "A general framework for multiresolution image fusion: from pixels to regions," Information Fusion, vol.4, no.4, pp.259-280, 2003. https://doi.org/10.1016/S1566-2535(03)00046-0
  16. Liu Y, Chen X , Peng H , et al., "Multi-focus image fusion with a deep convolutional neural network," Information Fusion, vol.36, pp.191-207, 2017. https://doi.org/10.1016/j.inffus.2016.12.001
  17. H. Li, X.-J. Wu, and J. Kittler, "Infrared and Visible Image Fusion using a Deep Learning Framework," in Proc. of International Conference on Pattern Recognition, 2018.
  18. Ma J, Yu W, Liang P, et al., "FusionGAN: A generative adversarial network for infrared and visible image fusion," Information Fusion, vol.48, pp.11-26, 2019. https://doi.org/10.1016/j.inffus.2018.09.004
  19. Li S, Kang X, Hu J, "Image fusion with guided filtering," IEEE Transactions on Image Processing, vol.22, no.7, pp.2864-2875, 2013. https://doi.org/10.1109/TIP.2013.2244222
  20. Baker, K. D., and G. D. Sullivan, "Multiple bandpass filters in image processing," IEE Proceedings E-Computers and Digital Techniques, vol.127, no.5, pp.173-184, 1980. https://doi.org/10.1049/ip-e.1980.0040
  21. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.& Bengio, Y, "Generative adversarial networks," Communications of the ACM, vol.63, no.11, 2020.
  22. Li S, Yang B, Hu J, "Performance comparison of different multi-resolution transforms for image fusion," Information Fusion, vol.12, no.2, pp.74-84, 2011. https://doi.org/10.1016/j.inffus.2010.03.002
  23. Dogra A, Goyal B, Agrawal S, "From multi-scale decomposition to non-multi-scale decomposition methods: a comprehensive survey of image fusion techniques and its applications," IEEE Access, vol.5, pp.16040-16067, 2017. https://doi.org/10.1109/ACCESS.2017.2735865
  24. Santurkar S, Tsipras D, Ilyas A, et al., "How Does Batch Normalization Help Optimization?," 2019.
  25. Wang S H, Phillips P, Sui Y,et al., "Classificationofalzheimer's disease based on eight-layer convolutionalneural network with leaky rectified linear unit and max Pooling," Journal of Medical Systems, vol.42, no.5, pp.85, 2018. https://doi.org/10.1007/s10916-018-0932-7
  26. Ma J, Chen C, Li C, et al., "Infrared and visible image fusion via gradient transfer and total variation minimization," Information Fusion, vol.31, pp.100-109, 2016. https://doi.org/10.1016/j.inffus.2016.02.001
  27. Kingma D, Ba J, "Adam: A method for stochastic optimization," in Proc. of conference paper at ICLR 2015, 2015.
  28. Qu G, Zhang D, Yan P, "Information measure for performance of image fusion," Electronics Papers, vol.38, no.7, pp.313, 2002.
  29. Yu S, Zhongdong W, Xiaopeng W, et al., "Tetrolet transform images fusion algorithm based on fuzzy operator," Journal of Frontiers of Computer Science and Technology, vol.9, no.9, pp.1132-1138, 2015.
  30. Petrovic V, Cootes T, "Information representation for image fusion evaluation," in Proc. of 2006 9th International Conference on Information Fusion, pp.1-7, 2006.
  31. Wang Z, Simoncelli E P, Bovik A C, "Multiscale structural similarity for image quality assessment," in Proc. of The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, vol.2, pp.1398-1402, 2003.
  32. Xydeas C S, Petrovic V, "Objective image fusion performance measure," Electronics letters, vol.36, no.4, pp.308-309, 2000. https://doi.org/10.1049/el:20000267
  33. Aslantas V, Bendes E, "A new image quality metric for image fusion: the sum of the correlations of differences," Aeu-international Journal of electronics and communications, vol.69, no.12, pp.1890-1896, 2015. https://doi.org/10.1016/j.aeue.2015.09.004