• Title/Summary/Keyword: Image pyramid network

Search Result 35, Processing Time 0.028 seconds

Dual Attention Based Image Pyramid Network for Object Detection

  • Dong, Xiang;Li, Feng;Bai, Huihui;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4439-4455
    • /
    • 2021
  • Compared with two-stage object detection algorithms, one-stage algorithms provide a better trade-off between real-time performance and accuracy. However, these methods treat the intermediate features equally, which lacks the flexibility to emphasize meaningful information for classification and location. Besides, they ignore the interaction of contextual information from different scales, which is important for medium and small objects detection. To tackle these problems, we propose an image pyramid network based on dual attention mechanism (DAIPNet), which builds an image pyramid to enrich the spatial information while emphasizing multi-scale informative features based on dual attention mechanisms for one-stage object detection. Our framework utilizes a pre-trained backbone as standard detection network, where the designed image pyramid network (IPN) is used as auxiliary network to provide complementary information. Here, the dual attention mechanism is composed of the adaptive feature fusion module (AFFM) and the progressive attention fusion module (PAFM). AFFM is designed to automatically pay attention to the feature maps with different importance from the backbone and auxiliary network, while PAFM is utilized to adaptively learn the channel attentive information in the context transfer process. Furthermore, in the IPN, we build an image pyramid to extract scale-wise features from downsampled images of different scales, where the features are further fused at different states to enrich scale-wise information and learn more comprehensive feature representations. Experimental results are shown on MS COCO dataset. Our proposed detector with a 300 × 300 input achieves superior performance of 32.6% mAP on the MS COCO test-dev compared with state-of-the-art methods.

Pyramid Feature Compression with Inter-Level Feature Restoration-Prediction Network (계층 간 특징 복원-예측 네트워크를 통한 피라미드 특징 압축)

  • Kim, Minsub;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.283-294
    • /
    • 2022
  • The feature map used in the network for deep learning generally has larger data than the image and a higher compression rate than the image compression rate is required to transmit the feature map. This paper proposes a method for transmitting a pyramid feature map with high compression rate, which is used in a network with an FPN structure that has robustness to object size in deep learning-based image processing. In order to efficiently compress the pyramid feature map, this paper proposes a structure that predicts a pyramid feature map of a level that is not transmitted with pyramid feature map of some levels that transmitted through the proposed prediction network to efficiently compress the pyramid feature map and restores compression damage through the proposed reconstruction network. Suggested mAP, the performance of object detection for the COCO data set 2017 Train images of the proposed method, showed a performance improvement of 31.25% in BD-rate compared to the result of compressing the feature map through VTM12.0 in the rate-precision graph, and compared to the method of performing compression through PCA and DeepCABAC, the BD-rate improved by 57.79%.

Infrared and visible image fusion based on Laplacian pyramid and generative adversarial network

  • Wang, Juan;Ke, Cong;Wu, Minghu;Liu, Min;Zeng, Chunyan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1761-1777
    • /
    • 2021
  • An image with infrared features and visible details is obtained by processing infrared and visible images. In this paper, a fusion method based on Laplacian pyramid and generative adversarial network is proposed to obtain high quality fusion images, termed as Laplacian-GAN. Firstly, the base and detail layers are obtained by decomposing the source images. Secondly, we utilize the Laplacian pyramid-based method to fuse these base layers to obtain more information of the base layer. Thirdly, the detail part is fused by a generative adversarial network. In addition, generative adversarial network avoids the manual design complicated fusion rules. Finally, the fused base layer and fused detail layer are reconstructed to obtain the fused image. Experimental results demonstrate that the proposed method can obtain state-of-the-art fusion performance in both visual quality and objective assessment. In terms of visual observation, the fusion image obtained by Laplacian-GAN algorithm in this paper is clearer in detail. At the same time, in the six metrics of MI, AG, EI, MS_SSIM, Qabf and SCD, the algorithm presented in this paper has improved by 0.62%, 7.10%, 14.53%, 12.18%, 34.33% and 12.23%, respectively, compared with the best of the other three algorithms.

LFFCNN: Multi-focus Image Synthesis in Light Field Camera (LFFCNN: 라이트 필드 카메라의 다중 초점 이미지 합성)

  • Hyeong-Sik Kim;Ga-Bin Nam;Young-Seop Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.149-154
    • /
    • 2023
  • This paper presents a novel approach to multi-focus image fusion using light field cameras. The proposed neural network, LFFCNN (Light Field Focus Convolutional Neural Network), is composed of three main modules: feature extraction, feature fusion, and feature reconstruction. Specifically, the feature extraction module incorporates SPP (Spatial Pyramid Pooling) to effectively handle images of various scales. Experimental results demonstrate that the proposed model not only effectively fuses a single All-in-Focus image from images with multi focus images but also offers more efficient and robust focus fusion compared to existing methods.

  • PDF

A Contrast Enhancement Method using the Contrast Measure in the Laplacian Pyramid for Digital Mammogram (디지털 맘모그램을 위한 라플라시안 피라미드에서 대비 척도를 이용한 대비 향상 방법)

  • Jeon, Geum-Sang;Lee, Won-Chang;Kim, Sang-Hee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.2
    • /
    • pp.24-29
    • /
    • 2014
  • Digital mammography is the most common technique for the early detection of breast cancer. To diagnose the breast cancer in early stages and treat efficiently, many image enhancement methods have been developed. This paper presents a multi-scale contrast enhancement method in the Laplacian pyramid for the digital mammogram. The proposed method decomposes the image into the contrast measures by the Gaussian and Laplacian pyramid, and the pyramid coefficients of decomposed multi-resolution image are defined as the frequency limited local contrast measures by the ratio of high frequency components and low frequency components. The decomposed pyramid coefficients are modified by the contrast measure for enhancing the contrast, and the final enhanced image is obtained by the composition process of the pyramid using the modified coefficients. The proposed method is compared with other existing methods, and demonstrated to have quantitatively good performance in the contrast measure algorithm.

DP-LinkNet: A convolutional network for historical document image binarization

  • Xiong, Wei;Jia, Xiuhong;Yang, Dichun;Ai, Meihui;Li, Lirong;Wang, Song
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1778-1797
    • /
    • 2021
  • Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://github.com/beargolden/DP-LinkNet.

Single Low-Light Ghost-Free Image Enhancement via Deep Retinex Model

  • Liu, Yan;Lv, Bingxue;Wang, Jingwen;Huang, Wei;Qiu, Tiantian;Chen, Yunzhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1814-1828
    • /
    • 2021
  • Low-light image enhancement is a key technique to overcome the quality degradation of photos taken under scotopic vision illumination conditions. The degradation includes low brightness, low contrast, and outstanding noise, which would seriously affect the vision of the human eye recognition ability and subsequent image processing. In this paper, we propose an approach based on deep learning and Retinex theory to enhance the low-light image, which includes image decomposition, illumination prediction, image reconstruction, and image optimization. The first three parts can reconstruct the enhanced image that suffers from low-resolution. To reduce the noise of the enhanced image and improve the image quality, a super-resolution algorithm based on the Laplacian pyramid network is introduced to optimize the image. The Laplacian pyramid network can improve the resolution of the enhanced image through multiple feature extraction and deconvolution operations. Furthermore, a combination loss function is explored in the network training stage to improve the efficiency of the algorithm. Extensive experiments and comprehensive evaluations demonstrate the strength of the proposed method, the result is closer to the real-world scene in lightness, color, and details. Besides, experiments also demonstrate that the proposed method with the single low-light image can achieve the same effect as multi-exposure image fusion algorithm and no ghost is introduced.

Instance segmentation with pyramid integrated context for aerial objects

  • Juan Wang;Liquan Guo;Minghu Wu;Guanhai Chen;Zishan Liu;Yonggang Ye;Zetao Zhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.3
    • /
    • pp.701-720
    • /
    • 2023
  • Aerial objects are more challenging to segment than normal objects, which are usually smaller and have less textural detail. In the process of segmentation, target objects are easily omitted and misdetected, which is problematic. To alleviate these issues, we propose local aggregation feature pyramid networks (LAFPNs) and pyramid integrated context modules (PICMs) for aerial object segmentation. First, using an LAFPN, while strengthening the deep features, the extent to which low-level features interfere with high-level features is reduced, and numerous dense and small aerial targets are prevented from being mistakenly detected as a whole. Second, the PICM uses global information to guide local features, which enhances the network's comprehensive understanding of an entire image and reduces the missed detection of small aerial objects due to insufficient texture information. We evaluate our network with the MS COCO dataset using three categories: airplanes, birds, and kites. Compared with Mask R-CNN, our network achieves performance improvements of 1.7%, 4.9%, and 7.7% in terms of the AP metrics for the three categories. Without pretraining or any postprocessing, the segmentation performance of our network for aerial objects is superior to that of several recent methods based on classic algorithms.

Two-Layer Video Coding Using Pyramid Structure for ATM Networks (ATM 망에서 피라미드 구조를 이용한 2계층 영상부호화)

  • 홍승훈;김인권;박래홍
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1995.06a
    • /
    • pp.97-100
    • /
    • 1995
  • In transmission of image sequences over ATM networks, the packet loss problem and channel sharing efficiency are important. As a possible solution two-layer video coding methods have been proposed. These methods transmit video information over the network with different levels of protection with respect to packets loss. In this paper, a two-layer coding method using pyramid structure is proposed and several realizations of two-layer video coding methods are presented and their performances are compared.

Research and Optimization of Face Detection Algorithm Based on MTCNN Model in Complex Environment (복잡한 환경에서 MTCNN 모델 기반 얼굴 검출 알고리즘 개선 연구)

  • Fu, Yumei;Kim, Minyoung;Jang, Jong-wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.1
    • /
    • pp.50-56
    • /
    • 2020
  • With the rapid development of deep neural network theory and application research, the effect of face detection has been improved. However, due to the complexity of deep neural network calculation and the high complexity of the detection environment, how to detect face quickly and accurately becomes the main problem. This paper is based on the relatively simple model of the MTCNN model, using FDDB (Face Detection Dataset and Benchmark Homepage), LFW (Field Label Face) and FaceScrub public datasets as training samples. At the same time of sorting out and introducing MTCNN(Multi-Task Cascaded Convolutional Neural Network) model, it explores how to improve training speed and Increase performance at the same time. In this paper, the dynamic image pyramid technology is used to replace the traditional image pyramid technology to segment samples, and OHEM (the online hard example mine) function in MTCNN model is deleted in training, so as to improve the training speed.