자료실

[논문/SCI] Enhancing semantically masked Transformer with local attentio…

페이지 정보

profile_image
작성자 관리자
댓글 0건 조회 1회 작성일 25-08-08 15:28

본문

* 성과기관 : 한국전자기술연구원(KETI)

* 학술지명 : Access(IEEE)


* Abstract *

Transformer-based semantic segmentation has been applied to various visual recognition applications and achieved outstanding performance in recent years. Since most of these approaches adopt a pretrained backbone and finetune it for semantic segmentation, they are not efficient in capturing semantic contextual information during the encoding stage, leading to sub-optimal segmentation performance. To address this problem, SeMask proposes a semantic attention operation to incorporate the semantic contextual information of an image during the encoding stage and improves the segmentation performance. However, the architecture of SeMask is entirely based on the attention mechanisms of Transformers and has some limitations to fully exploit the local details, which are important for more accurate segmentation. In this paper, we introduce a novel semantic layer into the encoder side of a Transformer-based segmentation model. The proposed semantic layer consists of depthwise convolutions with different kernel sizes to capture multi-scale local details. It is integrated at different stages of a hierarchical Transformer backbone to acquire multi-scale semantic contextual information on the encoder side to improve the overall segmentation performance, especially for more accurate segmentation of small objects. Our proposed method can be integrated with common segmentation models such as Semantic-FPN and Mask Transformers. Experimental results show that our proposed method can achieve state-of-the-art performance on the ADE20K dataset with 58.24% mIoU and the Cityscapes dataset with 84.97% mIoU


논문 전문은 아래 링크에서 확인 가능합니다.

* 구글드라이브 : https://drive.google.com/file/d/1PSn4i2hfvG_ST_jzSLuPzpGMe86zfIn_/view?usp=sharing

Total 12건 1 페이지

검색