UPM Institutional Repository

Spatial multi-scale feature transformer network for fine-grained few-shot image classification


Citation

Guo, Liyong and Marlisah, Erzam (2025) Spatial multi-scale feature transformer network for fine-grained few-shot image classification. Review of Computer Engineering Research, 12 (3). pp. 195-205. ISSN 2410-9142; eISSN: 2412-4281

Abstract

This year has seen significant advancements in deep learning, and fine-grained few-shot image classification (FGFSIC) has also made substantial progress. FGFSIC faces two key challenges: high intra-class variance and low inter-class variance, which hinder accurate classification with limited data. Despite considerable efforts to extract more discriminative features using powerful networks, few studies have specifically addressed these challenges. This paper proposes a Spatial Multi-Scale Feature Transformer Network to overcome these issues. The approach first modifies the backbone network to extract multi-scale features, with classification results derived from comparing these multi-scale representations. Additionally, a Spatial Feature Transformer network is introduced to adjust the spatial positions of multi-scale features, which helps to reduce intra-class variance. Experiments were conducted on three widely used datasets—CUB-200-2011, Stanford Cars, and Stanford Dogs. The results demonstrate that both components of the proposed model significantly enhance FGFSIC performance, with final accuracies surpassing those of most existing methods. The findings emphasize the effectiveness of the proposed approach in tackling the critical issues of high intra-class variance and low inter-class variance, making it a promising solution for fine-grained image classification tasks, particularly in situations where only limited data is available. This work paves the way for improved performance in real-world applications requiring precise, few-shot learning in fine-grained domains.


Download File

[img] Text
124885.pdf - Published Version
Restricted to Repository staff only

Download (742kB)

Additional Metadata

Item Type: Article
Subject: Hardware and Architecture
Subject: Computer Science Applications
Subject: Electrical and Electronic Engineering
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.18488/76.v12i3.4439
Publisher: Conscientia Beam
Keywords: Classification; Few-shot learning; Fine-grained few-shot image; Fine-grained image classification; Multi-scale features; Spatial transformer network
Sustainable Development Goals (SDGs): SDG 9: Industry, Innovation and Infrastructure, SDG 4: Quality Education, SDG 17: Partnerships for the Goals
Depositing User: MS. HADIZAH NORDIN
Date Deposited: 30 Apr 2026 15:48
Last Modified: 30 Apr 2026 15:48
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.18488/76.v12i3.4439
URI: http://psasir.upm.edu.my/id/eprint/124885
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item