UPM Institutional Repository

Dual-modality learning and transformer-based approach for high-quality vector font generation


Liu, Yu and Khalid, Fatimah binti and Mustaffa, Mas Rina binti and Azman, Azreen bin (2024) Dual-modality learning and transformer-based approach for high-quality vector font generation. Expert Systems with Applications, 240. art. no. 122405. pp. 1-19. ISSN 0957-4174


Vector fonts, serving as the fundamental format of fonts, play a significant role in modern media. Described by a set of mathematical equations, vector fonts enable style modifications by adjusting drawing parameters, making them favored by font artists and designers. Due to the non-structural nature of vector font data, the task of vector font generation resembles sequence generation. Existing methods, limited in handling long sequences, are only capable of synthesizing simple character vector fonts. In this paper, we propose a dual-modal learning strategy to convert raster glyph images to vector glyph images in an end-to-end manner. Specifically, by employing vector quantization, we comprehensively utilize the dual-modal information of vector fonts. We quantize image and sequence modal features using a shared codebook, mapping them to the same discrete space and aligning them. Through aligned features, we reconstruct raster glyph images and vector glyph images. During the transformation process of vector glyph data, we redesign the Transformer module for vector glyph data, leveraging multiple stacked sliding window attention mechanisms to model local and global information. By integrating reversible residuals with attention and feedforward layers within the Transformer module, we enhance the model's capability and stability in handling long sequence data without sacrificing accuracy. Finally, we perform cross-modal model distillation to obtain the model's backbone network. We further refine the backbone network using a differentiable rasterizer to minimize error accumulation during sequence generation. Qualitative and quantitative results demonstrate that our method achieves high-quality synthesis results in complex Chinese character glyphs. The synthesized vector fonts can be easily converted into TrueType fonts for practical use, holding valuable applications for anyone interested in personalized vector font styles.

Download File

Full text not available from this repository.

Additional Metadata

Item Type: Article
DOI Number: https://doi.org/10.1016/j.eswa.2023.122405
Publisher: Elsevier
Keywords: Deep learning; Distillation; Learning systems; Metadata; Rasterization; Chinese font generation; Deep learning; High quality; Long sequences; Modal representation; Multi-modal; Multi-modal representation; Sequence generation; Sequence models; Vector font generation; Vectors
Depositing User: Mohamad Jefri Mohamed Fauzi
Date Deposited: 01 Feb 2024 04:34
Last Modified: 01 Feb 2024 04:34
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1016/j.eswa.2023.122405
URI: http://psasir.upm.edu.my/id/eprint/105601
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item