Citation
Zand, Mohsen
(2015)
Semantic-based image retrieval for multi-word text queries.
PhD thesis, Universiti Putra Malaysia.
Abstract
Catalyzed by the development of digital technologies, the amounts of digital images being produced, archived and transmitted are reaching enormous proportions. It is hence imperative to develop techniques that are able to index,and retrieve relevant images through user‘s information need. Image retrieval based on semantic learning of the image content has become a promising strategy to deal with these aspects recently. With semantic-based image retrieval (SBIR), the real semantic meanings of images are discovered and used to retrieve relevant images to the user query. Thus, digital images are automatically labeled by a set of semantic keywords describing the image content. Similar to the text document retrieval, these keywords are then collectively used to index,organize and locate images of interest from a database. Nevertheless,understanding and discovering the semantics of a visual scene are high-level cognitive tasks and hard to automate, which provide challenging researchop portunities. Specifically, exploiting discriminatory features, handling the visual similarity between object classes and appearance diversity in each class,classification of low-level image visual features to appropriate semantic classes,comprehensively annotate images, and reliable indexing and ranking images through difficult queries are open issues to cope with. This study proposes newideas to overcome these challenges. First, a discriminatory image feature vector is generated using texture as a distinguishable visual feature. In the proposed method, the image texture which is extracted by the Gabor wavelet and the curvelet transforms in the spectral domain is encoded into polynomial coefficients. It not only provides rotation invariant features but also generates texture feature vectors with the maximum power of discrimination. Second, a context-aware and semantic-consistent image descriptor is presented to exploit the image visual attributes in a contextual space. The high-level visual space is constructed by a Dirichlet process regardless of the semantic classes, and then, the posteriors are used to build the contextual space.
Download File
Additional Metadata
Actions (login required)
|
View Item |