UPM Institutional Repository

Performance evaluation of distributed indexing using Solr and Terrier information retrievals


Citation

Aldailamy, Ali Y. and Abdul Hamid, Nor Asila Wati and Al-Mekhlafi, Mohammed Abdulkarem (2018) Performance evaluation of distributed indexing using Solr and Terrier information retrievals. In: 2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP'18), 26-28 Mar. 2018, Le Méridien Kota Kinabalu, Sabah, Malaysia. (pp. 142-149).

Abstract

The continuous growing datasets and the emergence terabyte-scale data pose great challenges to Information Retrieval (IR) systems. Tremendously, a large amount of data from various aspects is collected every day making the amount of raw data extremely large. As a result, indexing a large volume of data is a time-consuming problem. Therefore, efficient indexing of large collections is getting more challenging. MapReduce is a programming model for the computing of large document collections by distributing data and processing tasks over multiple computing machines. In this study, Solr and Terrier distributed indexing will be evaluated as they are the most popular information retrieval frameworks among researchers and enterprises. To be more specific, this paper will compare and analyze the distributed indexing performance over MapReduce for the indexing strategies of Solr and Terrier using 1GB, 3GB, 6GB, and 9GB datasets. In the experiments, the indexing average time, speedup, and throughput are observed as the number of machines involved in the experiments increases for both indexing frameworks. The experimental results show that Terrier is more efficient with large datasets in the presence of processing resource scalability. On the other hand, Solr performed better with small datasets using limited computing resources.


Download File

[img]
Preview
Text (Abstract)
Performance evaluation of distributed indexing using Solr and Terrier information retrievals.pdf

Download (36kB) | Preview

Additional Metadata

Item Type: Conference or Workshop Item (Paper)
Divisions: Faculty of Computer Science and Information Technology
Faculty of Engineering
DOI Number: https://doi.org/10.1109/INFRKM.2018.8464814
Publisher: IEEE
Keywords: Information retrieval; Indexing; MapReduce; Hadoop; Solr; Terrier
Depositing User: Nabilah Mustapa
Date Deposited: 04 Jul 2019 04:44
Last Modified: 25 May 2020 01:46
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1109/INFRKM.2018.8464814
URI: http://psasir.upm.edu.my/id/eprint/69482
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item