UPM Institutional Repository

Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce


Citation

Busu, Norzaharawani (2017) Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce. Masters thesis, Universiti Putra Malaysia.

Abstract

Nowadays, growing expansion of data content on the web delivers a huge amount of collective resources. Twitter, one of the biggest social media site collects tweets in millions every day in the range of Petabyte per year. Societies share their experiences, thoughts or simply talk just about whatever concerns them online. Unstructured big data in social media plays vital roles in sentiment analysis or also known as opinion mining. Continuous structured and unstructured data are being generated in a large scale every day. These data are meaningless if they are not being captured and analyzed accordingly. Traditional RDBMS technology becomes less reliable when dealing with huge amount of structured data and the processing speed of data becomes sluggish if the infrastructure is not being upgraded to match the big amount of data. Furthermore, RDBMS is not capable to deal with unstructured data. Due to petabytes of records are generated every year on the net, capturing and analyzing big data can be challenging and cloud computing technologies are able to provide an on-demand infrastructures and services based on user requirements. Therefore, this thesis aims to use cloud based infrastructure which is Amazon Web Service to capture unstructured of big data, and afterward analyzing, visualizing and extracting useful information from large, diverse, distributed and mixed of data gathered from public data sets and Twitter’s Application Programming Interface (API). The results and explanation on the experiments mentioned in the chapter four; show the test bed result on collecting twitter data, test bed result on processing twitter input data and test bed result on output data. The analysis emphasizes on the elapsed time when collecting twitter data and also the performance of Amazon Elastic MapReduce (EMR). The infrastructures provided by Amazon Web Service are proficient enough to captured and manipulated large volume of unstructured big data on twitter. Afterward, this study have tested the capability of Amazon Elastic MapReduce (EMR) to process the input twitter data that had collected earlier, and transform them into a meaningful output that can be used for any decision making.


Download File

[img]
Preview
Text
FSKTM 2017 24 IR.pdf

Download (1MB) | Preview

Additional Metadata

Item Type: Thesis (Masters)
Subject: Cloud computing - Data processing
Subject: Big data
Call Number: FSKTM 2017 24
Chairman Supervisor: Mrs. Sazlinah binti Hasan
Divisions: Faculty of Computer Science and Information Technology
Depositing User: Haridan Mohd Jais
Date Deposited: 28 Mar 2019 07:07
Last Modified: 28 Mar 2019 07:07
URI: http://psasir.upm.edu.my/id/eprint/67852
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item