Search results

(1 - 1 of 1)

Title: SCALABLE INDEXING AND SEARCHING ON DISTRIBUTED FILE SYSTEMS
Creator: Ijagbone, Itua
Date: 2016, 2016-05
Description: Scientific applications and other High Performance applications generate large amounts of data. It’s said that unstructured data comprises...
Show moreScientific applications and other High Performance applications generate large amounts of data. It’s said that unstructured data comprises more than 90% of the world’s information [IDC2011], and it’s growing 60% annually [Grantz2008]. The large amounts of data generated from computation leads to data been dispersed over the file system. Problems begin to exist when we need to locate these files for later use. For small amount of files this might not be an issue but as the number of files begin to grow as well as the increase in size of these files, it becomes difficult locating these files on the file system using ordinary methods like GNU Grep [8], which is commonly used in High Performance Computing and Many-Task Computing environments. It is as a result of this problem that we have chosen this thesis to tackle the problem of finding files in a distributed system environment. Our work leverages the FusionFS [1] distributed file system and the Apache Lucene [10] centralized indexing engine as a fundamental building block. We designed and implemented a distributed search interface within the FusionFS file system that makes both indexing and searching the index across a distributed system simple. We have evaluated our system up to 64 nodes, compared it with Grep, Hadoop, and Cloudera, and have shown that FusionFS’s indexing capabilities have lower overheads and faster response times.
M.S. in Computer Science, May 2016
Show less

repository.iit

Search the repository

Search results

Enabled Filters

Refine Results

Type

Date

Subject

Creator

Rights