Skip to main navigation Skip to search Skip to main content

Efficient file accessing techniques on hadoop distributed file systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system in Hadoop, can improve the throughput of data access effectively. It is very suitable for the application of handling large amounts of datasets. However, Hadoop has the disadvantage that the memory usage rate in NameNode is so high when processing large amounts of small files that it has become the limit of the whole system. In this paper, we propose an approach to optimize the performance of HDFS with small files. The basic idea is to merge small files into a large one whose size is suitable for a block. Furthermore, indexes are built to meet the requirements for fast access to all files in HDFS. Preliminary experiment results show that our approach achieves better performance.

Original languageEnglish
Title of host publicationSocial Computing - 2nd International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2016, Proceedings
EditorsWanxiang Che, Hongzhi Wang, Shaoliang Peng, Weipeng Jing, Guanglu Sun, Xianhua Song, Zeguang Lu, Qilong Han, Junyu Lin, Hongtao Song
PublisherSpringer Verlag
Pages350-361
Number of pages12
ISBN (Print)9789811020520
DOIs
StatePublished - 2016
Externally publishedYes
Event2nd International Conference on Young Computer Scientists, Engineers and Educators, ICYCSEE 2016 - Harbin, China
Duration: 20 Aug 201622 Aug 2016

Publication series

NameCommunications in Computer and Information Science
Volume623
ISSN (Print)1865-0929

Conference

Conference2nd International Conference on Young Computer Scientists, Engineers and Educators, ICYCSEE 2016
Country/TerritoryChina
CityHarbin
Period20/08/1622/08/16

Keywords

  • HDFS
  • Hadoop
  • Index
  • Small files

Fingerprint

Dive into the research topics of 'Efficient file accessing techniques on hadoop distributed file systems'. Together they form a unique fingerprint.

Cite this