MapReduce

From Big Data Analytics Lab

Introduction

MapReduce is a programming model, which is introduced by Google in 2004, for large-scale data processing run on a shared-nothing cluster. Since MapReduce provides automatic parallel execution on a large cluster of commodity machines, users can easily write their programs without the burden of implementing the features for parallel and distributed processing. It is widely used because of Hadoop, an open-source implemenation of the MapReduce framework. We have been working on improving the performance of MapReduce and analyzing large data with MapReduce.

Members

Ph.D. Alumni

  • 김기성 (Kisung Kim)
  • 이태휘 (Taewhi Lee)

M.S. Alumni

  • 배혜찬 (Hye-Chan Bae)
  • 김혜원 (Hyewon Kim)
  • 김태경 (Tai Kyoung Kim)
  • 송효진 (Hyojin Song)

Publications

  • Taewhi Lee, Hye-Chan Bae, and Hyoung-Joo Kim. Join Processing with Threshold-based Filtering in MapReduce. Journal of Supercomputing, vol. 69, no. 2, pp. 793-813, 2014.8. [LINK]
  • Taewhi Lee, Dong-Hyuk Im, Hangkyu Kim, and Hyoung-Joo Kim. Application of Filters to Multiway Joins in MapReduce. Mathematical Problems in Engineering, vol. 2014, Article ID 249418, 11 pages, 2014.3. [LINK]
  • Taewhi Lee, Kisung Kim, and Hyoung-Joo Kim. Exploiting Bloom Filters for Efficient Joins in MapReduce. Information ― An International Interdisciplinary Journal, vol. 16, no. 8(A), pp. 5869-5885, 2013.8. [PDF]
  • 배혜찬, 이태휘, 김형주. 맵리듀스 환경에서 블룸 필터를 사용한 적응적 조인 처리. 정보과학회논문지: 데이터베이스, 제 40권, 제 4호, pp. 233-242, 2013.8.
  • Taewhi Lee, Kisung Kim, and Hyoung-Joo Kim. Join Processing Using Bloom Filter in MapReduce. In Proceedings of the 2012 ACM Research in Applied Computation Symposium (RACS '12), pp. 100-105, San Antonio, TX, USA, 2012.10. [LINK]
  • 김태경, 김기성, 김형주. 맵리듀스에서 중복기반 조인과 비상충 조인을 이용한 효율적인 SPARQL 질의 처리. 정보과학회논문지: 데이타베이스, vol. 39, no. 4, pp.246-254, 2012.8. [PDF]


Reading Materials

MapReduce & Hadoop

Data Stores

Bookmarks