BigData
From Big Data Analytics Lab
Contents
[hide]Introduction
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions." (from Wikipedia.)
Members
Ph.D. Candidate
- 오혜성 (Hyesung Oh)
- 구해모 (Heymo Kou)
Ph.D. Alumni
- 김기성 (Kisung Kim)
- 이태휘 (Taewhi Lee)
M.S. Alumni
- 이인회 (Inhoe Lee)
- 이민섭 (Minsup Lee)
- 배혜찬 (Hye-Chan Bae)
- 김혜원 (Hyewon Kim)
- 김태경 (Tai Kyoung Kim)
- 송효진 (Hyojin Song)
Publications
- Woo-Hyun Lee, Hee-Gook Jun, and Hyoung-Joo Kim, "Hadoop Mapreduce Performance Enhancement Using In-Node Combiners", International Journal of Computer Science & Information Technology, vol. 7 no. 5 pp. 1-18, Oct. 2015 (pdf)
- 이인회, 오혜성, 김형주, "MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선", 정보과학회논문지 : 컴퓨팅의 실제, vol. 21, no. 11, pp. 681-688, 2015.11 (pdf)
- Taewhi Lee, Hye-Chan Bae, and Hyoung-Joo Kim. Join Processing with Threshold-based Filtering in MapReduce. Journal of Supercomputing, vol. 69, no. 2, pp. 793-813, 2014.8. [LINK]
- Taewhi Lee, Dong-Hyuk Im, Hangkyu Kim, and Hyoung-Joo Kim. Application of Filters to Multiway Joins in MapReduce. Mathematical Problems in Engineering, vol. 2014, Article ID 249418, 11 pages, 2014.3. [LINK]
- Taewhi Lee, Kisung Kim, and Hyoung-Joo Kim. Exploiting Bloom Filters for Efficient Joins in MapReduce. Information ― An International Interdisciplinary Journal, vol. 16, no. 8(A), pp. 5869-5885, 2013.8. [PDF]
- 배혜찬, 이태휘, 김형주. 맵리듀스 환경에서 블룸 필터를 사용한 적응적 조인 처리. 정보과학회논문지: 데이터베이스, 제 40권, 제 4호, pp. 233-242, 2013.8.
- Taewhi Lee, Kisung Kim, and Hyoung-Joo Kim. Join Processing Using Bloom Filter in MapReduce. In Proceedings of the 2012 ACM Research in Applied Computation Symposium (RACS '12), pp. 100-105, San Antonio, TX, USA, 2012.10. [LINK]
- 김태경, 김기성, 김형주. 맵리듀스에서 중복기반 조인과 비상충 조인을 이용한 효율적인 SPARQL 질의 처리. 정보과학회논문지: 데이타베이스, vol. 39, no. 4, pp.246-254, 2012.8. [PDF]
Reading Materials
MapReduce & Hadoop
- MapReduce
- Hadoop: The Definitive Guide, O'Reilly
- MapReduce Algorithms for Big Data Analysis (VLDB 2012 tutorial slides by prof. Kyuseok Shim)
Data Stores
- Summary
- NoSQL Ecosystem - a good summary article by Jonathan Ellis, project chair for Apache Cassandra. posted 2009-11-09.
- Visual Guide to NoSQL Systems by Nathan Hurst. posted 2010-03-15.
- High Performance Scalable Data Stores by Rick Cattell. 2010-04-27.
- NoSQL Databases - NoSQL Introduction And Overview by Christof Strauch, from Stuttgart Media University.
- CAP theorem
- Distributed file system
- GFS: Google File System
- Key-value stores
- Dynamo: Amazon's Highly Available Key-value Store
- Document stores
- Extensible record stores
- HadoopDB: a hybrid of DBMS and MapReduce technologies
Bookmarks
- Bibliography
- UIUC CS 525: Advanced Distributed Systems, Spring 2011
- Mapreduce & Hadoop Algorithms in Academic Papers (3rd update) posted 2010-05-08
- MapReduce paper list by Ashutosh Dutta
- Cassandra reading list by Jonatha Ellis, posted 2009-12-15
- UCI ISG Lecture Series on Scalable Data Management