Machine Learning
From Big Data Analytics Lab
Contents
[hide]Introduction
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions." (from Wikipedia.)
Members
Ph.D. Candidate
- 구해모 (Heymo Kou)
- 이용현 (Yonghyun Lee)
- 임유빈 (Yubin Lim)
Ph.D. Student
- 김동효 (Donghyo Kim)
M.S. Student
- 최병민 (Byeongmin Choi)
- 강필구 ()
- 양현식 ()
- 송창헌 (Changheon Song)
Publications
- 구해모, 남창민, 이우현, 이용재, 김형주, "분산 인 메모리 DBMS 기반 병렬 K-Means 의 In-database 분석 함수로의 설계와 구현", 정보과학회논문지 : 컴퓨팅의 실제, vol. 24, no. 3, pp. 405-112, 2018.3 (pdf)
Reading Materials
MapReduce & Hadoop
- MapReduce
- Hadoop: The Definitive Guide, O'Reilly
- MapReduce Algorithms for Big Data Analysis (VLDB 2012 tutorial slides by prof. Kyuseok Shim)
Data Stores
- Summary
- NoSQL Ecosystem - a good summary article by Jonathan Ellis, project chair for Apache Cassandra. posted 2009-11-09.
- Visual Guide to NoSQL Systems by Nathan Hurst. posted 2010-03-15.
- High Performance Scalable Data Stores by Rick Cattell. 2010-04-27.
- NoSQL Databases - NoSQL Introduction And Overview by Christof Strauch, from Stuttgart Media University.
- CAP theorem
- Distributed file system
- GFS: Google File System
- Key-value stores
- Dynamo: Amazon's Highly Available Key-value Store
- Document stores
- Extensible record stores
- HadoopDB: a hybrid of DBMS and MapReduce technologies
Bookmarks
- Bibliography
- UIUC CS 525: Advanced Distributed Systems, Spring 2011
- Mapreduce & Hadoop Algorithms in Academic Papers (3rd update) posted 2010-05-08
- MapReduce paper list by Ashutosh Dutta
- Cassandra reading list by Jonatha Ellis, posted 2009-12-15
- UCI ISG Lecture Series on Scalable Data Management