Originally inspired by Google's GFS and MapReduce papers, Apache Hadoop is an open source framework offering scalable, distributed, fault-tolerant data storage and processing on standard hardware. This session explains what Hadoop is and where it best fits into the modern data center. You'll learn the basics of how it offers scalable data storage and processing, some important "ecosystem" tools that complement Hadoop's capabilities, and several practical ways organizations are using these tools today. Additionally, you'll learn about the basic architecture of a Hadoop cluster and some recent developments that will further improve Hadoop's scalability and performance.
Basic knowledge: None
Preparation:
This tutorial would use Cloudera QuickStart VM for demo (
http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms.html). The attendees are welcome to download the VM beforehand on their laptops and follow along with the demo instructions. It is, however, not required.