Loading…
Budapest Data 2015 has ended
Back To Schedule
Thursday, June 4 • 10:55 - 11:35
SQL Engines on Hadoop - The case for Impala

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

This talk will go through the history and current state of processing engines for Hadoop, in particular, focussing SQL engines on Hadoop. We will, then, dive deep into one of the SQL processing engines for Hadoop - Cloudera Impala.

The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. WithImpala, the Hadoop community now has an open-sourced codebase that helps users query data stored in HDFS and Apache HBase in real time, using familiar SQL syntax. In contrast with other SQL-on-Hadoop initiatives, Impala's operations are fast enough to do interactively on native Hadoop data rather than in long-running batch jobs. Now you have the freedom to discover relationships and explore what-if scenarios on Big Data datasets. By taking advantage of Hadoop's infrastructure, Impala lets you avoid traditional data warehouse obstacles like rigid schema design and the cost of expensive ETL jobs.

This talk starts out with an overview of Impala from the user's perspective, followed by a presentation of Impala's architecture and implementation. It concludes with a summary of Impala's benefits when compared with the available SQL-on-Hadoop alternatives.

Speakers
avatar for Mark Grover

Mark Grover

Software Engineer, Cloudera
Mark is the co-author of O'Reilly's Hadoop Application Architectures book, a committer on Apache Bigtop and a committer and PMC member on Apache Sentry (incubating). He has contributed code to Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume projects. He is also a section... Read More →


Thursday June 4, 2015 10:55 - 11:35 CEST
Mátyás I-II.

Attendees (0)