Budapest Data 2015: Full Schedule

13:30 CEST

Heisenberg and the uncertainty laws of BI

Heisenberg's uncertainty principle is any of a variety of mathematical inequalities asserting a fundamental limit to the precision with which certain pairs of physical properties of a particle known as complementary variables, such as position x and momentum p, can be known simultaneously.

In simple words, there are situations when achieving two seemingly independent goals becames impossible (or can only be achieved to a certain limit).
In this presentation I will investigate a few classic BI problems sharing similar nature. There are two approaches in these cases or dual goals. Those goals cannot be fulfilled without a trade-off. These problems manifest in most of the BI/DWH/data management systems.
I will highlight the reasons behind these classic dilemmas. We will check industry best practices and Teradata specific answers to handle these situations.

The data modelling paradoxon.
A data model can't be flexible AND user friendly/simple at the same time.

The Business' BI vs IT's BI paradoxon
Business can't wait for IT development cycles. IT doesn't support non-standardized, hard to operate solutions.

The Classic DWH vs Big Data paradoxon
Websites, meters, mobile apps, etc. generate more and ever-changing data that can't be handled with classic BI toolset. Fail fast is important. New emerging technologies and packs of fresh data scientists promise to solve the quest, but can they do the same with good-old DWH/BI?

Speakers

Vágó Zoltán

Senior DWH Consultant, Teradata Hungary

Zoltán is a BI & DWH expert. In his previous positions he acted as the head of the BI team at Vodafone, participated in the mobile / landline data warehouse consolidation at Magyar Telekom. Recently he has been working for Teradata Hungary.

Wednesday June 3, 2015 13:30 - 14:00 CEST
Mátyás I

Data Warehousing

14:05 CEST

What makes a good ETL system

There is no BI without data warehouses and there are no data warehouse without an ETL system. ETL processes are crucial in the life of data-driven companies. There are several ETL tools available, both open source and commercial softwares, although none of them are widely adopted, there is no standard tool targeting this problem. In my talk I will point out the characteristics of good ETL frameworks, compare the existing ones and outline their best usecases.

Speakers

Göbölös-Szabó Julianna

Data Infrastructure Engineer, Prezi

Prior to joining Prezi Julianna studied mathematics, after that she was researching big networks from data mining aspects. In Prezi she is responsible for the stability of the data infrastructure which includes developing and operating Prezi's own ETL framework that runs thousand... Read More →

Wednesday June 3, 2015 14:05 - 14:35 CEST
Mátyás I

Data Warehousing

14:40 CEST

Adattárházak 2015-ben, kiterjesztés és gyorsulás: Big Data és a relációs világ, In-Memory, Exadata

Az előadás bemutatja, hogy az olyan korszerű technológiák, mint az alkalmazásoknak transzparens oszlopos memóriacentrikus adatkezelés és más vívmányok hogyan tehetik még nagyobb teljesítményűvé az adatkezelési architektúrát, illetve a „hagyományos" relációs és és Big Data jellegű adatok integrációja hogyan vethető be adattárházas környezetekben, továbbá az Exadata technológia milyen jelent és jövéképet ad az adattárházaknak is.

Speakers

Fekete Zoltán

principal pre-sales consultant, Oracle

Fekete Zoltán az Oracle termékek közül az 1996-tól az Oracle Express multidimenziós technológiával kezdett el foglalkozni. 1998 óta dolgozik az Oracle-ben a presales területen. Az üzleti intelligencia és adattárház területen elemző és tervező eszközökkel, jelentéskészítéssel... Read More →

Wednesday June 3, 2015 14:40 - 15:10 CEST
Mátyás I

Data Warehousing

15:10 CEST

Break / Szünet

Break

Wednesday June 3, 2015 15:10 - 15:40 CEST
Mátyás I

Break

15:40 CEST

Apache Spark – The modern data analytics platform

One of the fastest developing tool in the Hadoop world is Apache Spark, so it is not a surprise at all that this fast, data analytical, batch and streaming processing system with high fault tolerance has become a popular choice among the data scientists and data engineers. Its fault tolerance and scalability covers all aspects of data analysis starting from small sized databases to massive petabytes of data.
In this talk the speaker introduces the basic functionality of Apache Spark through a use case, to help users who has no experience wit this tool yet to use it easily in data analytical solution implementations. In the second part of the the talk the speaker will demonstrate to the audience how to run our Apache Spark program in cloud environment on large databases

Speakers

Gulyás Máté

CTO, enbrite.ly

Wednesday June 3, 2015 15:40 - 16:10 CEST
Mátyás I

Spark

16:15 CEST

Hive powered by Spark

Apache Hive has become de facto standard SQL on big data in Hadoop ecosystem. It is used extensively in data warehousing and data analytics with big data. Not long ago, Hive queries could only run on MapReduce and Tez. As Apache Spark become mature as an open-source data analytics cluster computing framework, it's also introduced to Apache Hive as a new, powerful execution engine. The obvious benefit is making Hive available to Spark users and providing a better performance and response time for existing Hive users. This presentation will talk about the motivation, design principles, architecture, etc. followed by a demo.

Speakers

Xuefu Zhang

Software Engineer, Cloudera

Xuefu Zhang has over 10 year’s experience in software development. Working for Cloudera since May 2013, he spends a lot of his efforts on Apache Hive and Pig. He also worked in the Hadoop team at Yahoo when the majority of the development on Hadoop was still there. Xuefu Zhang is... Read More →

Wednesday June 3, 2015 16:15 - 16:45 CEST
Mátyás I

Spark

16:50 CEST

Interactive Graph Analytics with Spark

The Spark community has a lot of experience using Spark for offline batch analysis tasks coming from a broad range of use cases. But creating an interactive web application which aims for sub-second response times using Spark as the computation backend is still a somewhat unexplored territory. We at Lynx Analytics wandered into this territory when we built LynxKite, our big graph analysis tool. The tool enables users to interactively explore graphs of hundreds of millions of vertices and billions of edges. Exploration includes global and local views of the graph featuring visualization of attributes, connections and distributions. This talk is about the technical challenges — general and domain specific — we faced during building this software and about our solutions. We will talk about problems like scheduler delay, GC pauses, interoperability with other Akka based libraries and solutions like sorted RDDs, prefix sampling, and column based attribute representation.

Speakers

Darabos Dániel

Programozó, Lynx Analytics

Dániel has been member of the LynxKite developers team in Budapest since the very beginning of the project. Prior to this he worked at Google SRE team in Dublin.

Wednesday June 3, 2015 16:50 - 17:20 CEST
Mátyás I

Spark

13:30 CEST

Nyugdíj előrejelzés korszerű mikroszimulációs módszerrel

Az egyéni életpályák, a nyugdíj jogszerzés társadalom szintű megoszlása és a területet érintő intézkedések hatása nem mutatható ki determinisztikus egyenletekkel, nem modellezhető makro modellekkel. Az előadás bemutatja, hogy egy nemrég megvalósult Európai Uniós projekt eredményeképpen az Országos Nyugdíjbiztosítási Főigazgatóság Európai szinten is korszerűnek számító mikroszimulációs modellezéssel vizsgálhatja és prognosztizálhatja a nyugdíjrendszert, az esetleges jövőbeni változások hatását.

Speakers

Tóth Krisztián

Mikroszimulációs nyugdíjmodell fejlesztő, ONYF

BCE-ELTE közös mesterszakán végzett biztosításmatematikusként. Az ONYF munkatársaként a MIDAS_HU mikroszimulációs nyugdíjmodell fejlesztésén dolgozik, a fejlesztés 2012-es megkezdése óta.

Puskás Péter

DWH Developer and Consultant, Omnit Solutions

Karriere során adattárház, BI, adatbázis és alkalmazás fejlesztési projekteken keresztül lehetősége volt megismerni a fejlesztési és üzleti oldalt is egyaránt. Az Omnit Solutions csapatát erősítve BI szakértőként adattárház és üzleti intelligencia megoldások... Read More →

Thursday June 4, 2015 13:30 - 14:00 CEST
Mátyás I

Data Warehousing

14:05 CEST

Rövid bevezetés a data governance-be

Big Data és elemzések, adattárház és önkiszolgáló BI - napjaink sláger témái. Ugyanakkor ahhoz, hogy hatékonyan tudjuk kezelni és elemezni az összegyűjtött adatainkat és ne költsünk felesleges dolgokra, tudni kell, hogy mink van, minek mi az értéke és mennyibe kerül. Ennek megválaszolását segíti a data governance (adatvagyon-kezelés). Az előadás „kedvcsináló jelleggel” bemutatja a DG alapokat.

Speakers

Gollnhofer Gábor

Vezető Tanácsadó, DMS Consulting

Az adattárházak tapasztalt szakembere, 1996 óta foglalkozik magyar és külföldi DW/BI rendszerek kialakításával és ehhez kapcsolódó tanácsadással.Kiemelt szakterülete a rendszertervezés és az adatmodellezés, mind az adattárházak, mind a hagyományos informatikai... Read More →

Thursday June 4, 2015 14:05 - 14:35 CEST
Mátyás I

Data Warehousing

14:40 CEST

How Prezi uses Amazon Redshift

Redshift is a fast, fully managed, petabyte-scale data warehouse solution. At Prezi we voted for it as Data Warehouse technology. We have couple of terabytes of data in it and is available for everybody doing interactive data analysis with blazing fast response time. In my talk I will show why we chose to use Redshift as our distributed SQL database and what best practices we applied to scale to our needs as the amount of data started to reach 10TB and the user base increased.

Speakers

Németh Tamás

Data Engineer, Prezi

Tamás has more than 10 years prior experience as software engineer in various fields like PKI and investment banking. Now at Prezi as a data engineer he makes sure the data infrastructure rocks: it is reliable and a joy to work with.

Thursday June 4, 2015 14:40 - 15:10 CEST
Mátyás I

Data Warehousing

15:40 CEST

Big Data & DWH modernization

Thursday June 4, 2015 15:40 - 16:10 CEST
Mátyás I

Data Warehousing

16:15 CEST

BDD: The Visual Face of Hadoop, The Hidden Face of Spark

Big Data Discovery (BDD) is a new Oracle product aimed at cataloging, enriching and analyzing data sets stored in Hadoop in a visual manner without the need to code. Data Scientists speed up preparation tasks, business analysts can correlate faster, and business users can tap quickly into Hadoop data. But what’s really under the covers of BDD? This 30min session will try to give you a glimpse.

Speakers

Luis Moreno Campos

EMEA Big Data Solutions Lead, Oracle

Luis is a Big Data Solutions director at Oracle for EMEA doing Business Development, Marketing Campaigns, Partner development and Sales Enablement. Regular speaker at Industry and technology events, CIO roundtables, technology user groups, University seminars, and marketing events... Read More →

Thursday June 4, 2015 16:15 - 16:45 CEST
Mátyás I

Big Data

16:50 CEST

Finding Hijacked Accounts: Anomaly Detection in User Behavior Analysis

Me and my team are currently developing a novel IT security product that employs user behavior analytics. With this product, security professionals can sustain a high level of security in complex IT environments by detecting abnormal activities that could indicate masquerade attacks, malicious insiders or other forms of security threats. As opposed to common SIEM (Security Information and Event Management) solutions that achieve this through comparing incoming activities to a manually defined rule database, our solution identifies reference patterns through unsupervised machine learning, providing more flexibility in specifying normal behavior. After this the ensemble of multiple algorithms scores incoming activities, highlighting those that differ most significantly from the previously learned baseline patterns.

I will present the most important high-level problems of this field, and I will also demonstrate the data science challenges that were translated from these issues. After defining these challenges, I will provide a broad perspective on the tools and algorithms that we develop and also methods that we utilize to resolve the challenges.

Speakers

Kovács László

Data Scientist, Balabit-Europe

László works as a Data Scientist at BalaBit-Europe. His main responsibilities include researching, developing, customizing, and testing of algorithms for an IT security product that detects anomalous activities in user behavior data. Prior to BalaBit he participated in data warehousing... Read More →

Thursday June 4, 2015 16:50 - 17:05 CEST
Mátyás I

Big Data Quick talk

09:00 CEST

Introduction to Apache Hadoop

Originally inspired by Google's GFS and MapReduce papers, Apache Hadoop is an open source framework offering scalable, distributed, fault-tolerant data storage and processing on standard hardware. This session explains what Hadoop is and where it best fits into the modern data center. You'll learn the basics of how it offers scalable data storage and processing, some important "ecosystem" tools that complement Hadoop's capabilities, and several practical ways organizations are using these tools today. Additionally, you'll learn about the basic architecture of a Hadoop cluster and some recent developments that will further improve Hadoop's scalability and performance.

Basic knowledge: None

Preparation:
This tutorial would use Cloudera QuickStart VM for demo (http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms.html). The attendees are welcome to download the VM beforehand on their laptops and follow along with the demo instructions. It is, however, not required.

Speakers

Mark Grover

Software Engineer, Cloudera

Mark is the co-author of O'Reilly's Hadoop Application Architectures book, a committer on Apache Bigtop and a committer and PMC member on Apache Sentry (incubating). He has contributed code to Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume projects. He is also a section... Read More →

Friday June 5, 2015 09:00 - 12:00 CEST
Mátyás I

Workshop

13:00 CEST

Data Science bevezető

Mindenhol azt halljuk, hogy a következő idők legkeresettebb szakmáinak egyike a data scienctist lesz, de milyen készségek kellenek ahhoz. hogy valakiből az adatok tudósa váljon? A workshop során bemutatásra kerülnek azok a területek, technológiák és módszerek, amik az elejét jelenthetik a data scientistté válás útjának.

Konkrét üzleti problémák mentén bemutatásra kerülnek:

az adatok manipulálásához szükséges készségek és technológiák;
az adatok elemzéséhez szükséges adatbányászati módszerek alapjai;
a nagyméretű adatok kezeléséhez szükséges technológiák;
az adatelemzés eredményeinek kommunikálásához szükséges eszközök és kommunikációs technikák.

A workshopra olyanok jelentkezőket várunk, akik érdeklődnek a data science iránt. A workshopon történő részvétel előképzettséget nem igényel.

Speakers

Nagy István

Technológiai vezető (CTO) • senior partner, data scientist, enbrite.ly • Dmlab

Founder and CTO of the startup company enbrite.ly to fight against fraudsters on the online advertising market, and founder and senior partner of Dmlab, one of the leading data mining companies in Hungary. His more than 8 years experience in data analysis combines the mindset of a... Read More →

Friday June 5, 2015 13:00 - 16:00 CEST
Mátyás I

Workshop