ApacheCon North America 2014 has ended
Register Now for ApacheCon North America 2014 - April 7-9 in Denver, CO. Registration fees increase on March 15th, so don’t delay!

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Big Data-Resource Management Track [clear filter]
Tuesday, April 8


Mesos: Elastically Scalable Operations, Simplified.
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications. It can run and manage Apache Hadoop, Apache Spark, MPI, Hypertable, Storm, Chronos, Marathon, and other applications on a dynamically shared pool of nodes. Mesos provides distributed systems primitives that make it easy to build scalable, fault-tolerant frameworks.
The biggest user of Mesos is currently Twitter, where it runs on tens of thousands of cores. Airbnb runs all of their data infrastructure on it, processing petabytes of data. At such large scale, it becomes increasingly important to provide developers with direct access to cluster resources, for scaling and introducing new services. In this way, Mesos speeds development and makes life easier for the data center operator.


Adam Bordelon

Mesosphere, Distributed System Engineer
Adam Bordelon is a distributed systems architect at Mesosphere and an Apache Mesos committer. Before joining Mesosphere, Adam lead development on Hadoop core at MapR, built distributed systems for recommendations at Amazon, and re-architected the LabVIEW compiler at National Instruments... Read More →

Niklas Nielsen

Distributed Systems Lead Architect, Intel
Niklas has been involved in development and design around and within Apache Mesos since 2013 and a member of the Apache Mesos Program Management Committee. He recently joined Intel’s Software Defined Infrastructure team and is working on scheduler enabling technologies utilizing... Read More →

Tuesday April 8, 2014 10:30am - 11:20am
Confluence A


Apache Hadoop YARN: The Next-generation Distributed Operating System
For diverse organizations, Apache Hadoop has become the de-facto place where data & computational resources are shared. This broad usage has stretched its design beyond its intended target. To address this, Apache Hadoop community has come up with next generation of Hadoop’s compute platform: YARN.

YARN in a nutshell is the distributed Operating System of the big-data world. In this talk, we will introduce YARN, covering how the new architecture decouples programming model from resource management, scheduling functions, platform’s fault tolerance & high availability, tools for application tracing & analyses. We will then discuss the exciting ecosystem of Apache Software Foundation projects forming around YARN. We will conclude with a coverage on the applications & services being built around YARN platform which lets user chose the programming models choice, all on the same data.


Jian He

avatar for Zhijie Shen

Zhijie Shen

Member of Technical Staff, Hortonworks
Dr. Zhijie Shen was awarded a Ph.D. degree in Computer Science from National University of Singapore. Now he is a Member of Technical Staff at Hortonworks, Inc. He is a Apache Hadoop Committer, and one of the core team of Apache Hadoop YARN. Moreover, he has been actively contributing... Read More →

Tuesday April 8, 2014 11:30am - 12:20pm
Confluence A


Building and Running Distributed Systems using Apache Mesos
Today's applications and data have outgrown single machines. Whether organizations like it or not, their engineers are building distributed systems. The computer is now the data center, the process is now a distributed system, and threads are now components of the distributed system. But provisioning and operating these distributed systems is still a mostly manual endeavor, even though concepts of modern operating systems apply naturally.

In this talk I'll present Apache Mesos, a "kernel" for the data center that provides primitives and abstractions for building and running distributed systems. Mesos has been running at scale for over two years at Twitter and since it graduated from the incubator last summer numerous frameworks have been built on top.  This talk will explore the fundamentals of Mesos, as well as what it takes to build applications and frameworks on top.


Benjamin Hindman

Benjamin Hindman is one of the creators of Apache Mesos. He began working on the project as a PhD student at UC Berkeley, and it followed him to Twitter where he is currently employed. Mesos now runs on thousands of machines at Twitter, Airbnb, etc. -- even when Ben is away skiing... Read More →

Tuesday April 8, 2014 1:30pm - 2:20pm
Confluence A


Deploying and managing distributed applications in a YARN cluster
Hadoop YARN's cluster manager makes it possible to convert static, one-per-node, cluster-wide services, into dynamic, user-specific app.
Hoya was developed at Hortonworks to support deployment of HBase clusters in YARN. It was done both to showcase this possibility, and to drive YARN development to the needs of long-lived applications.
Hoya's goal is to take existing Hadoop applications and host them in YARN cluster.
It has evolved to provide an extension model: providers. These allow Hoya to support different applications - it now supports Apache Accumulo, and can easily support other suitable applications.
Hoya is now capable of running long-lived, dynamic applications in YARN clusters. It is already being used in internal Proof of Concept applications and in testing the applications and YARN itself; other people and organisations are experimenting with it and providing feedback.


Zhihong Yu

Staff Engineer, VMware
I have been Apache HBase PMC for 5 and half years.I am also committer for Apache Slider and Apache Bahir.I contribute to Apache Phoenix and Apache Spark.I have presented at the past 3 ApacheCon NA events.

Tuesday April 8, 2014 2:30pm - 3:20pm
Confluence A


Managing containers in YARN/Mesos using Helix
Apache Helix is a generic cluster management framework that simplifies building large scale distributed systems. YARN is a generic resource manager that provides container-based resource allocation to achieve scalable application deployment, management, and monitoring. Integrating the abilities of Helix and YARN has great potential: This talk will provide insights into how one can leverage Helix and YARN to build, configure and deploy distributed systems. The life cycle of a distributed system consists of building the system, auto provisioning, deploying, configuring, handling failures, auto scaling up/down as per work load. This talk will showcase how Helix and YARN can be leveraged to tackle various challenges involved in each stage. Finally, we will show how integrating with Helix will enable the application to be run on a variety of systems such as YARN/Mesos/EC2.

avatar for Kanak Biscuitwala

Kanak Biscuitwala

Software Engineer, LinkedIn
Kanak Biscuitwala is an Apache Helix committer and a software engineer in the LinkedIn Distributed Data Systems group. In his time with the Helix project, he has introduced several new algorithms and APIs touching nearly every component of the framework. His interests are in distributed... Read More →

Zhen Zhang

Software Engineer, Linkedin
Zhen Zhang is an Apache Helix committer and a software engineer at LinkedIn. He is one of the initial committers of Helix and has worked on most of the codebase. His interests are in distributed data systems. Before LinkedIn, he got his Ph.D. in Computer Engineering from UC Irvin... Read More →

Tuesday April 8, 2014 3:45pm - 4:35pm
Confluence A


Extend YARN to support complicated workloads other than map-reduce
Apache Hadoop YARN is a sub-project of Hadoop at the Apache Software Foundation introduced in Hadoop 2.0 that target to run different types of workloads, all sharing a common resource management platform. Over the past two years, we've tried to integrate some other frameworks like OpenMPI to YARN, we encountered limitations for YARN to support different workloads beyond map-reduce.

This talk will describe 3 areas on how to make YARN to support complicated workloads other than map-reduce.
1) An improvement for flexible container resource management, thus will make application can dynamically increase and decrease resource of a running process.
2) An extended service container to run/manage daemon/service processes in each node.
3) Our thoughts to support new opportunities like virtualized platform support, etc. upon improvements we've done to YARN.


Michael Lv

Michael Lv works for Pivotal as a senior staff and currently is focussing on big data platform and frameworks. Michael has built Pivotal HD(then called Greenplum HD) Hadoop distribution from the beginning and led the team shipped half dozen product releases. Before joining Pivotal... Read More →

Tuesday April 8, 2014 4:45pm - 5:35pm
Confluence A