Apache Flume is a real time distributed data ingest system specifically designed for the Hadoop Ecosystem. Flume is highly scalable distributed system that guarantees delivery from a large number of data sources to an eventual destination like HDFS or HBase. Flume has been deployed in extremely large deployments in several companies around the world, transferring several hundreds of terabytes every weekend.
In this presentation, we will go through the fundamental components that make up Flume and how to configure and deploy Flume to your cluster to scale based on the number of sources and amount of data. As a committer and an engineer supporting Flume in production, I will present standard deployment topologies and how to design a deployment topology.
Hari Shreedharan is a PMC member on Apache Flume and a committer on Apache Sqoop. He is a Software Engineer at Cloudera. He regularly presents at conferences and meetups related to Hadoop and Big Data.