ApacheCon North America 2014 has ended
Register Now for ApacheCon North America 2014 - April 7-9 in Denver, CO. Registration fees increase on March 15th, so don’t delay!
Back To Schedule
Monday, April 7 • 3:00pm - 3:50pm
Introduction to Apache DataFu

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache DataFu is an open-source collection of user-defined functions for working with large-scale data in Hadoop and Pig.

During the the course of development at LinkedIn and other companies, a need was recognized for a stable well-tested library of routines in high-level languages suitable for execution on Hadoop.  Over time, many routines had been collected but were ill-documented, ill-organized, and easily broken.   Initially, DataFu was an initiative to clean-up these routines by adding documentation and rigorous unit tests.

Since then DataFu has evolved through many versions of Hadoop and Pig.  During this time DataFu has been used extensively at LinkedIn and other companies for many data driven products such as" People You May Known," "Skills and Endorsements" and other products.

This presentation presents an introduction to DataFu as well as example use cases in Pig.


William Vaughan

Software Engineer, LinkedIn
William Vaughan is currently a Staff Software Engineer at LinkedIn who has been involved with the creation of the Skills and Expertise as well as the Endorsements Big Data products.

Monday April 7, 2014 3:00pm - 3:50pm PDT
Confluence A

Attendees (0)