Loading…
ApacheCon North America 2014 has ended
Register Now for ApacheCon North America 2014 - April 7-9 in Denver, CO. Registration fees increase on March 15th, so don’t delay!
Back To Schedule
Wednesday, April 9 • 3:15pm - 4:05pm
Apache Pig as a platform for Datascience

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache Pig is a platform for analyzing large data sets that consists of
a high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs. In addition, it
provides for extensibility by way of User Defined Functions. There are
some third-party libraries for Pig geared for use by Data Scientists.

In this talk, I will explore how to integrate popular libraries with Apache Pig to provide a robust environment to do data science. I will explore gaps and potential improvements that can be had based on our experience using Pig as a tool for data science.  In particular, we will focus the role of Pig as a data aggregation tool as well as a platform to evaluate machine learning models at scale.

Speakers
CS

Casey Stella

Principal Architect, Hortonworks
I am a principal architect focusing on Data Science in the consulting organization at Hortonworks. In the past, I've worked as an architect and senior engineer at a healthcare informatics startup spun out of the Cleveland Clinic, as a developer at Oracle and as a Research Geophysicist... Read More →


Wednesday April 9, 2014 3:15pm - 4:05pm PDT
Confluence C

Attendees (0)