Loading…
This event has ended. Create your own event → Check it out
This event has ended. Create your own
Register Now for ApacheCon North America 2014 - April 7-9 in Denver, CO. Registration fees increase on March 15th, so don’t delay!
View analytic
Wednesday, April 9 • 3:15pm - 4:05pm
Apache Pig as a platform for Datascience

Sign up or log in to save this to your schedule and see who's attending!

Apache Pig is a platform for analyzing large data sets that consists of
a high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs. In addition, it
provides for extensibility by way of User Defined Functions. There are
some third-party libraries for Pig geared for use by Data Scientists.

In this talk, I will explore how to integrate popular libraries with Apache Pig to provide a robust environment to do data science. I will explore gaps and potential improvements that can be had based on our experience using Pig as a tool for data science.  In particular, we will focus the role of Pig as a data aggregation tool as well as a platform to evaluate machine learning models at scale.

Speakers
CS

Casey Stella

Principal Architect, Hortonworks
I am a principal architect focusing on Data Science in the consulting organization at Hortonworks. In the past, I've worked as an architect and senior engineer at a healthcare informatics startup spun out of the Cleveland Clinic, as a developer at Oracle and as a Research Geophysicist in the Oil & Gas industry. Before that, I was a poor graduate student in Math at Texas A&M. | | I primarily work with the Apache Hadoop software stack. I... Read More →


Wednesday April 9, 2014 3:15pm - 4:05pm
Confluence C

Attendees (15)