Back To Schedule
Wednesday, April 9 • 11:15am - 12:05pm
Building your big data search stack with Apache Nutch 2.x

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Lewis John McGibbney - In this tutorial Lewis encourages you to join him in building your own customized search stack capable of handling enormous data volumes. Although the tutorial is focused on Apache Nutch 2.x, we will also be using source code from Apache Gora; an open source framework which provides an in-memory data model and persistence for big data, which acts as an object (WebPage or Host) to-datastore mapping framework for crawl data. Apache Nutch 2.x differs from the Nutch 1.x branch in one key area; storage is abstracted away from any specific underlying data store by using Apache Gora for handling object to persistent mappings. This means we can implement an extremely flexibile model/stack for storing everything (fetch time, status, content, parsed text, outlinks, inlinks, etc.) into a number of NoSQL storage solutions.

avatar for Lewis J. McGibbney

Lewis J. McGibbney

Chair, ESIP Semantic Technologies Committee, NASA, JPL
My name is Lewis John McGibbney, I am currently a Data Scientist at the NASA Jet Propulsion Laboratory in Pasadena, California where I work in Computer Science and Data Intensive Applications. I enjoy floating up and down the tide of technologies @ The Apache Software Foundation having... Read More →

Wednesday April 9, 2014 11:15am - 12:05pm PDT
Confluence B

Attendees (0)