ApacheCon North America 2014 has ended
Register Now for ApacheCon North America 2014 - April 7-9 in Denver, CO. Registration fees increase on March 15th, so don’t delay!
Back To Schedule
Wednesday, April 9 • 11:15am - 12:05pm
Building your big data search stack with Apache Nutch 2.x

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Lewis John McGibbney - In this tutorial Lewis encourages you to join him in building your own customized search stack capable of handling enormous data volumes. Although the tutorial is focused on Apache Nutch 2.x, we will also be using source code from Apache Gora; an open source framework which provides an in-memory data model and persistence for big data, which acts as an object (WebPage or Host) to-datastore mapping framework for crawl data. Apache Nutch 2.x differs from the Nutch 1.x branch in one key area; storage is abstracted away from any specific underlying data store by using Apache Gora for handling object to persistent mappings. This means we can implement an extremely flexibile model/stack for storing everything (fetch time, status, content, parsed text, outlinks, inlinks, etc.) into a number of NoSQL storage solutions.

avatar for Lewis McGibbney

Lewis McGibbney

Enterprise Search Technologist III, Jet Propulsion Laboratory

Wednesday April 9, 2014 11:15am - 12:05pm PDT
Confluence B

Attendees (0)