This event has ended. Create your own event → Check it out
This event has ended. Create your own
Register Now for ApacheCon North America 2014 - April 7-9 in Denver, CO. Registration fees increase on March 15th, so don’t delay!
View analytic
Wednesday, April 9 • 11:15am - 12:05pm
Building your big data search stack with Apache Nutch 2.x

Sign up or log in to save this to your schedule and see who's attending!

Lewis John McGibbney - In this tutorial Lewis encourages you to join him in building your own customized search stack capable of handling enormous data volumes. Although the tutorial is focused on Apache Nutch 2.x, we will also be using source code from Apache Gora; an open source framework which provides an in-memory data model and persistence for big data, which acts as an object (WebPage or Host) to-datastore mapping framework for crawl data. Apache Nutch 2.x differs from the Nutch 1.x branch in one key area; storage is abstracted away from any specific underlying data store by using Apache Gora for handling object to persistent mappings. This means we can implement an extremely flexibile model/stack for storing everything (fetch time, status, content, parsed text, outlinks, inlinks, etc.) into a number of NoSQL storage solutions.

avatar for Lewis John McGibbney

Lewis John McGibbney

Data Scientist II, NASA Jet Propulsion Laboratory
Having a keen interest and ongoing involvement in the Apache Software Foundation, I enjoy floating up and down the tide of open-source technologies within the ecosystem there. In my free time I enjoy the freedom of cycling. | | Favourite drink... Bruichladdich

Wednesday April 9, 2014 11:15am - 12:05pm
Confluence B

Attendees (12)