Auto-Complete, suggestions for Magnolia CMS with JQuery/SolR Part 1/5 ( Adding Magnolia content to SolR )

searchformagolia

In this post we will create from scratch an auto complete searchbar in Magnolia CMS using SolR and JQuery ( and maybe some more stuff ).

As the suggestions are generated from the Solr index, the first thing we need to do is to index the Magnolia CMS website content into SolR. To do so we have a couple of options:

  1. Go through the JCR  tree, parse content and send it to solr using SolrJ, eventually completing this with a JCR EventListener to maintain synchronisation.
  2. Crawl the rendered website and send discovered links to Solr using SolR’s “/update/extract” queryhandler, doing this will trigger the tika parser, extracting as well pdf or word documents.
  3. Crawl the rendered website, but pre-parse the content and send only specific pieces of information to SolR.

All approaches have their advantages and drawbacks, so the best approach is a combination of those three options, this however is not the topic of the current post so we won’t discuss it here, maybe I will write later on how I solved this issue.

So for the moment being to have a bit of indexed content let’s use an out of the box crawler provides on of the shelve solution for pushing content to SolR, the one that needs probably the less configuration is Nutch.

  1. Download Solr 4.0
  2. Download Nutch.
  3. Follow this to install nutch and point it to your website

After making everything work, point the crawler to your website, and let everything cook in the background for some time.

We are now ready to move to the next part of our tutorial and concentrate on creating our auto complete handler in Solr.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: