Auto-Complete, suggestions for Magnolia CMS with JQuery/SolR Part 1/5 ( Adding Magnolia content to SolR )


In this post we will create from scratch an auto complete searchbar in Magnolia CMS using SolR and JQuery ( and maybe some more stuff ).

As the suggestions are generated from the Solr index, the first thing we need to do is to index the Magnolia CMS website content into SolR. To do so we have a couple of options:

  1. Go through the JCR  tree, parse content and send it to solr using SolrJ, eventually completing this with a JCR EventListener to maintain synchronisation.
  2. Crawl the rendered website and send discovered links to Solr using SolR’s “/update/extract” queryhandler, doing this will trigger the tika parser, extracting as well pdf or word documents.
  3. Crawl the rendered website, but pre-parse the content and send only specific pieces of information to SolR.

All approaches have their advantages and drawbacks, so the best approach is a combination of those three options, this however is not the topic of the current post so we won’t discuss it here, maybe I will write later on how I solved this issue.

So for the moment being to have a bit of indexed content let’s use an out of the box crawler provides on of the shelve solution for pushing content to SolR, the one that needs probably the less configuration is Nutch.

  1. Download Solr 4.0
  2. Download Nutch.
  3. Follow this to install nutch and point it to your website

After making everything work, point the crawler to your website, and let everything cook in the background for some time.

We are now ready to move to the next part of our tutorial and concentrate on creating our auto complete handler in Solr.


