Blog AboutGalleryPortfolioContact
Kenneth Solberg
Welcome to my blog

Microsoft Search Server Express 2008 and Umbraco

Introduction

Earlier this year Microsoft released Search Server Express 2008. This product is based on technology from Sharepoint and can compare to Google Mini Search Appliance, however MSSE is free with very few limitations (only one I know of is Clustering). Via it's really userfriendly Sharepoint like interface you get full control over sources to crawl and index - both local files and external websites.

1_msse_admin_640 2_msse_admin_sources_640

Microsoft Search Server Express (MSSE)

First lets install:

  • Grab a host operating system, either Windows 2003 or 2008. I chose 2003.
  • From 'Configure your server' in Windows add the 'IIS' role and enable ASP.NET only.
  • Download ASP.NET 3.0 runtime and install.
  • Download Windows Search Server Express 2008 and start the installation.
    • Do NOT install Windows Sharepoint Services first.
    • Run the 'Search Server Preparation Tool'.
    • Run the 'Install Search Server' and follow the instructions.
  • (Optional) Download Acrobat Reader v8.x for PDF IFilter and install for PDF indexing.
    • Download and save the 17x17 PDF icon/gif from here and save as:
      C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images\icpdf.gif
    • Edit the 'C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Xml\DocIcon.xml' file and insert the following line in the '' section in the appropriate place alphabetically for PDF:
    • Add the following registry key and set its value to 'pdf':
      HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList\38
    • Check the following GUID values are correct in the registry (default values should be {E8978DA6-047F-4E3D-9C78-CDBE46041603}):
      HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
      HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
    • Add "C:\Program Files\Adobe\Reader 8.0\Reader" to the system path.
    • Add PDF document type in the search server by opening up the administration console (http://MYSERVER:48560/ssp/admin/_layouts/managefiletypes.aspx), and add an entry for 'pdf' (no dot).
    • Restart the Search Server Service (from the command line):
      net stop osearch
      net start osearch

 

Now that we have installed MSSE, let's index something:

  • Go to 'Content Sources' and click 'New Content Source'.

    3_msse_admin_add_640
    Check the last checkbox to start full index directly.
    When indexing is done, you can get crawl results in the log:

    4_msse_admin_log_640
  • (Optional) Add crawler rules for authentication, url's to include/exclude, etc.

    5_msse_admin_rule_640
Of course, there's a lot of different configuration options, but I'll cover more of this in a later post. As you've probably noticed from the screenshots above I indexed the Umbraco forum. The final index consist of about 14.000 records. Of course, MSSE also provides a search interface that you can query your indexed sources and it looks like this:

 

6_msse_search_640
PS! Be sure to add a user on this site via the 'Site Actions -> Site Settings -> People and Groups -> Add User'. I created one called 'searchuser' as you'll see further down in the post.

A search for 'macro' returns 1972 records in ~0.5 sec on a Windows 2003 VMWare instance with 1GB of RAM and no spesific optimization for background services and such. Relevance sort is also extremely good and I believe it's even better than a 'site:forum.umbraco.org macro' search on Google!

Now, let's create some querying controls for Umbraco...

Search Community Toolkit

A really nice set of controls to query MSSE can be found at Codeplex. The project is called Search Community Toolkit and consist of two controls:

  • SearchInput which allows customisation of input controls including input box, search button and optionally a listbox with available scopes.
  • SearchResults to present the results of the query. The format of the query is defined in an xml file, and the results are transformed via an Xslt file.

Out-of-the-box these two controls isn't all that "Umbraco-friendly" (read: Public propery controllable), so I created a usercontrol wrapper for each with some extra candy and wrapped it in a Umbraco Package.

MSSE UserControls for Umbraco

Both UserControls expose all members from the underlying Controls from Codeplex and defaults to web.config settings with same name if not specified. Further ResultUrl defaults to currentPage and XSLT is performed in Umbraco context, yes - with umbraco.library, $currentPage and the whole schabong.

Download

Here's download links for the Visual Studio 2008 project files and binary build:

You should also define default values for all Macro parameters in web.config:


    "SearchServiceUrl" value="http://msse/_vti_bin/search.asmx" />
    "SearchServiceCredentialDomain" value="test01" />
    "SearchServiceCredentialUser" value="SearchUser" />
    "SearchServiceCredentialPassword" value="abc123" />
    "SearchTemplates" value="/xml/LiveSearchTemplates.xml" />
    "DefaultScope" value="All sites" />
    "ExcludedScopes" value="Rank Demoted Sites,Global Query Exclusion" />
    "XsltName" value="/xslt/Live.xslt" />
    :

Now, copy the /bin files and the two usercontrols to your site and create the macro with it's properties automatically fetched from the referenced usercontrols. Insert it in a tempalte and try it out! Here's a screenshot from my testsite:

7_umbraco_search 
In part 2 I'll discuss more advanced topics covering tighter integration with Umbraco with 'custom attribute mapping', searching other filetypes such as PDF files, customising the search result XSLT and more. Stay tuned!

25.9.2008

Older posts

04.09.08 - Umbraco meetup in Oslo