Microsoft Search Server Express 2008 and Umbraco
Introduction
Earlier this year Microsoft released Search Server Express 2008. This product is based on technology from Sharepoint and can compare to Google Mini Search Appliance, however MSSE is free with very few limitations (only one I know of is Clustering). Via it's really userfriendly Sharepoint like interface you get full control over sources to crawl and index - both local files and external websites.
Microsoft Search Server Express (MSSE)
First lets install:
- Grab a host operating system, either Windows 2003 or 2008. I chose 2003.
- From 'Configure your server' in Windows add the 'IIS' role and enable ASP.NET only.
- Download ASP.NET 3.0 runtime and install.
- Download Windows Search Server Express 2008 and start the installation.
- Do NOT install Windows Sharepoint Services first.
- Run the 'Search Server Preparation Tool'.
- Run the 'Install Search Server' and follow the instructions.
- (Optional) Download Acrobat Reader v8.x for PDF IFilter and install for PDF indexing.
- Download and save the 17x17 PDF icon/gif from here and save as:
C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images\icpdf.gif - Edit the 'C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Xml\DocIcon.xml' file and insert the following line in the '' section in the appropriate place alphabetically for PDF:
- Add the following registry key and set its value to 'pdf':
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList\38 - Check the following GUID values are correct in the registry (default values should be {E8978DA6-047F-4E3D-9C78-CDBE46041603}):
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf - Add "C:\Program Files\Adobe\Reader 8.0\Reader" to the system path.
- Add PDF document type in the search server by opening up the administration console (http://MYSERVER:48560/ssp/admin/_layouts/managefiletypes.aspx), and add an entry for 'pdf' (no dot).
- Restart the Search Server Service (from the command line):
net stop osearch
net start osearch
- Download and save the 17x17 PDF icon/gif from here and save as:
Now that we have installed MSSE, let's index something:
- Go to 'Content Sources' and click 'New Content Source'.
Check the last checkbox to start full index directly.
When indexing is done, you can get crawl results in the log:
- (Optional) Add crawler rules for authentication, url's to include/exclude, etc.
PS! Be sure to add a user on this site via the 'Site Actions -> Site Settings -> People and Groups -> Add User'. I created one called 'searchuser' as you'll see further down in the post.
A search for 'macro' returns 1972 records in ~0.5 sec on a Windows 2003 VMWare instance with 1GB of RAM and no spesific optimization for background services and such. Relevance sort is also extremely good and I believe it's even better than a 'site:forum.umbraco.org macro' search on Google!
Now, let's create some querying controls for Umbraco...
Search Community Toolkit
A really nice set of controls to query MSSE can be found at Codeplex. The project is called Search Community Toolkit and consist of two controls:
- SearchInput which allows customisation of input controls including input box, search button and optionally a listbox with available scopes.
- SearchResults to present the results of the query. The format of the query is defined in an xml file, and the results are transformed via an Xslt file.
Out-of-the-box these two controls isn't all that "Umbraco-friendly" (read: Public propery controllable), so I created a usercontrol wrapper for each with some extra candy and wrapped it in a Umbraco Package.
MSSE UserControls for Umbraco
Both UserControls expose all members from the underlying Controls from Codeplex and defaults to web.config settings with same name if not specified. Further ResultUrl defaults to currentPage and XSLT is performed in Umbraco context, yes - with umbraco.library, $currentPage and the whole schabong.
Download
Here's download links for the Visual Studio 2008 project files and binary build:
You should also define default values for all Macro parameters in web.config:
"SearchServiceUrl" value="http://msse/_vti_bin/search.asmx" />
"SearchServiceCredentialDomain" value="test01" />
"SearchServiceCredentialUser" value="SearchUser" />
"SearchServiceCredentialPassword" value="abc123" />
"SearchTemplates" value="/xml/LiveSearchTemplates.xml" />
"DefaultScope" value="All sites" />
"ExcludedScopes" value="Rank Demoted Sites,Global Query Exclusion" />
"XsltName" value="/xslt/Live.xslt" />
:
Now, copy the /bin files and the two usercontrols to your site and create the macro with it's properties automatically fetched from the referenced usercontrols. Insert it in a tempalte and try it out! Here's a screenshot from my testsite:
In part 2 I'll discuss more advanced topics covering tighter integration with Umbraco with 'custom attribute mapping', searching other filetypes such as PDF files, customising the search result XSLT and more. Stay tuned!
The participants were encouraged to ask questions along the way: