Document management with Microsoft Search Server
Storage is cheap and it’s easy to create documents, so most companies have tens or hundreds of thousands of documents on the computers in their companies. So how do your customers find a document on a particular topic? The worrying answer is probably that they can’t...
If you think I’m being pessimistic, last month I asked a customer for a document we’d jointly created at their site in 2004. I knew the name of the author, and the exact date it had been created. I could supply the probable title and phrases from the document. The response was that it would take weeks to find it, and it would be faster to start again from scratch. This isn’t the first time I’ve seen this reaction, or the equally painful sight of someone copy typing a document from an old printout of it.
Microsoft research has found that the average information worker carries out 20 searches on an average day looking for information, and in an IDC report last year (The Hidden Costs of Information Work, idc.com/getdoc.jsp?containerId=217936) analysts estimated that in any given week, the average information worker spends 9.5 hours trying to find the information they need to do their job, at a cost of around $14,250 per employee per year. In the same paper, IDC says that in any given week, the average information worker will spend 6.5 hours not finding information that they know exists before recreating it. So how can your customers overcome these problems? Larger companies put formal solutions into place; document management systems where all new documents must be incorporated and correctly logged. Such systems are overkill for small and medium size companies. They’re too bureaucratic and unwieldy, and require too much staff time to maintain and manage them. Instead, companies rely on people using folders to organise documents into categories, and remembering where information is likely to be stored.
If you have customers who run their business like this, one solution may be Microsoft Search Server. There are two versions of this, full and free. Search Server 2008 (the full product) lets you create a collection of Search Servers with separate servers for indexing and querying, and multiple query servers so that there’s no effective limit on how many queries can be handled simultaneously. In contrast, Search Server Express 2008 can only be installed on a single server running both the index and query server elements. Depending on the configuration, it is also limited to being able to search around half a million documents. In addition to these two options, the next release – Search Server 2010 – is currently available as a beta version for testing. Search Server is an example of an enterprise search engine. These essentially give business users an intranet Web site with a URL along the lines of search.theircompany.com. When they open the Web site they can enter searches for documents or files stored anywhere across their company network. If the company chooses, the search can extend across a WAN, and can include useful Internet sites. The search can include data stored more complex stores such as Exchange or Lotus Notes mail servers. The search results meet security rules for the company, and are integrated with Active Directory so the user sees only those documents they have permission to view.
Search Server has the potential to give your customers almost instantaneous access to their documents no matter where on their network they are located. However, getting the best from it requires a certain amount of configuration and setup, so they’ll need your expertise and help. You can even create your own custom applications that make use of Search Server Express and include it with the application, or on hardware you supply. If you’re planning to do this you’ll need to register for Search Server 2008 Express redistribution rights, but there’s no fee to pay.
Search Server Express can be used to index and search up to about half a million documents on a normal Windows server, and if your customers need more you can add extra capacity and performance by swapping them to the li-censed version of Search Server 2008. In essence, both Search Server and Search Server Express give an enhanced version of the enterprise search element of Microsoft Office SharePoint Server. Search Server runs on top of Windows SharePoint Services, and will install this if your customer’s machine doesn’t already have it. It also makes use of SQL Server Express unless you have full SQL Server installed and tell it to use that instead. The limit of half a million documents comes from the SQL Server Express 4GB database limit, so changing to full SQL Server overcomes that limitation.
Search Server comes set up to search file servers, web sites, SharePoint sites, Exchange Server public folders and Lotus Notes. You can also make use of third party filters to add the ability to search content stored in other formats such as ZIP archives. It’s important that your customers realise that to get good results from Search Server they’ll need someone to do some configuration work. It’s true that simply providing the locations of the customer’s file shares and Web sites will enable them to enter search queries and find documents, but they can get a much better experience with some planning. To get the most out of the system, you need to help them plan what it is they’re indexing, and what they want to find. If the information in the search locations is organised, tagged and categorised, Search Server will give them a much better experience and you’ll have happier customers.
The way the basic installation of Search Server Express works is that initially Internet Explorer loads and once you’ve supplied a username and password you’re left with a blank team site to configure. The configuration dashboard is based on SharePoint Web Parts, and will be familiar to anyone who has used SharePoint before.
You can offer your customers two ways in Search Server for processing queries to return search results – content crawling and federated search. When content crawling is used, the results are returned from the search server's con-tent index based on the user’s query. Federated search can be used to show results for other content that hasn’t been crawled by the search server. The query can use the local content index, or it can be forwarded to an external content repository where it is processed by that repository's search engine. In other words, Search Server doesn’t index the remote server; instead, it contacts the search engine on that server, passes on the query and gets a result back of the matching documents. (See online resources for a useful article covering the advantages)
Search Server’s federated search is based on the OpenSearch standard, and connectors are available for a number of information services including EMC Documentum, Epicor, Flickr, Virtual Earth and IBM FileNet. You can build your own federated search connector using XML to search document stores that have a search engine but don’t support OpenSearch – or to return results from Web sites as well as local documents. Windows 7 supports federated search directly in Explorer; you can find some useful search connectors atsevenfo-rums.com/tutorials/742-windows-7-federated-search-providers.html, there’s sample code for a SQL Server 2005 database at microsoft.com/downloads/details.aspx?FamilyID=b59ea8c3-76e8-45df-947f-32a69c56b5cd and the explanation at http://msdn.microsoft.com/en-us/library/dd742958(VS.85).aspx is a good primer.
Once you’ve set up Search Server Express, your customers should be able to simply get on and use the product. The indices are continually updated so that new content shows up automatically, even while the server is still crawling other document stores. Local administrators can also delete documents from the index without causing a need to re-crawl the source.
In terms of query creation, while all users should find it straightforward to create simple queries, they may need your assistance with more advanced queries. Options such as spelling corrections, removal of duplicate results and what are known as ‘best bets’ all make the results more useful to the end user.
One of the customisation options you can offer is to personalise the interface to suit the customer’s needs. You can do this using Microsoft SharePoint Designer rather than needing to do any coding. Much of the functionality of Search Server Express comes from the use of Web Parts, and you can both add new Web Parts or change those that are used using SharePoint editing functions. For example, one Web Part is used to show ‘high confidence’ results that exactly match the search terms users have entered. You can configure it so the results shown also include those that match any of the keywords defined for the system by the Search Server administrators.
You can also configure how Best Bets are handled. These are keyword terms defined by the administrator to improve search results that can also include a list of synonyms and the URLs for Web sites that have content related to the keyword term. When a query includes a keyword term or one of its synonyms, the definition for that term and links to its featured locations or documents, labelled ‘Best Bets’, are displayed first on the search results page, above the main search results. You can also set up predefined queries that you design with your customers to meet their most common needs. Best Bets can link to Web sites or documents that will help explain a keyword and give expanded meanings.
Another area that you can work on for your customers is that of scopes. As the name suggests, you create scopes to limit searches to run on particular document locations, or on content that has been marked as matching a particular property. For example, you could create a scope that limits the search site to just those documents on the head office corporate brochures location, or a content scope such as those documents that have been marked as marketing training materials. The options for creating rules governing scopes are quite extensive. Once created, the scopes appear as a drop-down list from where the user can select which scopes they want to search.
The ranking of search results is another option that you can customise. Search Server lets you create authoritative pages. These are sites that ranked by the relevance of the information they contain, ranked from the most valuable to the least valuable. This list is then used to work out the order in which results are displayed. By default, this is based on how many clicks would be needed to get from the authoritative page to the result – something linked to directly from an authoritative page is presumably more likely to be useful.
Going further with metadata
Taken at its simplest, Search Server gives its users a local version of the familiar Web search. It can be turned into something much more sophisticated, particularly if you help your customers make use of metadata. When a document is created, we’re all familiar with assigning details such as the title, filename and author. It’s possible to go much further, and to have document properties that give more information. For example, you could set up the document template in Office so the company project number is included, or the name of the manager who is responsible. These terms can then be included in search terms to make it easier to find relevant content.
Search Server is an easy to use product that should cut down on the time wasted by your customers looking for lost information. It has lots of options for customisation, and produces fast and reliable results. Microsoft obviously hopes your customers will outgrow Search Server Express and need to move up to full Search Server or to SharePoint Office Server, but this is by no means a limited version and you may find it’s all many customers need, especially if you take the time to customise it and extend it with connectors.
Search Server Express and Search Server Technical Resources.
There’s a good collection of technical articles on Search Server on the Microsoft Enterprise Search site:
Federated Search Overview: Search Server 2008
Understanding the two ways Search Server can find documents:
Customising Search Server Express
At the moment, the site is based on the document library template supplied by Search Server Express. I’ll add some Web parts for searching for documents to the main page.
I want to add some scopes so the business users can choose to see different categories of results. Here’ I’m adding a scope that will return training materials.
Now I’ll add a scope rule to say what should be in (or not in) the scope by specifying where the scope will look and what should happen.
Here is the site in use on a search for the phrase ITExpert Magazine in the documents that have been indexed by Search Server.
Alternatives to Search Server Express
Microsoft Office SharePoint Server 2007 (MOSS) is an alternative if your customers need more than just enterprise search. This commercial product used to be called SharePoint Portal Server, and in addition to similar search facilities to Search Server Express, you can use MOSS to set up personal Web sites, more complex Web portals, business document workflow and browser-based content authoring.
Yahoo and IBM have a well-established free, entry-level enterprise search product called IBM OmniFind Yahoo Edition (http://omnifind.ibm.yahoo.net/). This gives you a Yahoo user interface component as well as Web search. Like Search Server Express, ?it has a limit of 500,000 documents, which can be organised into five separate ‘collections’. You can configure OminiFind to manage results relevancy, synonyms, and top queries, and set up fine-tuning. The indexing is based on Apache Lucene, and there’s an upgrade path to IBM’s more sophisticated OmniFind Enterprise products.
If your customers have moved to Windows 7 (or Windows Search 4.0 in XP or Vista), and documents are stored on Windows Server 2003 (with Windows Search 4.0 running) Server 2008, Server 2008 R2, Windows Home Server or SharePoint, they can search those indexes directly from Explorer ?and even File dialogs (or the Start menu in Windows 7) and see results from fileshares that are mapped (or included in Windows 7 libraries), based on title, content and metadata. You can extend this with search connectors, and in Windows 7 Enterprise and Ultimate you can use Group Policy to add search scopes for specific document repositories to the Start menu and the ‘Search again in’ area at the bottom of the search results in Explorer. However you can't fine-tune the search the way you can with Search Server.