Showing posts with label SharePoint 2010. Show all posts
Showing posts with label SharePoint 2010. Show all posts

Sunday, June 6, 2010

Searching SharePoint 2010 with FAST

FAST is a high-end search engine that is being provided by Microsoft (at additional cost) as an enterprise level alternative to SharePoint’s built-in search engine. Whereas standard SharePoint 2010 can handle millions of documents, the FAST search engine can index and search over a hundred million i.e. it can scale to handle not only document management for an entire organization but more specialist requirements such as regulatory compliance and litigation document review. It also has extensive support for languages other than English including Chinese, Japanese and Korean.

As well as being an enterprise level search engine, FAST incorporates a number of features designed to make it easier for end users to find things. For example, many users remember documents by their visual appearance. FAST supports visual recognition by displaying a small thumbnail next to the summary of the document so users looking for a specific document can rapidly identify it. In addition FAST also includes graphical previewers for PowerPoint documents which can be used, for example, to find that one particular slide in a presentation without having to open the whole file and go through it slide by slide. Results also include links to ‘Similar Results’ and to ‘Duplicates’.

Example of a FAST Results Display


To support its search capabilities, FAST includes extremely powerful content processing based on linguistics and text analysis. Examples of linguistic processing in the item and query processing include character normalization, normalization of stemming variations and suggested spelling corrections. FAST automatically extracts document metadata such as author, date last modified, and makes them available for fielded searching, faceted search refinement and relevancy tuning. In addition to document metadata, it is also possible to define what Microsoft refer to as “managed properties”. These are categories such as organization names, place names and dates that may exist in the content of the document and can help develop or refine a search. Defining a custom extractor will enable such properties to be identified and indexed. (Note: this is a similar capability to that offered by several ‘Early Case Assessment’ tools in the litigation space).

Example of FAST Refinement Category List for a Results Set


Sharepoint 2010 Standard provides the ability to refine search results based on key metadata/properties such as document type, author, date created. These refinement metadata values are based by default on the first 50 results returned. With FAST, refinement moves to a whole other level, so-called ‘Deep’ refinement, where the refinement categories are based on managed properties within the entire result set. Users are presented with a list of refinement categories together with the counts within each category. (Note: this functionality is similar to the refinement capability that many major eCommerce sites provide e.g. NewEgg.com, BestBuy etc).

SharePoint 2010 with FAST : Architectural Overview


A detailed feature comparison between SharePoint2010 Standard Search and FAST is and further information about FAST is provided in Microsoft’s document “FAST Search Server 2010 for SharePoint Evaluation Guide” downloadable from http://www.microsoft.com/downloads/details.aspx?FamilyID=f1e3fb39-6959-4185-8b28-5315300b6e6b&displaylang=en

Tuesday, May 25, 2010

Beyond Keyword Searching

Sometimes we put documents into store for safe-keeping. We want them to be available if we should ever need them but we are not expecting to review them on a regular basis. Tax filings, expired contracts and wills would fall into this category. In a business environment though, there are many documents we need to look at on a regular basis or be able to retrieve quickly. There is nothing more frustrating than spending several hours hunting for a document you know is out there somewhere but can’t remember where it was filed and countless studies have revealed we all spend significant amounts of our working lives looking for information.

When SharePoint (and similar document management software) was first introduced, it seemed to offer a solution: behind the scenes text indexing (so users didn’t have to do anything other than upload their documents) and a really fast search engine that allowed users to retrieve documents based on the words in the text and a few key metadata fields such as title, author, folder name. However while keyword searching is very effective in extremely large, highly heterogeneous information environments like the internet as a whole (Google being a case in point – and even they modify this approach for other services such as Shopping) – it has significant limitations when looking for information in more focused environments – such as business operation – where one of the primary needs is to group together like documents and separate them from unlike documents.

Without some form of tagging, it is not straightforward to carry out even quite simple looking searches because the underlying language used to describe business concepts is not standardized. For example, the HR Department might be referred to as: HR, Human Resources and Personnel. A Project might be referred to by a project number, the client name, the project name, some abbreviation of the project name and so on. It is for this reason that most blogging software (such as this one) enables postings to be tagged/coded.
And beyond variation in terminology is the problem that no where in office documents is the purpose of the document automatically recorded. For example, there is no automatic way to distinguish a Word document that is a contract document from one that is a proposal, or an internal PowerPoint presentation from an external one produced for a client meeting. To categorize documents in this way requires human intervention and a document classification system that is agreed across the business entity.

SharePoint 2007 began to address some of the limitations of keyword searching by enabling documents to be tagged (or coded) on upload. Appropriate values for the tags/codes could be set up in lists (or for the more sophisticated, as BDC’s to a database) that would appear to users as drop down menus, or if few enough – checkboxes or radio buttons. And user compliance could be enforced by making tagging mandatory so that documents couldn’t be uploaded unless appropriate values had been selected. However, the management of this tagging could only be done at the site level, which made the enforcement of standard values and classification systems across a business entity with many site collections, let alone sites, too labor intensive.

SharePoint 2010 has extended its coding/tagging functionality in a variety of ways. It has introduced centralized coding management (aka Managed Metadata) that can be applied across an entire site collection. The Taxonomy Term Store (accessible to users with site administrator permissions) enables lists of terms to be created or imported (see figures 1 and 2 below) which can then be applied across all sites in a collection. Examples of the types of taxonomies that can be usefully managed in this way would be departments, geographic regions, project names, product names, sizes/units. Once a term list has been made available across the site collection, it can be included as a properties column in any document libraries across the entire site collection (see figure 3 below) and made available as a metadata filter for searching.

In SharePoint 2010, content administrators can also define hierarchies of Content Types that are meaningful to their business operation (e.g. Project Contracts, Financial Reports, Job Offer Letters) that can be deployed across entire Site Collections. Each Content Type can have assigned its own workflows, permissions and retention policies which are inheritable from general (e.g. Contract) to more specific types (e.g. Legal Contracts, Engineering Contracts).

The ability to centrally define and manage taxonomies and term/coding lists in SharePoint 2010 will make it much easier to manage effectively the large multi-site, multi-library document collections that now exist in many business organizations and are likely to grow further.

Tuesday, May 18, 2010

New Out-of-the-Box Search Features in Sharepoint 2010

Search has long been one of SharePoint’s strong points. It’s easy to use – simply type in a few keywords –very fast and seems to retrieve everything short of the kitchen sink. And therein also lay its weakness. Results came back as a long (often very long) list, mixing documents and folders together. If there was some form of relevance ranking going on, it wasn’t easy to spot it. And unless you built your own search interface, the out-of-the-box search function didn’t allow for any use of the SharePoint metadata users had so carefully added, let alone scoping by document metadata.

SharePoint 2010 changes all that with a slew of new search functionality and much improved results display. For example, it will now be possible to use metadata to filter document sets and to navigate through document libraries. SharePoint 2010 filtering and navigation by both user applied metadata and SharePoint content management metadata such as

SharePoint metadata (note: this is not document metadata but additional terms added by users on upload or by default through the assignment of a document to a particular folder/library) can be used to filter documents when searching or to navigate through a document library.

In addition to user-added metadata filtering, it will also be possible to filter by a small subset of document metadata such as date created, date last modified and author. Key Filtering is further supported by autocomplete functionality so users will not have to remember all the possible options (or how to spell them!)

The display of search results is much improved. No more cryptic laundry lists! Each result is now presented with longer snippets from the documents concerned i.e. it looks much more like Bing. Compare the screenshots below. The first is from SharePoint 2007 and the second from SharePoint 2010. The relevance ranking algorithms have also been enhanced which should mean that the most useful results display first.

Search Results in SharePoint2007


Search Results in SharePoint2010



Critical to winnowing down a large document set, SharePoint will now automatically display “Refinements” on the left hand side of search results which, as the name suggests, can be used to narrow down further the results. (Note: these refinements are derived from SharePoint and basic document metadata (dates, authors etc) so the more effort users and businesses put into tagging documents, the more useful this feature will be).

At the bottom of the results page, there will also be “Did You Mean” suggestions to help users with possible misspelling, acronyms etc.

And last but definitely not least, Microsoft has embraced the mobile world and made it easy to use SharePoint search features on any smart phone.

Friday, May 14, 2010

Microsoft SharePoint 2010 : A Big Leap Forward!

Sharepoint 2010 was officially released this week. It represents a significant upgrade from Sharepoint 2007 and provides a much friendlier interface for designing and maintaining SP sites. One small enhancement speaks for the rest: no longer will adding an image to a page require a tortuous workflow and the cutting and pasting of urls. In SP 2010, images can be added to a SP page as easily as they can be to any Office document. You simply select INSERT from the ribbon, choose the picture option, browse to the image’s location and click OK. No url’s required!

Key enhancements from ChromaScope’s perspective include:

Search:
Sharepoint search has been dramatically enhanced with features designed to improve searching over large document collections including: improved relevance ranking; better result summaries so that users can more easily identify whether a document is of interest; ‘Refinements’ which automatically determined based on the document set and presented in the Left Hand column (e.g. content type, document dates, document authors and other key metadata) which can be used to navigate and filter through a set of documents. SP 2010 also has “Did You Mean” suggestions and the People Search will search for nicknames and carry out phonetic name matching. In fact there are so many useful new search features that these will be discussed in a later Chromascope post. Also available (at additional cost) is the FAST search server for those requiring enterprise wide search capabilities. FAST is scalable to billions of documents, has the ability to extract metadata for use in searching and provides thumbnail previews of office documents so that users can quickly assess relevance without actually opening the document.

Records Management, Document Retention, Preservation and Legal Hold:
In SP 2010, records management is no longer confined to sites specifically set up to manage records. Records management features – including setting policies for compliance, storage and retention – will be available across all content libraries and sites. For preservation and legal hold, documents can be declared as “records” and locked from future editing or deletion. For preservation and retention purposes, specific workflows can be designed to automatically transfer documents meeting specified criteria to a dedicated document archive.

Office Web Applications:
No longer will it be necessary to have the native applications available to view Office documents stored in Sharepoint. Once produced and uploaded to Sharepoint, they can be viewed and edited in the browser. This immediately makes using Sharepoint on a smart phone (or dare I say, iPad) a viable option as well as facilitating the use of Sharepoint for document review (no need to provide the contract attorneys with desktop copies of office!).

Managed Metadata:
In SP 2010 it is possible to set up and manage centralized taxonomies (e.g. document types, organizational departments, geographic locations, project codes) and deploy these across the entire site collection. This will make it significantly easier to code and tag documents consistently and hence easier to search and retrieve.

Scalability:
It will be possible to scale SP to handle millions of documents. Figures of up to 200 million documents per library are being quoted . This makes SP a viable repository for large scale archiving, records management and document review.

For more information see Microsoft’s own SP 2010 site: http://sharepoint.microsoft.com/
(Note: the most useful overview of the new features and functionality can be found in the two downloadable documents: Sharepoint 2010 Evaluation Guide and Sharepoint 2010 Walkthrough Guide).