ChromaScope: Search

Showing posts with label Search. Show all posts

Thursday, January 6, 2011

The Art of Searching in an Expanding Information Universe

As the pedabytes of data on the internet grow ever larger, it has become harder and harder to find what you are looking for even when you are sure the information must be out there somewhere. Google is a wonderful thing but there are inherent problems in basic keyword searching that are becoming more apparent as the volume of data grows and, inevitably, along with it the volume of junk.

One problem with online searching is that most search engines require you to describe in some way – typically through the use of keywords – the information you want to retrieve. Which is fine if you know enough about what you are looking for to describe it but not at all if you don’t.

For example, at the beginning of a legal case, today’s legal teams may be presented with terabytes of emails and documents collected from individuals of interest (aka custodians) but may have little or no idea about what’s in those emails and documents or how to identify items of interest (aka responsive documents). This is such an issue that whole suites of software have been developed to assist with what is known as Early Case Assessment (e.g. Clearwell), attempting to solve the problem by analyzing the document set by topic, key phrases or terms so that the legal team can begin to develop a search strategy.

A more common situation would be one we have all experienced when trying to solve a technical issue in an area with which we are not familiar. “Pop-up thingy” may be how you’d describe the dialog window that keeps popping up but how is it ‘officially’ named in the software you are using? Without knowing that, finding assistance is difficult. You may have to trawl through a few dozen only marginally relevant items to finally track down the keywords you need to do a proper search.

Researching technical issues is also made difficult by the fact that you may not know which element of a systems environment is the one causing the problem and therefore where to focus the search. For example, if a user of hosted SharePoint 2010 on Win 7 32bit laptop using IE8 has issues downloading documents after an upgrade to Office 2010, is the primary problem with SharePoint, SharePoint 2010, Win7 UAC, IE 8, 32bit or MS Office 2010? Entering a search that includes all the software components and their versions is likely to be far too narrow and to remove potentially helpful documents (for example, the problem might not be Win7 related and there may be helpful information refering to a similar situation on desktops running Vista). Not scoping it at all is likely to result in hundreds of irrelevant documents dealing with obscure issues with, say, SharePoint 2003 and XP SP1. Once you have some clue as to what might be the cause of the problem – or even best hypothesis – you can scope down to the versions of the software environment that are relevant and, hopefully, find articles and postings relating to similar situation. But you need that initial clue/hypothesis i.e. you need to understand something about the answer before you can pose the question that's going to bring up potentially relevant solutions.

The difficulty with using keyword searching to find information, as the above examples illustrate, is that you have to know how the information you are searching for is expressed in words. The second is that the same word can have two different meanings or be used in two different contexts and it is not always easy to frame a search to exclude all meanings but the one you want, without losing potentially relevant articles.

The meanings do not need to be as diverse as say, the word “spring”. Take the example of “FedEx”. If you run a search for the keyword ‘FedEx’ on either Google or Bing, you will find that it brings up not only information published by FedEx on its own web site, but business articles about FedEx, articles mentioning FedEx Field (the sports venue), FedEx Air & Ground (NFL) Players of the Week, the FedEx PGA Cup and blogs/forum postings about a delivery or mentions of FedEx’s delivery service in articles which are actually about something else e.g. see highlighted ‘page1’ results from a Google search for Fedex below.

Partial screenshot of the results of a Google search for 'Fedex'

Google does have a News category filter but since the NFL is also news, the results include business news, company news and sports news.

Bing also category filters. These appear to filter based on source type rather than the content (indeed the API refers to them as sourceType). Below the top level ‘News’ source type is a subcategory called ‘business’ which presumable scopes the results to business news sources. When we tried it, it did seem to remove many of the top ranked listings relating to NFL issues but there remained in the top 10 postings (sorted by most recent), one result for the FedEx PGA Cup which we presume survived because the article was published in TradingMarkets.com which is deemed a business news source.

Results of searching Fedex in the News source types on Bing. Note ability to filter by Business, Sports or Political source types listed in the left hand menu.

In practice, we’ve tended to find Yahoo! Finance to be the easiest and quickest way to find recent business oriented articles about a company sorted by date, but obviously this only works for companies that are public or large enough to be tracked by Yahoo! Finance and even then, some of the articles seem only loosely related to the company in question.

The difficulties we have been experiencing trying to find information through the “usual channels” – primarily Google, Bing – had us reading with interest a recent posting on TechCrunch: “Why We Desperately Need a New and Better Google” (https://techcrunch.com/2011/01/01/why-we-desperately-need-a-new-and-better-google-2/)

It was a posting that resonated deeply having experienced many of the same issues –wading through the junk “compilation” sites that are nothing more than automatically gathered links to links and add zero value; increasing difficulty searching specifically for people; problems with trying to find only recently written (as opposed to recently indexed) articles.

Inspired by the posting, we decided to check out Blekko, a search engine the author of the article and his team of students at the School of Information at UC-Berkeleyhad used with some success, to see whether the functionality on offer would assist us with some of our search problems.

Blekko was founded in mid-2007 by a group who had previously worked at Topix and Netscape’s Open Directory.Blekko’s primary differentiator is the use of ‘slashtags’ to filter (or sort) search results. For example, using /people will filter search results that are specifically about a person; /date sorts results by published (not indexed) date; Topic slash tags e.g. /health or /recipes will filter the search to a curated subset of web sites dealing with these categories (thereby avoiding the spammers, the listers and other junk sites as well as minimizing the problem of multiple meanings/contexts for terms). Blekko developed some initial topic slashtags but users are free to create their own and use for their own purposes or share with others.

We searched Blekko for recent news stories about FedEx ( Fedex /news /date). We would have liked to have scoped by business but unfortunately there is currently no ‘business’ slashtag. While the initial results were all company and recent news related (good!), the NFL had crept in by result 11.

Screenshot of results using Blekko and the Search: Fedex /news /date

We also noticed that there didn’t appear to be any results about share prices (compared with, for example, search results for Fedex filtered by NEWS and BUSINESS on Bing), and so we tried slashtag Finance as an alternative. This brought up a very mixed bag of results, a consequence of filtering by web site rather than topic. There were many mentions of the PGA Cup because golf it seems that a well reported topic on Financial web sites! Obviously, if we were doing this frequently, it would be worthwhile to creating our own slashtag to scope the results to those business information sources we found most useful for this topic.

The results of a search for recent information about technology at Fedex (Fedex /tech /date) show some of the difficulties of achieving precision with keyword searching – even when scoped by source. Only the third article down is relevant.

Screenshot of Blekko search results for Fedex /tech

Without going into the realm of true semantic analysis and the semantic web, one mechanism that would help improve the relevancy of search results in cases where a topic can have multiple foci within the same information source context (e.g. FedEx as a company vs other companies incidental use of Fedex ) would be to make more use of facets in the manner of many Solr implementations or indeed SharePoint 2010 Fast but that in turn would require the use of taxonomies and indexing of content which in an world-wide-web scenario would need to be automated rather than carried out by human content providers as happens in SharePoint environments.

Snapshot of the results of a SharePoint 2010 Fast Search showing 'Refine by' options

Overall, we do think the ability to filter search results by a curated set of web sites has potential and we loved the ability to combine topic slashtags with the /date and /people tags to further refine and sort the results. We also liked the ability to declare a site as “spam” and have it forever banned from our search results. (Which we would have loved to have known about when trying to do a search on a Drupal related technical issue a few months ago). Another thing we did appreciate about Blekko is its transparency. For instance, it is very easy to find which web sources are included in a slashtag’s scope. Simple go to: find the slashtag and drill down on the link. In contrast, we were unable to find which news sources were included in Bing’s news sourcetype or which business news sources in the news >business category.

On a very minor note:(1) It would be helpful to new users of Blekko to put a link to the list of slashtags on the home page (2) When we searched for iChromatiq (we couldn’t resist!), our home page listed 18th after a series of postings for “aChromatic”. We can see why our web site ranked lower than the dictionary entry for ‘achromatic’ on dictionary.com – Blekko does make reasons for page rankings explicit – but it is because the Blekko engine treats ichromatiq and achromatic as the same term and since the ichromatiq web site has fewer inbound/outbound links than, the dictionary.com entry for ‘achromatic’, it is ranked far lower. We would have no argument with this ranking if we had searched for ‘achromatic’ or if our web site was achromatic.com. But logically, shouldn’t a search for a specific term rank pages containing that specific term above pages containing terms which may be similar but are not identical? They are, after all, the best fit. Or, at least – like Google or Bing – ask the user if they meant achromatic rather than ichromatiq and based on the response, search accordingly. Just a thought!

Monday, October 11, 2010

SharePoint vs Drupal: A “hands-on” comparison

Recently, we have found ourselves in the unusual position of building two content management oriented sites at the same time: one in SharePoint2010 Foundation and one in Drupal. While there are various blogs and commentary out there on the web about the pros and cons of the two, they are mostly written from the point-of-view of either a system administrator or a developer. In these projects, we are using third-party hosting (so no systems administration) and trying not to code but to use the out-of-the-box functionality, so we hope this blog will provide a different and practical perspective to anyone considering these platforms as options.

In our situation, the choice of platform was dictated by client needs: low-cost with ecommerce on the one hand and a company internal, office team environment on the other. Both systems are being hosted by third parties so we did not have to worry about systems administration. We did install Drupal for our development environment and note that, as everyone has commented, it is very straightforward to set up whereas our previous experience of on-premise SharePoint required significant input and ongoing maintenance from systems engineering. For SharePoint, we are working with SharePoint 2010 Foundation – which has some significant functional limitations over the “Standard” version. For Drupal we are working with version 6.15 and using Panels, Views, PathAuto, ImageCache and Ubercart as our base platform.

In both cases, the intention was to see how far we could go using the system “out-of-the-box” and without coding – which, by-and-large, we have been able to do. (Although we have to confess to a quick code-tweak in Drupal to change the name on a search button from APPLY to SEARCH). With both systems, we found ourselves frustrated initially by the fact we had less control over the individual look-and-feel of the page than we were used to in a conventional build-it-yourself, non-templated environment. However, once we adapted, we love the fact that we can focus on content and functionality and know that the look-and-feel is going to be consistently applied, and that we don’t have to design every style and control ourselves.

The steepest learning curve by far was with Drupal – which is to be expected since it is very much intended to be a lego-like platform with a wealth of options. The quantity and range of available Drupal contributed modules is its great strength and a significant advantage over the more monolithic SharePoint. On the other hand, many times we found ourselves spending hours “shopping” for new modules. While not an unhappy experience (we like shopping!) we had to be quite strict with ourselves to avoid becoming module experts who hadn’t actually built anything!

Another advantage of Drupal is the availability of sophisticated and varied themes. There are 759 freely available on the Drupal site plus many more that can be purchased for less than $100. This is a huge plus, making it easy to get a reasonable looking site up and running without spending significant effort designing and coding stylesheets. And then if you want to make minor changes to your theme - which you inevitably will - you can make local modifications to the theme stylesheet and/or use a module like CSSInjector to set up rule-based overrides. With SharePoint the out-of-the-box choice is mostly limited to the color palette – which is OK for company internal sites but anyone developing for external use is going to need more and having a broader library of available templates would be useful. Yes, you can use SharePoint Designer but it is much more effort than css-tweaking in Drupal.

Drupal's Theme Index

SharePoint’s strengths are undoubtedly its tight integration with Office and the ease of use of its out-of-the-box content management functionality. Once you have mastered the concepts of libraries and lists, you can very quickly create a functional CMS with most effort going – as it should – into organizing the content. The Office Ribbon look-and-feel and the more consistent user interface in SharePoint2010, as compared with earlier versions, mean that complete beginners can become effective users in a very short space of time. The multi-file upload feature is a joy: it’s fast, it’s easy to use and it makes large scale document upload a pain-free operation. The search site gives you effort free total site collection search capability and indeed, even at the Foundation level, we have found SharePoint’s searching to be fast, efficient (maybe even over-efficient as we are not sure of the usefulness of indexing every Excel cell) and users love what they describe as the “google-style” result displays. Users also like being able to synch their contacts and calendar with Outlook.

The SharePoint 2010 Ribbon

For internal content management systems, SharePoint2010 is a no-brainer and a hosted option removes the pain of system setup and administration. However, it could have been, should have been, so much better. It is the small things that don’t quite work that bring SharePoint down. Like the missing spellchecker on the editor (see our previous blog), or the fact that you can’t automatically set the calendar display to show multiple user events. The “wiki” style content creation feature isn’t quite there yet either. In an office/work environment, you often need to create “ordered” content with some kind of an index page: “How To” documents for example. SharePoint wiki pages, while searchable and link-able cannot be explicitly ordered and Foundation doesn’t even have tagging options. After using Drupal Views, we also found the limitations on SharePoint list settings frustrating and unnecessary. If Views allows you to set multiple filters and sort levels, why can’t SharePoint since the underlying architecture – SQL – is fundamentally the same? However we do note that the UI on SharePoint’s list set up is far more intuitive and can be readily used by non-programmers whereas Views took some getting used to and is definitely not intuitive.

The downside of Drupal is the learning curve and the fact that you do have to set-up and configure much of the functionality you want. While the extensive range of available modules means that most of this can be done without coding, it still takes some time to research and install these. And although there are many helpful blogs and commentaries on various aspects of Drupal (for which we are profoundly grateful – what did we do before Google?), interfaces for the more complex modules are often not at all intuitive and documentation can be sparse, or written from a developer perspective that assumes you are going to want to code. Panels is an example of a module where more extensive documentation and some cook-book examples would have been very helpful.

In summary, there is a place for both Drupal and SharePoint. Each has their strengths and weaknesses and neither is perfect. Both are impressive in how much functionality is available and configurable without coding. For company-internal, content management, SharePoint would be our first choice and a hosted version makes it easy to get up and running in a matter of days if not hours (as well as being cost-effective compared with purchasing an on-premises license). For external sites needing a broad range of functionality such as ecommerce, Drupal is a great option. It’s hard to beat free and the extensive eco-system of freely available modules and themes makes it easy to put together a site that has a stylish look-and-feel and rich functionality while never (or almost never) having to cut a line of code.

Friday, June 25, 2010

Serious Data Analytics with the Palantir Platform

Every now and then we feel like children outside a candy store, faces pressed to the window, eying the good things within. Today was one of those moments when we came across a reference to Palantir Technologies’ data analytics platform on on TechCrunch and went to investigate further.

Palantir is a data analysis platform which enables the integration of structured and unstructured data from a variety of sources – documents, databases, email communications – and provides the sophisticated tools required to search and analyze it. The company – Palatir Technologies (http://www.palantir.com/) - focuses on two verticals: Finance and Government with the latter accounting for 70% of their business and divided into Intelligence and Defense, Financial Regulation (Palantir is currently being used to monitor ARRA stimulus funding fraud and alert the various Inspector General’s to suspicious activity), Cybersecurity and Healthcare (e.g. tracing the origin of food poison outbreaks, correlating hospital quality indicators with medicare cost reports). Palantir has also teamed up with Thomson Reuters to develop a next generation financial analysis platform.

In order to deliver its functionality, the Palantir platform incorporates a number of different technologies. Its text search engine is based on Lucene – a java based text retrieval engine that has been around for a long time. Lucene, like most text retrieval software, operates on an inverted index i.e. it creates a list of key words (ignoring any stop words – generally words in a language that are not meaningful or, because they are so common, useful in a search – like ‘the’ or ‘a’ in English) and indexes against each term, the entire set of documents (and positions within the document) where the term occurs. One of Palantir’s customizations adjusts the retrieved results so that users can only see information they are cleared to view (a necessary requirement for some of Palantir’s national security customers). If a user doesn’t have access to a piece of information, its existence is totally suppressed and it will never appear even in a keyword count.

To test drive Palantir - go to : https://www.analyzethe.us/ and use their 'Analyze the US' application to explore public domain information about the US. The interface is easy to use, once you have adjusted to the UI metaphor, and most functions can be achieved by drag-and-drop. A set of test data is provided e.g. mortality statistics for various US hospitals. As with all data analysis systems, the challenge is knowing what questions to ask, within the context of the available data.

Palantir has one of the most easy to use geospatial analysis interfaces we’ve seen. Any group of geocodeable entities can be seen in map view by simply dragging and dropping the selection onto the Map icon. Geospatial related searches can be carried out over an area defined by radius, polygon or route. In addition, HeatMap and TreeMap geovisualizations are also supported. We did try importing some geocoded distribution data to see if we could produce a HeatMap of delivery density and were able to do so quickly and with minimum effort (see below based on Richmond VA).

Palantir would seem to be an ideal tool for use in forensic accounting and fraud investigations where there are a large number of interconnected persons of interest and organizational entities. Similarly, its ability to integrate structured data and documents might also be helpful in complex finance, fraud and IP related litigations where the legal team needs a way of analyzing and understanding a large set of both data and documents. Recent sub-prime related litigations come to mind as do complex Mergers and Acquisitions.

Sunday, June 6, 2010

Searching SharePoint 2010 with FAST

FAST is a high-end search engine that is being provided by Microsoft (at additional cost) as an enterprise level alternative to SharePoint’s built-in search engine. Whereas standard SharePoint 2010 can handle millions of documents, the FAST search engine can index and search over a hundred million i.e. it can scale to handle not only document management for an entire organization but more specialist requirements such as regulatory compliance and litigation document review. It also has extensive support for languages other than English including Chinese, Japanese and Korean.

As well as being an enterprise level search engine, FAST incorporates a number of features designed to make it easier for end users to find things. For example, many users remember documents by their visual appearance. FAST supports visual recognition by displaying a small thumbnail next to the summary of the document so users looking for a specific document can rapidly identify it. In addition FAST also includes graphical previewers for PowerPoint documents which can be used, for example, to find that one particular slide in a presentation without having to open the whole file and go through it slide by slide. Results also include links to ‘Similar Results’ and to ‘Duplicates’.

Example of a FAST Results Display

To support its search capabilities, FAST includes extremely powerful content processing based on linguistics and text analysis. Examples of linguistic processing in the item and query processing include character normalization, normalization of stemming variations and suggested spelling corrections. FAST automatically extracts document metadata such as author, date last modified, and makes them available for fielded searching, faceted search refinement and relevancy tuning. In addition to document metadata, it is also possible to define what Microsoft refer to as “managed properties”. These are categories such as organization names, place names and dates that may exist in the content of the document and can help develop or refine a search. Defining a custom extractor will enable such properties to be identified and indexed. (Note: this is a similar capability to that offered by several ‘Early Case Assessment’ tools in the litigation space).

Example of FAST Refinement Category List for a Results Set

Sharepoint 2010 Standard provides the ability to refine search results based on key metadata/properties such as document type, author, date created. These refinement metadata values are based by default on the first 50 results returned. With FAST, refinement moves to a whole other level, so-called ‘Deep’ refinement, where the refinement categories are based on managed properties within the entire result set. Users are presented with a list of refinement categories together with the counts within each category. (Note: this functionality is similar to the refinement capability that many major eCommerce sites provide e.g. NewEgg.com, BestBuy etc).

SharePoint 2010 with FAST : Architectural Overview

A detailed feature comparison between SharePoint2010 Standard Search and FAST is and further information about FAST is provided in Microsoft’s document “FAST Search Server 2010 for SharePoint Evaluation Guide” downloadable from http://www.microsoft.com/downloads/details.aspx?FamilyID=f1e3fb39-6959-4185-8b28-5315300b6e6b&displaylang=en

Tuesday, May 25, 2010

Beyond Keyword Searching

Sometimes we put documents into store for safe-keeping. We want them to be available if we should ever need them but we are not expecting to review them on a regular basis. Tax filings, expired contracts and wills would fall into this category. In a business environment though, there are many documents we need to look at on a regular basis or be able to retrieve quickly. There is nothing more frustrating than spending several hours hunting for a document you know is out there somewhere but can’t remember where it was filed and countless studies have revealed we all spend significant amounts of our working lives looking for information.

When SharePoint (and similar document management software) was first introduced, it seemed to offer a solution: behind the scenes text indexing (so users didn’t have to do anything other than upload their documents) and a really fast search engine that allowed users to retrieve documents based on the words in the text and a few key metadata fields such as title, author, folder name. However while keyword searching is very effective in extremely large, highly heterogeneous information environments like the internet as a whole (Google being a case in point – and even they modify this approach for other services such as Shopping) – it has significant limitations when looking for information in more focused environments – such as business operation – where one of the primary needs is to group together like documents and separate them from unlike documents.

Without some form of tagging, it is not straightforward to carry out even quite simple looking searches because the underlying language used to describe business concepts is not standardized. For example, the HR Department might be referred to as: HR, Human Resources and Personnel. A Project might be referred to by a project number, the client name, the project name, some abbreviation of the project name and so on. It is for this reason that most blogging software (such as this one) enables postings to be tagged/coded.
And beyond variation in terminology is the problem that no where in office documents is the purpose of the document automatically recorded. For example, there is no automatic way to distinguish a Word document that is a contract document from one that is a proposal, or an internal PowerPoint presentation from an external one produced for a client meeting. To categorize documents in this way requires human intervention and a document classification system that is agreed across the business entity.

SharePoint 2007 began to address some of the limitations of keyword searching by enabling documents to be tagged (or coded) on upload. Appropriate values for the tags/codes could be set up in lists (or for the more sophisticated, as BDC’s to a database) that would appear to users as drop down menus, or if few enough – checkboxes or radio buttons. And user compliance could be enforced by making tagging mandatory so that documents couldn’t be uploaded unless appropriate values had been selected. However, the management of this tagging could only be done at the site level, which made the enforcement of standard values and classification systems across a business entity with many site collections, let alone sites, too labor intensive.

SharePoint 2010 has extended its coding/tagging functionality in a variety of ways. It has introduced centralized coding management (aka Managed Metadata) that can be applied across an entire site collection. The Taxonomy Term Store (accessible to users with site administrator permissions) enables lists of terms to be created or imported (see figures 1 and 2 below) which can then be applied across all sites in a collection. Examples of the types of taxonomies that can be usefully managed in this way would be departments, geographic regions, project names, product names, sizes/units. Once a term list has been made available across the site collection, it can be included as a properties column in any document libraries across the entire site collection (see figure 3 below) and made available as a metadata filter for searching.

In SharePoint 2010, content administrators can also define hierarchies of Content Types that are meaningful to their business operation (e.g. Project Contracts, Financial Reports, Job Offer Letters) that can be deployed across entire Site Collections. Each Content Type can have assigned its own workflows, permissions and retention policies which are inheritable from general (e.g. Contract) to more specific types (e.g. Legal Contracts, Engineering Contracts).

The ability to centrally define and manage taxonomies and term/coding lists in SharePoint 2010 will make it much easier to manage effectively the large multi-site, multi-library document collections that now exist in many business organizations and are likely to grow further.

Tuesday, May 18, 2010

New Out-of-the-Box Search Features in Sharepoint 2010

Search has long been one of SharePoint’s strong points. It’s easy to use – simply type in a few keywords –very fast and seems to retrieve everything short of the kitchen sink. And therein also lay its weakness. Results came back as a long (often very long) list, mixing documents and folders together. If there was some form of relevance ranking going on, it wasn’t easy to spot it. And unless you built your own search interface, the out-of-the-box search function didn’t allow for any use of the SharePoint metadata users had so carefully added, let alone scoping by document metadata.

SharePoint 2010 changes all that with a slew of new search functionality and much improved results display. For example, it will now be possible to use metadata to filter document sets and to navigate through document libraries. SharePoint 2010 filtering and navigation by both user applied metadata and SharePoint content management metadata such as

SharePoint metadata (note: this is not document metadata but additional terms added by users on upload or by default through the assignment of a document to a particular folder/library) can be used to filter documents when searching or to navigate through a document library.

In addition to user-added metadata filtering, it will also be possible to filter by a small subset of document metadata such as date created, date last modified and author. Key Filtering is further supported by autocomplete functionality so users will not have to remember all the possible options (or how to spell them!)

The display of search results is much improved. No more cryptic laundry lists! Each result is now presented with longer snippets from the documents concerned i.e. it looks much more like Bing. Compare the screenshots below. The first is from SharePoint 2007 and the second from SharePoint 2010. The relevance ranking algorithms have also been enhanced which should mean that the most useful results display first.

Search Results in SharePoint2007

Search Results in SharePoint2010

Critical to winnowing down a large document set, SharePoint will now automatically display “Refinements” on the left hand side of search results which, as the name suggests, can be used to narrow down further the results. (Note: these refinements are derived from SharePoint and basic document metadata (dates, authors etc) so the more effort users and businesses put into tagging documents, the more useful this feature will be).

At the bottom of the results page, there will also be “Did You Mean” suggestions to help users with possible misspelling, acronyms etc.

And last but definitely not least, Microsoft has embraced the mobile world and made it easy to use SharePoint search features on any smart phone.

Friday, May 14, 2010

Microsoft SharePoint 2010 : A Big Leap Forward!

Sharepoint 2010 was officially released this week. It represents a significant upgrade from Sharepoint 2007 and provides a much friendlier interface for designing and maintaining SP sites. One small enhancement speaks for the rest: no longer will adding an image to a page require a tortuous workflow and the cutting and pasting of urls. In SP 2010, images can be added to a SP page as easily as they can be to any Office document. You simply select INSERT from the ribbon, choose the picture option, browse to the image’s location and click OK. No url’s required!

Key enhancements from ChromaScope’s perspective include:

Search:
Sharepoint search has been dramatically enhanced with features designed to improve searching over large document collections including: improved relevance ranking; better result summaries so that users can more easily identify whether a document is of interest; ‘Refinements’ which automatically determined based on the document set and presented in the Left Hand column (e.g. content type, document dates, document authors and other key metadata) which can be used to navigate and filter through a set of documents. SP 2010 also has “Did You Mean” suggestions and the People Search will search for nicknames and carry out phonetic name matching. In fact there are so many useful new search features that these will be discussed in a later Chromascope post. Also available (at additional cost) is the FAST search server for those requiring enterprise wide search capabilities. FAST is scalable to billions of documents, has the ability to extract metadata for use in searching and provides thumbnail previews of office documents so that users can quickly assess relevance without actually opening the document.

Records Management, Document Retention, Preservation and Legal Hold:
In SP 2010, records management is no longer confined to sites specifically set up to manage records. Records management features – including setting policies for compliance, storage and retention – will be available across all content libraries and sites. For preservation and legal hold, documents can be declared as “records” and locked from future editing or deletion. For preservation and retention purposes, specific workflows can be designed to automatically transfer documents meeting specified criteria to a dedicated document archive.

Office Web Applications:
No longer will it be necessary to have the native applications available to view Office documents stored in Sharepoint. Once produced and uploaded to Sharepoint, they can be viewed and edited in the browser. This immediately makes using Sharepoint on a smart phone (or dare I say, iPad) a viable option as well as facilitating the use of Sharepoint for document review (no need to provide the contract attorneys with desktop copies of office!).

Managed Metadata:
In SP 2010 it is possible to set up and manage centralized taxonomies (e.g. document types, organizational departments, geographic locations, project codes) and deploy these across the entire site collection. This will make it significantly easier to code and tag documents consistently and hence easier to search and retrieve.

Scalability:
It will be possible to scale SP to handle millions of documents. Figures of up to 200 million documents per library are being quoted . This makes SP a viable repository for large scale archiving, records management and document review.

For more information see Microsoft’s own SP 2010 site: http://sharepoint.microsoft.com/
(Note: the most useful overview of the new features and functionality can be found in the two downloadable documents: Sharepoint 2010 Evaluation Guide and Sharepoint 2010 Walkthrough Guide).

ChromaScope