Friday, December 10, 2010

Commetrix CMX Analyzer: Dynamic Social Network Visualization

Commetrix CMX Analyzer is a social network analysis platform from a German company Trilexis (www.trilexis.com) which originated in a research group at the Technical University of Berlin. (Note: the website, user interface and documentation are all in English.) What is interesting about this particular tool is its emphasis on the dynamics of social interactions over time. It achieves this through a data format that captures information about each individual link event including not only originator, destination and time but also user specified attributes which could include communication mode (email, IM, twitter), type of exchange (social, work, ecommerce), topic (e.g. keywords extracted from the subject).

Commetrix CMX Analyzer User Interface

A small subset of the Enron Email dataset –from the size and the individuals referenced we are guessing a single custodian - is provided for demonstration purposes. Part of our interest in this particular software is that we are familiar with the Enron dataset and had researched it using the social network analysis functionality of an eDiscovery system called MetaLINCS. We were curious to see what additional insights CMX Analyzer might provide.

CMX Analyzer is a desktop tool built in Java incorporating the 3D graphical capabilities of Java 3D and the Java Media Framework. Once we had obtained the license key, the application was straightforward to install and comes with a user guide. To date we have only been able to try it out on the sample data set provided as the process of creating new data sets requires end-user coding (of link attributes) followed by a data transformation process that requires as separate tool (Commetrix Producer) or the data being sent to Trilexis for processing by their systems.

Commetrix Data Preparation Process:

Commetrix is not as functionally or visually rich as some of the other tools we have investigated and reported on in previous blogs (e.g. Gephi, nodeXL). However, where it comes into its own is in the dynamic visualization of email communications over time. The MetaLINCs software we had used in the past had provided a “time-slider” but was essentially a “snapshot” approach. Commetrix has time-sliders too but also animates the traffic creating a unique perspective on what is, after all, a time-based series of events. (We should also warn readers that the resulting animations make for highly addictive viewing. We were totally captivated!) The start-end of the time period can be set, as can the intervals and speed of animation. It is also possible to run the time line backwards as well as forwards. This makes it possible to identify “hot spots” of communication activity between group subsets at particular points in time. In other types of communication e.g. twitter or facebook – we can see how this would provide valuable insight into the evolution of a topic of discussion or a social group.

Snapshot of Communications: Jan 2000

Snapshot of Communications: Dec 2000
Visually, Commetrix is more limited than some of the other packages we have used e.g. it is not possible to pan or zoom. There are options to change node size and color to represent parameters such as communications sent, communications received, number of direct contacts. Color schemes cannot be chosen directly but can be set to show selected attributes e.g. the following screenshot shows nodes color coded by the ‘function’ attribute where dark blue represents employees, pale blue represents directors, green represents traders, wholly purple circles represent managers and purple circles with yellow centers represent in-house lawyers. (Note: we found the use of full and semi colored circles to be somewhat confusing).

Colorcoding by Function

Included in Commetrix is an “egoview” option which allows you to select a particular node and investigate communication to and from that individual node. Links can be filtered to include only direct communications (a 1-step link) or communications involving two or more steps. The image below, for example, shows communications to and from Sara Shackleton. While this capability is helpful focusing down on traffic to and from a node, in the case of email communications if the data set is from only one custodian, the egoview has limited value when used outside that custodian as it will show only those communications that happen to have been referenced in emails sent to and from the primary custodian i.e. it is an imperfect sample.

Screenshot Showing Ego View - Tana Jones

Commetrix also comes with a Keyword filter. The intent is to allow the user to focus on interactions “about” the selected keywords. The interface is less obvious than some of the other areas and we confess to wondering if there was a bug until – rereading the manual – we realized that “In” didn’t mean “inbound” but include and “Out” meant exclude. Selecting the terms was also rather tedious as it meant scrolling through a long list of options. To validate the filtering, we took ‘california’ related terms and looked to see if Jeff Dasovitch was included, which he is – see screenshot below. It would be interesting to see this concept better developed with better keyword lists, more complex keyword filtering options and possibly the employment of automated topic determination techniques such as keyword clustering.

Screenshot Showing Use of Keyword Filter
Although the enron data set was provided only for demo purposes – having worked with this data, we were curious about two things: firstly how were the keywords derived (we guessed email subject but some of the keywords were email domains – indicating other metadata might have been used as well – and some phrases had been concatenated (e.g. ‘californiaattached’) or include a leading article (e.g.thenumber), or word fragments (e.g.’t’, ‘e’). Secondly, and more importantly, how were the “identities” of the individuals represented by the nodes resolved? This is always a major issue in email communications if the only information about senders and recipients is an email address. Most individuals have multiple email addresses – even within companies – and the names on email addresses may be difficult to resolve to a single individual. We raise this question because MetaLINCS included functionality that attempted to link individuals with their email accounts based not only on email address but also on communications patterns. Even then, many individuals/email accounts that a human would identify as probably being connected, could not be automatically linked. We are guessing that the identity of individuals was manually coded since the node table has a clean one-to-one mapping between individuals and a single email address.

In summary, while we think some of the other software we have used and researched offer better social network visualization options, we really liked the time-line animation Commetrix provides and believe it could be very helpful when studying the evolution of a network or communication patterns over time. While the keyword filtering option was disappointing in both the implementation and the demo dataset provided, we think it has obvious potential – particularly when analyzing large data sets of email, IM and twitter – in enabling users to focus in on only those communications “about” a particular topic. Of course, with that come all the provisos of using keywords as a substitute for “aboutness” but if it was combined with stemming, a better stop word list, and some form of thesaurus (to apply synonyms automatically) it would be very powerful.

Sunday, November 21, 2010

Anatomy of a Professional Community Portal

Our mission on this project was to create a portal for a professional community. The Portal aimed to support the usual range of community functionality such as blogging, forums and aggregating news feeds, host curated, searchable libraries of documents – from Standard Operating Procedures to Equipment Manuals and browsable and searchable directories of key information such as suppliers of equipment or professional programs. In addition, it needed to provide custom, secure workspaces where groups of users could collaborate on activities such as ISO accreditation and quality assurance. These workspaces needed to assist the group manage the process, monitor events and store relevant document and submissions in one easy-to-find place. The Portal also had to support an ecommerce area where merchants could sell equipment, training courses and quality assurance/proficiency testing programs.

Community Portal Functionality


Since there was a very limited amount of time in which to produce a demonstrable system and an even more limited budget for developing it, we opted to use Drupal as the underlying platform. Some of the pros and cons of this decision, and a comparison with SharePoint, were discussed in the previous blog. The combination of Drupal core, Views and Ubercart enabled us to roll out the ecommerce area and basic community features such as blogs, forums, job center, news aggregation rapidly and with relatively little direct coding.

An Example of a Browsable, Searchable Directory


Drupal’s taxonomy infrastructure, together with Views and Drupal’s core content management and search capabilities, made it very straightforward to roll out a number of different libraries and directories that were both easy to set up and easy for content managers to add to and edit. For situations in which content can be contributed by more than a small group of content managers/writers, Drupal supports workflow management although – like many Drupal functions – it does require a little more work to set up than SharePoint 2010’s more plug-and-play approach.

For the workspaces we created permission controlled secure areas that featured a mix of calendars and events, lists e.g. task lists, subscription lists, member lists – and content libraries e.g. standard guides, test submissions and test results. Each workspace was set up to support multiple projects within the overall activity type.

An Example of a Secure, Custom Workspace


The area we found least satisfactory were Drupal’s out-of-the-box submission forms for anything other than standard content such as documents and blogs. It did not provide a satisfactory interface for more complex data submission and we are currently testing various form modules and functional extensions to rectify this.

All in all, we found Drupal a very powerful and effective platform for building a professional community portal. As in any IT project, planning and design is an essential ingredient in long term growth and maintainability. In particular, we would recommend careful consideration of the information architecture in advance of any development. Drupal is underpinned by a relational database and the same considerations of redundancy, normalization and entity-relationships that hold in conventional system design, hold for Drupal development and design too. Consideration needs to be given to the relationships between the objects that Drupal nodes represent and data dictionaries set up to define each field. Doing this, you can leverage the power of Views to create a functionally rich, maintainable portal.

Monday, October 11, 2010

SharePoint vs Drupal: A “hands-on” comparison

Recently, we have found ourselves in the unusual position of building two content management oriented sites at the same time: one in SharePoint2010 Foundation and one in Drupal. While there are various blogs and commentary out there on the web about the pros and cons of the two, they are mostly written from the point-of-view of either a system administrator or a developer. In these projects, we are using third-party hosting (so no systems administration) and trying not to code but to use the out-of-the-box functionality, so we hope this blog will provide a different and practical perspective to anyone considering these platforms as options.

In our situation, the choice of platform was dictated by client needs: low-cost with ecommerce on the one hand and a company internal, office team environment on the other. Both systems are being hosted by third parties so we did not have to worry about systems administration. We did install Drupal for our development environment and note that, as everyone has commented, it is very straightforward to set up whereas our previous experience of on-premise SharePoint required significant input and ongoing maintenance from systems engineering. For SharePoint, we are working with SharePoint 2010 Foundation – which has some significant functional limitations over the “Standard” version. For Drupal we are working with version 6.15 and using Panels, Views, PathAuto, ImageCache and Ubercart as our base platform.

In both cases, the intention was to see how far we could go using the system “out-of-the-box” and without coding – which, by-and-large, we have been able to do. (Although we have to confess to a quick code-tweak in Drupal to change the name on a search button from APPLY to SEARCH). With both systems, we found ourselves frustrated initially by the fact we had less control over the individual look-and-feel of the page than we were used to in a conventional build-it-yourself, non-templated environment. However, once we adapted, we love the fact that we can focus on content and functionality and know that the look-and-feel is going to be consistently applied, and that we don’t have to design every style and control ourselves.

The steepest learning curve by far was with Drupal – which is to be expected since it is very much intended to be a lego-like platform with a wealth of options. The quantity and range of available Drupal contributed modules is its great strength and a significant advantage over the more monolithic SharePoint. On the other hand, many times we found ourselves spending hours “shopping” for new modules. While not an unhappy experience (we like shopping!) we had to be quite strict with ourselves to avoid becoming module experts who hadn’t actually built anything!

Another advantage of Drupal is the availability of sophisticated and varied themes. There are 759 freely available on the Drupal site plus many more that can be purchased for less than $100. This is a huge plus, making it easy to get a reasonable looking site up and running without spending significant effort designing and coding stylesheets. And then if you want to make minor changes to your theme - which you inevitably will - you can make local modifications to the theme stylesheet and/or use a module like CSSInjector to set up rule-based overrides. With SharePoint the out-of-the-box choice is mostly limited to the color palette – which is OK for company internal sites but anyone developing for external use is going to need more and having a broader library of available templates would be useful. Yes, you can use SharePoint Designer but it is much more effort than css-tweaking in Drupal.

Drupal's Theme Index


SharePoint’s strengths are undoubtedly its tight integration with Office and the ease of use of its out-of-the-box content management functionality. Once you have mastered the concepts of libraries and lists, you can very quickly create a functional CMS with most effort going – as it should – into organizing the content. The Office Ribbon look-and-feel and the more consistent user interface in SharePoint2010, as compared with earlier versions, mean that complete beginners can become effective users in a very short space of time. The multi-file upload feature is a joy: it’s fast, it’s easy to use and it makes large scale document upload a pain-free operation. The search site gives you effort free total site collection search capability and indeed, even at the Foundation level, we have found SharePoint’s searching to be fast, efficient (maybe even over-efficient as we are not sure of the usefulness of indexing every Excel cell) and users love what they describe as the “google-style” result displays. Users also like being able to synch their contacts and calendar with Outlook.

The SharePoint 2010 Ribbon


For internal content management systems, SharePoint2010 is a no-brainer and a hosted option removes the pain of system setup and administration. However, it could have been, should have been, so much better. It is the small things that don’t quite work that bring SharePoint down. Like the missing spellchecker on the editor (see our previous blog), or the fact that you can’t automatically set the calendar display to show multiple user events. The “wiki” style content creation feature isn’t quite there yet either. In an office/work environment, you often need to create “ordered” content with some kind of an index page: “How To” documents for example. SharePoint wiki pages, while searchable and link-able cannot be explicitly ordered and Foundation doesn’t even have tagging options. After using Drupal Views, we also found the limitations on SharePoint list settings frustrating and unnecessary. If Views allows you to set multiple filters and sort levels, why can’t SharePoint since the underlying architecture – SQL – is fundamentally the same? However we do note that the UI on SharePoint’s list set up is far more intuitive and can be readily used by non-programmers whereas Views took some getting used to and is definitely not intuitive.

The downside of Drupal is the learning curve and the fact that you do have to set-up and configure much of the functionality you want. While the extensive range of available modules means that most of this can be done without coding, it still takes some time to research and install these. And although there are many helpful blogs and commentaries on various aspects of Drupal (for which we are profoundly grateful – what did we do before Google?), interfaces for the more complex modules are often not at all intuitive and documentation can be sparse, or written from a developer perspective that assumes you are going to want to code. Panels is an example of a module where more extensive documentation and some cook-book examples would have been very helpful.

In summary, there is a place for both Drupal and SharePoint. Each has their strengths and weaknesses and neither is perfect. Both are impressive in how much functionality is available and configurable without coding. For company-internal, content management, SharePoint would be our first choice and a hosted version makes it easy to get up and running in a matter of days if not hours (as well as being cost-effective compared with purchasing an on-premises license). For external sites needing a broad range of functionality such as ecommerce, Drupal is a great option. It’s hard to beat free and the extensive eco-system of freely available modules and themes makes it easy to put together a site that has a stylish look-and-feel and rich functionality while never (or almost never) having to cut a line of code.

Saturday, September 18, 2010

Analyzing Email Communications: An Ego-Centric Approach

As a quick scan through prior blogs will show, throughout this year we have been exploring the application of social network visualization software to email communications. Our interest has been two-fold: finding tools to support those working in legal and regulatory environments who need to examine large numbers of emails for answers to “Who, What, Where, When and Who Knew” kinds of questions and secondly, to see if this approach might provide behavioral psychologists with tools to identify and/or objectively measure, communications issues in workplace teams. In many workplace situations, email has become the primary communication mechanism whether through cultural factors (as with many IT teams) or because of distance (with geographically dispersed teams). At the same time communication issues are cited as one of the primary reasons why projects fail. It seemed to us that tools for analyzing the flow of email communications in a team might help identify team members who are outside the group, or who have significantly fewer interactions with key individuals in the team, thereby enabling remedial action to be taken.

Software we have looked at so far includes: Gephi – useful for large data sets – and NodeXL – useful for analyzing smaller groups of individuals with great options for customizing the appearance of the graphs e.g. color coding particular attributes or clusters and easy to use. Data feeds into both are organized basically as edge lists and node lists with Gephi requiring XML formatting and NodeXL spreadsheet or csv lists. (Note: in an email environment, a node is an individual – represented by either an email address or a name and an edge is the communication between two individuals with the volume of communications represented by a weight measure). The visualizations produced look at communication and clustering from a birds-eye view across the entire data set.

UCINET takes a somewhat different approach. UCINET is a social network analysis program developed by university researchers at the University of Kentucky and distributed by Analytic Technologies (see www.analytictech.com/ucinet/). There is a free trial version and relatively low cost options for students, researchers and single users.

Unlike NodeXL or Gephi, UCINET is not a complete visualization package but only the analytic engine. It is, however, integrated with a freeware program called NETDRAW. Since both are included in the download package, installation is straightforward. We did find in practice though that the package behaves like a set of separate tools operating on a common data set compared with the more integrated environments of NodeXL or Gephi. Another difference is that UCINET works on matrices not edge/node lists. Fortunately, it has an import function which accepts a standard edge list (e.g. person1, person2, weight) in excel format. The import function then converts this into a matrix for analysis and visualization.

Our test data set is the same as before: an anonymized set of email communications. For this investigation we started with a small subset of 368 nodes and 1223 edges.

NETDRAW visualization of entire email network


While NETDRAW is by no means as sophisticated as the graphical packages in Gephi or even NodeXL, where the UCINET/NETDRAW package came into its own is in its ability to hone in easily on a selected set of individuals. A checklist menu of nodes appears on the right hand side of the graph and altering the selections immediately redraws the graph showing only those individuals and their connections. We think this is very helpful when drilling down to investigate the interactions between a particular group of people.

Another great feature of UCINET/NETDRAW is its ability to visualize interactions from an “ego” perspective. By selecting an initial “ego”, the software identifies all the individuals in communication with the selected individual and produces a subgraph of communications between them. For example, simply selecting “Carmela Soprano” produced the following subgraph.

"Carmelo Soprano" Ego Network Graph


NETDRAW can be configured to represent the volume of communications as the size of the link:

Network Graph with Link Width Representing Communication Volume


Or with the volume shown in a link label:

Network Graph with Link Label Showing Communication Volume


UCINET offers a range of node centrality measures including Closeness, Betweenness, Degee and Eigenvector. (For information about what these measures represent, see previous blogs or go to: http://en.wikipedia.org/wiki/Betweenness_centrality#Eigenvector_centrality). Once the measures are calculated, nodes can be colorized to represent one of the selected measures. For example the nodes on the sub-graph below have been colorized to represent the value of the Indegree attribute.
It is also possible to filter based on a particular measure. The graph below shows the entire set filtered to show only nodes with high Eignvector counts (a measure of the importance of the individual in the network).

Network filtered by Eigenvector Measure (to show 'Important' individuals only)



UCINET/NETDRAW also has a number of algorithms for analyzing subgroups. For example, in the subgraph below (an “ego” network for Tom Hagen), it has identified 3 factions – represented by the three different colors: red, blue, black.

Graph identifying Factions within a Subgroup


An analysis of cliques in the entire set identified 60 separate groups shown in the graph below.

Graph showing the 60 cliques identified in the data set


What we liked about UCINET/NETDRAW is the ease with which we could explore the involvement of particular individuals in the network using the ego feature combined with the filtering and attribute based node coloring. We also liked the wide range of analysis options which included not only the standard centrality measures but also various clustering algorithms and analyses of cliques and subgroups. While more extensive documentation would have been helpful, (although we do appreciate that this was initially developed as a research tool), we did appreciate that whatever we did to it, it never crashed and managed to catch any errors gracefully.

Saturday, September 11, 2010

The Case of the Missing Spell Checker

A recent project involved creating a proof-of-concept SharePoint 2010 Foundation site(s) for a client. The aim was to demonstrate some of SharePoint’s collaboration features and show how the platform could support various teams within the client’s organization. In setting up the demonstration, we decided to create a small Knowledge Base using the built in content creation tools.

The new page editing tools are certainly easier to use than in previous versions of SharePoint and adding in pictures is a cinch. The range of styles and fonts is also much improved. We did think the mechanism for linking pages – while very wiki-like – could have been made easier for less tech-savvy users. More importantly, since Foundation users do not get the content management and tagging features of the Standard and Enterprise versions, better tools for organizing the pages – other than simple links – would have been helpful. For example, it would have been nice to have been able to designate one of the pages as the “Home Page” of the Knowledge Base. Another great feature would have been to have an “Index Page” with an automatically created index of pages in the wiki.

SharePoint 2010 Foundation Content Editor: Insert Options


SharePoint 2010 Foundation : Text Editing Options


It wasn’t until someone pointed out a glaring spelling error in the copy we’d been writing for the Knowledge Base that we realized that, most strangely, there wasn’t any form of spell checker in the content editor. At first we thought we’d simply mislaid it somewhere in the ribbon but after looking high and low for it and checking several blogs, we realized that it in fact doesn’t exist in Foundation. Microsoft skirt round the issue by declaring that spell checking exists in Standard and Enterprise, thereby carefully not saying that it doesn’t exist in Foundation.

This seems to us very strange and a significant drawback to Foundation (which is almost certain to be the de facto hosted version). After all, blog platforms and software like Blogger - on which ChromaScope is hosted - have incorporated spell checkers for some time now.

Blogger's Editing Options (Spell Check is the last icon on the right)


Intrigued, we decided to do a quick comparison of functionality between the HTML editors in Blogger and SharePoint 2010.

Feature

Blogger

SharePoint 2010
Foundation

Cut/Copy/Paste

Yes

Yes

Font
Styles

Yes
(7 available)

Yes  (13 available)

Font
Color

Yes
(limited range)

Yes
(extensive range)

Strike-through/SuperScript/Subscript

Strike-through
only

Yes

Highlight
Text

Yes

Yes

Paragraph
Formatting (e.g. justification)

Yes

Yes

Style
Gallery (e.g. Byline)

Quote
only

Yes
(7 available)

MarkUp Style Gallery (e.g. Heading1)

Title
and Body only (from blog content editor).

Yes
(14 available)

Text
Layout (e.g. columns)

Yes
but through Page Design rather than content editor.

Yes

Insert
Picture/Image

Yes

Yes

Insert
Video

Yes

Yes
(but not as obvious how to do this)

Insert
Link

Yes

Yes

Insert
Jump Break

Yes

No

Insert
Table

No

Yes

Select
Elements based on HTML tag

No

Yes

CheckIn/CheckOut

No
(but the publish function enables users to decide when pages become publically available.)

Yes

Tagging

Yes

No

Edit
HTML Source

Yes

Yes

Page
Templating

Yes

Yes
but by using SharePoint Designer

Language
Support

Yes
including non-latin

Extensive
including non-latin

Spell Checking

Yes

No

While overall, SharePoint 2010 Foundation has a very rich content editor, some of the features and the rather technical HTML element orientation may make it difficult for the general user or, more likely, simply languish unused. Blogger, on the other hand, with the exception of the option of easily adding a table, has all the features the general user/content creator would need to compose content AND a spell checker! Hopefully Microsoft take note of the feedback that we, and we are sure everyone else, will give them and make the text editor in SharePoint 2010 Foundation more like an easy-to-use content editor and less like an HTML editor for web designers.

Monday, August 16, 2010

PivotViewer: More Than Just Images

PivotViewer (aka simply as Pivot) is a framework that comes out of Microsoft Live Labs and is intended to support analysis of large datasets where the individual data entities have an image associated with them. We say this carefully because at first glance it looks like “yet-another-image-gallery-application” but it really is not (although we’d agree that you could use it for that purpose if you wanted, just as you can use a chisel to pull up carpet tacks and a 6-burner commercial class stove to cook a packet of soup).

Screenshot of AGM Movie Demo in PivotViewer showing the tiled view


The Silverlight enabled viewer works in a not too dissimilar way from an Excel Pivot table. Data can be filtered by any of the facets/categories available, supplemented (if required) by keyword searching. Images can be shown in tiled view or organized in bar chart view by chosen facet. Drilling down to item detail is as simple as zooming into an image. The corresponding data is displayed in a list to the side and adjacent items can be quickly stepped through using forward/backward buttons.

Screenshot of AGM Movie Demo in PivotViewer showing chart view



Underpinning the framework is the concept of a “collection”. A collection comprises a set of images and an XML file describing the images. The CollectionXML schema is a set of property-values that specify the collection as a whole, the facet categories into which the collection is organized and the individual items. The images in the collection are stored in Deep Zoom format and rendered using Seadragon technology.


CollectionXML Schema Overivew


Creating a Pivot collection is not as intimidating or difficult as it might sound, however, because fortunately LiveLabs provide several tools to facilitate the process including one that is based on Excel.

Screenshot of Pivot Collection Tool for Excel


This summer, LiveLabs also released a Silverlight 4 control which can be embedded in web sites (including SharePoint) and used to view, manipulate and analyze collections. The tools are available (for free) from the Pivot site. The Silverlight PivotViewer control can be downloaded from: www.silverlight.net/learn/pivotviewer/

Our initial interest in PivotViewer was its visualization capability and its potential for presenting complex data in ways that make it easier for users to understand and analyze. To this end we decided to try it out for ourselves and build a mini application using the Silverlight control as the viewer and the Pivot Collection Tool for Microsoft Excel to create the underlying collection. We had available a small collection of data and images relating to laboratory equipment and thought this would provide an interesting proof-of-concept.

Unlike many “interesting concept” toolsets we have attempted to deploy in the past, this one turned out to be very straightforward to use – despite a paucity of documentation. While the Excel Collection tool is “plug-and-play”, some knowledge of .NET development and Silverlight is obviously necessary to deploy the PivotViewer control. Thanks to Tim Heurer’s very helpful blog on how to deploy PivotViewer, we were able to get a basic Lab Equipment PivotViewer up and running very quickly.

Screenshot of Laboratory Equipment application - tile view


Screenshot of Laboratory Items by material type (chart view)



Although we knew going in that the small number of data items we were using was less than ideal, (more is definitely better here), we thought that the set of uniform images we had available (complete with color coding) and the supporting data about the equipment (size, material type, category, descriptions etc). would make up for it. We were wrong! We had focused on the images and these, while necessary, are not sufficient. What is absolutely essential to really make the most of this application is rich data. We had only two main facets and a small number of parameters for each significantly limiting what we could do.

Contrast this with the AMG Movie demo provided as a sample with the control where each movie is accompanied by a wealth of information including a description as well as faceted data such as date of release, director, actors, genre, box office takings, countries, runtime time and it is this information that fuels the application.
Close-up of Movie demo item and accompanying data


When thinking about how Pivot could be used, our first thoughts had been the obvious “image gallery” type applications: a web enabled version of an art gallery or museum for example. The “out-of-the-box” ability to support filtering and search by multiple facets – supplemented by keyword searching – would be ideal. Users could look, for example, for all Impressionist paintings depicting lakes painted in France between this date and that. Similarly, it could be used to develop a very useful, useable interface to any large catalog of items: from clothing (women’s jeans boot-cut dark-wash) to hardware (small plate door knocker solid brass satin nickel finish).

However, it was when playing with the movie application that we realized that thinking of it as simply a front end to a catalog was to underplay its potential. We had started to look at the box office takings facet and it was then that the penny-dropped. We found ourselves looking for patterns. What correlations were there between directors, actors and takings? It was very easy to ask these questions and then focus in on the results, arranging the items as tiles or as bar graphs. We could see the visual potential of PivotViewer really coming into play when looking at, for example, trends in sales on clothing or even real estate – anything where visual appearance (from color to style) is a factor in sales, cost of manufacture, page views or some other key metric.

Screenshot from AGM Movie demo showing Movies by Box Office Gross


In the movie demo, the images are a nice-to-have as a visualization but are not an essential part of the analysis per se. In other cases, we could envisage the images themselves being an essential part of the analysis. For example, retailers often study the selling power of pages in their printed catalogs or web sites, to determine which layouts are the most effective. PivotViewer would make this a very easy analysis to conduct. Similarly a greetings card manufacturer could look for patterns and trends in consumer choice of design.

In summary, we believe this technology has great potential deployed in environments that are data rich and where either visual appearance is correlated with one or more key metrics, or can facilitate visualization of complex data simply by making the individual items (or groups of items) more recognizable.
Underpinning the framework is the concept of a “collection”. A collection comprises a set of images and an XML file describing the images. The CollectionXML schema is a set of property-values that specify the collection as a whole, the facet categories into which the collection is organized and the individual items. The images in the collection are stored in Deep Zoom format and rendered using Seadragon technology.