Saturday, September 18, 2010

Analyzing Email Communications: An Ego-Centric Approach

As a quick scan through prior blogs will show, throughout this year we have been exploring the application of social network visualization software to email communications. Our interest has been two-fold: finding tools to support those working in legal and regulatory environments who need to examine large numbers of emails for answers to “Who, What, Where, When and Who Knew” kinds of questions and secondly, to see if this approach might provide behavioral psychologists with tools to identify and/or objectively measure, communications issues in workplace teams. In many workplace situations, email has become the primary communication mechanism whether through cultural factors (as with many IT teams) or because of distance (with geographically dispersed teams). At the same time communication issues are cited as one of the primary reasons why projects fail. It seemed to us that tools for analyzing the flow of email communications in a team might help identify team members who are outside the group, or who have significantly fewer interactions with key individuals in the team, thereby enabling remedial action to be taken.

Software we have looked at so far includes: Gephi – useful for large data sets – and NodeXL – useful for analyzing smaller groups of individuals with great options for customizing the appearance of the graphs e.g. color coding particular attributes or clusters and easy to use. Data feeds into both are organized basically as edge lists and node lists with Gephi requiring XML formatting and NodeXL spreadsheet or csv lists. (Note: in an email environment, a node is an individual – represented by either an email address or a name and an edge is the communication between two individuals with the volume of communications represented by a weight measure). The visualizations produced look at communication and clustering from a birds-eye view across the entire data set.

UCINET takes a somewhat different approach. UCINET is a social network analysis program developed by university researchers at the University of Kentucky and distributed by Analytic Technologies (see www.analytictech.com/ucinet/). There is a free trial version and relatively low cost options for students, researchers and single users.

Unlike NodeXL or Gephi, UCINET is not a complete visualization package but only the analytic engine. It is, however, integrated with a freeware program called NETDRAW. Since both are included in the download package, installation is straightforward. We did find in practice though that the package behaves like a set of separate tools operating on a common data set compared with the more integrated environments of NodeXL or Gephi. Another difference is that UCINET works on matrices not edge/node lists. Fortunately, it has an import function which accepts a standard edge list (e.g. person1, person2, weight) in excel format. The import function then converts this into a matrix for analysis and visualization.

Our test data set is the same as before: an anonymized set of email communications. For this investigation we started with a small subset of 368 nodes and 1223 edges.

NETDRAW visualization of entire email network


While NETDRAW is by no means as sophisticated as the graphical packages in Gephi or even NodeXL, where the UCINET/NETDRAW package came into its own is in its ability to hone in easily on a selected set of individuals. A checklist menu of nodes appears on the right hand side of the graph and altering the selections immediately redraws the graph showing only those individuals and their connections. We think this is very helpful when drilling down to investigate the interactions between a particular group of people.

Another great feature of UCINET/NETDRAW is its ability to visualize interactions from an “ego” perspective. By selecting an initial “ego”, the software identifies all the individuals in communication with the selected individual and produces a subgraph of communications between them. For example, simply selecting “Carmela Soprano” produced the following subgraph.

"Carmelo Soprano" Ego Network Graph


NETDRAW can be configured to represent the volume of communications as the size of the link:

Network Graph with Link Width Representing Communication Volume


Or with the volume shown in a link label:

Network Graph with Link Label Showing Communication Volume


UCINET offers a range of node centrality measures including Closeness, Betweenness, Degee and Eigenvector. (For information about what these measures represent, see previous blogs or go to: http://en.wikipedia.org/wiki/Betweenness_centrality#Eigenvector_centrality). Once the measures are calculated, nodes can be colorized to represent one of the selected measures. For example the nodes on the sub-graph below have been colorized to represent the value of the Indegree attribute.
It is also possible to filter based on a particular measure. The graph below shows the entire set filtered to show only nodes with high Eignvector counts (a measure of the importance of the individual in the network).

Network filtered by Eigenvector Measure (to show 'Important' individuals only)



UCINET/NETDRAW also has a number of algorithms for analyzing subgroups. For example, in the subgraph below (an “ego” network for Tom Hagen), it has identified 3 factions – represented by the three different colors: red, blue, black.

Graph identifying Factions within a Subgroup


An analysis of cliques in the entire set identified 60 separate groups shown in the graph below.

Graph showing the 60 cliques identified in the data set


What we liked about UCINET/NETDRAW is the ease with which we could explore the involvement of particular individuals in the network using the ego feature combined with the filtering and attribute based node coloring. We also liked the wide range of analysis options which included not only the standard centrality measures but also various clustering algorithms and analyses of cliques and subgroups. While more extensive documentation would have been helpful, (although we do appreciate that this was initially developed as a research tool), we did appreciate that whatever we did to it, it never crashed and managed to catch any errors gracefully.

No comments:

Post a Comment