Visualization of an email communication network
Social network analysis software enables the interactions and relationships between individuals or organizations to be modeled and visualized in a graphical format with individuals/organizations are represented as nodes and interactions/relationships as edges. In recent months, such software has been used extensively to model and analyze behavior in social spaces such as Facebook, Twitter etc. It seemed to us that it might also be useful in analyzing and understanding communication patterns in emails.
Emails are a key source of information in litigations (witness the publication of significant emails in the recent Goldman Sachs case) and are also monitored for compliance reasons in regulated industries. While most research of email data involves some form of keyword searching, there are occasions when it is important to understand who is in communication with whom: particularly if an investigation is at an early stage and may need to be broadened.
Understanding patterns of communication (as evidenced by email traffic) is also important when investigating why projects are failing or teams are not performing effectively. There is a substantial body of research that shows that communication issues are one of the primary reasons behind failing projects and dysfunctional teams. Analyzing and understanding the pattern of communications within a team or department can help business leaders and project managers identify where the breakdowns are occurring and target remedial action.
Gephi is an open source tool for visualizing networks (http://www.gephi.org/). It runs on Windows, Linux and Macs. While it will import files in a variety of formats (including CSV), the recommended format for importing data is .gexf – graph exchange XML format (see: http://gexf.net/format/index.html). GEXF is an XML based file format that is straightforward to generate once basic email metadata has been extracted and stored in a SQL database.
Some of the visualizations we were able to generate from an anonymized email set using this procedure are shown below. Gephi is very flexible in allowing for a range of different network representations and filtering so, for example, only highly connected individuals are shown. On the downside, it is still in alpha and, from our experience, not particularly robust. It crashed several times while we were attempting simple operations like adding text labels. While easy to import and export data, to use it effectively, some knowledge of the mathematics behind graphing and network analysis is helpful.
Visualization of Key Communicators in an Email Network
Use of Color to Show SubGroups within an Email Communication Network
Close-Up Showing Degree of Communication (as Line Thickness) between Participants in an Email Network