Monthly Archive for August, 2009


Now that the summer’s over, there’re still a few things for me to finish up with regard to my project. While the tool’s core functionality is implemented, it still lacks a few features to make it fully usable, such as a dialog for setting options like how many topics to generate. Additionally, I plan to implement a system overview visualization which will present all of the classes in a project, colored according to the topic that they belong to. This view will make use of a tree map like the single topic view, but will visualize the distribution of topics over all of the system’s classes.

Finally, once my plugin is polished, I’ll also be working on a paper introducing the tool to submit to conferences early next year. Anyway, I hope you’ve enjoyed what you’ve read about my research; if you happen to have any questions, I’ll certainly do my best to illuminate things if you post a comment or send me an email.

The Design

As alluded to in the previous post, the dependency link view of X-Ray and the Tree Map were two visualizations that I thought might prove particularly useful in presenting our topic information. Incorporating these elements, the design I came up with uses two primary views. The first aims to introduce all of the topics that our tool has found, giving the user an idea of what concepts they encapsulate and how they might relate to each other. The second view presents a single topic in depth, showing the source documents that make it up so that the user can extend their understanding of the abstract concepts in a body of source code to the concrete source that implements the concepts.


Topics Overview

The overview presents all of the topics, along with the top few words that are most associated with each topic and the top few packages or classes that are associated with the topic. This information will hopefully be enough to get an idea of what concepts the topic encompasses. Additionally, this view presents the dependency links between documents in different topics; each time code in one topic refers to code in another topic, that’s represented by a corresponding arrow between the two topics. This should hopefully give the user some general idea of how the concepts relate to each other.


Single Topic View

The single topic view, obtained by clicking on one of the topics in the topic overview, displays the classes that are associated with the topic the user has selected. These are displayed in a tree map according to their place in the package hierarchy, the typical means of organizing code in a system written in Java. The size of any box in the tree map represents a particular variable; at the moment, either that class’s size in lines of code or the degree to which it belongs to the topic it has been placed in. At the moment, to navigate back to the overview the user right clicks anywhere on this view.

And that’s an overview of the design’s goals and how it tries to accomplish them.

Visualization Options

After getting the interface lined up between a LDA library, JGibbsLDA, and the Eclipse IDE that our tool will be based in, I started looking into the myriad options for visualizing the information we can generate. An incredible number of visualizations of software systems have been proposed over the years, from the 3d CodeCities which maps buildings to source files and blocks to groups of files to SeeSoft‘s line view, which simply maps a line of pixels to a line of source code in a file. The difficult part of our project is determining what visualization, or combination of visualizations, might actually be best for the information we want to visualize.

CodeCity Visualization SeeSoft Line Visualization

In the past, one of the visualizations that has been proposed for topics by Ducasse, et al. is the Distribution Map. This quite simply displays source documents in their respective packages, and displays the topics to which they most belong by means of the document’s color. While this view provides an interesting overview of a system and the spread of topics within it, it doesn’t seem to be the best view for our main goal, which is allowing for the exploration of unfamiliar systems. Another related view that Kuhn, et al. investigated is the rather literal Software Map, which draws a topographic-style map reflecting similarities between code documents by the distance between then, and the size of the documents by the topographic size of their “hills”.

Distribution Map Software Map

Since one of the additional pieces of data that I’m interested in integrating into our visualizations is the structural links between source documents, another interesting view is that implemented by the X-Ray tool. One of X-Ray’s views involves drawing arrows between code elements, arrows that represent one code document’s dependency upon another. I think this information could be very useful in addition to the topic information I discussed in a previous post, and so X-Ray’s visualization is very interesting for my project.

xray1.png xray2.png

Finally, another visualization, introduced in 1992, that I think could be interesting to use in our tool is the Tree Map. Put to good use in a number of tools that visualize how your hard drive’s space is used up, Tree Maps can be useful in any context in which you have hierarchal documents with a size property.

tree map Tree Map 2

Having considered all of these myriad visualization options, along with a number of other possibilities,  I had to select the combination which would best fit our goal of providing for the exploration of a topic map of a software system, and then implement the visualization in our Eclipse plugin.