Ugh... Real Data.
Too. Much. Data. I really don't know what to do about these large projects. There are just too many relations. Here is a sample Flare visualization I've managed to construct from a climate model repository (it is a graph of file relatedness):
The files are in text along the circumference of the circle, grouped by directory. When you mouse-over a file, it highlights all its related files in red. This visualization clearly presents too much data (the file names are too tiny to read!). Grabbing the information from the Trac database and outputting it to an HTML file took about 40 seconds. It took Flare another 20 seconds to construct the visualization from this HTML file. And the red highlighting stuff is very slow. Current thoughts:
- we visualize the entire repository by collapsing directories; if there are too many directories we'll just have to give an error message telling them to browse to a smaller sub-directory
- perhaps we should only visualize recent revisions (or let this be user specified)
- we only analyze revisions with less than ~25 files committed at a time (under the assumption that larger commits are more likely to be some sort of initial check-in, rather than an actual change to the code? I don't quite know how to justify this, but the analysis takes too long otherwise)
- allow users to see detailed file-relatedness graphs for individual files only (i.e. user picks a file, and we'll show a graph of its related files.. perhaps as a flare force-directed graph, rather than the circle one?)

1 Comments:
Ainsley - great work on getting Flare working. This visualization does contain way too much information, but I have to say, it's incredibly pretty. If you turn it on it's side one way it looks happy and turn it the other way, it looks sad :-)
Anyway, I think you've now stumbled across the essential research question - how do we pick out just enough information from this data to be useful for the programmers? Let's brainstorm on this tomorrow.
Post a Comment
<< Home