span.fullpost {display:none;}

Wednesday, November 18, 2009

First evaluation result

I have a (pretty terrible) result: with the way things stand right now, tracSnap is able to predict 16% of the people who comment on tickets related to Hadley model defects.

How I arrived at this result:
  1. Jon gave me a list of tickets (defects) that he was able to associate with a specific revision of the repository.
  2. For each such ticket, I looked at the files that were involved in its associated revision, and used tracSnap to get the "experts" for those files.
  3. I compared this list of experts with the people who actually helped to fix the defect, and checked to see if they are the same people.
I have a hunch that I can improve this statistic by finding a better way to relate changes in the branches to the files in the trunk. Right now, if you make changes in your branch, you are an 'expert' on only the files in your branch, and not the corresponding file in the trunk. So if you aren't the one to merge your changes to the trunk, you never get associated with the trunk files, and thus will not get suggested as an expert.

As of yet, I've only looked at the prediction rate using the "experts", and not the ticket reporter's "suggested contacts". I suspect that both analyses will produce similar results, since the suggested contacts and experts are not independent (contact suggestions rely partially on the expertise calculations).

16%... lame.

Wednesday, November 11, 2009

tracSnap Evaluation Ideas

I'm working on tracSnap as a CSC495 credit this year. Taking a half course spread out over the whole year is a strange experience... I can never find a large block of time to work on this! I will probably get most of this work done over Christmas break, so that I can focus. So far, I've been working with Jon and Steve to come up with a plan on how to evaluate the tool. Here's what we've come up with:
  1. File relation stuff: Using the file relations graph, try to find some interesting implicit file dependencies that maybe Hadley isn't aware of. And then we tell them, and ask if its useful information. This will be difficult, because there are a lot of confusing file names I will have to make sense of in order to determine which relations are the less obvious ones.
  2. Social/Expertise stuff: Jon has identified tickets that seem to be defects in the code. So, if we look at the data that tracSnap had access to at the time the ticket was reported, would tracSnap have predicted which experts fixed the defect? (i.e.) Are the experts/social contacts suggested by tracSnap the same people that commented on the ticket, and helped to fix the defect?
I've been working on (2.) so far. I've had to rewrite a fair amount of the code in order to incorporate the necessary temporal component. (Before, tracSnap just used the entire repository, from revision #1 to the most recent revision, to make its suggestions. If we want to see if tracSnap "predicts" the right people to talk to, we can only look at the data that happened before the ticket was created.)

So I think I've got the code re-written as required, and so at the moment I'm waiting as tracSnap calculates all the file relations and experts on the Hadley data (this takes a while! 15000+ revisions!)

Friday, November 6, 2009

Oh yeah...

.. I have a blog! I had forgotten. First important thing to note: Upon shamelessly googling myself today, I've discovered that apparently, I am for sale: click here! What the crap!?!? !!? !?!?!? That picture was from 2003 when I graduated Grade 12, and the Star did an article about me. Yup.. so that's a picture of me, in my backyard, which you can feature on your homepage for the mere price of $240/month. At least they know I'm valuable, I guess?

I don't remember signing anything that said they could do this. Hmm.

Anyways, I'm still working on tracSnap related activities -- mainly evaluating its effectiveness on the Hadley's Centre's system -- as a CSC495 credit this year. I'll write something more thorough later. Too weirded out and tired at the moment.

Monday, August 24, 2009

Wrapping Up!

So tracSNAP is now available to the public from the TracHacks website, here. Everything is commented and documented and in a "stable" state (I hope).

PDF version of the poster here.

Sarah is going to do the screencast.

I think we're done? Yay!

Monday, August 17, 2009

Poster


Tuesday, August 11, 2009

Update

Steve has told us that we should be in "wrap-up" mode now -- trying to get our projects into a stable, well-documented state so that later on, other people can continue to work on them if they want to. I'm trying to get into the appropriate frame of mind for this, but there is still a lot I need to do with Flare, and I can't stop thinking of "Oh! We should add to our project!"

At the moment, I think I've gotten the visualizations down to a more reasonable form. The directories are collapsed. Clicking a d
irectory will take you to the source code browser for that directory.
The social graphs are also up and running, although I think they might be better displayed as a force-directed layout, rather than this circle one.

List of things that still need to be done:

  • figure out how to tell Flare to display certain nodes/edges in a different colour (I want the person's immediate social network to be a different colour, etc.)

  • when viewing "my files" give the option to view only the most recent revision / the last month's revisions / all revisions (this is going to change the database schema though.. so maybe not)

  • after looking at "overall files", let users select directories that they want to zoom in on (so that they can see the file connections, not just directory connections)

  • provide fullscreen versions of the visualizations

  • visualize the social connection between two given people (force-directed?)

  • make poster for the August 18th poster session

Wednesday, August 5, 2009

Ugh... Real Data.

Too. Much. Data. I really don't know what to do about these large projects. There are just too many relations. Here is a sample Flare visualization I've managed to construct from a climate model repository (it is a graph of file relatedness):

The files are in text along the circumference of the circle, grouped by directory. When you mouse-over a file, it highlights all its related files in red. This visualization clearly presents too much data (the file names are too tiny to read!). Grabbing the information from the Trac database and outputting it to an HTML file took about 40 seconds. It took Flare another 20 seconds to construct the visualization from this HTML file. And the red highlighting stuff is very slow.

Current thoughts:
  • we visualize the entire repository by collapsing directories; if there are too many directories we'll just have to give an error message telling them to browse to a smaller sub-directory
  • perhaps we should only visualize recent revisions (or let this be user specified)
  • we only analyze revisions with less than ~25 files committed at a time (under the assumption that larger commits are more likely to be some sort of initial check-in, rather than an actual change to the code? I don't quite know how to justify this, but the analysis takes too long otherwise)
  • allow users to see detailed file-relatedness graphs for individual files only (i.e. user picks a file, and we'll show a graph of its related files.. perhaps as a flare force-directed graph, rather than the circle one?)
There's a lot to think about! Today I also worked on cleaning and commenting the code. Its fun to try to decipher my original logic!