Here's a followup to this article by the author:
"...TL;DR is that developers just didn't find it useful. Sometimes they knew the code was a hot spot, sometimes they didn't. But knowing that the code was a hot spot didn't provide them with any means of effecting change for the better. Imagine a compiler that just said "Hey, I think this code you just wrote is probably buggy" but then didn't tell you where, and even if you knew and fixed it, would still say it due to the fact it was maybe buggy recently. That's what TWR essentially does. That became understandably frustrating, and we have many other signals that developers can act on (e.g. FindBugs), and we risked drowning out those useful signals with this one.
Some teams did find it useful for getting individual team reports so they could focus on places for refactoring efforts, but from a global perspective, it just seemed to frustrate, so it was turned down.
From an academic perspective, I consider the paper one of my most impactful contributions, because it highlights to the bug prediction community some harsh realities that need to be overcome for bug prediction to be useful to humans. So I think the whole project was quite successful... Note that the Rahman algorithm that TWR was based on did pretty well in developer reviews at finding bad code, so it's possible it could be used for automated tools effectively, e.g. test case prioritization so you can find failures earlier in the test suite. I think automated uses are probably the most fruitful area for bug prediction efforts to focus on in the near-to-mid future."