What's the point of Project Grok?
In yesterday's Emacs Conf, +Steve Yegge
gave a talk about Project Grok. (Everything I say here is based on the content of that talk, rather than on the hour-long video on the same subject, because I was less than inspired by what I heard and when I'm less than inspired by something I am not terribly interested in spending an entire hour listening to a video on the subject which may well just cover most of the same ground again. Though, uh, looking at the clock now I've written all of this it appears I am
sufficiently inspired to spend an hour ranting on G+ about it. Oh well.)
I'm afraid I can't see any point in it at all. Grok was described as a server-side, sorry, 'in the cloud' system which does semantic analysis and compiler-driven data acquisition on uploaded source code, and delivers the result back to consumers (in particular, Emacs). But... while I am as much in favour of compiler-driven data acquisition and richer metadata on source code as the next compiler-geek developer, this seems absolutely pointless to me. Indeed, with the exception of divergent language coverage there was only one thing Steve did with this million-plus lines of Project Grok code during the talk that I couldn't do with Eric Ludlam's wonderful CEDET already (more on the one exception below). (I suspect, or at least hope, that the compilers explain the stated million-line code footprint of Grok, a horrifying figure when compared to the eighty thousand lines or so required for all of CEDET, including the project-finding code which Grok does not attempt to replace.)
The advantage of semantic analysis and the like to text editors is clearly that you can get realtime feedback on syntactic elements while you're coding. Semantic shows us the advantages of this, both in terms of things like definition completion in the minibuffer and in terms of providing feedback both from and to other machinery like GNU GLOBAL and tagging engines, so that e.g. GLOBAL tags are used to carry out searches initiated by Semantic (which eliminates most need for Semantic to integrate with actual compilers: all it needs is a parser). Often, this happens when you're still writing the code: you have no sooner introduced some variable than you need to use it, maybe a screen or two down in the same function: but even though you may have forgotten its precise type, Semantic has not, and can tell you. But... if you move the analysis phase onto remote servers, you lose that. You can only analyse code you've uploaded. Worse yet, if you work by integrating with a compiler, you can only analyse code that is syntactically correct! Most of us tend to go into writing frenzies perhaps for weeks and then stamp the compilation errors out afterwards: that's unavoidable inasmuch as until that writing frenzy is over there may be missing functions and the like so the code won't compile anyway. With a compiler-integrated thing like Grok, that cannot work, and you can only get semantic markup on things that you've finished working on!
So, that's a major strike against the whole idea. Are there any others? Yes, and I mentioned these to Steve though he elegantly dodged them. Firstly, services in the cloud can go away forever without warning. Steve interpreted this as a question about SLAs, which is a masterpiece of dodging given that I'd just mentioned Google Reader and Google Code Search for comparison. The problem is not that the thing may go down for maintenance at any time -- after all, even if it had a mythical 100% SLA your network connection can still go down at any time! No, the problem is that unless you're paying for it it can go away completely and forever
whenever Google so desires -- just as Google Code Search did, and other projects: Google has a track record of doing just this. Not noticing this requires such fast dancing that I can only assume it was intentional on Steve's part, not least because judging by his talk Google Code Search and Project Grok were at one point related, so he can't have failed to note that it's gone away for good. Worse yet, from corporations' point of view, even if they're paying for it, it can still
go away completely and forever, unless you have contract terms saying that Google may not withdraw it (good luck with that) -- or, alternatively, unless the server source code is available and is not so tied to magical Google stuff that it'll work on normal systems (something else I asked about, got no response to, and very much expect the answer is 'no'). I don't want my development environment to be broken when Google loses interest.
Secondly, uploading things to the cloud means that Google can see them. This isn't so problematic for free software -- after all, Google can see that anyway -- but not all software is free, and personally I don't want to have to jump through mental hoops whenever I write something thinking 'is this free? No? I'd better turn off Grok and use some other semantic analyzer', which will, of course, almost certainly have a different interface, meaning that the UI of my editor now changes depending on whether I want to let Google see my code. (The name of my present employer just makes the question all the more pointed to me: is Oracle likely to want Google to see all its source code, ever? Uh, no. So Steve's vision of all the world using a cloud-based build system
A more minor point is that a crucial attribute of anything integrated into a text editor is responsiveness
-- witness the enormous efforts that have gone into making the Emacs redisplay engine fast. Now unless the Emacs-side Grok code downloads everything related to a given source file whenever you load the source file, it's going to have a latency problem. Not a very big one -- Google's latencies are admirably low, for a network service -- but still, making Emacs not block makes the network code for querying Grok into a maze of sentinels and filters which can get fiendishly complicated -- and if you're going to have fiendishly complicated code in there, why not put in a parser and drop all this server-side complexity?
As for that cloud-based build system... words fails me. Build systems vary radically, so radically that there's not really any hope of a cloud-based thing ever understanding how to build all of them, and even if cloud-based things weren't notoriously unconfigurable and didn't often have appalling user interfaces. Software is complex. The thing I'm working on now links with both 32- and 64-bit x86 code. I hope the build system has both available at once, oh and a cross-compiler too, and I hope it has every crazy language imaginable avaiable because people write code generators using during builds in all sorts of barmy things. Last month my build failed because of a compiler bug. How would I fix that? My testsuite needs a custom kernel to run, frequently panics the machine when there's a bug... the cloud-based build system clearly can't actually test
everything it builds. I doubt it can even test GNU coreutils (requires root for full testing, often requires a loopback filesystem with somewhat unusual properties). And even if this cloud-based semantic-analysis and build system utopia were provided... what benefits would it bring? None, that I can see, other than that people could write code from tablets and mobile phones without the horsepower to run compilations. Shame they don't have the keyboards to write code on either. So this would be useful for the single case of a person with a docking station but without access to either local storage or a VPN who nonetheless wanted to do software development. That seems to be not so much a small as a nonexistent niche.
Casting about wildly for possible advantages of Project Grok (discounting its disjoint language coverage over Semantic, which just means that someone needs to write Bovine grammars for more languages), the only one I can see is that a cloud-based thing can see source code that you can't, and so can provide links to arbitrarily much code that you don't have locally. However, unless there is also a TRAMP module in Project Grok that connects to some Googly network filesystem, I don't see how that's going to do you any good, because you can't see that code in your editor!
(Oh, the one feature I saw that Grok had and CEDET did not? It could underline syntactic constituents that you could navigate to. To do that efficiently with gtags we'd need to get gtags to generate a Bloom filter or something to allow CEDET to rapidly mark such tags up. I'm not going to implement it because I thought the feature looked horrendously visually distracting. If you don't know what identifiers are and the syntax highlighting doesn't help enough, I don't see how underlining them would improve matters.)