TL;DR: Humans are stupid when it comes to thinking about performance. ALWAYS profile your code and benchmark your assumptions.

I had dental surgery last week and I'm still recovering. I'm operating at 70% power right now and my wife won't let me go to work until I'm at 95% (should be tomorrow). The least I can do is work on YouCompleteMe ( from home a bit under her watchful eye.

YCM runs several completion engines inside of Vim; this is a problem. First, it takes a while to start up and (cleanly) shut down the various background threads those engines use and this adds to Vim startup and shutdown latency. Second, libclang (one of the semantic engines) has a bad habit of crashing on us every now and then and since everything is running inside the Vim process, all of Vim gets taken down and your unsaved changes laugh at you and die. People don't like that. I get clubbed to death with angry email when that happens.

These issues have been plaguing both Google-internal users of YCM and external ones (, although it's a much bigger problem for internal users because of various peculiarities of Google infrastructure that aren't relevant.

IMO the best way to handle this is to split YCM into a client/server model: a small client inside Vim talks to a separate, local server process that does most of the work.

 - no Vim startup or shutdown latency caused by YCM; all of the setup work is done in the other process
 - libclang crashes take down the server, Vim shrugs it off and transparently reboots it, no work is lost; at worst the user is slightly annoyed
 - should be impossible to block Vim's GUI thread; YCM tries really hard not to do it already, but Vim really really wants to block. Let's see it do that for code that's in a separate process, the fucker...
 - possible to reuse the YCM server for code completion in different editors like emacs, Sublime Text etc

 - possibly slower

It's that single drawback I listed that's been causing me pain these last few weeks. Here's a worst-case scenario that's entirely plausible: you're working on many different files at once. Let's say they're big C++ files, and you have 4 unsaved buffers. We need to ship the contents (not just filepaths) of all of those files to the server because it needs to compile the unsaved version of your files, not the state that's on disk.

Let's assume that the files are 25kb. So 100kb (uncompressed) needs to be shipped to the server for every keystroke in the editor. At a typing speed of 60 words per minute, we have a 200ms budget for the round trip.

A warmed-up libclang can already take 400ms on occasion (usually it's sub-100ms), although YCM does caching and prediction like crazy so this doesn't hit the user often, and when it does, it's not as painful. But we certainly don't want the client-server communication to be eating into our budget.

YCM is open source so only open source communication mechanisms are available. I considered Apache Thrift, BSON, MessagePack, custom schemes using Protocol Buffers, yadda yadda. I settled on using JSON over HTTP at least initially because it's so broadly understood and it would be trivial to implement for both myself and possible future writers of YCM clients.

So how slow would this be? The only correct answer: I have no clue, let's benchmark it!

YCM is written in Python already, so from the perspective of ease of implementation, a Python server would be ideal (I don't have to rewrite everything from scratch).

So, (pure Python) as the microframework running on top of CherryPy (also pure Python) as the HTTP server. creates a representative blob of JSON with 4 random files totaling 100kb, serializes the whole thing and ships it to where it's deserialized and a JSON response is sent back (again, serialized then deserialized). We do this several hundred times.  All of the JSON serialization and deserialization is done with the pure Python 'json' module. So everything is about as slow as you can get.

Here's the benchmark code:

There's a README there explaining how to set everything up if you want to run it yourself.

And here are the results:

All times in milliseconds
count   500.000000
mean      3.078773
std       0.855418
min       2.817416
25%       2.864671
50%       2.896655
75%       3.005898
max       5.882284

So the cost of this communication overhead I was so worried about? About 3ms for the round trip, or ~1.5% of my budget. And that's for the worst-case scenario with the slowest code I can come up with that's still reasonable. Don't forget, latency is all we care about here, throughput is irrelevant.

This was all run on my iMac (3.4GHz Core i7) at home, running system Python 2.7.2 on Mountain Lion. I didn't bother to turn off HyperThreading or other background processes.

Things I expected I'd have to do, in order I expected to have to do them until performance became acceptable:

 1. Swap out the default JSON module for ultrajson (
 2. Compress the data (Snappy comes to mind)
 3. Use something other that JSON+HTTP (Thrift?)
 4. Run the server under PyPy
 5. Rewrite pretty much all of YCM server code in something like Go or C++.

All of that would be a complete waste of my time since even the "slowest" solution is so damn fast that it just doesn't matter.

Conclusion: benchmarks are your friend. Don't waste time trying to figure out how you're going to make something faster until you're sure it's actually slow.

I keep having to relearn that...
Shared publiclyView activity