A few more improvements to +Samuel Lampa's GC content. I concentrated on the four best results (D and C) and tried to beat them. As we all know, memory access is faster than disk access, so this time I read 1024 bytes (1 KByte) each time instead of 1 line. In the end I found that D (with LDC) can be as fast as C (with GCC).

Results and algorithms can be found here: http://goo.gl/57DW4
