Profile cover photo
Profile photo
Gustavo Barbieri
Lucky Man
Lucky Man

Gustavo's posts

Post has attachment
Acabei de disputar o evento 8 O Desafio! Cheguei em 1º no meu Porsche 911 Targa (1974)! 

As in the previous post, opencv is not fun. It's also fun to write simple and fast code once you have a very specific purpose, that led me to try face detection on my own -- after all I just need a rough match so I can align crop (0.0-1.0 in both xy-axis), avoiding pictures with faces at 1/4 or 3/4 grid lines to be cut.

The start was already interesting: algorithms work on grayscale (8bit) input, often acquire by converting RGB to Y (luminance). After some research found that the naive (R+G+B)/3 is suboptimal as components should weight differently... okay, I knew that, but I didn't remembered the weights. Anyway that was my "baseline" implementation.

Then found that TIFF spec have some weights that libjpeg-turbo simplifes as 0.29900 * R + 0.58700 * G + 0.11400 * B. This was my second implementation, just like that, using FPU.

Of course FPU would be slow (more than I expected), then libjpeg already does a fixed-point implementation at 16bits precision -- simple to implement and very close results. Although libjpeg-turbo uses a lookup table (storing all calculations for all 256 R, G an B), I dislike allocating memory for these and various platforms will have different costs, likely on x86 it will fit into L2 caches and be fast, while on others it won't and the cachemiss will be a PITA. So my 3rd version was without lookup table, doing calcs for every pixel: (19595 * R + 38469 * G + 7471 * B) >> 16.

Then my feeling of doing such kind of software tells me that all those multiplications would be expensive. Mulling during my flight I came with a series using only shifts to approximate the result, not as close as the fixed-point implementation above, but good enough for my purposes:
        /* approximate to nearest division by 2:
         *     0.29900 -> 1/4 = 0.250
         *     0.58700 -> 1/2 = 0.500
         *     0.11400 -> 1/8 = 0.125
         *     0.99999 ->     = 0.875
         * Series to be near 1.0: 0.875 * (1 + 1/8 + 1/64) = 0.998046875
         * r = 0x0000ff, (((r >>  0) << 16) >> 2) = r << 14
         * g = 0x00ff00, (((g >>  8) << 16) >> 1) = g <<  7
         * b = 0xff0000, (((b >> 16) << 16) >> 3) = b >>  3
        const unsigned int r = (color & 0x0000ff) << 14;
        const unsigned int g = (color & 0x00ff00) << 7;
        const unsigned int b = (color & 0xff0000) >> 3;
        const unsigned int c = (r + g + b);
        *dst = (c + (c >> 3) + (c >> 6)) >> 16;

But was this last version faster? what is your guess on the fastest version? I can tell you FPU was slower by a 6-7x margin and that what I supposed to be the fastest wasn't!

Post has attachment
Doing "egg" led me to investigate opencv for face detection as done by fgallery ( what to say? Not so hard to use, but man, to find a face in a string of pixels I need GObject and XML... not to say a lack of personality with all the C-C++ back and forth.

Then my software will not use opencv as I disliked this project... too bad nobody did a sane alternative (or I missed it?)

Post has attachment
While I like web services that pop in the internet now and then and use them extensively, for some kind of data I fear these services vanishing in the future and letting me down. We all know this happened several times in the past, things like "multiply" and "orkut"were growing and then disappeared.

Then I keep my own services for some of those like my blog and photo gallery. In the blog I write some tech articles and for photo gallery it is just a backup of photos I want to keep as I usually post them to Flickr and Facebook as well. In both cases they are only updated by me or family members and thus new content is rarely added.

The software I was using were http:/// for blog and for photos. They are based on PHP and usually supported by most web hosting. While they are good for more complex and dynamic use, for me they are a source of constant updates and attempts to breach server security. The net result is that they cause me more work than what they save.

Since I'm moving my servers from shared to a private amazon aws (thanks to +Osvaldo Santana Neto for the hint about a local data center in São Paulo), I also decided I'd simplify these services and replace the dynamic stuff with pre-generated static files.

With some effort I managed to generate simple html pages for each post of my blog. Then I wrote a python script that parses them and generates index.html and some JSON files with archive, tags and categories. With some javascript I could add back the dynamic behavior, but this time at client's machine, which saves me from server load and security issues. The result is is now much more server and client friendly, consumes less bandwidth and is easier to cache on both sides. To add new data I just write a new html file and run the script to parse it and update relevant files such as indexes, categories, tags, archives and feed. No login, no php, no attempts to breach security or brute force login attempts to pollute my httpd logs.

But the worse part was gallery, as it was super-slow. Due its horrible upload system I already used rsync to upload files to server and then process them using "add files from server". The thumbnail generation would often fail, the php execution would be aborted and so on. Of course the thumbnailing is always implemented in the worst possible way, usually by calling a huge tool like imagemagick or netpbm for each file and desired size. These tools are great, but for simple tasks such as generating a smaller version of an image you don't need them.

More than the slowness to add new pictures, gallery is slow to navigate. The ui feels outdated and you constantly need to load new html to go to the next page. Google shows me that some projects provides javascript galleries based on JSON information, being completely static at server and much more dynamic for user as it's more responsive. See for one good example.

Then I was left with the thumbnail generation and creating navigation between albums. While there are some tools such as and, they did not provide all I needed in terms of speed (sigal, fgallery) or multiple albums (fgallery). However they provided me with hope and good ideas (like fgallery's idea to center cropped photos based on face detection). They were also quite painful to get running on a "stable" server, my AWS is running CentOS 6.5 and the newest software may be from 1999 :-P.

One of the largest source of slowness in these tools is the fact that they are naive and open the actual image at its full size, then scale it down to the desired target size, then repeat the process if there are multiple sizes of a given image. Even for the average programmer it should be clear that repeating an expensive process over and over again (ie: open file, read headers) is not good. But if you know graphics and JPEG standard you should know you don't need to load the whole image pixels if you have a near macroblock that would do. Say you want the image to be scaled to 1:4, then you can load up to that macroblock size and skip the pixels that region would represent -- it is already scaled for you, saving disk reads, memory and cpu cycles to scale many more pixels than you need.

Luckily enough I know +Carsten Haitzler and remembered he wrote some years ago doing very efficient jpeg thumbnailing. The software was deprecated as the features were incorporated directly into Evas jpeg image loader, but it still works perfectly and if matched with libjpeg-turbo will run even faster. Bonus point because it is very small and only depends on libjpeg, being great to be used in a "stable" server environment like CentOS 6.5.

As in the past 11 years I'm doing computer graphics related work I felt obligated to do something better, so I decided to create a new software: egg (efficient gallery generator, and starts with "e" as a tribute to Enlightenment project) that uses epeg to do its work. I should publish it soon, but it will ship as a single binary to generate stuff in the way I need (should be usable by others). Later on I plan to include support for png/raw images as they are also part of my library and video thumbnailing (likely using libavcodec).

My goal with egg is to be efficient in every possible way, like avoiding useless memory allocations, using efficient directory walking such as openat(2)/fstatat(2)/mkdirat(2), instructing the kernel on the usage pattern of file and memory with posix_fadvise(3)/posix_madvise(3), using CPU vector instructions and so on.

The output will be only images and JSON with extra information which can be converted to something else at client side (ie: to use with or server side.

Note: googling for epeg I found there are some bindings for it, so you can use from your own server infrastructure: Perl (, ObjC ( Should be easy to add python or php.

Since it seems g+ is just for tech stuff, I'll try to post some stuff I'm working on and that I did not had the motivation to create a blogpost for it yet. Posts to follow this one.

Post has attachment

Post has attachment

Post has attachment

Post has attachment

Post has attachment
Wait while more posts are being loaded