Then my software will not use opencv as I disliked this project... too bad nobody did a sane alternative (or I missed it?)
I'll skip the face detection for now and proceed to generate the JSON representing each album and hierarchy. Then I'll integrate with galleria.io (JS/CSS), I got a modified theme that works great on desktop and my iPhone, after that I'll resume working on these fancy bits such as face detection for better cropping.
The start was already interesting: algorithms work on grayscale (8bit) input, often acquire by converting RGB to Y (luminance). After some research found that the naive (R+G+B)/3 is suboptimal as components should weight differently... okay, I knew that, but I didn't remembered the weights. Anyway that was my "baseline" implementation.
Then found that TIFF spec have some weights that libjpeg-turbo simplifes as 0.29900 * R + 0.58700 * G + 0.11400 * B. This was my second implementation, just like that, using FPU.
Of course FPU would be slow (more than I expected), then libjpeg already does a fixed-point implementation at 16bits precision -- simple to implement and very close results. Although libjpeg-turbo uses a lookup table (storing all calculations for all 256 R, G an B), I dislike allocating memory for these and various platforms will have different costs, likely on x86 it will fit into L2 caches and be fast, while on others it won't and the cachemiss will be a PITA. So my 3rd version was without lookup table, doing calcs for every pixel: (19595 * R + 38469 * G + 7471 * B) >> 16.
Then my feeling of doing such kind of software tells me that all those multiplications would be expensive. Mulling during my flight I came with a series using only shifts to approximate the result, not as close as the fixed-point implementation above, but good enough for my purposes:
/* approximate to nearest division by 2:
* 0.29900 -> 1/4 = 0.250
* 0.58700 -> 1/2 = 0.500
* 0.11400 -> 1/8 = 0.125
* 0.99999 -> = 0.875
* Series to be near 1.0: 0.875 * (1 + 1/8 + 1/64) = 0.998046875
* r = 0x0000ff, (((r >> 0) << 16) >> 2) = r << 14
* g = 0x00ff00, (((g >> 8) << 16) >> 1) = g << 7
* b = 0xff0000, (((b >> 16) << 16) >> 3) = b >> 3
const unsigned int r = (color & 0x0000ff) << 14;
const unsigned int g = (color & 0x00ff00) << 7;
const unsigned int b = (color & 0xff0000) >> 3;
const unsigned int c = (r + g + b);
*dst = (c + (c >> 3) + (c >> 6)) >> 16;
But was this last version faster? what is your guess on the fastest version? I can tell you FPU was slower by a 6-7x margin and that what I supposed to be the fastest wasn't!
I hopped the shift-only version (last) would be faster than the one that multiplies, but not on x86. Maybe on ARM but I have nothing at hand to test it.
Then I keep my own services for some of those like my blog and photo gallery. In the blog I write some tech articles and for photo gallery it is just a backup of photos I want to keep as I usually post them to Flickr and Facebook as well. In both cases they are only updated by me or family members and thus new content is rarely added.
The software I was using were http:///wordpress.org for blog and http://galleryproject.org/ for photos. They are based on PHP and usually supported by most web hosting. While they are good for more complex and dynamic use, for me they are a source of constant updates and attempts to breach server security. The net result is that they cause me more work than what they save.
Since I'm moving my servers from shared dreamhost.com to a private amazon aws (thanks to for the hint about a local data center in São Paulo), I also decided I'd simplify these services and replace the dynamic stuff with pre-generated static files.
But the worse part was gallery, as it was super-slow. Due its horrible upload system I already used rsync to upload files to server and then process them using "add files from server". The thumbnail generation would often fail, the php execution would be aborted and so on. Of course the thumbnailing is always implemented in the worst possible way, usually by calling a huge tool like imagemagick or netpbm for each file and desired size. These tools are great, but for simple tasks such as generating a smaller version of an image you don't need them.
Then I was left with the thumbnail generation and creating navigation between albums. While there are some tools such as http://sigal.saimon.org/ and https://github.com/wavexx/fgallery, they did not provide all I needed in terms of speed (sigal, fgallery) or multiple albums (fgallery). However they provided me with hope and good ideas (like fgallery's idea to center cropped photos based on face detection). They were also quite painful to get running on a "stable" server, my AWS is running CentOS 6.5 and the newest software may be from 1999 :-P.
One of the largest source of slowness in these tools is the fact that they are naive and open the actual image at its full size, then scale it down to the desired target size, then repeat the process if there are multiple sizes of a given image. Even for the average programmer it should be clear that repeating an expensive process over and over again (ie: open file, read headers) is not good. But if you know graphics and JPEG standard you should know you don't need to load the whole image pixels if you have a near macroblock that would do. Say you want the image to be scaled to 1:4, then you can load up to that macroblock size and skip the pixels that region would represent -- it is already scaled for you, saving disk reads, memory and cpu cycles to scale many more pixels than you need.
Luckily enough I know and remembered he wrote http://svn.enlightenment.org/svn/e/OLD/epeg/ some years ago doing very efficient jpeg thumbnailing. The software was deprecated as the features were incorporated directly into Evas jpeg image loader, but it still works perfectly and if matched with libjpeg-turbo will run even faster. Bonus point because it is very small and only depends on libjpeg, being great to be used in a "stable" server environment like CentOS 6.5.
As in the past 11 years I'm doing computer graphics related work I felt obligated to do something better, so I decided to create a new software: egg (efficient gallery generator, and starts with "e" as a tribute to Enlightenment project) that uses epeg to do its work. I should publish it soon, but it will ship as a single binary to generate stuff in the way I need (should be usable by others). Later on I plan to include support for png/raw images as they are also part of my library and video thumbnailing (likely using libavcodec).
My goal with egg is to be efficient in every possible way, like avoiding useless memory allocations, using efficient directory walking such as openat(2)/fstatat(2)/mkdirat(2), instructing the kernel on the usage pattern of file and memory with posix_fadvise(3)/posix_madvise(3), using CPU vector instructions and so on.
The output will be only images and JSON with extra information which can be converted to something else at client side (ie: to use with http://galleria.io) or server side.
Note: googling for epeg I found there are some bindings for it, so you can use from your own server infrastructure: Perl (https://github.com/tokuhirom/image-epeg), ObjC (http://lists.apple.com/archives/cocoa-dev/2004/Jan/msg00955.html). Should be easy to add python or php.
- IntelManager, 2013 - present
- ProFUSIONOwner, 2008 - present
- INdTSoftware Engineer, 2006 - 2008
- IBMSoftware Engineer (Intern), 2003 - 2005
Owner of ProFUSION, company focused on embedded systems. Expertise areas include graphics, multimedia, connectivity and other infrastructure blocks.
- UNICAMP2001 - 2005
- COC1997 - 1998
- Liceu Albert Sabin1999 - 2000
- SEMAR1989 - 1996
billiob's blog - Terminology at the EFL Dev Day 2013
Slides from my talk about terminology at the EFL Dev Day 2013
Optimizing hash table with kmod as testbed | Politreco
One thing that caught my interest lately was the implementation of hash tables, particularly the algorithms we are currently using for calcu
Dilbert comic strip for 03/17/2012 from the official Dilbert comic strip...
The Official Dilbert Website featuring Scott Adams Dilbert strips, animation, mashups and more starring Dilbert, Dogbert, Wally, The Pointy
ProFUSION | ProFUSION is part of the GENIVI alliance
GENIVI is a non-profit industry alliance committed to drive the broad adoption of an In-Vehicle Infotainment (IVI) open-source development p