While I like web services that pop in the internet now and then and use them extensively, for some kind of data I fear these services vanishing in the future and letting me down. We all know this happened several times in the past, things like "multiply" and "orkut"were growing and then disappeared.
Then I keep my own services for some of those like my blog and photo gallery. In the blog I write some tech articles and for photo gallery it is just a backup of photos I want to keep as I usually post them to Flickr and Facebook as well. In both cases they are only updated by me or family members and thus new content is rarely added.
The software I was using were http:///wordpress.org
for blog and http://galleryproject.org/
for photos. They are based on PHP and usually supported by most web hosting. While they are good for more complex and dynamic use, for me they are a source of constant updates and attempts to breach server security. The net result is that they cause me more work than what they save.
Since I'm moving my servers from shared dreamhost.com
to a private amazon aws (thanks to +Osvaldo Santana Neto
for the hint about a local data center in São Paulo), I also decided I'd simplify these services and replace the dynamic stuff with pre-generated static files.
is now much more server and client friendly, consumes less bandwidth and is easier to cache on both sides. To add new data I just write a new html file and run the script to parse it and update relevant files such as indexes, categories, tags, archives and feed. No login, no php, no attempts to breach security or brute force login attempts to pollute my httpd logs.
But the worse part was gallery, as it was super-slow. Due its horrible upload system I already used rsync to upload files to server and then process them using "add files from server". The thumbnail generation would often fail, the php execution would be aborted and so on. Of course the thumbnailing is always implemented in the worst possible way, usually by calling a huge tool like imagemagick or netpbm for each file and desired size. These tools are great, but for simple tasks such as generating a smaller version of an image you don't need them.
for one good example.
Then I was left with the thumbnail generation and creating navigation between albums. While there are some tools such as http://sigal.saimon.org/
, they did not provide all I needed in terms of speed (sigal, fgallery) or multiple albums (fgallery). However they provided me with hope and good ideas (like fgallery's idea to center cropped photos based on face detection). They were also quite painful to get running on a "stable" server, my AWS is running CentOS 6.5 and the newest software may be from 1999 :-P.
One of the largest source of slowness in these tools is the fact that they are naive and open the actual image at its full size, then scale it down to the desired target size, then repeat the process if there are multiple sizes of a given image. Even for the average programmer it should be clear that repeating an expensive process over and over again (ie: open file, read headers) is not good. But if you know graphics and JPEG standard you should know you don't need to load the whole image pixels if you have a near macroblock that would do. Say you want the image to be scaled to 1:4, then you can load up to that macroblock size and skip the pixels that region would represent -- it is already scaled for you, saving disk reads, memory and cpu cycles to scale many more pixels than you need.
Luckily enough I know +Carsten Haitzler
and remembered he wrote http://svn.enlightenment.org/svn/e/OLD/epeg/
some years ago doing very efficient jpeg thumbnailing. The software was deprecated as the features were incorporated directly into Evas jpeg image loader, but it still works perfectly and if matched with libjpeg-turbo will run even faster. Bonus point because it is very small and only depends on libjpeg, being great to be used in a "stable" server environment like CentOS 6.5.
As in the past 11 years I'm doing computer graphics related work I felt obligated to do something better, so I decided to create a new software: egg (efficient gallery generator, and starts with "e" as a tribute to Enlightenment project) that uses epeg to do its work. I should publish it soon, but it will ship as a single binary to generate stuff in the way I need (should be usable by others). Later on I plan to include support for png/raw images as they are also part of my library and video thumbnailing (likely using libavcodec).
My goal with egg is to be efficient in every possible way, like avoiding useless memory allocations, using efficient directory walking such as openat(2)/fstatat(2)/mkdirat(2), instructing the kernel on the usage pattern of file and memory with posix_fadvise(3)/posix_madvise(3), using CPU vector instructions and so on.
The output will be only images and JSON with extra information which can be converted to something else at client side (ie: to use with http://galleria.io
) or server side.
Note: googling for epeg I found there are some bindings for it, so you can use from your own server infrastructure: Perl (https://github.com/tokuhirom/image-epeg
), ObjC (http://lists.apple.com/archives/cocoa-dev/2004/Jan/msg00955.html
). Should be easy to add python or php.