First the bounding polygon (10K points!) is drawn in red, then the BKD tree intersection takes over, recursively visiting all previously indexed cells (shown in gray), testing each point in the cell to see if it's inside the polygon. The cells were created during indexing by recursively partitioning space, alternating latitude then longitude, until the leaf cell has between 512 and 1024 points. Areas with a high point density result in very small cells.
It's interesting to me how many cells wind up "lopsided" as long slivers instead of being closer to squares; I didn't expect this, and it shows how important it is to visualize the things you work on. Or maybe it's just a bug!
The animation also makes one limitation clear: the search recursion now visits all cells that overlap the enclosing bounding box of the polygon, but this is clearly wasteful as you see cells outside the polygon, but inside its outer bounding box. To fix this, we need a fast way to check whether the shape overlaps an arbitrary axis-aligned rectangle.
The BKD approach differs from other space partitioning structures like quad trees (http://en.wikipedia.org/wiki/Quadtree) and geohash (http://en.wikipedia.org/wiki/Geohash) because it's data-driven, drawing lines depending on the data set, not static, drawing fixed lines in space regardless of what you are indexing. It makes it a bit more costly at indexing time, but then at search time it's very fast: ~5.7X faster than #Lucene 's geohash implementation for various bounding-box searches around London.
It can only index points, which should be the common case for spatial search with #Lucene .
Many thanks to http://openstreetmap.org for providing the base map image, bounding polygon for London, and the full database of points and relations (I indexed a ~60 million subset for this animation).
Over time we will fix lots of other queries/filters to break themselves into cheap and expensive too. E.g. a distance filter can be a cheap bounding box or polygon check, plus an expensive per-hit distance calculation.
Exciting times for #Lucene .
- JSS Academy of Technical Education, NoidaInformation Technology, 2003 - 2007
- St. Joseph's College, AllahabadHigh School, 1988 - 2002
- LucidWorksEngineer, 2012 - present
- AOLPrincipal Software Engineer, 2007 - 2012
- Onyomo.comIntern, 2006 - 2006
Notes from Startup School Europe (London) — The Inflexion
Notes from YC Startup School Europe, in London.
These 23 Charts Prove That Stocks Are Heading For A Devastating Crash
If you've been doubting whether the stock market is experiencing a bubble, these 23 terrifying charts will put those doubts to rest.
Lettuce See the Future: Japanese Farmer Builds High-Tech Indoor Veggie F...
Humans have spent the last 10000 years mastering agriculture. But a freak summer storm or bad drought can still mar many a well-planted harv
Book Excerpt: Exposing India’s Blood Farmers | Science | WIRED
A few days before the Indian celebration of Holi, an emaciated man with graying skin, drooping eyes, and rows of purple needle marks on both
Here Is The Mystery, And Completely Indiscriminate, Buyer Of Stocks In T...
With the Fed having tapered its liquidity injections into the stock market from $85 billion to "only" $45 billion per month, retail investor
Testing Lucene's index durability after crash or power loss
One of Lucene's useful transactional features is index durability which ensures that, once you successfully call IndexWriter.commit, even if
NoSQL Meets Bitcoin and Brings Down Two Exchanges: The Story of Flexcoin...
Flexcoin was a Bitcoin exchange that shut down on March 3rd, 2014, when someone allegedly hacked in and made off with 896 BTC in the hot wal
A veteran programmer explains how the stock market became “rigged”
A conversation with Eric Scott Hunsader, a well-known critic of high-frequency trading.
Staking $1 Billion That Herbalife Will Fail, Then Lobbying to Bring It Down
The activist hedge fund manager William A. Ackman bet a billion dollars on the collapse of the nutritional supplement company Herbalife, the
The Netflix Tech Blog: The Netflix Dynamic Scripting Platform
At the core of the redesign is a Dynamic Scripting Platform which provides us the ability to inject code into a running Java application at
Gold Fix Study Shows Signs of Decade of Bank Manipulation
The London gold fix, the benchmark used by miners, jewelers and central banks to value the metal, may have been manipulated for a decade by
Ken Shirriff's blog: Bitcoin mining the hard way: the algorithms, protoc...
This article explains Bitcoin mining in details, right down to the hex data and network traffic. If you've ever wondered what really happens