Profile cover photo
Profile photo
Thomas Egense
I have a math blog and I am generally interested in all kinds of scientific matters. I use mathematics to create art – you can check out my album. I try to find unique content for my posts,which are mostly of scientific nature.
I have a math blog and I am generally interested in all kinds of scientific matters. I use mathematics to create art – you can check out my album. I try to find unique content for my posts,which are mostly of scientific nature.

Thomas's posts

Post has attachment
SolrWayback - another Wayback Machine implementation

If you are involved with Web archiving my SolrWayback Machine may interest you. Fresh from my blog post at work:

#wayback #waybackmachine #resaw #warc #webarchive

Post has attachment
Automated improvement of search in low quality OCR using Word2Vec

The Word2Vec algorithm used to detect OCR errors and custom code on top of that to make search work in spite of poor quality OCR text.

Abstract for my paper to be presented at Digital Humanities in the Nordic Countries 2nd Conference 2017.

Post has shared content
Need improved solution to "Moving Sofa" problem.

For some reason this makes me think of Dirk Gently's Holistic Detective Agency...

Romik's ambidextrous sofa

The ambidextrous moving sofa problem is to find the planar shape with the biggest area that can slide through right-angled turns both to the right and to the left in a hallway of width 1.  

Earlier this year Dan Romik, a mathematician at the University of California Davis, found the best known solution to this problem!   He created this animated gif of it, too.  His shape is bounded by 18 curves, each of which is either part of a circle, or part of a curve described by a polynomial equation of degree 6.   

Nobody has proved his solution is optimal.   We're not even sure that it's locally optimal, meaning that you can't make slight changes in his shape that increase the area and get a shape that still fits down the hallway.  This is an interesting challenge.

For more, including the precise area of this shape, try my blog article on Visual Insight:

I hope you're all having a great holiday!

Each year I try to think of things I can stop doing... so I can do more new stuff.   In 2017, I will try to take a year-long break from posting articles on Visual Insight.  I've been doing two a month for quite a while, I've done 81 of them, and I'm running out of enthusiasm.  Also, right now, a lot of my energy is going into the Azimuth Backup Project.  So, maybe I will save up ideas and restart Visual Insight in 2018.  But perhaps I'll end with a bang on January 1st, 2017.

Animated Photo

Post has shared content
More bad news about Arctic sea ice melting
(and generally bad news about the environmental degradation of our planet)

Today I give thanks for my childhood.  I grew up on a planet where global warming had just begun — a place your children will never know.  

It was a beautiful planet.  It seems like a long time ago. This was before the drought killed 100 million trees in California, a third of all  trees in the state.   New Orleans had not yet drowned under flood waters.  The Great Barrier Reef off the coast of Australia was still healthy, not yet bleached by the raging heat.

But the biggest difference was near the North Pole.   Back when I started college in 1979, the volume of Arctic sea ice in summer was 4 times what is now!

Last winter was especially shocking.  In February, the climate scientist Peter Gleick wrote:

What is happening in the Arctic now is unprecedented and possibly catastrophic.

The extent of Arctic sea ice had shrunk to record lows, while the temperature hit new record highs for winter.   In December 2015, parts of the North Pole were covered with a lake!

A unique event?  No: this year again scientists are shocked!   Here's what I read today on

Freakishly high temperatures in the Arctic driven by heat-packed oceans and northward winds have been reinforced by a "vicious circle" of climate change, scientists said Thursday.

Air above the Polar ice cap has been 9-12 degrees Celsius (16.2 to 21.6 degrees Fahrenheit) above average during the last four weeks, according the data from the Danish Meteorological Institute (DMI), which tracks hourly changes in Arctic weather.

And during several days last week, temperatures above the North Pole were a balmy zero degrees Celsius (32 degrees Fahrenheit), a full 20 C (36 F) above the levels typical for mid-November, said Martin Stendel, a DMI climate researcher based in Copenhagen.

"This is by far the highest recorded" in the era of satellite data, starting in 1979, he told AFP.  "What we are observing is very unusual."

At this time of year, open Arctic ocean exposed by sea ice melted away in summer should be freezing again, with thousands of square kilometres icing over every day.  But that has not been happening, at least not at the same pace, said Stendel.

"Not only was the ice not growing as it would normally, there was further melting due to warm air coming in," he explained by phone.

The US National Snow and Ice Data Center reported that sea ice extent in October was the lowest on record, some 6.4 million square kilometres (2.5 million square miles). Ice cover at the top of the globe shrank to its smallest area in 2016 — some 4.14 million sq km (1.6 million sq miles) — on September 16.

Several factors have caused the Arctic to overheat since late October, say scientists.  The most immediate are warm winds sweeping up from western Europe and off the west coast of Africa.

"The winds carrying this heat is a temporary — and fairly unprecedented — weather phenomenon," said Valerie Masson Delmotte, a scientist at the Climate and Environment Sciences Laboratory in Paris".  Only since Thursday have they abated.

A second contributor is the record-strong Pacific Ocean El Nino that tapered off earlier this year — after pumping a couple tenths of a degree of added warming into the atmosphere.

But reinforcing these periodic, if powerful, drivers is the biggest one of all: global warming, experts agreed.

Two days ago, I read this on LiveScience:

The Arctic Is a Seriously Weird Place Right Now

The sun set on the North Pole more than a month ago, not to rise again until spring. Usually that serves as a cue for sea ice to spread its frozen tentacles across the Arctic Ocean. But in the depths of the polar night, a strange thing started to happen in mid-October. Sea ice growth slowed to a crawl and even started shrinking for a bit.

Intense warmth in both the air and oceans is driving the mini-meltdown at a time when Arctic sea ice should be rapidly growing. This follows last winter, when temperatures saw a huge December spike.

Even in an age where climate change is making outliers — lowest maximum sea ice extent set two years in a row, the hottest year on record set three years in a row, global coral bleaching entering a third year — the norm, what's happening in the Arctic right now stands out for just how outlandish it is.

"I've never seen anything like it this last year and half," Mark Serreze, director of the National Snow and Ice Data Center, said.

The latest twist in the Arctic sea ice saga began in mid-October. Temperatures stayed stuck in their September range, pausing sea ice growth. By the end of the month, the Arctic was missing a chunk of ice the size of the eastern U.S.

The oddness continued into November. A large area of the Arctic saw temperatures as much as 36°F above normal, further slowing Arctic sea ice growth and even turning it around for a few days. In other words, it was so warm in the Arctic that despite the lack of sunlight, sea ice actually disappeared.

"​The ridiculously warm temperatures in the Arctic during October and November this year are off the charts over our 68 years of measurements," Jennifer Francis, a climate scientist at Rutgers University who studies the Arctic, said.

Compounding the warm air is warm water. Sea surface temperatures on the edge of the ice are also running well above normal in many places, further inhibiting sea ice growth.

Things will keep getting stranger — freakishly violent storms in the east and southeast US, droughts and fires in the west, and so on.

I'm thankful I grew up on a different planet.  I remember it fondly, and it makes me want to save what we have now.

Here's the article:

Here's the LiveScience article:

Both of these were mentioned on +Azimuth by +rasha kamel  so make sure to add +Azimuth to your G+ feed — it'll help you keep informed.

This is Peter Gleick's tweet last February, with a graph:

Here's the video showing the Arctic sea ice minimum volume each year:
Animated Photo

Post has shared content
Google AI experiments

Want to try understand how deep learning algorithms work? This is the best explanation I have seen so far. And you get to play with some of the demo's.

Visit A.I. Experiments to explore machine learning technology in hands-on ways. Play with pictures, drawings, music, and more, and get resources for creating your own experiments. #aiexperiments

Post has attachment
Prototype demo for OCR postfix in Danish Newspapers

In The Danish Newspaper Archive you can search in 25million newspaper pages and view the pages. The search engine uses OCR (optical character recognition) from scanned pages but often the software reading the text from the scanned images makes reading errors. As a result of this the search engine will miss matching words due to OCR error. Since many of our newspapers are old and quality of the scans/microfilms is not very good combined with OCR software has problems old fonts types, the bad OCR constitutes a substantial problem.

One way to find these OCR errors is using the Word2Vec algorithm that I have written about before. The algorithm detects words that appear in similar contexts. So for a corpus with perfect spelling the algorithm will detect similar words,synonyms,conjugations,declensions etc. But in the case of a corpus with OCR errors the Word2Vec algorithm will also find the
misspellings of a given word either from bad OCR or in some case journalists. A given misspelled word appear the the exactly same contexts for all it misspellings. For this to work the Word2Vec algorithm requires a huge corpus and for the newspapers we had 140GB raw of text. So this is probably also the largest word2vec index ever build on a Danish corpus.

Given the list of words returned by Word2Vec we then use a Danish dictionary to remove the same word in different forms which is not a OCR error. On the remaining words you just see if the words are close to enough comparing characters to be identified as a misspelling.
Examle: Lets say you use the Word2Vec to find words for 'banana' and it returns: hanana, bananas,apple, orange.
You remove bananas using the (english) dictionary since this is not an OCR error. For the three remaining word only 'hanana' is close to 'banana' and it thus the only mispelling of banana found in this example. Remember the Word2Vec algorithm does not care how the words are spelled/misspelled it only uses the semantic context of the words.

You can play with the Word2Vec index on the Danish Newspapers here: (remember to select the newspaper corpus)

And this page shows how the dictionary is used to find misspellings:
(change the last words in the url - Danish only sorry...)

Running this algorithm on 1000 random words takes 8 hours (using 20 CPUs) and fixes 84mio. OCR errors though a very few of them are false positives and not OCR errors, but this is very rare compared to true OCR errors. This last step to maximize bad OCR errors and minimize false positives is still in progress... :)

The newspaper Archive:

Post has attachment
The State and University Library Aarhus goes live with a new Labs page


The page is a showcase for interesting or fun stuff we make at work and want to show to the public. The applications are all based on the huge amount of data we have available in our many collections.

The first 3 applications are

1) Smurf Search in over 23M newspaper pages from 1770-now and show word statistics. This can be used as very fast visualization of trends over time. The newspapers are in danish, but some words like "titanic" will still be searchable of course and show the tragic accident in 1912 and the 1998 movie by that name.
You can click on the graph to see the articles, but unless you have login you can only see newspapers before 1916.

2) Zoom 1 million newspapers in a single zoomable image. Total size of the 1M pages is 20 terapixels.

3) word2vec Word2Vec is an unsupervised machine learning algorithm that maps words to vectors and some semantics in the langauge is preserved in the vector representation. The corpus is 65.000 free Gutenberg E-books, so the network is highly trained.

Half the books are in English, but many other languages are also in this same dictionary. Use it to find similar words or analogies. The algorithm only looks at the placement of words in contexts of other words around it and has no understanding of the words.

#bigdata #word2vec #machinelearning

+Toke Eskildsen +Michael Poltorak Nielsen +Jørn Thøgersen +Dorete Larsen

Post has attachment
Size of the Universe - beautiful animation

Post has attachment
My latest project is  using Natural language processing (NLP) read Lord of the Rings and plotting characters by 'similarity' in 2D.

The algorithm (using word2vec) knows nothing about the English language but just see how words are 'grouped' together and done statistics/neural network training. I then define a distance measure between the words and fits them in a 2D plot.

I also feed the algorithm 35000 english books from the Gutenberg project and made a plot of different animals.

It is very hard to decide how much the 2D plot captures of the similarity between the words. But I believe they are much better than a random plot.

Post has attachment
Nice interactive animation of the space-filling Hilbert Curve
A loose explanation of the Hilbert Curve is that it  is a 1-dimensional curve that 'fills' the 2-dimensional plane.

It has several practical applications as well. One of them is to quickly come up with a very good solution to the traveling salesman problem that has not generally solved (and probably never will). You visit the points in the same order as dense enough (for your purpose) Hilbert Curve would visit them when you overlay the Hilbert curve on the plane with the points.
Wait while more posts are being loaded