Thomas Egense
Works at Statsbiblioteket
Attended Århus Universitet
Lives in Aarhus
Thomas Egense

Improving search results in **The Danish Internet Archive
A small update on one of the projects we are currently working on at The State and University Library.

The State and University Library has been harvesting the Danish internet since 2005 just like the the Internet Archive (https://archive.org). We have also build a search engine on top of our 500TB harvested Arc/Warc files. We have 4billion documents in the index now, but we have also only indexed about 20% of raw data so far. Using the search engine we can easy search for material compared to the Internet Archive where have to know the URL if you want to see the historic webpage.

The index  contains only the text/meta-data for the harvested pages and the binaries are stored in the Arc/Warc files. But in the search result we have also index the offsets from the Arc-files for the documents. So after the search result has been returned we can enrich the result with the binaries loaded from the Arc/Warc-files. And this is what I did, making a simple mimetype servlet that returns the binaries. (images/pdf/html/js.. etc.)

This makes it a lot faster for the internet researchers to have the images  included directly in the  search result or a download-button to  download a PDF etc. Before they had to load the webpage in the 'Wayback' engine.

Our Isilon storage which has the 500TB Arc/Warc data has no problem returning the binaries almost instantly  even though the webpage loads 20 of the images simultaneously by the servlet and every image is several MB. The servlet  resizes  them for the search-result but the download will be original content of course.

Disclaimer: The screenshot is just my proof of concept frontend.
For the real front-end we are aiming for "shine" (https://github.com/netarchivesuite/shine) which is also used by British Library.
​  is working on it from The State and University Library.

I have mentioned this Danish Internet Archive project before, here etc. :

And for more technical details how we are improving the response time of the searches, especially for the facets you can read some of the blogs by ​.

This small mimetype servlet, just as all our internet archive projects,  is open source  github:  https://github.com/netarchivesuite/webarchivemimetypeservlet

Thomas Egense

The Erdős discrepancy problem has been solved by Terence Tao

Though the proof has not been formally accepted as correct yet, but given the reputation of  and his previous results, I am quite certain it is correct.

Paul Erdős himself discovered the mathematical talent of Terence Tao and took him under his wings when Terence Tao was only 10 years, so it must be very satisfying for Terence to settle another of his masters problems. Cute picture of Erdős and Tao (https://qph.is.quoracdn.net/main-qimg-8da548bb26a961f36528be452d5e1165?convert_to_webp=true)

The conjecture involves arithmetic progressions and back in 2008 Terence proved another somewhat similar conjecture about
arithmetic progressions in the prime numbers.

My explanation of the "The Erdős discrepancy problem": Given an infinite sequence of -1 or +1 it is possible find an arithmetic progression that sums to any given number (positive or negative). Ie. it is impossible to rearrange  the sequence of -1 and +1 in such a way to avoid "long streaks" with either mostly -1 or +1 in  a arithmetic progression.

Edit: the arithmetic progression must be of the form a*x and not a*x+b, or the conjecture would follow trivial from   Van der Waerden's theorem.

An exciting preprint has just appeared on the arXiv:
has solved the Erdos discrepancy problem, a famous open problem of Erdos that was also the subject of Polymath5, the fifth polymath project. At some point soon I plan to write a short blog post about the role of polymath in this instance. But I'll wait until Terry has blogged about the mathematical aspects, which I'm sure he'll do soon. (Until then, there is the very nice introduction to his paper.) The basic point I want to make is that although Polymath5 didn't solve the problem, it did lead to a rapid improvement in understanding of the problem (to which Terry contributed heavily). For that reason I have always considered it a success, but after Terry's breakthrough that argument is easier to make.﻿
Abstract: We show that for any sequence $f: {\bf N} \to \{-1,+1\}$ taking values in $\{-1,+1\}$, the discrepancy $$\sup_{n,d \in {\bf N}} \left|\sum_{j=1}^n f(jd)\right|$$ of $f$ is infinite. This answers a question of Erd\H{o}s. In fact the argument also applies to sequences $f$ taking values ...
Thomas Egense

How to win (get a small advantage) in a game that seems impossible to exploit.

The Game:

1:
You opponent picks  two different numbers  both >0 and they are not restricted to the integers. The two numbers are written on two pieces of paper and placed  face down.  That is ANY numbers he decided which  could be 42 or  e^pi og a googol etc.

2)
Your are allowed to pick one of the numbers at random and turn it face up and see the number.

3)
You can now with (strict) > 50% probablity reveal if the unknown number is higher than or less than the revealed number.

How??? See the Alex Bellos video below.

A very clever component can minimize you advantage to as close as 50% as possible but never reach 50%,  you will always have a small advantage.

Computer simulations show you have over 60% chance of success if the opponent is very bad when pick two 'random' numbers as humans tend to do.

Here's my latest Numberphile vid, about a guessing game I write about in Alex Through the Looking Glass/The Gapes of Math. Great to see that so many people have ran simulations that prove it works!﻿
Thomas Egense

And how we have a result for cubes similar to Lagrange's four-square theorem

I notice that the result of seven cubes needed is a little more than I would expect from the four squares to obtain the same, and yet there even is a few numbers where eight cubes is needed.

This new theorem should be named Siksek's eight-cubes theorem.

While this theorem may have no practical use, I am still satisfied to know the truth.

Number theory is famous for having a lot of easily stated but hopelessly difficult open problems (e.g. Goldbach's conjecture, the twin prime conjecture, etc...).  But the last few decades have seen a remarkable amount of progress on many of these, some of which have been open for centuries.  Here is another example that just hit arXiv today:  every integer greater than 454 is a sum of at most 7 positive cubes.﻿
Abstract: A long-standing conjecture states that every positive integer other than 15, 22, 23, 50, 114, 167, 175, 186, 212, 231, 238, 239, 303, 364, 420, 428, 454 is a sum of at most seven positive cubes. This was first observed by Dase and Jacobi in 1851 on the basis of extensive ...
Thomas Egense

Game of thrones Season five to be premiered TODAY

And if you want a small recap of what happened in the previous four seasons these gorillas have a very short version of the plot  :)

Thomas Egense

The Danish Newspaper Online Archive is now online

After 200+ years of collecting all danish newspapers and stacking them in warehouses or copying them to microfilms, they can  now   be  accessed online for free!  At The State And University Library Aarhus we have been working on this project for years and are very happy to open it to the public today.

We have build a powerful search engine on top of all data so it is easy to find text matches down to a specific page with  highlighting and you can download the newspaper as a PDF. For historians this is a gold mine and  the general public will be able to search for family members or old articles they want to read again.

Limitations:
So far we have only 1 million news paper pages in the index, but this will increase to over 32 million pages over the next year.

You can only search in newspapers that are older than 100 years, unless you search from the computers within the State and University Library where there is no limitations.

The OCR (Optical character recognition) is not perfect and this is most evident in oldest newspapers, but we have several ideas how to improve this over time.

Since we just went live today there is a heavy load on the site - so be gentle :)

Thomas Egense

Robert Langlands  - the mathematician behind the 'Langlands Program'.

The 'Langlands program' can be compared to the similar hunt for a "Unified Theory" in physics. The program is trying to connect previous believed totally unrelated branches of mathematics and some of the found connections is still a mystery.

If you want a much deeper understanding about this project I can recommend reading "Love and Math" by Edward Frenkel. But the book require some mathematical knowledge to understand the concepts in more details. I did not understand everything since it require knowledge in so many different branches of mathematics.  Still the book is the best way to gain a deeper insight into the frontiers of mathematical research.﻿
Canadian Robert Langlands is 'like a modern-day Einstein,' who has devoted his life to the limits of pure mathematics
28
10

Thomas Egense

More on The Erdős discrepancy problem

In my last post I wrote about the recent proof of this 80 year old conjecture by ​ . This new article explains the history behind the conjecture and gives a simple visualization of the problem with snakes and a precipice.

I can also recommend the video by ​

And most important on all - I now have a new math puzzle in my arsenal  - namely the 11 step version from the video. This 11 step possible very easy to explain to a ten+ year old and  can be solved with a little skill and some patience.﻿
Thomas Egense

This 'rocket' was far better than  I had expected...
Amazing sounds as well.

Rocket science ...﻿
Thomas Egense

For  #caturday here is my siamese  (Anubis) finding new opportunities with a tree I cut down  by using it as a look-out point.﻿
Thomas Egense

Help teach science to students, save the planet and get an octopus!

The awesome team behind TWDK (http://www.thingswedontknow.com/team.php) have started a fundraising campaign to cover some of the basic costs involved in running the site such as research, artwork, writing expense etc.

I was contacted by the TWDK  team and asked if I wanted to help with this charity campaign and since it both involved teaching science and environmental awareness,  I instantly agreed.

Six of my  best fractals all resembling marine creatures can be picked as rewards when donating more than £150 GBP. I am selling the fractals very close to my production/shipping cost  so donaters will get some value for their money.  The total price for supporting the project and getting a fractal is still less  than my regular price for the same fractal.

The campaign is scheduled to run for 60 days.

Thomas Egense

* Five Podcasts from BBC about the most important programming languages*

The 15 minutes duration of each podcast is perfect timed with the driving time to my workplace...

Programming Languages

As part of the BBC's Make it Digital Season, Aleks Krotoski presents a brief history of some of the most famous high-level programming languages.  Each of these easily digested programmes is only fifteen minutes long and is available online as a stream and as a podcast or MP3 file.

Aleks Krotoski explores the history of programming languages. The history of computing is dominated by the hardware; the race for speed and power has overshadowed how we've devised ways to instruct these machines to do useful tasks.

Listen here (15 min streams):

These programmes should be available worldwide without restriction. They are easiest to play on a computer (Flash) although they will work on iOS with a few extra clicks and on Android after the BBC media player http://goo.gl/oHuhfM is installed.

Fortran: http://goo.gl/AYbDwV
Cobol: http://goo.gl/JSz51R
Basic: http://goo.gl/V4uWEP
Java: http://goo.gl/BJQwla
The Tower of Babel: http://goo.gl/eo0jL2

Podcast and MP3s: http://goo.gl/qCgm70

BBC Make it Digital: http://goo.gl/3am6qs

