Dan Piponi
2,657 followers -
Homo Sapiens, Hominini, Hominidae, Primates, Mammalia, Chordata, Animalia
Homo Sapiens, Hominini, Hominidae, Primates, Mammalia, Chordata, Animalia

2,657 followers
Dan's interests
View all
Dan's posts
Post has attachment
Throw a differentiable manifold in the bath and turn on the water. As the bath fills, from time to time the surface of the water will be tangent to the surface of the manifold. The points where this happens are called critical points. There's a critical point the moment the water level hits the bottom of the manifold and there's one just as it covers the top. There are also critical points at the beginning and end of each hole in the manifold. So (modulo some jiggling [5]) the numbers of critical points provide constraints on things like the number of connected components and the number of holes of the manifold. This is the beginning of what are known as the Morse inequalities [1].

Back in 1982 Ed Witten published a classic paper showing how Morse theory is related to supersymmetric quantum field theory, rederiving the inequalities using physics. This is an example what Witten has a history of doing: reinterpreting some mathematics as a physical theory and then "proving" results with a physical argument. Sometimes the things he proves are old results (like the Morse inequalities) and sometimes he shows things that mathematicians haven't proved yet.

Anyway, I found this paper interesting: The Graph Laplacian and Morse Inequalities [3]. It simplifies Witten's setup by considering graphs rather than differentiable manifolds. In this setting the path integrals of quantum field theory become sums over walks on the graph. The result is a version of the Morse inequalities derived using graph laplacians and it turns the complexity of supersymmetric quantum field theory into something you can study with a bit of linear algebra on a computer. Along the way I learnt about the existence of "discrete Morse theory" [4].

[1] https://en.wikipedia.org/wiki/Morse_theory#Morse_inequalities
[2] https://projecteuclid.org/euclid.jdg/1214437492
[3] https://arxiv.org/abs/1704.08354
[4] https://en.wikipedia.org/wiki/Discrete_Morse_theory
[5] https://en.wikipedia.org/wiki/Sard%27s_theoremo﻿
Post has attachment
The definition of a Markov chain uses a useful notion that I find hard to express in English. It's the idea that P(X(n)|X(n-1), X(n-2), ..., X(0)) = P(X(n)|X(n-1)). Ie., knowing X(n-2), X(n-3) and so on might tell you about X(n), but once you know X(n-1), none of that other stuff is of any use to you at all. It comes up in ordinary everyday reasoning but I don't know what to call it.

Some examples:

1. I want my computer program to run as fast as possible. There's a step I can do using either function A or function B. I measure the speed of A and the speed of B and I know A is faster. Now someone comes along and provides me with profiling tools that can tell me exactly what's going on inside functions A and B. It's all very fancy. But in actual fact it's no use to me at all. After all, I've already measured the speed of A and B. Knowing precisely which instructions contributed what to the total time doesn't get me any further.

2. A friend of a friend is an aircraft mechanic. He refuses to fly because he's seen all kinds of errors in his daily work that could have turned into disasters. But in actual fact, once you know how often aircraft disasters happen, knowing the intimate details of aircraft maintenance doesn't make you any more informed about how safe it is to fly.

3. Someone performs a study of how aggressive video games make people. They find certain effects on the behaviour of game players. They also perform fMRI studies and find that after playing video games, parts of the brain associated with aggression "lit up" and use that to support the case that video games cause violence. But once you know people behave with a certain level of aggression, the fMRI study tells you nothing more about this. The fMRI result is, at best, a proxy for actually violent behaviour - but that's only useful if you don't actually know what the behaviour was.

4. A pharmaceutical product Y has been studied in detail and the risks are now well documented. A group of anti-Y campaigners now advertise that compound X is used in the production of product Y. X is known to have certain risks. But once you know the risks associated with product Y, you've already measured the risk including the risk from X. Noting additionally that X itself carries risks is double counting.

In each case, knowing the extra detail may teach you something useful that can be used in other situations. But it's of no use in answering the main point I raise in each example. And in each case the detail would be useful if you didn't already have the number you really need - like the actual risk of flying.

I'd lump the information from the biopsied cells in this NY Times article in the same category: https://www.nytimes.com/2017/03/23/well/move/the-best-exercise-for-aging-muscles.html?_r=0 Knowing how many genes have changed their activity level is almost certainly a very weak proxy for measuring how best to exercise. (And anyway, disease states change gene activity too.) But it may be good science for other applications.

I wish there was a name for the thing I'm pointing out. And the notion, I think, makes sense, even if you disagree with the examples.
﻿
Post has attachment
There is a bit of folklore which says that in an imperative language you can always eliminate GOTOs from your code and replace them with structured WHILE or DO loops.

I was wondering about compiling BASIC programs and got interested in the problem of rearranging the GOTOs to use structured loops. There's an algebraic gadget for analysing this sort of thing called a "Kleene Algebra with Tests" [2]. So I thought I'd have a go at deriving an algorithm for the restructuring using it.

I quickly found you end up with lots of code duplication. But worse, I quickly found examples where I couldn't do the elimination.

I'd rediscovered that the "Bohm Jacopini" theorem is false :-) [4] (Actually I rerediscovered it as I've been in a similar place before for a different reason...)

The correct statement of the folklore is in a paper "On Folk Theorems" [1]. You can eliminate all GOTOs if you're allowed to introduce new flag variables. In fact, this is often how GPUs eliminate branching [5].

If you want to eliminate GOTOs without introducing new variables you not only need WHILE (or DO, or FOR) loops, but you also need some kind of BREAK instruction. But that's still not enough. You need a multi-level BREAK N instruction than can get you out of N levels of loop. Or, as suggested in [6], you use returns from procedures.

From time to time I find myself introducing a new flag variable in imperative code to ensure a portion of a loop isn't executed. It always looks clunky. But I guess now I know there's a theorem to justify it I can feel less guilty.

By the way, check out link [3] for a fantastic example of why it's hard to compile BASIC and why interpreted languages can be very different from compiled ones.

[1] http://www.marcusramos.com.br/univasf/pc-2008-2/p379-harel.pdf
[2] https://www.cs.cornell.edu/~kozen/papers/kat.pdf
[3] https://www.riscosopen.org/forum/forums/11/topics/3890#posts-49640
[4] https://www.cs.cornell.edu/~kozen/papers/BohmJacopini.pdf
[6] http://stackoverflow.com/questions/982595/how-to-break-out-of-2-loops-without-a-flag-variable-in-c

﻿
Post has attachment
I don't like the way the concept of entropy (in the sense of information theory) is taught. I'll explain by analogy.

Area is the unique property of geometric figures that satisfies a bunch of "axioms" like "is additive", "is invariant under Euclidean motions", "if you scale something by a its area scales by a^2" and so on. But nobody teaches area like this. The earliest lesson on area I can remember involved placing my hand on graph paper, drawing the shape of my hand, and then counting the number of squares in the shape. What's more, that lesson still corresponds well to how I use area today and it has direct descendants in the form of the Riemann and Lebesgue integrals.

Entropy often gets taught using some set of axioms including additivity (in some sense). They give some good intuitions, but I feel like at bottom there's something missing. In particular, you don't get the formula for entropy emerging of its own accord. Everything seems a little contrived.

To me, the thing that brings the most clarity is known as the "Type Theorem". It's in many (but not all) books on information theory and I provide a link below [1]. I think it's the appropriate analogue of counting squares for relative entropy. It starts with a natural question: if I draw n independent samples from some probability distribution on a finite set, and count how many times each outcome comes up, what is the probability distribution on the set of possible counts? Ie. what is the distribution of possible empirical distributions?

The Type Theorem answers this question. It says that to a good approximation the probability of any particular empirical distribution is a simple expression in the relative entropy between the empirical and real distributions (modulo some details you can read about). To me this makes everything clear. The form of the expression for entropy just pops out. And numerous applications of entropy outside of communications follow from this result. I think it should come before anything else in a course teaching entropy.

[1] https://blogs.princeton.edu/sas/2013/10/10/lecture-3-sanovs-theorem/﻿
Post has attachment
I eventually figured out how to exactly compute, and implement in Haskell, the "functional" logarithm and exponential of (some) formal power series. So, for example, we can get the formal power series for arcsin as exp(-log(sin)) where log and exp here are the functional versions.

In fact, exp here is just the exponential map from differential geometry, and log is its inverse.

I wrote it up here: http://blog.sigfpe.com/2017/02/logarithms-and-exponentials-of-functions.html﻿
Post has attachment
A little followup to my previous post on functional square roots.

If we have functional powers that suggests the idea of functional logarithms and exponentials too. For example, if f is a function, we'd expect something like log(f^m) = m log f.

It's tempting to use the formal power series for log. We could interpret log (1+f) as f-f^2/2+f^3/3-... where 1 here is the identity function and ^ is functional power. But this fails. It took me a little while to figure out why.

Formal power series form a ring under addition and multiplication. For example, if f, g and h are power series, we have f*(g+h) = f*g+f*h, where * is power series multiplication. It looks like we might also have a ring structure with with power series composition operator ○ as the multiplication. But we don't have distributivity. The formal power series of log(1+x) only satisfies the property log(f^m) = m log f because of distributivity. So if we want to compute a logarithm of a function we'll have to use a more direct definition of log that doesn't assume distributivity.

Here's an approach: define log (f) = lim (n->∞) n (f^(1/n) - 1). We can compute f^(1/n) by iterating the square root operator I previously defined. The catch is that it's problematic to compute this exactly because the coefficients of the resulting power series are all limits. So here's what I do: I compute a rational approximation to the limit with n = 2^256 (nice and big!) and then find rational numbers with small denominators that approximate the resulting coefficients. This makes the assumption that the exact answer has relatively small denominators. I put the code at [1].

When I run the code for f(x)=x+x^2, out comes a sequence of rationals. Sadly I can't find the sequence on OEIS. Nonetheless, a web search gets me to [2] and I see...oh...I'm just rediscovering some classical mathematics :-)

Anyway, you'd hope for a property like log f = -log(inverse f). Amazingly the code demonstrates that this seems to hold for the case log (x -> x+x^2) = - log (x -> (-1+√(1+4x))/2).

[1] https://gist.github.com/dpiponi/db72d898dea78e95579998b8c01befe0
[2] http://math.stackexchange.com/questions/208996/half-iterate-of-x2c﻿
Post has attachment
Let me try to distract you briefly from current affairs with a square root of the sin function, i.e. a function f such that f(f(x)) = sin(x).

I wrote some Haskell code to compute the formal power series of such a function [2]. (In fact, it computes the unique functional square root of any formal power series starting x+...)

The coefficients grow large. Nonetheless, you can truncate the series and try evaluating the resulting polynomial. In the graph I've plotted:

In blue: the degree 38 truncation of the series. Let's call it h.
In green: the functional square of the truncation, i.e. x->h(h(x)).
In red: the sin function.

In most of the range [0,π/2] the approximation isn't bad, but you can see it dive bombs soon after.

There's a trick to squeeze more out of this approximation though. Clearly sin^(1/2)(x) = sin^(-1)(sin^(1/2)(sin(x))). What's more, sin(x) is closer to zero than x. So if we replace sin^(1/2) with h in the middle of the sandwich we should get better results. In [1] Thomas Cartright speculates that we can repeat this trick ad infinitum, i.e. that sin^(1/2)(x) = limit as n -> ∞ of sin^(-n)(h(sin^n(x))) where h is given by a truncated power series.

Also see the answers at [3]. To be honest, my reading comprehension skills aren't very good. The question asks if the series converges but I can't figure out what the answer is despite there being a lot of writing there.

You can tweak the code to compute things like the square root of the logistic map x -> x(1-x) to find out how many rabbits there are after any dyadic rational number of years - assuming you can truncate the resulting series appropriately.

[1] http://www.physics.miami.edu/~curtright/TheRootsOfSin.pdf
[2] https://gist.github.com/dpiponi/aa7b713da7c822c04193095397b70bb4
[3] http://mathoverflow.net/questions/45608/does-the-formal-power-series-solution-to-ffx-sin-x-converge﻿
Post has attachment
http://www.sciencedirect.com/science/article/pii/S2095254616000211

This paper has an interesting name, but it seems completely bogus to me.

I'm no fan of p-values, but the criticism in this paper seems entirely invalid. It points out that if we take some sample data and then create a new set of sample data by replicating it n times we typically get a new sample that shows more statistical significance. By replicating enough times we can get p as low as we like.

I think the author is claiming that replicating n times is the same kind of thing as collecting a sample that is n times bigger, so we can make our data significant for any hypothesis simply by collecting enough data. But this is a fundamental error. The statistics of an n times replicated sample are entirely different from the statistics of n times bigger samples.

Am I misunderstanding this paper?﻿
Post has shared content
Some personal stuff about the good ol' days...

I've talked with people a couple of times recently about how we did anything with computers before Google. I've assembled a complete answer to what, today, might have been replaced by a single Google search.

As a kid I was fascinated by the internals of programming languages. But there was no way I could possibly afford a book on compiler writing. Until university (or later, I didn't do CS at uni), almost everything I knew on the subject came from three articles, two in magazines, one in a book. Amazingly, all three are now online.

1. "Adventure II" in Practical Computing magazine. http://www.nostalgia8.org/download/adv-creators/paw/kenreed.html This introduced to me the idea of a domain specific language. A few years later I wrote my own adventure game compiler for the BBC Micro. (Article also at http://vintagecomputers.site90.net/mags/praccomp/1980/80aug_Adventure_II.pdf)

2. An article on "Pascal in Forth" from the short-lived magazine Soft. http://tangentstorm.github.io/winfield-pascal-83.html This taught me the idea of embedding a language inside another. I think everyone should learn Forth! BTW I had no access to Forth apart from an implementation I wrote in BASIC.

3. A chapter on writing a compiler in BASIC(!) using the BBC Micro's inline assembler in a book called "Practical Programs for the BBC Computer..." http://www.bighole.nl/pub/mirror/homepage.ntlworld.com/kryten_droid/Acorn/Atom/pp/pp_compiler.htm It was a stroke of genius for Acorn to embed an assembler in their BASIC and the author of that compiler did a fantastic job of getting the most out of BBC Basic. Using the 6502 emulator I wrote for my Atari 2600 emulator I was able to get that compiler running again a couple of days ago (https://github.com/dpiponi/Bine).

I probably bought that book at one of the computer fairs at Olympia in London. These events were a major source of information for me.

I also spent a little time disassembling the BBC Basic ROM. Of course I had to write my own disassembler to do that. [1]

That was pretty much the sum total of knowledge I managed to accumulate on the subject from age 13 to 18. Five years of randomly acquired scraps to assemble a library that's a minuscule fraction of what I could get today in one second.

When I was writing the expression language for ILM's fire simulation system (embedded in Python, compiled to CUDA) that last article was there at the back of my mind.

[1] A book was published that was a complete commented disassembly of the ROM. Years later I knew the author, who was doing neural net research where I did my PhD, but I didn't know he wrote that book (or that the book existed) until last week.﻿
Post has attachment
Some personal stuff about the good ol' days...

I've talked with people a couple of times recently about how we did anything with computers before Google. I've assembled a complete answer to what, today, might have been replaced by a single Google search.

As a kid I was fascinated by the internals of programming languages. But there was no way I could possibly afford a book on compiler writing. Until university (or later, I didn't do CS at uni), almost everything I knew on the subject came from three articles, two in magazines, one in a book. Amazingly, all three are now online.

1. "Adventure II" in Practical Computing magazine. http://www.nostalgia8.org/download/adv-creators/paw/kenreed.html This introduced to me the idea of a domain specific language. A few years later I wrote my own adventure game compiler for the BBC Micro. (Article also at http://vintagecomputers.site90.net/mags/praccomp/1980/80aug_Adventure_II.pdf)

2. An article on "Pascal in Forth" from the short-lived magazine Soft. http://tangentstorm.github.io/winfield-pascal-83.html This taught me the idea of embedding a language inside another. I think everyone should learn Forth! BTW I had no access to Forth apart from an implementation I wrote in BASIC.

3. A chapter on writing a compiler in BASIC(!) using the BBC Micro's inline assembler in a book called "Practical Programs for the BBC Computer..." http://www.bighole.nl/pub/mirror/homepage.ntlworld.com/kryten_droid/Acorn/Atom/pp/pp_compiler.htm It was a stroke of genius for Acorn to embed an assembler in their BASIC and the author of that compiler did a fantastic job of getting the most out of BBC Basic. Using the 6502 emulator I wrote for my Atari 2600 emulator I was able to get that compiler running again a couple of days ago (https://github.com/dpiponi/Bine).

I probably bought that book at one of the computer fairs at Olympia in London. These events were a major source of information for me.

I also spent a little time disassembling the BBC Basic ROM. Of course I had to write my own disassembler to do that. [1]

That was pretty much the sum total of knowledge I managed to accumulate on the subject from age 13 to 18. Five years of randomly acquired scraps to assemble a library that's a minuscule fraction of what I could get today in one second.

When I was writing the expression language for ILM's fire simulation system (embedded in Python, compiled to CUDA) that last article was there at the back of my mind.

[1] A book was published that was a complete commented disassembly of the ROM. Years later I knew the author, who was doing neural net research where I did my PhD, but I didn't know he wrote that book (or that the book existed) until last week.﻿