Profile

Cover photo
Verified name
Dan Piponi
Worked at Google
Attended King's College London
2,627 followers|4,368,999 views
AboutPostsPhotosYouTube

Stream

Dan Piponi

Shared publicly  - 
 
 
How do you beam Internet connectivity hundreds of kilometers between balloons while they travel the stratospheric winds?

For many people, the closest Internet access point can be hundreds or even thousands of kilometers away. To extend connectivity to these areas, Project Loon needs to transmit a signal from Internet connection points on the ground, beam it across multiple balloons in the stratosphere, and then send that signal back down to users. This is particularly challenging given that all the while, each balloon in the chain is constantly in motion sailing the stratospheric winds. In this video, Baris Erkmen, Project Loon’s Technical Lead, shows us how the team has created a platform that allows for consistent high-speed data transmission across balloons traveling 20 kilometers above the Earth’s surface.
1 comment on original post
4
Add a comment...

Dan Piponi

Shared publicly  - 
 
I thought I'd play with the Tabby's star light curves. There was a bit that looked vaguely sinusoidal so I tried fitting a sine wave to it. My code recovers the 0.88 day period that is believed to be the rotation period of the star. You can see the peaks and troughs of the light curve (red dots) line up with the sine wave (blue dots).

It uses a really crappy incremental nonlinear least squares approach. I'm sure something in the Fourier domain would work better but I've never tried fitting a sinusoid this way so I thought I'd see what goes wrong. It's very sensitive to initial conditions because, as you can imagine, sinusoids give rise to a lot of local minima. So I creep up on the fit by slowly expanding the size of the data set. Someone publishing a paper might call this a homotopy based approach to give the illusion they were doing something principled.

I spent more time extracting the data than writing the fitter. So I've shared the data I used in a convenient .npz file. See (what I hope are public) files:

https://dl.dropboxusercontent.com/u/828035/Tabby/tabby.npz
https://dl.dropboxusercontent.com/u/828035/Tabby/plot.py
https://dl.dropboxusercontent.com/u/828035/Tabby/period.py

The .npz file contains time (in days), flux, and the error range in the flux measurement.

I got the data via
http://archive.stsci.edu/kepler/data_search/search.php?action=Search&ktc_kepler_id=8462852

I think I extracted the data correctly as plots of various subregions look like published graphs I've seen. Try running plot.py and zooming.

I've stored the data as lists of arrays reflecting the way the observations come in distinct contiguous regions. I also removed a load of NANs which I guess correspond to missing data.

If you reuse tabby.npz for yourself, don't look for long term trends. There's instrument drift over time. I think I'm OK sharing the data. It's labelled as 'public' on one of the web pages I went through.
7
Dan Piponi's profile photoDoug Burke's profile photoGeert Barentsen's profile photo
4 comments
 
Hi - this is Geert from Kepler. Indeed the data is in the public domain, and PDCSAP_FLUX is the correct field to use!

In brief, the PDCSAP_FLUX values have been pipeline-processed to remove instrumental noise by fitting and subtracting a set of significant basis vectors that describe the lightcurves of stars that fell on the same CCD chip (using the assumption that the lightcurves of stars really ought to be independent).

This approach works well for removing short-timescale instrumental noise that would otherwise confuse the detection of planets, but it is not designed to take out long-term variations, and neither does it put data from different "quarters" on the same absolute flux scale. The latter is challenging because the spacecraft rotated every quarter to keep its solar panels pointed correctly towards the sun, causing a star to fall on 4 different CCD chips each year. (And every CCD chip has a unique sensitivity etc.)

A common approach to take out the long-term trends is to divide the flux values by their median, using the assumption that the median is a good estimator for the baseline flux of the star. (And indeed I see you did this in plot.py.) This assumption is somewhat violated in Quarter 8 however, where a significant but unknown fraction of the flux is affected by the events surrounding the day-792 dip. To create the lightcurve graphs seen in the original paper on Tabby's star, I suspect the authors removed the dips from the lightcurve before computing the median denominator used in the normalization. I suspect doing this would get rid of the funny "one side of the dip is at a different height to the other" effect you describe?

(The Best Thing™ to do here would be to treat the normalization constants as nuisance parameters in a physical model for the object.)

Let us know if you have further questions!
Add a comment...

Dan Piponi

Shared publicly  - 
 
Following some earlier G+ posts on units and dimensions I had some ideas about doing linear algebra with "dimensioned" matrices. But there were too many equations for G+ so I wrote it on my blog instead.

Turns out I rediscovered an idea that one author has called "multidimensional analysis": http://www.georgehart.com/research/multanal.html
Introduction. Programming languages and libraries for numerical work tend not to place a lot of emphasis on the types of their data. For example Matlab, R, Octave, Fortran, and Numpy (but not the now defunct Fortress) all tend to treat their data as plain numbers meaning that any time you have a ...
12
3
Shreevatsa R's profile photoMarius Buliga's profile photoDaniel R. Grayson's profile photoQiaochu Yuan's profile photo
4 comments
 
There are various ways one could try to describe this kind of reasoning in terms of more familiar things. One is to think in terms of the linear algebra of vector spaces graded by the free group on whatever units are relevant. Of course it's natural to generalize to vector spaces graded by any group whatsoever. "Dimensioned matrices" come from the internal hom here.
Add a comment...

Dan Piponi

Shared publicly  - 
 
A plugboard (more specifically an n-plugboard) is a board with n holes in it, each of which can accept the end of a wire. Pick one representative plugboard for each n and call it P(n). We can make a vector space over k, say, (which I’ll call PB) whose basis elements are {P(0), P(1), P(2), P(3), …}.

We can define a product ⊗ on this vector space and make it a commutative algebra. Given two plugboards we can stick them together and then plug in some wires with one end in each of the original parts. Each wire used results in one less hole in each of the two parts. For example, after joining an m-plugboard to an n-plugboard and plugging in k wires we have, in effect, an (m+n-k)-plugboard left over. Define the product of P(m) and P(n) in PB as the sum a(i)P(i) where a(i) is the number of ways you can get an i-plugboard by adding wires from a P(m) to a P(n) joined to it. We don’t care about the order in which the wires are added, which way round the wires are or anything else. One wire is the same as any other. We’re only counting the choices of holes they connect. Extend this multiplication to all of PB by linearity. P(0)=1, the unit.

For example P(2)⊗P(3) = P(5)+6P(3)+6P(1). There is one way of adding no wires from a P(2) to a P(3). There are 3+3 ways to add one wire going from a P(2) to a P(3) leaving a P(3). And there are 3x2 ways to add two wires going from a P(2) to a P(3) leaving a P(1). More generally, a ⊗-product of n plugboards enumerates the ways you can hook up any number of wires with each wire going between two different plugboards.

The Enigma machine from World War II had a plugboard pictured above with 26 holes in it and 10 wires. The number of ways those wires could be plugged in is given by the coefficient of P(6) in P(1)⊗P(1)⊗…26 factors in total…⊗P(1).

So here’s a neat theorem I’m not going to prove: there is an isomorphism ϕ from (PB,+,⊗,0,1) to the algebra of polynomials over k (k[x],+,×,0,1) given by PB(n) → H(n,x) where H(n,x) is the nth “probabilist’s” Hermite polynomial [3]. The ⊗-product becomes ordinary multiplication.

For example H(2,x)H(3,x) = H(5,x)+6H(3,x)+6H(1,x). The number of ways to use the Enigma plugboard is the coefficient of H(6,x) when the polynomial x^26 is written as a sum of Hermite polynomials. (Which is also the coeffcient of x^6 in H(26,x) but that’s another story.)

The theorem is essentially Isserlis’ theorem in disguise, aka Wick’s theorem. It forms the cornerstone of quantum field theory and it's what leads to Feynman diagrams. But I like looking at it in terms of plugboards. (Compare also the connection to wiring up modular synths I mentioned a while back [5].)

There’s another operation you can perform on an n-plugboard - you could just block one hole giving an (n-1)-plugboard. If a plugboard has n holes there are n ways you could do this. So if we want to count the ways to block up holes it makes sense to consider the operator ∂:PB → PB such that ∂(P(n)) = nP(n-1). It’s not too hard to convince yourself that ∂ satisfies the Leibniz rule ∂(ab)=a∂b+(∂a)b. When you block a hole in a product you either block a hole in the first component or block a hole in the second. So we have a differential algebra [2].

The map ϕ is actually an isomorphism of differential algebras and ∂ maps to the ordinary derivative under the isomorphism. Many theorems about derivatives of polynomials carry over to plugboards and vice versa. For example, arguing informally now so as to allow infinite sums (or moving to a suitable completion of PB), think about e = P(0)/0!+P(1)/1!+PB(2)/2!+P(3)/3!+… . This satisfies the “differential equation” ∂e = e. So by the isomorphism we must have ∂ϕ(e) = ϕ(e). But ϕ(e) is essentially a formal power series in x. So solving the “differential equation” we get ϕ(e) = Aexp(x) for some constant A. In other words H(0,x)/0!+H(1,x)/1!+H(2,x)/2!+H(3,x)/3!+ … = Aexp(x) for some A. (A is exp(-1/2) BTW. That exp(-1/2) factor is a reminder that we’re not talking about complete trivialities here.) We can "transfer" other familiar functions like sin and cos from the reals to PB.

There’s lots more I could say. PB tells you all about functions of Gaussian random variables. (I was hoping to use this for some statistics computations but it isn't going to work out...) The isomorphism ϕ is, in some sense, a Gaussian blur(!). There is a shadowy conspiracy theory going on behind all this stuff [6]. But this is enough for now.

[1] https://en.wikipedia.org/wiki/Isserlis%27_theorem
[2] https://en.wikipedia.org/wiki/Differential_algebra
[3] https://en.wikipedia.org/wiki/Hermite_polynomials
[4] https://en.wikipedia.org/wiki/Enigma_machine#/media/File:Enigma-plugboard.jpg
[5] https://plus.sandbox.google.com/+DanPiponi/posts/DJuDKHeGxfy
[6] https://en.wikipedia.org/wiki/Umbral_calculus
14
2
John Baez's profile photoDan Piponi's profile photo
12 comments
 
Another interesting connection: an example of a plugboard in nature is the way the bases in RNA can bind to each other. So if you enumerate all the ways a sequence of n bases can bind to itself you're led inevitably to Hermite polynomials. To see what I'm talking about peek here http://ipht.cea.fr/Docspht/articles/t04/135/publi.pdf
Add a comment...

Dan Piponi

Shared publicly  - 
 
If you're trying to solve a real world problem then at the early stages I think you have to tread very carefully in the way your formulate it mathematically. Once you've formulated it, it's easy to imagine that this is now the problem you need to solve and forget about the original problem. You can then spend a long time making progress on solving the formulation, but fail to realise that you're actually stuck in a rut and that there's another formulation of the original problem that works much better.

I had the problem of trying to find optimal (in some sense) paths for moving a large number of vehicles from various A's to various B's. It seemed natural to pick one vehicle and then optimise for that. The problem of finding the optimal path could then be written as something like a calculus of variations problem. At this point I was now trapped in a particular formulation. Trying to solve the calculus of variations problem led to an ODE, but instead of initial conditions it had both initial and terminal conditions. You can discretise time and use a standard ODE solving method, but you can't simply step forward through time from the start as you don't know, for any choice of starting path, whether it will end the right way. It turns into a messy search problem. And then I had the prospect of solving this problem reliably for a large number of paths.

Eventually I realised I had to break out of my original formulation. Instead of solving for one entire path at a time it works better to solve for many paths incrementally and simultaneously. If you can solve and tabulate the solutions to the problem of getting from each A to each B over the time period s to t, and you can solve from the short time period from s-ds to s, then you can combine these to solve for the entire table of solutions from s-ds to t. Now it's just recursion. This is classic dynamic programming, which was in fact originally invented to solve this very problem in the case where the vehicles were missiles. It can be incredibly easy compared to solving millions of ODEs with initial and terminal conditions.

I think I basically recapitulated the history of control theory, which is unsurprising as I was getting my problem solving clues from web pages on control theory and colleagues who had some experience with related problems. The former approach is the Pontryagin principle approach [1] and the latter is the Hamilton-Jacobi-Bellman approach [2] (which is similar to reinforcement learning). The former approach is probably a good choice if there's a chance you can solve the equations exactly and get a simple formula. The latter approach seems to work better when the quantity you're optimising is determined by big tables of numbers.

[1] https://en.wikipedia.org/wiki/Pontryagin%27s_maximum_principle
[2] https://en.wikipedia.org/wiki/Hamilton%E2%80%93Jacobi%E2%80%93Bellman_equation
32
6
David “goo.gl/LuFkzX” Tweed's profile photoDan Piponi's profile photoPeyman Milanfar's profile photo
4 comments
 
"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey
Add a comment...

Dan Piponi

Shared publicly  - 
 
Many scientific papers take the form:

1. Stuff X happened
2. If hypothesis Y is false then the chances of something as extreme as X (for some definition of extreme) happening is less than p.
3. p is small, therefore it is likely that Y is true.

I'm going to ignore the fact that this isn't a sound argument. (An example of a sound argument might be "if X happened then Y is true with probability p. X happened and p is large. Therefore Y is likely true.)

One problem with the argument in 1, 2 and 3 is that if you try many different things that result in many different Xs you expect to be able to eventually cherry pick a suitable X that can be used to justify Y even if it's unlikely.

Given how obvious this problem is, why has it been getting a lot of attention so recently? I remember talking about this with researchers 30 years ago and nobody seemed to care.
17
10
Marco Devillers's profile photoTomer Ben David's profile photoDan Piponi's profile photo
19 comments
 
+Tomer Ben David Yes, you got it. Because people rarely publish uninteresting experiments, when you read about an apparent correlation in a sample of 100 people you have no idea how many 100s of people the researchers quietly trawled through previously.
Add a comment...

Dan Piponi

Shared publicly  - 
 
I think I’ve mentioned before that I think audio processing might give a nice set of examples for teaching and motivating complex analysis. With the help of a chill dude on youtube, here’s one.

If you build a circuit to transform a complex AC signal using resistors, capacitors and op-amps then the only thing it can do is multiply the signal by some rational function of the frequency of the signal [1,3]. The amplitude of this complex multiplier is a change in signal amplitude and the argument is a change of phase.

Suppose you just want to modify just the phase of the signal. You need to find rational functions that map the real line (of frequencies) to the circle (of unit amplitudes). A good choice is a Möbius transformation [4] like f[a,ω]=(iaω-1)/(iaω+1), with real positive a. This needs just a small number of electronic components and the implementation is called an allpass filter [2]. As you go from ω=0 to ω=∞ the phase change goes from -1 to 1 via i. Varying a allows you to vary where it passes through i.

If you modify a signal with an allpass filter, and average it with the original signal, you expect the signal to approach zero at frequencies where the phase change approaches -1. This gives a nice way to drop out part of the frequency range from a signal while keeping the rest relatively unmodified.

More generally, you can chain a sequence of allpass filter stages so as to multiply the signal by p[ω]=f[a1,ω]f[a2,ω]…f[an,ω], and then average with the original signal. By varying a1…an with respect to time you can make these dropouts sweep up and down the frequency scale.

When such a circuit is placed between a guitar and an amplifier it’s called a phaser [5]. Here’s a demo:
https://www.youtube.com/watch?v=rpdFZ5VDGDs
You can try to guess how the various controls relate to n and a1…an. I’m not sure, but the the ‘resonance’ may be moving the quantities a1…an off the real line so it’s no longer simply an allpass filter [6]. And the model I describe is probably only valid for small signals due to nonlinearities.

There is a lot of elementary complex analysis here. For example you’re essentially hearing the winding number of p in the complex plane because the number of times p takes the value -1 is the number of ‘dropouts’ in the frequency scale. That is, of course, related to the number of poles and zeros of p.

[1] By combinations of Ohm’s law and Kirchoff’s laws, and as long as the op amp is operating in a regime where its behaviour is linear.
[2] https://en.wikipedia.org/wiki/All-pass_filter

[3] Also see https://en.wikipedia.org/wiki/Electrical_impedance
[4] https://en.wikipedia.org/wiki/Möbius_transformation
[5] https://en.wikipedia.org/wiki/Phaser_(effect)
[6] Resonances happen when a pole in p approaches the real line so that the amplitude starts rising towards infinity.
3
1
Māris Ozols's profile photoDan Piponi's profile photo
2 comments
 
+Māris Ozols It's the special filters.
Add a comment...
Have him in circles
2,627 people
Alexey Levan's profile photo
Hemanth Kapila's profile photo
Lewis Smeeton's profile photo
Mariana Soto's profile photo
Nkosinathi Nkuna's profile photo
Jasiek Gryboś's profile photo
Sven Koschnicke's profile photo
Ferdy Toonk's profile photo
Alex Feinberg's profile photo

Dan Piponi

Shared publicly  - 
 
I came across a curious connection between continued fractions and combinatorics on arxiv. It'll take some setup. Let's see how short I can make this...

A "snake graph" is either a "tile" (consisting of 4 vertices and 4 edges) or is built from a snake graph by adding a tile (consisting of 2 new vertices and 3 new edges) above, or to the right of the top right tile in a snake graph. The picture shows an example.

Call adding a tile above a "zig" and a tile to the right a "zag". Think of the first tile as a zig. Then the graph in the picture is constructed from a sequence (zig, zig, zig, zag, zig, zig, zag, zag, zag, zig).

Given a sequence made from two letters (say A and B) you can "compress" it using run length encoding, ie. by listing the length of runs of one letter, starting with the number of A's. Eg. ABBAAABAB becomes (1,2,3,1,1,1).

You can also anti-run length encode it. List the lengths of the anti-runs, i.e. the stretches of alternating letters. So ABBAAABAB, made of anti-runs AB-BA-A-ABAB, becomes (2,2,1,4).

We can anti-run encode the zigs and zags in a snake graph. The pictured graph becomes (1, 1, 3, 5).

Given a sequence a=(a1, ..., an) of integers greater than 0 and with the last greater than 1, we can read it as the anti-run length encoding of zigs and zags and build the corresponding snake graph, G(a).

These sequences also define continued fractions via [a] = [a1, a2, ..., an] = f(a1+1/(a2+1/(...1/an))).

A perfect matching of a graph is a subset of its edges where every vertex in the graph touches precisely one edge in the subset. If an edge between two people expresses compatibility between them, a perfect matching is a way of pairing everyone off as compatible couples. Let m(G) be the number of perfect matchings of G.

Now comes the theorem:
[a1, a2, ..., an] = m(G(a1, a2, ..., an-1))/m(G(a2, a3, ..., an-1)).

In addition, the numerator and denominator are coprime so no cancellation is possible.

I wrote some code to test this theorem, though it wasn't intended to be particularly readable: https://gist.github.com/dpiponi/e78b31913d7a76b9c4a9a028c3874499

I got this result from a paper on cluster algebras, whatever they are: http://arxiv.org/abs/1608.06568 My notation might not follow the paper exactly.

BTW It's not hard to prove using elementary methods. You can almost read off a proof from my code.

And I probably made at least one mistake above...
19
7
David Eppstein's profile photoRongmin Lu's profile photoMike Stay's profile photoDan Piponi's profile photo
6 comments
 
+Mike Stay That's ASCII for a_n-1, not a_{n-1}. So a 1 gets subtracted off the last term. I can see that what I wrote was ambiguous.
Add a comment...

Dan Piponi

Shared publicly  - 
 
I went to a great talk on Tabby's star by Jason Wright yesterday. I'm going to assume you've heard about this star as it's been talked about all over the web. It's the star with the bizarre light curves that nobody can explain and which became associated with the idea of alien megastructures [1].

He went over some details. Here are some:

(1) The dips in intensity are as large as 20%. Hard to imagine a planet large enough to block 20% of a star's light. The other obvious hypothesis is that the blocker is dust. But dust is ruled out because it would re-emit energy in the form of infrared and no infrared peaks are visible in the star's spectrum. (This rules out Dyson spheres and other similar structures too.)

(2) Bradley Shaefer, an expert at working with old photographic plates, claimed the star had dimmed significantly over the the period from 1890 to 1989 [3]. But critics called into question the quality of this work. As you can imagine, it's hard to trust magnitude readings made from aging plates.

(3) Very recently the paper "KIC 8462852 Faded Throughout the Kepler Mission" [2] was published. Kepler was designed to find short term changes in star brightness, not long term trends. However, there is a sequence of Kepler images that were intended just for calibration that were repurposed to look for changes over the length of the mission. This was used to study not just Tabby's star, but also a few hundred other stars in a similar region of the sky, and a few hundred stars of a similar class. Guess which was the biggest outlier. The graph attached shows how the intensity of KIC_8462852 decreased over 4 years. No other star had such an extreme change in brightness.

Jason Wright had no explanation for this behaviour. The lack of infrared rules out many kinds of light blocker. Stars might be expected to show output variations like this over periods of a million years but not over much shorter periods. I like the starspot hypothesis myself.

[1] https://en.wikipedia.org/wiki/KIC_8462852
[2] https://arxiv.org/abs/1608.01316
[3] http://arxiv.org/abs/1601.03256
9
Larry Gritz's profile photoDan Piponi's profile photo
9 comments
 
Also from the paper: "Visual measures have the advantages of being fast, cheap, and simple, whereas scanning methods are always slow, expensive, and complex."

Now I'm going to have to actually read the paper. He did actually use many scans as well and I don't know why he mixed the two methods.
Add a comment...

Dan Piponi

Shared publicly  - 
 
Presented without comment, a handful of counts of deaths by various means in US in 2015. In each case I give my source. I didn't make much effort in checking the sources beyond checking the URL and in some cases a check of the web site mission statement.

"Islamic" terrorism: 20
[http://www.snopes.com/toddlers-killed-americans-terrorists/]

Killings by toddler: 21
[http://www.snopes.com/toddlers-killed-americans-terrorists/]

By lightning: 27
[http://www.lightningsafety.noaa.gov/fatalities.shtml]

Fatal bite from pit bull: 28
[http://www.dogsbite.org/dog-bite-statistics-fatalities-2015.php]

Killed by cops: 1186
[http://thinkprogress.org/justice/2015/12/28/3735190/killed-by-police-2015/]

Killed by gun: 12942
[https://www.thetrace.org/2015/12/gun-violence-stats-2015/]

Traffic fatalities: 38300
[http://www.npr.org/sections/thetwo-way/2016/02/18/467230965/2015-traffic-fatalities-rose-by-largest-percent-in-50-years-safety-group-says]

Cancer: 312150
[http://www.cancer.org/acs/groups/content/@editorial/documents/document/acspc-044509.pdf]
20
6
Allan Erskine's profile photoJean-Philippe Bernardy's profile photoPhil Gossett's profile photo
3 comments
 
Clearly, we should build a wall. A very short wall.

Add a comment...

Dan Piponi

Shared publicly  - 
 
I mentioned a while back that dimensions are a choice. Here's an example illustrating what I meant.

Here are some differential equations for modeling disease spread that I found on Wikipedia [1]. S, I, R and N=S+I+R correspond to the sizes of population groups and t is time.

dS/dt = -βIS/N
dI/dt = βIS/N-γI
dR/dt = γI

If you use standard physics dimensions then S, I, R and N are dimensionless numbers, t has the dimensions of T (time), and both β and γ have the dimensions T^-1.

Let K be a constant. Note that these differential equations are invariant under this substitution:

S → KS
I → KI
R → KR
N → KN

Whenever a set of equations has a scale invariance like this you can choose to introduce a new dimension with the power of K on the right hand side being the power of the dimension. So along with the usual M (mass), L (length) and T we can now add in P, for "person". So we have [S] = [I] = [R] = [N] = P^1 = P. With this new dimension, the left hand side of the first equation has dimension [dS/dt] = P/T and the right hand side has [-βIS/N] = T^-1 PP/P = P/T. So the first equation is consistent. Same goes for the others.

Among other things, you can use this to check your working. Further down that Wikipedia page is the equation

R0 = β/γ

So [R0] = T^-1/T^-1 = 1.

(That also makes intuitive sense as the Wikipedia article describes R0 as a ratio of numbers of people.)

A few lines later we have

S(t) = S(0)exp(-R0(R(t)-R(0)))

The argument of an exp function needs to be dimensionless. But

[R0(R(t)-R(0))] = 1P = P.

So using dimensions has paid off. We've found an error.

I think the Wikipedia article has mixed two conventions. One where S, I and R represent (dimensionless) proportions of the population in the three three groups, and one where S, I and R represent absolute numbers of people.

This is also something you can implement in software. By giving quantities appropriate types you can think of this as a way you can use a C++ or Haskell compiler, say, as a way to statically prove invariance properties of your quantities. In particular, with such a system in place it would be impossible to implement the putative expression for S(t) without the compiler complaining. Or for systems like Matlab and Excel [2] you can use a separate static analyser to check your code. And you can use it for any scale invariance you (or the machine) identify in your equations, not just the standard ones from physics.

[1] https://en.m.wikipedia.org/wiki/Compartmental_models_in_epidemiology#The_SIR_model_without_vital_dynamics
[2] https://web.engr.oregonstate.edu/~erwig/papers/DimErrors_JVLC09.pdf
16
4
David “goo.gl/LuFkzX” Tweed's profile photoDan Piponi's profile photoSuhail Shergill's profile photo
6 comments
 
+Dan Piponi​ it might be silly, but I know the feeling and can relate :) 
Add a comment...

Dan Piponi

Shared publicly  - 
 
E(X+Y)=E(X)+E(Y), where E is expectation, is an elementary fact from probability theory. But sometimes my intuition says it can't possibly be true when X and Y aren't independent. Two examples:

(1) A common solution to Buffon's needle problem [1] involves a trig integral. But the seemingly harder and more general Buffon's noodle problem [2] turns out to have a much easier solution. I'm pretty sure I'd come up with the trig integral solution immediately if I hadn't seen this problem before. But I never came up with the Buffon noodle solution despite it being a near-trivial application of E(X+Y)=E(X)+E(Y). I think the reason I found it hard to see was that it was hard to notice that expectations are truly linear for variables that are so clearly non-independent.

(2) In Markov Chain Monte Carlo (MCMC) [3] you generate a sequence of samples from a probability distribution that are often highly correlated with their neighbours. People sometimes deal with this by "thinning", i.e. by picking every Nth sample, for some N, in the hope that samples N apart will be less correlated. But if you're trying to compute expectations using an MCMC simulation then the linearity of expectation means you don't need to do this. In fact, if you're estimating using the sample mean then thinning increases the variance of your estimate.

For both cases I've seen articles online explicitly remind readers that expectations are linear irrespective of the independence of the random variables. So I suspect I'm not the only one with this intuition failure.

[1] https://en.wikipedia.org/wiki/Buffon%27s_needle
[2] https://en.wikipedia.org/wiki/Buffon%27s_noodle
[3] https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo
22
6
David Roberts's profile photoJohn Baez's profile photo
9 comments
 
Right.  And a great example shows up when you're doing integration on an infinite-dimensional Hilbert space H.  Any polynomial function on H only depends on finitely many coordinates, so you can integrate it with respect to the Gaussian measure exp(-<x,x>) dx on a finite-dimensional space, normalized so the integral of 1 is 1.    This procedure is consistent and define an integration algebra in Segal's sense.  

Using this we can define things like L^infinity(H), which is a commutative von Neumann algebra, and L^2(H), which is a Hilbert space isomorphic to the so-called "Fock space" of H.   They are completions of the polynomial algebra in various norms.  L^2(H) is the completion of the polynomials in the obvious L^2 norm.
Add a comment...
People
Have him in circles
2,627 people
Alexey Levan's profile photo
Hemanth Kapila's profile photo
Lewis Smeeton's profile photo
Mariana Soto's profile photo
Nkosinathi Nkuna's profile photo
Jasiek Gryboś's profile photo
Sven Koschnicke's profile photo
Ferdy Toonk's profile photo
Alex Feinberg's profile photo
Collections Dan is following
Education
  • King's College London
    Mathematics
  • Trinity College, Cambridge
    Mathematics
Links
YouTube
Other profiles
Contributor to
Story
Tagline
Homo Sapiens, Hominini, Hominidae, Primates, Mammalia, Chordata, Animalia
Introduction
Blog: A Neighborhood of Infinity
Code: Github
Twitter: sigfpe
Home page: www.sigfpe.com
Bragging rights
I have two Academy Awards.
Work
Employment
  • Google
Basic Information
Gender
Male