David Kagan

+

1

2

1

2

1

That is a fascinating way to look at things! I'm looking forward to reading these posts.

Start a hangout

Natural selection is like a process of learning! A population of organisms can be seen as a 'hypothesis' about the best way to reproduce. This hypothesis gets refined as the bad guesses get killed off while the good ones thrive. On the other hand, a widely used measure of biodiversity is mathematically identical to entropy. This suggests an interesting tangle of ideas relating *information*, *entropy* and *biodiversity*, along with Bayesian inference and evolutionary game theory. I've been thinking about this for a while; now I'll start explaining and exploring it on my blog.

25

8

43 comments

David Kagan

+

1

2

1

2

1

That is a fascinating way to look at things! I'm looking forward to reading these posts.

John Baez

+

1

2

1

2

1

They're gonna keep on going for a while! What I thought I'd do in the first, I'll probably get around to in the third. And that was supposed to be just the beginning.

glad to hear this ;)

Boris Borcic

+

1

2

1

2

1

Isn't there some sort of dual viewpoint emphasizing disparition over reproduction? I mean, a most prominent simplified portrait of natural selection that clearly displays the information flow originating in the environment has individual deaths serving as exclusive channel of that flow.

Deen Abiola

+

3

4

3

4

3

Are you aware of the work by machine learning theorist in the area? In particular, the body of work pioneered by Leslie Valiant? The learning power of evolution is characterized in terms of machine learning and its shown to be weaker than many current algorithms (weaker than Probably Approximately Correct, equivalent to correlational statistical queries). Search Valiant Evolution algorithms for interesting reading.

http://people.seas.harvard.edu/~varunk/docs/recombination-K11.pdf

http://www.almaden.ibm.com/cs/people/vitaly/papers/FV_Evolvability_COLT_OP_08.pdf

http://www.mpi-inf.mpg.de/~mehlhorn/SeminarEvolvability/p619-feldman.pdf

http://people.seas.harvard.edu/~varunk/docs/recombination-K11.pdf

http://www.almaden.ibm.com/cs/people/vitaly/papers/FV_Evolvability_COLT_OP_08.pdf

http://www.mpi-inf.mpg.de/~mehlhorn/SeminarEvolvability/p619-feldman.pdf

Exciting post. :)

Piotr Migdal

+

1

2

1

2

1

I would suggest the opposite: survival of the fittest information.

And in a general context: that science and technology is just a collection of ideas, theories and hypothesis which are evolving and multiplying, being fitter than other approaches to understand the world (e.g. religion, superstitions, common wisdom, most of personal intuitions, ...).

And in a general context: that science and technology is just a collection of ideas, theories and hypothesis which are evolving and multiplying, being fitter than other approaches to understand the world (e.g. religion, superstitions, common wisdom, most of personal intuitions, ...).

+Deen Abiola - Thanks for the references! I wasn't aware of these. I get the feeling that machine learning, evolutionary game theory and the study of biodiversity live in parallel universes with limited communication between them, but I'm just starting to learn all three, so I expect there's lots of interesting interplay that I haven't bumped into yet.

+Piotr Migdal - yes, all that is interesting too. I'd call it the converse, the flip side of the same coin, rather than the opposite in the sense of one being right and the other wrong.

+John Baez I meant "the converse", thx for pointing this out.

+John Baez, Recently, Game Theory has been getting a lot of attention in machine learning - stuff like regret minimization, on line learning and reinforcement learning. Parts of evolutionary game theory see use in coevolutionary optimization - an area I am really interested in.

I think it's mainly the biodiversity and real biology people that are most segregated. Which is a shame because it is amazing to see these same things keep popping up everywhere. Our knowledge base has too little entropy heh. Maybe category theory can help as a bridge there...

Is information fundamental or does the digital age color our view the way clockwork mechanics did 200 years ago? Maybe I am biased but I vote for information as fundamental.

I think it's mainly the biodiversity and real biology people that are most segregated. Which is a shame because it is amazing to see these same things keep popping up everywhere. Our knowledge base has too little entropy heh. Maybe category theory can help as a bridge there...

Is information fundamental or does the digital age color our view the way clockwork mechanics did 200 years ago? Maybe I am biased but I vote for information as fundamental.

Regret minimization ? Does that apply to the recurrence of cause for regret (like a lethal gene) or to something more "tachyonic" ?

(Of course you are just naming a concept common both to machine learning and decision theory. My point is that the name of "regret" drives common sense to the garden path of postulating a case of hindsight and a side-effect of immutable*post hoc* information - what makes the jargon confusing since that's apparently not the intention)

(Of course you are just naming a concept common both to machine learning and decision theory. My point is that the name of "regret" drives common sense to the garden path of postulating a case of hindsight and a side-effect of immutable

Deen Abiola

+

1

2

1

2

1

Hehe no. I was just giving an example of something that occurs in both machine learning and game or decision theory.

I've wanted to know for many years whether entropy **is** information or a measure of information or neither; and is there a valid law of conservation of information?

Akira Bergman

+

3

4

3

4

3

Information Theory seems to have a unifying role in many fields; electrical engineering, physics, mathematics, computing. There are even proofs? of the Riemann's hypothesis in IT formalism.

I am looking forward to these posts.

I am looking forward to these posts.

Boris Borcic

+

1

2

1

2

1

Maybe we should start betting on what John Baez will soon expose to us! My own best bet : given the prof's superpowers he will complete this work by uniting physics to evolutionary ecology *via* quantization of the string length of biographies! <0.382 wink>

John Baez

+

2

3

2

3

2

I think information is to data as mass is to matter: a simple universal way of saying 'how much', while neglecting all the details of 'what'. As such it's very limited, but also a great bridge between different worlds of thought.

The "log" in the entropy definition reminds me the "1/log" in the prime density.

+John Baez then there should also be negative mass, if there is negative information.

John Baez

+

1

2

1

2

1

+Jim Stuttard wrote: "I've wanted to know for many years entropy is information or a measure of information or neither..."

Do you know the formula for entropy of a probability distribution? That's probably the most important thing to know if you want to know what entropy is. Then Shannon showed that the entropy of a probability distribution on strings of symbols can be seen as the average amount of information contained in a string randomly chosen according to this probability distribution.

Here's another way to think about: if you know a probability distribution on a set, and then someone hands you a member of this set chosen according to this probability distribution, the amount of information you receive is, on average, the entropy of this distribution.

Another name for probability distribution is 'random variable'.

So here's another way to say what I just said: the entropy of a random variable is the amount of information you're*missing* until someone tells you its value.

Do you know the formula for entropy of a probability distribution? That's probably the most important thing to know if you want to know what entropy is. Then Shannon showed that the entropy of a probability distribution on strings of symbols can be seen as the average amount of information contained in a string randomly chosen according to this probability distribution.

Here's another way to think about: if you know a probability distribution on a set, and then someone hands you a member of this set chosen according to this probability distribution, the amount of information you receive is, on average, the entropy of this distribution.

Another name for probability distribution is 'random variable'.

So here's another way to say what I just said: the entropy of a random variable is the amount of information you're

John Baez

+

1

2

1

2

1

+Jim Stuttard wrote: "is there a valid law of conservation of information?"

If you take a random variable and apply a one-to-one function to it, you get a new random variable with the same entropy. This is an easy mathematical theorem.

So, we say that "deterministic, reversible processes conserve entropy" or "deterministic, reversible processes conserve information".

If you take a random variable and apply a one-to-one function to it, you get a new random variable with the same entropy. This is an easy mathematical theorem.

So, we say that "deterministic, reversible processes conserve entropy" or "deterministic, reversible processes conserve information".

John Baez

+

2

3

2

3

2

By the way, +Jim Stuttard, if my remarks on information and entropy were too terse to make much sense, this is a good introduction:

http://en.wikipedia.org/wiki/Entropy_%28information_theory%29

In particular, it gives the all-important formula I mentioned but did not give.

http://en.wikipedia.org/wiki/Entropy_%28information_theory%29

In particular, it gives the all-important formula I mentioned but did not give.

+John Baez: If I may go a bit off topic, I've been thinking for a while about the emergence of replicators (the pre-evolutionary phase), and came to a very simple idea that I haven't seen anywhere. Suppose we have a soup with chemicals, some of which are catalists (ferments), i.e. they facilitate creation of new organic molecules from those already present. If we consider their action as a map on the space of all possible molecules, this map should be reducing volume, because it's likely one catalist can only produce one product, but two different catalists may well produce the same product. Then the map has a fixed point, and with enough iterations we'll arrive at it. And that is a replicator (if it's different from zero).

Have you seen this idea discussed? If not, do you see any serious holes in it?

Have you seen this idea discussed? If not, do you see any serious holes in it?

It sounds like a good idea to me. You might take a look at Eigen and Schuster's work on "the dynamic hypercycle":

http://jaguar.biologie.hu-berlin.de/~wolfram/pages/seminar_theoretische_biologie_2007/literatur/schaber/Eigen1977Naturwissenschaften64.pdf

http://jaguar.biologie.hu-berlin.de/~wolfram/pages/seminar_theoretische_biologie_2007/literatur/schaber/Eigen1978Naturwissenschaften65a.pdf

http://jaguar.biologie.hu-berlin.de/~wolfram/pages/seminar_theoretische_biologie_2007/literatur/schaber/Eigen1978Naturwissenschaften65b.pdf

I think you'd really like it! In fact I should read it too. It talks about the origin of life using dynamical systems theory and topology, and it talks, in the second part, about finding fixed points of some flow.

http://jaguar.biologie.hu-berlin.de/~wolfram/pages/seminar_theoretische_biologie_2007/literatur/schaber/Eigen1977Naturwissenschaften64.pdf

http://jaguar.biologie.hu-berlin.de/~wolfram/pages/seminar_theoretische_biologie_2007/literatur/schaber/Eigen1978Naturwissenschaften65a.pdf

http://jaguar.biologie.hu-berlin.de/~wolfram/pages/seminar_theoretische_biologie_2007/literatur/schaber/Eigen1978Naturwissenschaften65b.pdf

I think you'd really like it! In fact I should read it too. It talks about the origin of life using dynamical systems theory and topology, and it talks, in the second part, about finding fixed points of some flow.

+John Baez wrote:

"If you take a random variable and apply a one-to-one function to it, you get a new random variable with the same entropy. This is an easy mathematical theorem.

So, we say that "deterministic, reversible processes conserve entropy" or "deterministic, reversible processes conserve information". "

That was the formulation I was missing. Now, perhaps, I can start to think about invariants for a stochastic probability monad. So many thanks for that.

"If you take a random variable and apply a one-to-one function to it, you get a new random variable with the same entropy. This is an easy mathematical theorem.

So, we say that "deterministic, reversible processes conserve entropy" or "deterministic, reversible processes conserve information". "

That was the formulation I was missing. Now, perhaps, I can start to think about invariants for a stochastic probability monad. So many thanks for that.

John Baez

+

3

4

3

4

3

Sure thing! When you apply a function that's not one-to-one to a random variable, you get a new random variable that can have less information. Some friends and I came up with a slick way to *uniquely characterize* Shannon's notion of information in terms of the resulting 'information loss':

John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss,*Entropy* **13** (2011), 1945-1957, http://www.mdpi.com/1099-4300/13/11/1945

We would have published our result in*Information*, but that journal said our result was meaningless random noise, so we submitted it to *Entropy*.

John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss,

We would have published our result in

This idea, when I first learned it, helped me understand how it was that evolution could sometimes be so darn clever. It's not that there is no intelligence behind evolution; it's that the intelligence behind it is more like an artificial intelligence algorithm than like a person. It creates species as wildly as I create ideas, and the destruction of a species is as significant to evolution as my destroying an idea in my own mind.

John Baez

+

1

2

1

2

1

+Douglas Summers-Stay - that's a nice way of putting it. The papers Deen Abiola pointed us to in the comments here try to prove certain learning algorithms are more capable than evolution - at least as described by certain mathematical models of evolution. That's pretty interesting... but it's also interesting to ponder whether actual biological evolution might be more powerful than evolution as formalized by these models. Of course, the author of that paper, and all the supposedly more powerful algorithms he invents, are ultimately products of biological evolution! But there's also a fascinating line of work on 'the evolution of evolvability', and also a lot of new evidence from biology that Lamarck was partially right: acquired characteristics can be inherited, thank to various 'epigenetic' tricks. There's so much to think about here....

All the machine learning algorithms that I can think of, including most of the ones called "evolution," start with a fixed space of possibilities (a literal high-dimensional space), and search within that space for the maximum. But evolution of life seems to have a way of increasing the possibility space as it searches. Stuart Kauffman has written about this.

I wonder if those epigenetic options are also selected and strengthened with use, like the brain connections.

+Douglas Summers-Stay - I don't think evolution really increases the space of possibilities. I just think that space is huge beyond our understanding, with many 'mountain ranges' that must be crossed to reach promising new valleys.

+Akira Bergman - Good question! I bet we're just beginning to understand the amazing links between the brain, the immune system, the epigenetic system and our genes. For example, only recently did people discover that if your mother suffered from hunger, certain genes of *yours* will be switched on that make it easier for you to gain weight:

http://en.wikipedia.org/wiki/%C3%96verkalix_study

http://en.wikipedia.org/wiki/%C3%96verkalix_study

More cool epigenetic stuff:

A study has shown childhood abuse (defined in this study as "sexual contact, severe physical abuse and/or severe neglect") leads to epigenetic modifications of glucocorticoid receptor expression which play a role in HPA activity. Animal experiments have shown that epigenetic changes depend on mother-infant interactions after birth. In a recent study investigating correlations among maternal stress in pregnancy and methylation in teenagers and their mothers, it has been found that children of women who were abused during pregnancy were significantly more likely than others to have methylated glucocorticoid-receptor genes, which in turn change the response to stress, leading to a higher susceptibility to anxiety.

http://en.wikipedia.org/wiki/Transgenerational_epigenetics

A study has shown childhood abuse (defined in this study as "sexual contact, severe physical abuse and/or severe neglect") leads to epigenetic modifications of glucocorticoid receptor expression which play a role in HPA activity. Animal experiments have shown that epigenetic changes depend on mother-infant interactions after birth. In a recent study investigating correlations among maternal stress in pregnancy and methylation in teenagers and their mothers, it has been found that children of women who were abused during pregnancy were significantly more likely than others to have methylated glucocorticoid-receptor genes, which in turn change the response to stress, leading to a higher susceptibility to anxiety.

http://en.wikipedia.org/wiki/Transgenerational_epigenetics

It seems the genes grow into a tree like structure, with the optional parts (at the outermost part) switched on and off by epigenetics to find a way for the tree to grow. A bit like a climbing plant.

I found John Denker's account of probabilistic entropy on his excellent website www.av8n.com/physics/thermo/ and Ivan Bratko and Alex Jakulin's

Quantifying and Visualizing Attribute Interactions:

An Approach Based on Entropy http://arxiv.org/pdf/cs/0308002v3.pdf very useful.

I hope John will write something about configurational entropy which has been source of controversy in the past (if he integrates his log (1/p)) :).

Quantifying and Visualizing Attribute Interactions:

An Approach Based on Entropy http://arxiv.org/pdf/cs/0308002v3.pdf very useful.

I hope John will write something about configurational entropy which has been source of controversy in the past (if he integrates his log (1/p)) :).

John Baez

+

2

3

2

3

2

+Akira Bergman wrote: "...then there should also be negative mass, if there is negative information".

If there were particles with negative mass, the vacuum would be unstable and all hell would break loose: basically, particle-antiparticle pairs could spontaneously form while particles of other sorts*gained* energy, keeping the total energy conserved.

Interestingly, many physicists think there was an 'inflationary era' in the early history of the Universe, from 10^{-36} seconds after the Big Bang to around 10^{-33} or 10^{-32} seconds, when there*were* particles of negative mass. And they think all hell *did* break loose: the universe expanded by a factor of 10^{26}. This is far from certain, but there is some reasonably good evidence for it.

Having negative mass particles around is slightly like having ATM's on every corner that give out free money. Rampant inflation. :-)

If there were particles with negative mass, the vacuum would be unstable and all hell would break loose: basically, particle-antiparticle pairs could spontaneously form while particles of other sorts

Interestingly, many physicists think there was an 'inflationary era' in the early history of the Universe, from 10^{-36} seconds after the Big Bang to around 10^{-33} or 10^{-32} seconds, when there

Having negative mass particles around is slightly like having ATM's on every corner that give out free money. Rampant inflation. :-)

Wouldn't the consequent "all hell breaking loose" part be, not the gigantic expansion, but the ending of it (the roll-off into our less-inflationary vacuum)? Just being a nitpicker again...

+John Baez: thanks, I'll take a look! The words "dynamic hypercycle" ring a bell, but I can't remember why.

Deen Abiola

+

2

3

2

3

2

+John Baez +Douglas Summers-Stay

One of the papers I link to shows how Valiant's model of evolution's learning can be vastly sped up by introducing a recombination opertator. It makes a passing reference to evolutionary game theory work when discussing what is required in the reality of natural selection and genetic recombination for the model to hold. http://people.seas.harvard.edu/~varunk/docs/recombination-K11.pdf The strength of this new model is not given. But it is not hard to imagine that machine learning algos can be "smarter".

Evolution's learning ability is clearly surpassed in speed by humans - indeed this fact is what many proponents of a super human AI argue will allow us to produce something smarter than us. it's happened before. In many areas (where the assumptions of an exponential distribution or convexity or smooth differentiable objective function hold) machine learning algos already surpass everything on the planet.

What humans and to some extent evolution do best and what even the best ML algos struggle with is in managing complexity by a kind of layering of past experience to quickly solve more complex problems (what Summers was hinting at). That is, most search algorithms are not very good at reducing search time by improving the search space as they learn. Instead they blindly grow the search space exponentially with problem size. New methods including deep learning and transfer learning and search space improving heuristics seek to improve ML beyond the idiot savants that only excel in well behaved problems (the kind that is not readily found in reality).

One of the papers I link to shows how Valiant's model of evolution's learning can be vastly sped up by introducing a recombination opertator. It makes a passing reference to evolutionary game theory work when discussing what is required in the reality of natural selection and genetic recombination for the model to hold. http://people.seas.harvard.edu/~varunk/docs/recombination-K11.pdf The strength of this new model is not given. But it is not hard to imagine that machine learning algos can be "smarter".

Evolution's learning ability is clearly surpassed in speed by humans - indeed this fact is what many proponents of a super human AI argue will allow us to produce something smarter than us. it's happened before. In many areas (where the assumptions of an exponential distribution or convexity or smooth differentiable objective function hold) machine learning algos already surpass everything on the planet.

What humans and to some extent evolution do best and what even the best ML algos struggle with is in managing complexity by a kind of layering of past experience to quickly solve more complex problems (what Summers was hinting at). That is, most search algorithms are not very good at reducing search time by improving the search space as they learn. Instead they blindly grow the search space exponentially with problem size. New methods including deep learning and transfer learning and search space improving heuristics seek to improve ML beyond the idiot savants that only excel in well behaved problems (the kind that is not readily found in reality).

+John Baez: I took a look at Eigen and Schuster's work, specifically where they discuss fixed-point ideas. I think it is different from what I had in mind, because their space is the N-dimensional space of concentrations of N types of molecules, while my space is the abstract space where points are molecule species. I won't pollute your space here with the details, I probably should write it all down sometime...

Distinctly over my head in this conversation, but I just tripped across a chart that supports my sense that, if there's a gap between the study of biodiversity and game theory and machine learning, it's not going to be the biologists who are going to be able to build the bridge. They can't. They haven't got the skill set.

http://eideneurolearningblog.blogspot.com/2012/06/education-for-misfits-and.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogspot%2Fhyhm+%28Eide+Neurolearning+Blog%29&utm_content=Google+Reader

http://eideneurolearningblog.blogspot.com/2012/06/education-for-misfits-and.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogspot%2Fhyhm+%28Eide+Neurolearning+Blog%29&utm_content=Google+Reader

+John Baez: I recalled where I heard the words "dynamic hypercycle" -- it was associated with Haken's "synergetics".

+Marc Harper I am not looking inside the function. Only the results are needed to make this conclusion. You can capture a very good model of how evolution learns and then ask, what kind of functions can it learn? You can then compare it to humans and computational learning algos. That is what the papers I mentioned above do, one of the authors who has won the computing equivalent of a Nobel prize, for whatever that is worth.

I actually specialize in something called Genetic Programming so I have a lot of respect for what we can learn from nature. However, like its natural motivation, genetic programming tends to be very slow because its manner of following a gradient is implicit rather than explicit. I am seeking a way to be able to more explicitly guide this search.

Evolution learns at a scale measured in hundreds to tens of thousands of years. It learns by generating an incredibly large amount of hypotheses and unless the queries (organisms) are simple it waits many thousands of years to stabilize at a sufficiently optimal distribution on the best set of phenotypes. Human's learn on a scale of decades to centuries. With new improvements in biotech we have gotten to a point where we can manipulate our own genetics and protein processing without the messy process and glacial pace of evolution. Even before biotech we were able to direct the development of specific phenotypes more quickly than the blind patience of evolution with this thing called breeding.

I actually specialize in something called Genetic Programming so I have a lot of respect for what we can learn from nature. However, like its natural motivation, genetic programming tends to be very slow because its manner of following a gradient is implicit rather than explicit. I am seeking a way to be able to more explicitly guide this search.

Evolution learns at a scale measured in hundreds to tens of thousands of years. It learns by generating an incredibly large amount of hypotheses and unless the queries (organisms) are simple it waits many thousands of years to stabilize at a sufficiently optimal distribution on the best set of phenotypes. Human's learn on a scale of decades to centuries. With new improvements in biotech we have gotten to a point where we can manipulate our own genetics and protein processing without the messy process and glacial pace of evolution. Even before biotech we were able to direct the development of specific phenotypes more quickly than the blind patience of evolution with this thing called breeding.

Add a comment...