Shared publicly  - 
Here Cosma Shalizi argues that in the Bayesian approach to probability theory, entropy would decrease as a function of time, contrary to what's observed. He concludes: "Avoiding this unphysical conclusion requires rejecting the ordinary equations of motion, or practicing an incoherent form of statistical inference, or rejecting the identification of uncertainty and thermodynamic entropy."

His argument uses more jargon than strictly necessary, which may intimidate potential critics, so let me summarize it.

SHORT STATEMENT: Entropy is a measure of ignorance. Suppose we start off with some ignorance about a situation and then occasionally make measurements as time passes. The past completely determines the future, so our ignorance never increases. When we make measurements, our ignorance decreases. So, entropy drops.

MATHEMATICAL STATEMENT: Suppose we describe a situation using a probability distribution on a measure space X. The entropy of this probability distribution says how ignorant we are: the more smeared-out the distribution, the higher its entropy. Suppose that as time passes, two kinds of processes occur. 1) As time passes without measurements being made, the probability distribution changes via a measure-preserving function f: X -> X. This does not change its entropy. 2) When a measurement is made, we Bayes' rule to update our probability distribution. This decreases its entropy.

You may take my word for this: the mathematical statement is correct. The question is thus whether he's drawing the correct conclusions from the math! I have my own opinions - but I prefer to let you mull on it a bit.

Thanks to +Alexander Kruel for pointing this out!

Abstract: Many physicists think that the maximum entropy formalism is a straightforward application of Bayesian statistical ideas to statistical mechanics. Some even say that statistical mechanics is ...
John Baez's profile photoDmitri Manin's profile photoIan Durham's profile photoRahul Siddharthan's profile photo
Interesting, and helpful with something I've been pondering and tinkering with.
For one "The past completely determines future" wouldn't work in real world or in quantum or statistical mechanics system or system where measurement affect the system . In the closed classical mechanics system with no measurement effects we can actually decrease entropy with number of measurements in Bayesian sense - we can exactly determine its state and that would make its entropy zero, correct?
Sounds like Jaynes's interpretation of entropy. And not to harp on relationalism, but is it possible to reconcile the two by saying that Shalizi is right about the uncertainty present in a particular observer-observed interaction, but this says nothing about the amount of entropy present in the "entire" physical system? After all, if measurement is physical interaction, decreasing my observed uncertainty about a quantity might be increasing somebody else's uncertainty about a quantity given that they aren't observing my measurement, just its effect on the system being measured.
I remember wanting to point you to that paper when I first saw it a while ago... I apologize for forgetting; I think my arrow of time was pointing the wrong way round. Or the right way.
I have only read the first couple of pages but it sounds very confused to me. He is arguing that updating your knowledge of the system, by making a measurement on the system, decreases the entropy of the system. "Under repeated measurements, the entropy is non-increasing, either on average or strictly, depending on whether the measurements are noisy or not." This is certainly true. It has nothing to do with the entropy of the universe. Thermodynamics says only that the entropy of an isolated system increases. A system that is being measured by an external observer is not isolated.

To make it clearer -- suppose you could, somehow and via repeated careful measurements, determine to a good accuracy the position and velocity of every molecule in a box of gas. That would sharply decrease the entropy of the system, at least momentarily. But, (a) the measurements required would increase entropy elsewhere in the universe, (b) the entropy of the gas would rapidly rise again: your information would be lost within nanoseconds (I'm guessing).
ps - +John Baez : I'm also unsure about two things you write. "The past completely determines the future, so our ignorance never increases." See the gas-in-a-box example above for a trivial counterexample. And, "As time passes without measurements being made, the probability distribution changes via a measure-preserving function f: X -> X. This does not change its entropy". This is akin to Liouville's theorem that phase volume doesn't change under Hamiltonian flow. But this is true only if your probability distribution (or phase-space distribution) is itself perfectly defined. Once you "coarse-grain" it is no longer true. And, of course, you can never know positions and momenta to infinite accuracy (even apart from quantum limitations). That, I thought, is the entire reason why entropy increases for an isolated system: our initial knowledge can never be enough to determine the system's state for all time to come.
Leaving aside questions of physics on which I'm not qualified to comment, it doesn't make sense that the way we update our beliefs based on new data should depend on issues of thermodynamics.
Do I understand correctly that this paper defines entropy in terms of the uncertainty we have about the state of the system given all our past observations? Usually entropy is defined it terms of the uncertainty we have about the state of the system given only the CURRENT value of some macroscopic variables. The paper is not disputing that the entropy defined in that way is increasing, or is it?
+Jane Shevtsov - I have trouble drawing sharp lines between information theory, probability theory, statistical mechanics, and thermodynamics: they seem like slightly different viewpoints on the same subject, namely reasoning about situations where we have incomplete information.

So, to me it isn't a priori bizarre that some particular approach to updating our beliefs might lead us to the conclusion that we always know more and more about the world, contradicting what we know from thermodynamics, namely that in many situations we effectively know less and less, because the information we had gets reshuffled around in ways that make it useless to us.

However, I don't think Shalizi is accomplishing as much as he seems to think...
+Matt McIrvin - apology accepted, but please don't make that mistake even further back in the past.
+Rahul Siddharthan - I don't have much time right now (I've hit a patch of spacetime where there aren't many seconds per second), so I'll just say a little.

My statement

"The past completely determines the future, so our ignorance never increases"

is my summary-in-words of what Shalizi is claiming, not something I personally believe. My mathematical statement along the same lines:

"if a probability distribution changes via a measure-preserving function f: X -> X, this does not change its entropy"

is true... but of course the question then becomes: when is this a good way to think about how our knowledge of the system changes with the passage of time? And as you note, it leaves out all issues of coarse-graining...
I think I agree with +Rahul Siddharthan .

Ignoring the 'observer-is-part-of-the-system' thing for a moment, there is still the issue of limitations to the observer's ability to represent the state of the system (is this what is meant by coarse-graining in this context?). If the system's dynamics rapidly 'mix' its phase space, then I could imagine that a set of states that are indistinguishable to the observer would rapidly 'smear out'.

Perhaps then an interesting question to ask would be: Given an observer capable of making measurements at some rate, and with the ability to represent the system to some resolution, how much would the entropy of the system be reduced for that observer, relative to the 'physical entropy' of the system?

This also reminds me of Maxwell's demon, and Landauer's principle: even if the observer is able to measure the state of the system rapidly and effectively enough to substantially reduce the 'subjective' entropy of the system, it still won't be able to extract infinite work from the system without heating itself up (or without having an infinite memory or something...).
+Duncan Mortimer - I was thinking of Maxwell's demon too. The identification of entropy as information resolved, rather than created, paradoxes, I believe :)

+John Baez - my comment on coarse-graining requires a bit more thought when applied to probability distributions.
Remember, Bayesian probabilities for dynamical systems really carry two time indices: one for the time a gamble concerns, and one for the time an agent makes that gambling commitment. The statement "Liouville evolution leaves the Shannon entropy ucnchanged" concerns probability assignments for whcich the former index changes but the latter stays the same. For the statement "Conditioning on new information lowers the Shannon entropy", it's the other way around.

It's also woth pondering that the way physics students are taught (or, at least, the way I was taught in three different classes) to relate Shannon entropy and thermodynamic entropy assumes equal a priori probabilities across surfaces of constant energy in phase space. If your information about the system is so good that you are willing to violate this assumption, marking some microstates of energy E as more likely than others because you've made some super-fine-grained measurement, then you have to rethink the 
... connection between Shannon and Clausius.

[Sorry, I can't seem to edit my G+ comments with my phone's browser, so my comment is split.]

+Duncan Mortimer wrote: "Ignoring the 'observer-is-part-of-the-system' thing for a moment..."

Good, I think we should ignore that. It's complicated, and I don't think we need to introduce it here. I also don't think we need to introduce quantum mechanics. Shalizi is supposing that we, the observers, are making measurements of a classical system that does not contain us, without disturbing this system. I think we can solve his problem - "what happens to the 2nd law?" - while working within these assumptions.

"... there is still the issue of limitations to the observer's ability to represent the state of the system (is this what is meant by coarse-graining in this context?)."

Yes, 'coarse-graining' refers to the fact that while the system's state lies in some set X, we typically can't measure everything about its state. We often imagine that X is partitioned into subsets, and all we can tell is which subset the state lies in.

These subsets are often called 'macrostates', but you can visualize them as 'coarse grains', like seeds of wheat in a sack.

"If the system's dynamics rapidly 'mix' its phase space, then I could imagine that a set of states that are indistinguishable to the observer would rapidly 'smear out'."

Right! That's crucial. Shalizi is assuming that if we initially know the systems state is in some subset S of X, then after some time evolution we know it's in the set f(S), where f: X -> X describes time evolution. But f(S) will typically be a very complicated set... too complicated for us to think about. So, typically we're only able to remember that the system is some larger set containing f(S). And that's one reason entropy increases.
+John Baez -- I'm not bringing quantum mechanics into it. Even classically, you cannot observe a closed system, by definition of a closed system. If you are shining light on it, you are interacting with it. If you are looking at the light it emits, well, if it's emitting light it's not a closed system (or the light is a part of the system and you are interacting with it). The second law of thermodynamics applies ONLY to closed systems. And any kind of measurement requires an interaction, which makes it not a closed system. There is really no contradiction here.
In addition, if the system we're studying is open, buffetted by the outside world, then the "synchronic" probability assignments---the gambles that I will make today about what will happen tomorrow and the day after that---will not be related by Liouville evolution. Instead, if I draw a blob in phase space to represent what the system is doing today, then the blob I draw for what it can be doing tomorrow will be fuzzed out.

(This applies in both directions: the mixing-up caused by the outside world means that my information about what's happening now becomes less and less relevant the farther into the future I try to project. It also means that my information about now becomes less and less relevant as I try to reconstruct what happened deeper and deeper into the past. Attempts to derive an ever-increasing entropy from statistical mechanics smuggle in a time asymmetry somewhere. For example, in proving Boltzmann's H-theorem in the kinetic theory of gases, one treats the joint probability for atoms before a collision differently than the joint probability for atoms after a collision. You can see the gory details of the BBGKY factorization and all that in Kardar's Statistical Physics of Particles. Exactly where the time asymmetry comes in depends on the derivation you're reading, and I haven't made a systematic survey of who does what! I work on theoretical systems which are quite decidedly open and synchronic probability relationships are Markovian rather than Liouvillian, so my day job is somewhat removed from the more philosophical questions.)
+Rahul Siddharthan wrote: "I'm not bringing quantum mechanics into it."

I know you're not. My mention of quantum mechanics referred to +Sergey Ten's comment:

"For one "The past completely determines future" wouldn't work in real world or in quantum or statistical mechanics system or system where measurement affects the system."

I just wanted to say that our discussion will be better focused if we ignore quantum mechanics.

"Even classically, you cannot observe a closed system, by definition of a closed system."

True, but in classical mechanics we usually make the idealization that we can measure the state of a system with an arbitrarily small disturbance of the state... and then proceed to focus our attention on an idealized limiting case where we act like we can measure the system's state exactly without disturbing it. There might be problems where this idealization is dangerous, but I don't believe this is one. I think we can crack this problem without bringing that extra subtlety into it.

In other words: I don't think that extra sublety - the effect of the observation on the system being observed - is the main reason entropy increases in classical statistical mechanics.

Unless, that is, you're calling the Bayesian updating of the probability distribution in my process 2) an "effect of observation on the system being observed". I don't consider it that, since it doesn't involve, for example, photons pushing on particles in the observed system. It's just us updating the probability distribution we use to describe the system. But to some extent this is just a semantic issue.
Argh. Leaving long comments on G+ with my phone is really exasperating, and I won't have time to address all this seriously for several days anyway... no one told me when to run, I missed the starting gun...

Blake wrote: "In addition, if the system we're studying is open, buffeted by the outside world, then the "synchronic" probability assignments---the gambles that I will make today about what will happen tomorrow and the day after that---will not be related by Liouville evolution. Instead, if I draw a blob in phase space to represent what the system is doing today, then the blob I draw for what it can be doing tomorrow will be fuzzed out."

Right. Right now I think this is the main reason why Shalizi's argument fails to have any teeth.

I'd say coarse-graining is a second, fall-back reason.
Without thinking too much about this, the critical issue that matters is Does nature (the material world) have irreducible randomness? (completely independent of any concept of knowledge)
+John Baez , you wrote: "So, typically we're only able to remember that the system is some larger set containing f(S). And that's one reason entropy increases." So you seem to disagree with my earlier statement that entropy is usually defined with respect to just the currently measured values of the macroscopic variables. You seem to say that the definition should also take into account some memory we have about past measurements. Do I understand you correctly?
+John Baez In classical mechanics we usually make the idealization that we can measure the state of a system with an arbitrarily small disturbance of the state... Unless, that is, you're calling the Bayesian updating of the probability distribution in my process 2) an "effect of observation on the system being observed".

I guess I am saying yes to the second statement, and am unconvinced by the idealization in the first statement. Shalizi's argument is purely about the information learned about the system by making measurements. But the entropy reduction due to this information gain, it seems to me, would typically be dwarfed by the entropy increase due to other reasons. Eg, you measure the magnetisation of a block of metal: the information you gain from that reduces the entropy, but it's a tiny reduction compared to the total entropy of the block of metal, and meanwhile the heat emitted by the experimental setup has contributed more entropy to the world.

It is possible that you can contrive a system where the Bayesian updating of the probability is the dominant contribution to the total entropy and the entropy of the system does measurably decrease because of your measurements. My point is that, even in this case, it is not a violation of the second law, because -- having ruled out the huge "background" entropy that would normally dwarf this effect -- you can no longer use the "idealization that we can measure the state of a system with an arbitrarily small disturbance of the state". If the major contribution to the entropy is the information about the state that you are measuring, then it can't really be a macroscopic/thermodynamic system, and you can't pretend that the measurement doesn't change the system. So the second law doesn't apply anyway.
I haven't read the paper. But it's possible to prove that the sum of the entropy increase associated to the measurement outcomes and the entropy change in the system being observed is non-negative. In other words, you pay for the entropy decrease in the system with the (Landauer) entropy increase associated to your measurement outcomes. A version of this argument is given in section 14.4.4 of my book with Ike Chuang. Our proof is quantum, and has the classical case as a corollary. It's framed as a discussion of quantum error-correction, but the analysis applies much more generally. It's possible to prove even stronger results than are in the book, but I don't recall references offhand.
I think he's misinterpreting the mathematics. Jaynes is a Bayesian - and one of the most nuanced and careful thinkers I have ever read - and it's quite clear entropy doesn't go down as we update. I think he may be confusing some of the points with regard to conditional entropy (in addition, as +Michael Nielsen hinted at, unless it's a closed system, you pay for it somewhere - entropy can go down, even in classical thermodynamic systems, as long as the system isn't closed - and no system is perfectly closed except maybe the universe).

I also have long had a gripe with the interpretation of entropy as a measure of "ignorance" or "disorder." I prefer to view it as a measure of "possibility." When you view it that way, it's obvious that when you update your information the entropy will go down since you've now cut down on the number of possibilities you have (incidentally this interpretation is motivated by work by Tom Moore and Dan Schroeder along with a re-interpretation of Jaynes).
It's my bedtime, so I can't respond now to all the interesting points made here. I just wish Shalizi had talked to all of us before putting his paper on the arXiv.
On reading the post, the top answer over there by Piotr Migdal was how I was inclined to answer, saying that for your knowledge to increase, entropy somewhere else must increase. But anyone who is thinking about entropy and information must be aware of that-- surely he couldn't have made such a mistake? I haven't read the paper.
+Douglas Summers-Stay I think it was the 80s before the argument that there's a cost to erasing bits of your memory was fully articulated. Maybe Shalizi was just a few years out of date.
+John Baez +Blake Stacey Are you guys saying that the problem is that the system is changing over time, while Bayesian probability updating requires that you observe the same system (same distributions, parameters, etc.) each time?
+Jane Shevtsov The point which both John and I voiced is that the assumption "As time passes without measurements being made, the probability distribution changes via a measure-preserving function f: X -> X" is simply not applicable in many, if not most, of the situations we actually care about.

Really, I think the idea that as we learn more about a system, we're better equipped to extract energy from it, is rather less than shocking, particularly in post-Szilárd and post-Landauer physics.

Other points can also be made, but I think I'll wait until John can wake up and have his say before I go on a ramble.
+Rahul Siddharthan wrote: "Shalizi's argument is purely about the information learned about the system by making measurements. But the entropy reduction due to this information gain, it seems to me, would typically be dwarfed by the entropy increase due to other reasons."

That's true. He completely the usual reasons for entropy increase - which are well-studied, but still confusing to many people. And these typically dwarf the effects he's discussing. As far as I can tell, he doesn't even mention these usual reasons.

To repeat what I think they are:

1) In reality, many 'almost closed' classical systems are not completely closed but weakly coupled to their environment, which means the best description of these systems is not deterministic but stochastic. For example, if you have a box of gas, the atoms in the walls of the box are wiggling slightly, so if we only describe the atoms in the gas, they're subject to random influences.

2) In reality, we describe many classical systems at a coarse-grained level, which again means best description of these systems is not deterministic but stochastic, even if the underlying fine-grained description is deterministic.

3) In reality, classical systems are fundamentally quantum-mechanical.
+John Baez Sorry to be so argumentative, but while 2 and 3 are correct, I think 1 is understating the situation.
Systems that are integrable or almost-integrable will be stable to small perturbations from the environment. Systems that are chaotic will be sensitive to those perturbations, but equally sensitive to the necessary imprecision with which we know the initial conditions. (Eg, the atoms in a gas in a box. Even a single atom in a 1D box, if we know its momentum p with an uncertainty dp, will over time t develop an uncertainty in position t dp, which eventually will be more than the size of the box.)

And all systems to which stat mech applies will be in this category: otherwise we will see signs that the system is not ergodic -- eg, the problem of motion of stars that motivated Henon and Heiles.

In short, I think the coupling to the rest of the environment is not the point here. The coupling to the measurement device, however, is relevant.
+Gustav Delius wrote: "John Baez, you wrote: "So, typically we're only able to remember that the system is some larger set containing f(S). And that's one reason entropy increases." So you seem to disagree with my earlier statement that entropy is usually defined with respect to just the currently measured values of the macroscopic variables."

No, I actually agree with you. Shalizi is imagining scenarios that include the following. Initially we make measurements and determine the state of our classical system lies in a subset S of the phase space X; later we make measurements and determine it lies in a subset T. But, he says, we also know it's in the subset f(S), where f: X -> X describes time evolution. So, we know it's in f(S) intersect T.

There are a number of reasons why classical mechanics of complex systems doesn't really work this way. One reason is that we typically have access only to a 'coarse-grained' description of the system. Let me sketch a scenario along these lines. Instead of being able to determine whether the system is in an arbitrary subset of X, we have a partition of X into subsets S_i, and can determine whether the system is in one of these subsets. Due to this fact, and our ignorance of the precise dynamics of the system, in part because of its interaction with the outside world, all we know is the probability p(i,j,t) that if the system starts in a subset S_i that it ends in some subset S_j at some chosen later time t. So, the 'coarse-grained dynamics' is described by a Markov process on the set of indices i. In this sort of dynamics, entropy can and often does increase with time.

Now Shalizi might say: if you start out knowing the system's state is in the subset S_i, at some later time you know for sure that it's in the subset f(S_i). But I was saying that even if we knew f - even if the underlying dynamics were deterministic (which in practice it's not) and we knew this dynamics precisely (which in practice we don't) - the set f(S_i) would often be a messy complicated set, so in a coarse-grained description all we could say for sure is that the state lies in the union of the sets S_j that cover f(S_i).

+Philip Thrift wrote: "Without thinking too much about this, the critical issue that matters is Does nature (the material world) have irreducible randomness? (completely independent of any concept of knowledge)"

I don't think that's the critical issue that matters in analyzing Shalizi's paper and the mistakes he made. It's certain a critical issue for understanding the universe! But Shalizi is working purely within a particular formal model of the universe - classical statistical mechanics - and drawing what he seems to think is an important and surprising conclusion about it: a 'coherently Bayesian' approach to statistical reasoning would force us to conclude entropy decreases with the passage of time in classsical statistical mechanics. I think he's making the mistake of overlooking (and not even mentioning) the well-known reasons for entropy increase in this framework.

What nature (the material world) actually does is barely relevant to this dispute. For one thing, nature has already told us classical statistical mechanics is wrong - or more charitably, just an approximation. So we're discussing a question about an approximate theory of nature, not about nature itself.
+Rahul Siddharthan - I don't mind your arguments because they're interesting and intelligent. Many people are simultaneously very argumentative about the foundations of thermodynamics and spectacularly uninformed about it. Like most fascinating subjects, it's a lot easier to get interested in it than to understand it. This is one reason it's better to work on subjects that seem boring and technical: the people who have opinions about them usually know something. :-)

I agree that in a chaotic classical system that's perfectly closed except for the occasional intervention of a measuring apparatus, chaos can amplify disturbances due to the act of measurement in such a way that previously measured information gets lost. It would be nice to see a worked-out model of this idea.

However, I believe that in situations closer to the 'real world', where systems are never perfectly closed, chaotic amplification of interactions with the environment is enough to make it wise to describe these systems as having a dynamics given by a deterministic part plus small random perturbations. And this by itself is enough to account for entropy increase - even without bringing any measurement apparatus into play, or any coarse-graining.

Here I'm imagining something like a gas of perfectly elastic billiard balls in a container whose walls are undergoing very small thermal fluctuations.
Shalizi's point, if I understood correctly, is not that entropy decreases, but that Bayesian definition of entropy is incompatible with the Second Law. And that sounds true to me.

Informally, if we want entropy to be a property of the system, and not of the observer, it should be defined as the uncertainty about the microscopic state of the system for the observer who measures its macroscopic state without prior knowledge of the system. Then the whole Bayesian argument is beside the point.

If, however, we insist on the Bayesian definition, then we have to consider the fact that the observer must maintain his knowledge between measurements. Either it will deteriorate as his own entropy increases, and the argument breaks, -- or he has to expend energy and increase entropy elsewhere to keep his memories.
+John Baez - thanks for the compliment :) I guess my point is, for an ergodic system, if its entropy is less than maximum, it will grow, regardless of any external perturbation. (Think of a gas that is initially known to occupy only half the box.) And even if you reduce it using measurements (at the expense of increased entropy elsewhere), it will grow, unless your measurement has revealed a constraint of some kind. (In that case the estimate of the maximum possible entropy has fallen. And that is fine.)

+Dmitri Manin - "if we want entropy to be a property of the system, and not of the observer" ah, but Jaynes, for example, would argue that we don't want that.(*)

I agree with your last paragraph. While the cost of maintaining memory can be close to zero, the cost of retrieving stored memories is not negligible. But I think even if we assume it to be negligible, the second law won't be violated.

(*) "It is therefore a platitude... that the work we can extract from any system depends, necessarily, on how much information we have about its microstate. If entropy lacked this property of measuring human information, it could not serve its thermodynamic function."
ps - also worth reading, from Jaynes, is

He argues that Gibbs already understood these things in the 19th century but didn't express himself clearly. He imagines a situation where there are, unknown to us, two forms of argon, A1 and A2, that are completely identical in every respect except that A2 is soluble in Whifnium, and A1 is not; and Whifnium has not yet been invented. So, every result we have today would be correctly described by assigning a zero entropy of mixing to these two forms (we aren't even aware that there are two forms). But once we know that there are two forms that can be distinguished, the situation changes and there is an entropy of mixing previously prepared pure samples of A1 and A2. "But if this entropy increase is more than just a figment of our imagination, it ought to have observable consequences, such as a change in the useful work that we can extract from the process" -- which is obviously possible only if we are capable of distinguishing A1 and A2 in some way.
I would insist that the cost of maintaining memory can not be made arbitrarily low. Heck, I have to spend a lot of food for that, and I still forget things. Disk drives fail, CDs rot, clay tablets crumble.

Maybe I'm wrong, though, if we consider an extremely idealized system, where I can just put a ball in a potential, and it will stay there indefinitely. But I'm not sure if we are allowed to get rid of humans under Bayesian framework.
+Dmitri Manin - I said close to zero, not zero. CDs rot but they last for at least a couple of decades, if not longer. One needs to compare the energy to maintain them with other energies. The energy to create them, and read them, is much higher. It's ok to get rid of humans -- Jaynes in his book talks throughout of a "robot" who reasons in an ideal manner.
It just seems odd to talk about a physical measure (entropy - even if it's a statistical measure) of a system X in the context of knowledge. Because knowledge is a state of a human brain B, one would then have to measure the entropy of the entire system X + B.
+Philip Thrift - now you're entering well-trodden territory, in fact a well-trodden patch of quicksand in which many intellectual explorers have sunk never to be seen again. I'm loath to follow you, but will merely post some warning signs:

In physics, entropy is something we typically assign to a 'mixed state' rather than a 'pure state'. These are mathematically well-defined concepts, which you should know to keep from sinking too deep into the mud - so, I'll try to explain them roughly below. But some people like to stir up trouble by calling mixed states 'epistemic states' and pure states 'ontic states'. The idea is that a mixed state describes our limited knowledge of a situation rather than what's 'really going on'.

To me, after spending years thinking about this, the dispute over whether probabilities are 'merely subjective' or 'objective features of the world' seems far less interesting than going ahead and using probability theory to do stuff. So, I will let the rest of you march into that quagmire, but not follow myself.

Here's the idea of pure versus mixed states:

In classical or quantum physics, associated to any physical system there are states and observables. An observable is a real-valued quantity we might conceivably measure about the system. A state represents what we might conceivably know about the system. The previous sentence is quite vague; all it really means is this:given a state and an observable there is a mathematical recipe that lets us calculate a probability distribution on the real number line, which represents the probability of measuring the observable to have a value lying in any subset of the real line. We call this the probability distribution of the observable in the state. Using this we can, if we want, calculate the mean of this probability distribution (let us assume it exists!), which we call the expectation value of the observable in the state.

Given two states Ψ and Φ, and a number p between 0 and 1 there is a recipe for getting a new state, called pΨ +(1-p)Φ. This can be described roughly in words as follows: "with probability p, the system is in state Ψ; with probability 1-p it is in state Φ." This is called a mixture of the states Ψ and Φ. If a state is a mixture of two different states, with p not equal to 0 or 1, we call that state a mixed state. If a state is not mixed it is pure. Roughly speaking, a pure state is a state with as little randomness as possible.

The entropy of a pure state is zero; the entropy of a mixed state is not.
I have no problem imagining all Feynman paths as physically real as in Huw Price's natural alternative <> "Backward Causation, Hidden Variables and the Meaning of Completeness" (2001), sec. 3.2.

I think this keeps me out of this quicksand, but I could be wrong.
To add to +John Baez 's comment (I assume most people who have read this far know this, but perhaps some don't): the entropy of the mixed state he describes is - (p log p + (1 -p) log (1 -p)) This becomes zero, as he observes, if p is 0 or 1. if there are many states rather than two, and the probability of being in the i th state is P(i), then the entropy is - ∑ P(i) log P(i) .. If there are W states which are all equally probable, then the entropy is log W, which (up to a factor) is Boltzmann's famous formula.

Now, I guess the question is whether the P(i) represent our state of knowledge or an objective reality. And John's position seems to be, does it matter? (a.k.a. "shut up and calculate".) I expect most physicists would react that way, but most people who react that way would end up being Bayesians (many early Bayesians were people with an interest in physics, such as Jeffreys, Keynes, even Laplace). Meanwhile, "frequentist" statisticians, who insist on the "objective" validity of probability, will simply refuse to attempt certain problems...
+Rahul Siddharthan wrote: "the entropy of the mixed state he describes is - (p log p + (1-p) log (1-p))."

That's true if the states Ψ and Φ that we're forming a mixture of are themselves pure. It was important for the definition I was making at the time to let them be mixed. I was defining a mixed state to be a mixture of two other states, say pΨ +(1-p)Φ, where p is neither 0 nor 1. But Ψ and Φ might need to be mixed themselves, since not every mixed state is a mixture of just two pure states.
Ah ok - I thought you were just defining a mixture of two (pure) states. My comments apply if the states in the mix are pure. If they are not, the entropy is what I wrote, plus the weighted entropies of the individual states.

For example, if your states Ψ and Φ are each composed of two pure states, Ψ = p1 S + p2 T and Φ = p3 U + p4 V, with p1 + p2 = 1 and p3 + p4 = 1, then we can calculate the total entropy as the weighted sum of the two entropies for Ψ and Φ, plus an entropy for mixing Ψ and Φ -- as follows:
S = - p [ p1 log p1 + p2 log p2 ] - (1 - p) [ p3 log p3 + p4 log p4 ] - p log p - (1 - p ) log (1 - p )
you get the same answer you would if you took four probabilities p p1, p p2, (1 - p ) p3, (1 - p) p4 (which sum to 1) and directly used the earlier expression - ∑ P(i) log P(i).

Shannon took the opposite tack in his 1948 paper: he required this sort of decomposition (or rather, a simpler decomposition involving three possibilities), and proved that only the formula - ∑ P(i) log P(i) satisfies it.
ps - is there any math solution for Google+? Ideally a latex plugin?
I realized my comment appeared to contradict itself. What I meant to say was this: entropy can decrease for an open system and, arguably, the only truly closed system is the universe (and even that's debatable). It should be no surprise, if we interpret entropy as a measure of possibility, to then see it decrease after updating if the updating resulted in fewer possibilities.
+Rahul Siddharthan re Jaynes: once I read "completely identical in every respect except that A2 is soluble in Whifnium, and A1 is not", I have to admit, I lost a good deal of interest in him. What he means is that molecules A1 can be distinguished from A2, but their dynamics are exactly the same. I don't think this is even theoretically possible. Either they differ or they don't, there's no way to distinguish between two things that behave exactly identically.
+Rahul Siddharthan Yes, I know. But I reread my original post and it looked like a contradicted myself and I wanted to make sure - for the record - that I'm not nuts. ;)
+Ian Durham - actually I found your earlier comment logical enough :) Maybe we're all nuts to be spending so much time on this.
+Rahul Siddharthan Ah, well, that's good to hear (that you found my original comment logical)! It does strike me as odd that we're spending time on a paper that was posted to the arXiv eight years ago, but then again it gives us an excuse to talk about entropy which is always interesting (and creates more entropy!).
Along the same lines:

"Does not the theory of a general tendency of entropy to diminish [sic] take too much for granted? To a certain extent it is supported by experimental evidence. We must accept such evidence as far as it goes and no further. We have no right to supplement it by a large draft of the scientific imagination. " (Samuel Burbury, 1904)

Burbury's last case: the mystery of the entropic arrow. In Craig Callender, ed., Time, Reality and Experience, Cambridge University Press, 2002, 19—56. [<>]
+Dmitri Manin wrote: "Either they differ or they don't, there's no way to distinguish between two things that behave exactly identically."

There are actually some interesting borderline cases, like different stable isotopes of the same element. Say you have two tanks of ethanol connected by a pipe. You turn a faucet that lets them mix. Does the entropy go up? Normally chemists would say no. But now suppose you discover that one of the tanks consisted of ethanol made with oxygen-18, while the other was made with oxygen-16. Now the answer should be yes, the entropy went up. But unless we're clever enough to do experiments to detect the difference between these two kinds of ethanol, the difference between them is a 'difference that doesn't make a difference'.

I picked ethanol, by the way, because the difference between heavy water (D2O) and ordinary water (H2O) is big enough to make a noticeable difference in some contexts: give someone only heavy water to drink, and they'll eventually die! But for the example I gave the percentage difference in molecular weights is much smaller, and probably irrelevant to biology.
Jaynes' was obviously a thought experiment. And, as +John Baez points out, not an unreasonable one.
+John Baez You said in your previous reply regarding the difference between the two isotopes of oxygen, "the difference between them is a 'difference that doesn't make a difference'." I'm a little surprised you would say that. Certainly, it may not be a difference noticeable to us, but whether or not the entropy goes up when mixing those two gases should be independent of whether or not we are clever enough to make the measurement. Taken at face value, that would indicate that were this experiment performed in, say, the 1770s (when oxygen was discovered), the entropy would not go up, but if the same experiment were performed now, say (I'm just guessing the difference could be measurable by current techniques), the entropy would go up. Now, while I find aspects of Wheeler's participatory universe attractive, I think this is taking that concept a bit to the extreme.
+Ian Durham - Entropy depends on a mixed state, which represents our state of knowledge of a physical system. If we learn that the system has more degrees of freedom than we thought, we need to revise our opinions: we realize we have more ignorance than we'd thought, so we need to recalculate the entropy, and we get a higher answer.

If it makes you feel better, you can say the original calculation was wrong all along. But I find it more interesting to think about why the original 'wrong' entropy was perfectly fine and useful - until we accessed a method of distinguishing between situations that previously had been indistinguishable to us.

There's a lot more to say about this... but I'll just add that entropy is not an observable in the same sense as the position or momentum of a particle: it's zero on all pure states, it's only nonzero for mixed states, which are states where we have less than maximal knowledge of a situation. So, it behaves in ways that may seem counterintuitive if you're thinking of it as similar to position or momentum. As +Rahul Siddharthan just pointed out, there are interesting challenges when it comes to building a reliable 'entropy meter'.
+John Baez: yes, the entropy will increase in the mixing, even if we don't notice -- this mixing is irreversible, isn't it? I don't think there is a problem here, because there is a fundamental difference between distinguishable and indistinguishable objects, as we well know. Elementary particles of the same kind, molecules of the same kind (and in the same configuration and excitation state, of course) are truly indistinguishable, we know that because their statistics depends on it. Isotopes are distinguishable, and that's enough.
+Dmitri Manin - I wasn't going to bring this into the discussion yet, because it spoils some of the fun to do so prematurely, but you're right: when we take quantum mechanics into account, we get an interesting new twist. Unlike in classical mechanics, we can use interference effects to tell whether two particles are distinguishable or not, regardless of whether they have any other observable distinct properties.

For example, if our universe contained electrons and also schmelectrons, with the exact same mass, charge, spin, etcetera, we could tell they were different by noticing that a hydrogen atom could contain only 2 electrons in its lowest orbital, and only 2 schmelectrons... but if we use both electrons and schmelectrons, we could fit up to 4 in the lowest orbital.

This is quite a remarkable change from the classical paradigm, where particles that seem indistinguishable can always turn out later to be distinguishable!

Nonetheless there are still other ways in which we may need to revise our entropy estimates upwards when we realize that situations that seemed identical are in fact distinct. I'll leave it as a puzzle for people to think of some.
By the way, all these puzzles go back to Gibbs and the 'Gibbs paradox':

A lot of people like to say this paradox is solved by quantum mechanics. While quantum mechanics certainly changes the story, it's still worthwhile thinking about these puzzles - even at the classical level, since classical physics is not just obsolete junk.
For Jaynes' take on the Gibbs paradox (which he attributes to Gibbs himself) do read the link I posted a little earlier (which seems to have sparked this sub-discussion). It's not good to dismiss an argument without reading it :)
+John Baez: of course, classical physics is not obsolete junk, but it is equally true that there are macroscopic phenomena that it does not describe correctly. In particular, indistinguishability of particles has measurable consequences, causing Fermi and Bose statistics to differ from Boltzmann statistics. Gibbs paradox is closely related. By the way, although indistinguishability has much more drastic consequences for quantum mechanics, it is not, I think, incompatible with classical physics either.

Generally speaking, postulates saying that there is no physical experiment capable of distinguishing A from B usually have very important consequences, beginning with Galileo relativity principle (A = rest, B = motion), and particle indistinguishability is in that class too.

+Rahul Siddharthan: sorry, will read it.
+John Baez Distinguishability does, indeed, play an important role in all of this, but I still have a problem assuming that merely by "drawing back the veil" on the measurement, as it were, somehow fundamentally has a major effect on whether some phenomenon exists or not. This is why I also am uncomfortable with interpreting entropy as a measure of knowledge or ignorance. I am much more comfortable with interpreting it as a measure of the number of possible configurations or outcomes of measurement (perhaps weighted). This makes it independent of our experimental capabilities.

Think of it this way. If you take the view that it is based on knowledge or ignorance, the same experiment performed at the exact same time in one country could differ in its outcome from the same experiment performed simultaneously in another country. While I do believe relativity needs an overhaul and has fundamental flaws, I also believe that in order for different parts of the universe to coherently (not in the quantum sense, rather in the colloquial sense) interact with one another, we need to expect some consistency (this does not mean things aren't random, just that the randomness should be probabilistic and independent of human interaction).

Incidentally, this is a very interesting discussion that relates to some work I've been doing that grew out of last year's FQXi conference and some subsequent discussions. I'm basing it on Wheeler's "law without law" concept, to some extent.

Oh, and by the way, despite the relative elementary aspect of it, I highly, highly recommend reading Dan Schroeder's book An Introduction to Thermal Physics. It, along with some things he and Tom Moore have written, have done a lot to influence my view on this stuff.
+Ian Durham - measure experimentally. That's the whole point. Or extract work from the entropy difference. Given that you aren't, a priori, aware that there are two species of molecules (but let's admit that you consider it a possibility).
+Rahul Siddharthan Well, I can certainly propose an experiment and I can show mathematically how it could be done. Whether or not such a device can be built is up to the experimentalists. Take a look at my response to John. Again, I don't think that the structure of the universe is influenced by our technology (I have a caveat to that, but I'll leave that for later since lunch is calling my name ;)).
+Ian Durham - I did read your reply to John and didn't see any proposed experiment there. What experiment, in your opinion, will give different answers in different countries depending on the state of knowledge of the experimenters? A thought-experiment is fine, as long as it can be carried out in principle and the answer can be predicted according to current theories. If you can show that "entropy=knowledge" and "entropy is an objective physical quality" predict different answers for that experiment, excellent. To be concrete, can the experiment involve two collections of molecules, with slightly differing molecular weights (cf +John Baez 's example of ethanol with different O isotopes), which cannot be distinguished by the technology of the day? Or anything else along the same lines (i.e. there is a difference but the experimenters don't know it and cannot measure it) is fine.
+Rahul Siddharthan OK, my stomach is full. ;) So back to thermodynamics. Anyway, I have two answers to your question. The first is that I am arguing that the interpretation that is being put forward here implies that the answer does depend on the ability of the experimenters (which I disagree with - it shouldn't). The second gets more deeply at the specific problem mentioned here: how can I distinguish between oxygen-16 and oxygen-18 experimentally and how could I envision the experiment being different in different places.

One way to measure the entropy of mixing requires that we know the entropies of the individual species before they mix. Using the full Sackur-Tetrode equation, I can write these in a manner that includes the mass (I'm assuming for simplicity that the oxygen is monatomic which is a reasonable assumption since monatomic oxygen gas can be produced). Since the equation for the entropies depends on the mass and the masses are different (albeit by a small amount), there will be a non-zero entropy of mixing that can be calculated, but only if there is actually a known difference in the masses.

Let us suppose on one side of the planet that there is a sufficiently advanced civilization that has the ability to create a machine that can measure this difference in masses between O-16 and O-18. Further suppose they have the ability, then, to measure the entropy of mixing (I have no idea how, but I'll assume, somewhat artificially, that if they can calculate it, they can measure it - yes, I know there are problems with that, but we'll save it for another time). The caveat here that I alluded to earlier is that the difference be large enough such that any machine that makes the measurement is not constrained by the uncertainty principle.

Now suppose on the opposite side of the planet there is a civilization that is at the technological level of the 18th century. To them, they are unaware that there are even isotopes of oxygen (they don't even know what an isotope is). If someone were to present them with such a setup (two chambers, one with O-18 and one with O-16), their measurement devices would not be sophisticated enough to tell the difference between the two. They, in fact, could not even calculate the difference, even armed with the Sackur-Tetrode equation since, as far as they are concerned, there's only one mass for oxygen.

So if both experiments are carried out simultaneously, one civilization will find a non-zero entropy of mixing while the other will find it is zero. Measurements then will back up the results. Which result is correct and real? I say the more sophisticated one is real because I say reality, at least at this mesoscopic level, is independent of the state of knowledge of the experimenters. Yes, at some point you will hit a quantum level at which point I could conceivably see it as possible that the state of knowledge of the experimenters could make a difference in reality. Here's my response to that:

The experimenters in all cases are a part of the system. In the mesoscopic situation, we ignore their contributions as small. As long as the system remains large enough such that quantum effects are minimized, this is perfectly valid. But if the experimenters state of knowledge suddenly does become relevant (say, when the number of molecules is very small), we're in the quantum realm and we likely would not use the Sackur-Tetrode equation anyway since that is really a coarse-graining, semi-classical result anyway.

All of this, then, boils down to the definition of entropy. If we take the von Neumann entropy as the most general definition (actually, I believe the Kolmogorov definition is more general, though I may be wrong on that) and assume that, as the system of interest becomes more classical, the coherences in the density matrix go away, we end up with the usual definition of entropy. But I can still interpret that entropy in the way I have proposed.
+Ian Durham I find that somehow unsatisfactory. The question to me was, could the entropy difference be detected without knowing how to distinguish O16 and O18? If you handed the less advanced civilisation two boxes, one of pure O16 and the other pure O18, could they tell the difference purely by measuring a thermodynamic quantity? If not, I'd say entropy depends on the knowledge of the observer.
+Rahul Siddharthan I have two responses to that. The first is: how accurate are their measuring devices? This depends entirely on the accuracy of their measuring devices (not to mention the definition of a `thermodynamic quantity'). My second response is: since, as I pointed out above, entropy depends on mass and, via the thermodynamic identities, one can relate changes in entropy to changes in thermodynamic quantities, the less advanced civilization actually ought to find a difference if their measuring devices are accurate enough, even if they only measure thermodynamic quantities (whatever that means - remember, there's this unbroken line of connections going from macroscopic device down to microscopic molecules, so even the concept of `measurement' is a bit fuzzy).

The thermodynamic identities, in fact, are one major reason I have trouble interpreting the entropy as being an observer-dependent quantity. Since the thermodynamic identities relate entropy to thermodynamic quantities, this would imply that I could make temperature, chemical potential, and even the number of particles observer dependent quantities as well.
+Ian Durham - show how to use the thermodynamic identities to devise an experiment that measures the difference in entropy between a jar of ethanol made with O16 and a jar containing a mixture of ethanol made with O16 and ethanol made with O18. I had fun thinking about this - you will too!
+John Baez: "I am much more comfortable with interpreting it as a
measure of the number of possible configurations" -- I agree
completely, but distinguishability has a direct impact on this
interpretation, because it materially changes the number of possible
configurations (divides it by the number of permutations). To me, it
is a statement about how things are, and not about the state of our
knowledge. Particles are indistinguishable not if we currently can't
tell them apart, but if no physical experiment can detect that they
have been swapped. And that is a pretty strong statement about
physical reality. What's important (and fascinating) is that we can and do know about this fundamental indistinguishability, because it has measurable consequences.

So to me, if we can't distinguish between two isotopes, we calculate
their mixing entropy incorrectly, that's all there is to it.
+Dmitri Manin - I can't see who said "I am much more comfortable with interpreting it as a measure of the number of possible configurations". It wasn't me.
+Dmitri Manin wrote: "So to me, if we can't distinguish between two isotopes, we calculate their mixing entropy incorrectly, that's all there is to it."

That's true in a sense, but as a theoretical physicist I assume all our theories will ultimately be shown incorrect - so instead of concerning myself only with the 'ultimate truth', I try to do calculations relative to a clearly stated theory, and also understand how calculations in one theory relate to calculations in another. So, it's not enough for me to know that we get the 'incorrect' entropy if we are unable to distinguish situations that we later learn to distinguish. Any entropy we compute now may turn out to be 'incorrect' later. So, I am interested in understanding when working with an 'incorrect' entropy is able to deliver accurate predictions, and when it breaks down.

For example, chemists often ignore the isotopic composition when computing or 'measuring' entropy. This often does not cause any trouble.

For example, in the example I gave, the second law would hold with either the 'incorrect' entropy or the 'correct' one, and I could compute the heat produced by burning this ethanol and get almost identical answers in either the 'incorrect' or 'correct' formalisms.

On the other hand, when we move to nuclear physics, of course, counting two different isotopes as identical can give dramatically wrong answers. Now the difference really makes a difference.

We can imagine further deeper layers of physics that call for even further refinements in how we compute entropy, but this does not mean that all our previous calculations gave completely inaccurate answers.

There is a good reason for these phenomena, and it's fun to think about.

In classical physics, I'd think about these issues by considering two phase spaces, X and Y, the second being a quotient space of the first. In other words, we have a one-to-one but not onto map P: X -> Y.
Y is the 'coarser' description of the system and X is the 'finer' one. Sometimes the flow on X describing the 'true dynamics' will project down to give a flow on Y. In this situation we can do statistical mechanics with either phase space and never run into trouble. More commonly, the flow on X will just 'approximately' project down to a flow on Y, so the coarser description gives just approximately correct results. We could flesh this out into a nice mathematical picture and prove theorems about it.

In quantum mechanics it works a bit differently, but there's still a story like this to be told.
Sorry, that was a quote from +Ian Durham indeed. Re: "it's not enough for me to know that we get the 'incorrect' entropy if we are unable to distinguish situations that we later learn to distinguish". I think, though I may well be wrong, that you didn't quite get my point. Which is: there are things which we will never "learn to distinguish", and it is possible to know that for certain.

Do you think it is possible that we will learn to distinguish individual electrons?
No, I don't think we'll learn to distinguish individual electrons. I don't claim to know this for certain, but I'd be glad to take bets against people who think otherwise.

What interests me is something else: there are many cases where we have a 'hierarchy of theories': several useful descriptions of the same system, some more detailed than others. Instead of just saying one is right and the others are wrong - or that all of them are wrong and some theory we don't know yet is right - we take advantage of this hierarchy. We use the more detailed theories only when we really need them, using the simpler ones whenever we can get away with it, because they make computations easier. In these situations we may have several different concepts of entropy, which are all useful.
+Ian Durham wrote: "This is why I also am uncomfortable with interpreting entropy as a measure of knowledge or ignorance. I am much more comfortable with interpreting it as a measure of the number of possible configurations or outcomes of measurement (perhaps weighted). This makes it independent of our experimental capabilities."

I don't see how the outcomes of measurements are going to be independent of our experimental capabilities. The greater our capabilities, the more possible outcomes, right?

Here's what I'm thinking:

Classically, entropy is the sum over configurations i in some set X of

-p_i log(p_i)

(We could do an integral instead, or do things quantum-mechanically, but I think those are irrelevant complications for the point I'm trying to make.)

This will clearly depend on what we think the set X is, and also on what we think the probabilities p_i are.

The dependence on the probabilities p_i leads us into the usual argument about how subjective these probabilities are: we've got subjective Bayesians, objective Bayesians, frequentists and other kinds of people with other views. I'm a Bayesian, so I believe the probabilities p_i depend on my knowledge or ignorance of what's going on.

But I think right now we're mainly talking about something else: the dependence on the choice of the set X. (Though maybe it's just me who is talking about that.)

The example where one chemist thinks he has molecules of ethanol in a jar, and another things he has molecules of two kinds of ethanol, made from two isotopes of oxygen, is a case where two people use two different sets X and compute two different values of entropy. What's interesting is that for predicting the results of a large class of experiments, either entropy works just fine!

You might hope to settle this dispute by simply 'measuring' the entropy. This leads to some fun concrete puzzles. I've challenged everyone to take advantage of the thermodynamic relations to come up with a way to 'measure' the entropy of a jar of ethanol, thus deciding which entropy is correct, but nobody has tackled that puzzle yet.
+John Baez: just to register that I have no disagreement with your comments on the hierarchy of theories. On your puzzle: if we put the jar in a centrifuge, the temperature of the mix will increase slightly, while the temperature of the pure ethanol will remain constant, I think. This will correspond to the entropy decrease of the mix because of partial separation. I'm not 100% sure though.
All - to get past confusions about measurable mass difference, centrifuges and whatnot, suppose we focus on mixing of two kinds of methanol: CH3OD and CH2DOH. (D is deuterium.) Both have the same mass and I think will be extraordinarily difficult to separate - but they are different.
That's a nice idea, +Rahul Siddharthan! There will probably be some other, even sneakier, way to distinguish these molecules and even to extract useful work from the reduced entropy of two jars of methanol in which these two molecules have been separated out, as compared to two jars in which they're mixed. But it's getting harder.

It would be even harder if we had two big organic molecules and the only difference was which location deep inside held the deuterium atom!
And by the way, regarding +Dmitri Manin's question: I wasn't just saying I bet we'll never be able to distinguish electrons because it seems unlikely. I bet it for the reason I think he'd bet it: quantum mechanics seems to give a sure-fire way to prove things are indistinguishable, namely by effects involving quantum phases. I have no idea how to cook up a theory that mimics known physics but where electrons turn out to be distinguishable. (I also have no real motivation to try.)
+John Baez A challenge! I love a challenge. ;) Seriously, I will try to come up with a demonstration using the two types of methanol that +Rahul Siddharthan has suggested.

Incidentally, I think that, at the quantum level, the entropy does depend on the experiment, but it simply makes no sense to me beyond that (I don't think there is a hard and fast cutoff between quantum and classical like Bohr, I think it just starts to "wash out" as the system gets larger and larger). Either way, I think attempting to meet your challenge will be a good thing, if for no other reason than we might learn something interesting from it.
+John Baez I also think we have to be careful about what we mean when we use the word "distinguishable." Clearly, you and I could both simultaneously carry out an experiment that identifies an electron in our labs (assuming, hypothetically, that either of us actually had a lab). They are distinguishable by the mere fact that they are in different locations. Distinguishability in quantum mechanics becomes an issue for either a single system or two correlated systems. It doesn't apply to two uncorrelated systems.
+Ian Durham: I think there is a very unambiguous definition: "no physical experiment can determine whether the two particles have been swapped". No quantum effects have been harmed, spatial separation is not an issue.
+Rahul Siddharthan: Nice idea with molecules differing only by the location of D! I also thought about isomers, but they can sometimes be separated with special membranes.
+John Baez: So exactly what is the challenge -- demonstrate a way to measure entropy of a mixture, without separating the mixture, even partially?
+Dmitri Manin - I don't understand what you mean by "this is still separation of the mix". NMR will probably give a better signature of the difference between the species, but we can look for isotopes of other atoms, and, as John says, bury them as deep as we like in a long hydrocarbon chain. However, the point (to me) is, how do you extract useful work from this entropy difference -- indeed, how do you measure it in a thermodynamic experiment (as opposed to an indirect calculation based on first verifying that the samples are pure or mixed)? Actually, it is even more interesting to me if you can experimentally show that they are different, but cannot extract useful work from the entropy difference...
+Rahul Siddharthan: By separation of the mix I mean that the two species will have different distributions over spatial orientations. As for useful work, I think entropy increase does not guarantee that there is work to be extracted (work being extracted does guarantee entropy increase, but not v.v.)

What I propose is exactly showing that they are different without extracting work (but also without proving that it can't be extracted). Centrifuge, of course, is just to enhance the effect -- alternatively, you can just turn over a test tube. Then the change of gravity orientation will cause readjustment of energy distribution over degrees of freedom, in slightly different ways for the mixture and the pure species. This should cause a change in thermodynamic variables, though I can't say precisely what change.

So far I'm relying on the differences in mechanical properties of the species. It would be harder if you can come up with two species with both mass and momentum of inertia being exactly equal. Then I'll probably go for the differences in deformational degrees of freedom.
+Rahul Siddharthan From what I have been able to determine, the two compounds you proposed have different freezing points (one is listed as 11ºC and the other is listed as -99ºC). Thus it wouldn't be hard to distinguish them (they are also listed as having slightly different densities).
+Dmitri Manin I'm not sure I understand your comment. I mean, I agree with your definition of distinguishability to an extent, but by virtue of the fact that John and I would be performing separate experiments in separate locations, it still clearly distinguishes them. I guess what I'm saying is that there is more to distinguishability than what you propose. To quote Schumacher and Westmoreland, "[o]nly orthogonal quantum states of a system are completely distinguishable by measurement." Unless our electrons are correlated in some way, my electron and John's electron are not in the same system.
+Ian Durham: Well, I'm afraid I don't quite understand your objection. Your electrons are distinct entities, but they lack individuality. Indistinguishability refers to the latter, not the former. Formally, indistinguishability is a symmetry (of dynamics) with respect to the permutation of the objects.

Suppose you two trapped your respective electrons and went to bed. Next morning you are asked to do an experiment that would determine whether someone sneaked in overnight and swapped your two electrons.
+Dmitri Manin Right, I understand your objection. My reply is that in any realistic situation, I can take steps (all classical) to ensure that no one swapped my electron. In other words, the symmetry is part of a bigger picture here that includes classical measuring devices and human beings. I object to the interpretations that we tend to glibly attach to our quantum results.

Think of it this way. Suppose we have an electron in state | k' > where, following Sakurai, k' is a collective index for a complete set of observables. A second electron is in the state | k'' > where k'' is likewise an index for the second electron. The combined state can be most generally represented by c1| k' >| k'' > + c2| k'' >| k' >. Exchange degeneracy implies that all wave vectors of this form lead to an identical set of eigenvalues when a measurement is performed.

But here's the thing. In the above we have made an assumption that each of the electrons is perfectly isolated from its environment. In practice that is virtually impossible to do. Thus, suppose my electron just came out of a Stern-Gerlach device and is in the state |z+>. John's, on the other hand, just came out of a similar device, but is in the state |x+> (note: I object to Sakurai's derivation since he a priori assumes they are correlated and so I say they do not have to be in orthogonal states, i.e. there is nothing preventing this particular arrangement that I have suggested). The combined state is quite clearly |z+>|x+> and thus the electrons are very much distinguishable (and, in this case, essentially as a result of where they are!).
+Ian Durham: I suspect that you forgot to include the coordinates in the state. What you say is that "the electron at coordinates X is in the internal state |x> and the electron at coordinates Y is in the internal state |y>". Then if I carry them around and arrive at "|x> at Y and |y> at X", that's a physically different system. But what I mean by swapping is "the electron that used to be at X in state |x> is now at Y in state |y> and v.v.", and that is not a physically distinct state of affairs. While if they were billiard balls, you could mark them with a marker and easily detect the swap.
Back to Gibbs' paradox. Here's how I see it now: if I'm able to detect the difference between two molecular species, my entropy calculation will show me that it increases when I connect the containers and they mix, which is consistent with the observation that this process is spontaneous and irreversible. If I'm not able to detect the difference, my entropy calculation will show that entropy does not increase when I connect the containers, which is consistent with the observation that nothing changes macroscopically. No paradoxes ensue, and this is consistent (I think) with +John Baez's comments on hierarchy of theories.
+Dmitri Manin You said that distinguishability implied that "no physical experiment can determine whether the two particles have been swapped." Consider, on the other hand, the two electrons discussed above. Mine is in the |z+> state and John's is in the |x+> state. Let us first suppose that they are not swapped. Suppose I then feed my electron back into my Stern-Gerlach device without switching the measurement axis. I am guaranteed that the electron state upon exiting my SG device is still |z+>. Likewise, John is guaranteed that his will be |x+>.

Now suppose that they have been swapped but John and I do not know it. If we do the same thing as before, I am now unwittingly feeding an electron in the |x+> state into an SG device that measures along the z axis. As such, there is a 50% likelihood that I will find the electron, upon exiting my SG device, to be in the |z-> state. That would give me unambiguous proof that the electrons had been swapped. It only works 50% of the time, but the mere fact that it could work means they are at least partially distinguishable.
+Ian Durham: no, if they have been swapped, the one you now have is still in the |z+> state! Imagine if they were coins, and you put yours heads up, and John put his tails up. I then swap them and put John's on your table heads up, and put yours on John's table tails up. You can tell the swap happened, by remembering that your coin had a slight ding, and it's now gone. This won't work with electrons.
I know what you are saying and I disagree with it because there is absolutely no way to prove it. In fact, as stated, it is unprovable by its very nature. Call me an operationalist, but I prefer interpretations that can be experimentally tested. That one can't.

Actually, the more I think about it, the more I think you are misinterpreting distinguishability, exchange degeneracy, etc. In fact, you should read the section in Schumacher and Westmoreland on it (section 4.2).
It is very well provable, because the derivation of Bose and Fermi statistics depends on this indistinguishability, and they have measurable consequences.

But maybe it's better to consider a much more realistic scenario: you and John shoot your electrons towards each other. They scatter and either return to their owners or miss each other and pass through (somewhat deflected in either case). You will not be able to determine whether you got back yours or John's. Unfortunately, I don't know whether this scattering can affect the electrons' spins, but if it can, then you won't be able to tell which electron you got even if you and John had prepared them in a particular spin state.

Of course, it's entirely possible that I'm wrong.
The example you just gave is the one out of Sakurai. I have no problem with that in principle (though, again, I do have a problem with his derivation - there are hidden assumptions in it). But it just simply isn't true that, in the Stern-Gerlach example, one can swap the particles and not have a change in the state as you implied. In fact, this is experimentally provable. I can even do it with interferometers and photons.
I think I'll better understand what you want to prove if you show how you can do it with interferometers. (Section 4.2 in S&W is about distinguishability of states, not of particles.)
Ok, then what is a particle? As far as I am concerned, a particle is just a set of quantum states.
A particle is a subspace of the system's phase space?

For two particles in a 1-d space, phase space is 4-d: (x1, p1, x2, p2). If the wave function of the original system is Psi(x1, p1, x2, p2), the wave function of the system with swapped particles is Psi(x2, p2, x1, p1). (Or maybe I should rather use the Hamiltonian.)
If a particle is a subspace of the system's phase space, as you suggest, then particles are "identified" by coordinates in phase space which means they are identified by spatial location.
So I have my interferometer example (mathematically it's exactly the same as the SG example). I'm trying to figure out how to post it here.
+Ian Durham links/refs please for freezing points and densities. Wiki lists the freezing points of regular methanol (CH3OH) and deuterated methanol (CD3OD) as extremely close. Ditto for water and heavy water. These have different molecular weights. The compounds I suggested have the same m.w. and identical chemistry so I would be astonished at the differences you suggest.
+John Baez OK, regarding your challenge. With the O-16 and O-18, I can do it for small N because O-16 and O-18 have different masses. For large N I will admit it would be very, very difficult to measure. It is nevertheless calculable and thus at least theoretically possible to measure. Which means we're back at the beginning. Here's the problem. If the definition of entropy depends on experiment, then that implies that the definition of everything else in the thermodynamic identities must be as well. But that's absurd because it implies the definitions of these things are mutable which then implies that organizations like NIST have no business existing.

+Rahul Siddharthan I'll take a look at the Wiki pages. I just did Google searches on the two compounds and used what I found. Incidentally, I would expect there would at least be a spectroscopic difference between the two and thus one could use spectroscopic techniques to distinguish them.
Yes, I already mentioned NMR. It wouldn't be hard to cook up compounds where even that wouldn't work (with current, or even with foreseeable, technologies). But to me the spectroscopic difference makes it even more interesting: if you can tell that the compounds are different and can therefore calculate an entropy of mixing, but cannot extract useful work from that, what does the entropy mean?
+Dmitri Manin Here's my attempt at presenting the interferometer example here in this somewhat limited format. Given that it appears you have a copy of S&W lying around, take a look at the MZI shown in Figure 2.1 on p. 21. Suppose I have one with the phase angle set to zero and John has one with the phase angle set to 90º. Both devices have non-destructive detectors as described on p. 86. My initial output will have probabilities equivalent to the |z-> outcome of an SG measurement while John's will have output with probabilities that are equivalent to the |x+>. If, using mirrors, fiber optic cables, whatever, I passed my output back through my MZI again, my state has probabilities equivalent to |z+>. If, on the other hand, I use mirrors to divert John's output into my MZI, the probabilities are equivalent to |x+>. Thus, while it isn't always guaranteed to work, I can distinguish the photons 50% of the time. This is the point of quantum cryptography.
+Rahul Siddharthan Define "useful work." Are you talking about mechanical work? Or are you talking about a more general notion of work that could include chemical work? Because I can create a system in which you can get chemical work out of a change in entropy while the mechanical work is zero.

This is why I don't like the old thermodynamic definitions of entropy. If you simply interpret it as a measure of the possible configurations of a system, there is no paradox or problem anywhere.
+Ian Durham Chemical work is fine. Work is a well-defined concept in thermodynamics (and different from the definition in mechanics).
+Rahul Siddharthan Actually, work is not as well-defined as everyone thinks it is. Someone wrote an article in AJP not that long ago that listed all the myriad definitions of work and noted that they don't all mean the same thing. I can't seem to remember who wrote it, but I'll see if I can find it.
I still disagree. For example, is PdV a form of thermodynamic work? If so (and it is in a bunch of my texts), then, since it is also mechanical work, we have a bit of a problem since mechanical work is not terribly well defined.
A long time ago, +Dmitri Manin wrote: "So exactly what is the challenge -- demonstrate a way to measure entropy of a mixture, without separating the mixture, even partially?"

That's one version of the challenge. I can imagine others. But I think +Rahul Siddharthan has taken up the job of posing the challenge quite energetically while I was sleeping here on the other side of the planet, and I'm happy to let him be the official prosecutor. :-)

I'll give away the point of my puzzle. We often 'measure' entropy by using the formula dS = dQ / T, which holds for reversible changes only. But this formula only measures changes in entropy. If we warm up a substance starting from absolute zero and assume it has entropy zero at absolute zero, we can use this to work out its entropy in equilibrium at some higher temperature. However, this method has limitations.

For example, a piece of glass at absolute zero still has nonzero entropy, since it's not really in thermal equilibrium. In equilibrium it would be a perfectly ordered crystal at absolute zero, but it takes a long long long time to reach equilibrium - we say it's 'frustrated'.

It's also challenging to see how to use dS = dQ / T to determine the entropy of a jar of CH3OD and CH2DOH separated by a thin membrane, or the same system where the two substances are mixed.
+Ian Durham: let me at this point resort to a reference. By "swapping particles" I mean the action of the exchange operator, cf. . I think your experiment is not relevant to this, but if you insist that it is, I'll try to explicate.

+John Baez: yes, I was thinking about dQ/T, but it's tricky in many respects, especially if we go from absolute zero. I wonder if it's possible to construct a reversible isothermic process leading from separated to mixed state and show that it requires an appropriate amount of heat. But that's probably a utopian dream. Also I expect there to be indeterminacy in the entropy difference of the 0/0 kind when the difference between species vanishes, to account for the Gibbs paradox-like discontinuity.
We have drifted rather far from the original paper, which claimed (correctly) that measurements reduce entropy and (incorrectly, according to many commenters including myself) that this violates the second law of thermodynamics...

As to what is entropy: I think +John Baez 's definition should be pretty uncontroversial: can we take that as generally accepted? Namely, for a pure state, entropy is zero; for a mixed state, entropy is - ∑ P(i) log P(i) where the sum is over the pure states that form the mixed state, and P(i) is the probability of the i th pure state.

With that definition, it seems clear to me that, if we claim that entropy is a physical quantity independent of our knowledge, so are the probabilities P(i) (and, indeed, the pure states -- a.k.a. microstates -- i that constitute the mixed state -- or macrostate.) I find the claim that these exist objectively hard to swallow. Any classical system is in a specific microstate (pure state). It is not in a mixture of many states. It is our ignorance that leads to a mixed state.

In quantum mechanics it's more complicated since coupling with the environment can lead to a mixed state independent of our ignorance -- but let's keep QM out of the discussion.

Now for the entropy of mixing: there are three situations.

1. We are mixing two substances which are different, and can be separated (eg, water and ethanol). There is an entropy of mixing that can in principle be converted to work. No controversy here, I think.

2. We are mixing two substances that, as far as we know, are identical and we have no way to tell them apart. Since we don't even know whether they are different or not, let alone whether the unmixed boxes contained "pure" samples of each kind, I don't see how we can invoke an entropy of mixing. Example: say, 4-deutero-decane and 5-deutero decane (ie, linear-chain decane where a lone deuterium replaces H in the 4th or 5th carbon atom). I believe these will have essentially identical NMR and other spectra, identical chemistry, and indistinguishable physics by current experimental capabilities. We cannot distinguish these two molecules experimentally in any way, or verify that a sample contains only one kind and not the other.

3. We are mixing two substances that we know are different (eg they have different spectra -- eg CH2DOH and CH3OD, as I suggested earlier), but their chemistry and physics are still so nearly identical that there is no way we can separate them or use the entropy of mixing to extract useful work. In this case, we can calculate the entropy of mixing and can claim to be aware that it exists, but is it a useful concept at all?
ps - I'm not sure the Clausius definition of entropy is very useful, though it is certainly an objective physical quantity. The additive constant (as Jaynes points out, really an additive function) is only one aspect of it.
Here's a wild idea: assuming for a moment that we are dealing with ideal gas, compress both parts adiabatically and quasistatically to infinitesimal volume, then remove the wall (which will have no effect), then expand back in the same manner. If it turns out that temperature didn't return to where it was in the beginning, then Delta Q / T will give us the difference in entropy, where Delta Q is the heat required to bring the temperature back where it was.
+Dmitri Manin - I believe your idea will lead to zero entropy of mixing even for two different ideal gases (eg helium and neon) -- i.e. the temperature will return to where it was and no heat will be required to be added. (edit - specifically, gamma = 5/3 for both ideal gases. So the adiabatic compression and expansion will be identical for both gases, in both cases PV^(5/3) constant. At the end of the expansion, if you reach the same volume, you will have the same pressure, and therefore temperature, as before.)

(edit 2) However, removing the wall -- even at infinitesimal volume -- is an irreversible step. (I.o.w. you cannot restore the original state by reversing the adiabatic process that you describe.) So the Clausius formula gives only a lower bound and the actual entropy increase is more (I have no idea how to calculate it with the Clausius method -- I suspect you need a membrane permeable to helium but not neon).
+Rahul Siddharthan said "As to what is entropy: I think +John Baez 's definition should be pretty uncontroversial: can we take that as generally accepted? Namely, for a pure state, entropy is zero; for a mixed state, entropy is - ∑ P(i) log P(i) where the sum is over the pure states that form the mixed state, and P(i) is the probability of the i th pure state." (By the way, how'd you get the sigma to appear?)

I never disagreed with the mathematical definition of entropy (that would be ridiculous). I disagreed with the interpretation. I do believe that the probabilities exist independent of our state of knowledge. My point all along has been that, since one can relate entropy to regular thermodynamic quantities, if we take it as being subjective as you suggest, then it implies that these other thermodynamic quantities are also subjective. But if that's true, then classical thermodynamics would have no consistency, i.e. I would have no reason to believe a temperature reading for London, for instance, if I wasn't there to measure it myself.

+Dmitri Manin Yes, I am fully aware of what you are referring to regarding symmetric and anti-symmetric states. This is precisely what Sakurai discusses at the beginning of Ch. 6. Photons, though they are bosons, can be used to mimic fermions in certain situations (this being one of them).
+Ian Durham - well, it works because small differences average out: given a pressure and volume for an ideal gas, the distribution of microstates is sharply peaked about the maximum entropy value and additional knowledge will shift that very little... It is well known that, for small systems, concepts like temperature are not well defined.
ps - I got the sigma this time by cutpasting; originally, from a unicode table somewhere.
+Rahul Siddharthan Right, that's what I've been saying all along. One of my pet peeves has always been the way we use language in science. My point is that, while your definition of entropy as subjective works for quantum systems, it doesn't work for classical systems. But we ought to have one definition of entropy. If you think of it the way I do, there's no reason to think of it as all that mysterious - and the subjectivity at the micro-level is accounted for as merely being a manifestation of randomness.
+Ian Durham I thought I was saying the opposite: classically it's 'really' a pure state so the probabilities are entirely from our ignorance. Quantum mechanically it's 'really' a mixed state but we can't calculate the density matrix exactly so I don't think there is a practical difference.
+Ian Durham: I don't understand why you and I can't come to an agreement about distinguishability, but I reluctantly feel it's time to leave it at that.

+Rahul Siddharthan: you are certainly right that with ideal gas, removal of the wall remains the irreversible step. Reducing volume doesn't change anything if the molecules have zero diameter, and everything is essentially scale-invariant. What I had in mind is compressing to the point where idealization breaks down. But I need to think it through, it may not work.
+Ian Durham wrote: "I do believe that the probabilities exist independent of our state of knowledge. My point all along has been that, since one can relate entropy to regular thermodynamic quantities, if we take it as being subjective as you suggest, then it implies that these other thermodynamic quantities are also subjective. But if that's true, then classical thermodynamics would have no consistency, i.e. I would have no reason to believe a temperature reading for London, for instance, if I wasn't there to measure it myself."

I don't want to get into a discussion of this now; I just want to register my disagreement. As a Bayesian I believe probabilities have an inherent subjective aspect - but this does not at all mean 'anything goes', or that you have to be there yourself to believe something.

For example, if you flip a coin and cover it up, and I believe you're an honest guy and it's a fair coin, I'll say the chance of it landing heads up is 50%, even if you're in England and I'm here in Hong Kong. But if I believe you're a dishonest guy and you stand a lot to gain from the coin landing heads up, I'll give a different figure of the probability... especially if I have no way to see if you're cheating.

To me the probabilities in statistical mechanics are fundamentally no different. I use all the information I have at my disposal to guess probabilities, and from those I can compute the entropy, the expected values of observables, and so on.
+John Baez Believe it or not, I'm actually a Bayesian as well (or, perhaps "Jaynesian" is the better way of putting it). That is, in fact, precisely why I believe what I believe. The question revolves around the updating of our probabilities - is that a subjective or objective process? I say it is an objective process that results from outcomes of random behavior. I think what you're saying is that it is a subjective process that is based on the updating of our knowledge about the system. When I get back from my son's play I will put an example on my blog that demonstrates why the latter view leads to inconsistencies.

+Dmitri Manin Perhaps it is better just to agree to disagree on this point. It is certainly a tricky subject.
+Rahul Siddharthan: building on the idea of selectively permeable walls: if the species are distinct, it is conceptually possible to create a gate (channel, membrane) selectively permeable for one and not for the other and extract work from mixing (expansion into the full volume). If they are not distinct, it is conceptually impossible. That's where discontinuity comes from. Technically, as the species become ever closer, the efficiency of the gate may go down (maybe even there will be a limit on theoretically achievable efficiency), and so the discontinuity may be smoothed out.
+Rahul Siddharthan Can you provide the links to those versions of methanol you refer to? I found a Wiki page for deuterated methanol (CD3OD) but not to CH3OD. I want to make sure we're using the same data.
+Rahul Siddharthan I don't see how pure and mixed states have anything to do with the argument I presented in that blog post.
+Ian Durham I was answering your question about the methanol versions. To get their physical properties, first you have to purify them. I predict that though their spectra are different, their physical properties are too similar for them to be separated.
+Rahul Siddharthan Aaaaaaahh. Doh! Sorry about that. I still disagree with you on that, though. Aside from the fact that a spectrum is a physical property in my book, since they have different binding structures, my guess is that their total binding energy will be different and, as such, they should have ever-so-slightly different masses. Whether that difference is detectable is another thing. It may be so small that any attempt at measurement bumps up against the uncertainty principle.
Again, that was a practical and not a theoretical observation... yes, there will be a tiny relativistic mass difference. Also a not-quite-so-tiny non-relativistic difference in moment of inertia. I doubt that's exploitable but who knows. One can imagine even tinier differences, eg my decane example with D on the 4th or 5th carbon atoms. I don't hesitate to say it is impossible to separate those by any technology available today, and I expect even their spectra will be indistinguishable.
Just for fun, here's how we can exploit the difference in moments of inertia. At the same T, avg. rotational energy will be the same for species A and B, so the one with lower moment of inertia will have a little higher avg angular velocity, but the linear velocity distributions will be the same, because the masses are equal.

If the molecules are roughly stick-like in shape (long cylinders), we can create a microporous membrane with holes through which the molecules can only pass length-wise. So the slower-rotating molecules will have a slightly higher chance to pass through it, and that's enough. Needless to say, the efficiency of this membrane will be minuscule, but this is a matter of principle.
+Rahul Siddharthan Right, but we can't say for certain that someday technology might find a way to measure these (as long as there is nothing that theoretically prevents it). So my whole point is simply that "reality" shouldn't depend on our technological capabilities since, if it did, there are paradoxes that arise.
+Ian Durham again we are back to the question: is entropy "reality"? If we assume that it is knowledge, I don't think any paradoxes arise. I haven't seen any mentioned so far. As Jaynes points out, even the Gibbs paradox is not a paradox if you think of entropy in terms of specifying a macroscopic state appropriately for your knowledge of the system.
+Rahul Siddharthan Did you read my blog post by any chance? The paradox is that, since we can set entropy equal to things that are generally interpreted as being states of reality, you could interpret that changing our knowledge can change reality. That, then, leads to paradoxes (e.g. gravity didn't exist until Newton discovered it).
I hadn't (travelling and on slow 2G link, posting these from my mobile). Still haven't read carefully. Two quick reactions. 1: what's dV? 2. TdS = pdV is an infinitesimal equation and you need to integrate it over a path. Ditto for pdV. To me it seems, in your setup, both observers would agree it tells us nothing.
Ps - also you need to bring in chemical potentials, at least for the observer who thinks they are different gases. My guess is, for that observer the entropy increase will then work out. For the other, zero = zero.
Add a comment...