Big data

A petabyte is a lot of information.  But how many petabytes does it take to completely describe one gram of water?  

Let's see:

A bit is the information in one binary decision — a no or yes, a 0 or 1.

• 5 bits: approximate information in one letter of the Roman alphabet.

A byte is 8 bits.

A kilobyte is about a thousand bytes (actually 1024 of them).

• 2 kilobytes: a typewritten page.
• 100 kilobytes: a low-resolution photograph.

A megabyte is about a million bytes.

• 1 megabyte: a small novel or a 3.5 inch floppy disk.
• 2 megabytes: a high-resolution photograph.
• 5 megabytes: the complete works of Shakespeare.
• 500 megabytes: a CD-ROM.

A gigabyte is about a billion bytes.

• 1.25 gigabytes: the human genome, or a pickup truck full of books.
• 20 gigabytes: a good collection of the works of Beethoven.
• 100 gigabytes: a library floor of academic journals.

A terabyte is about a trillion bytes.

• 2 terabytes: an academic research library.
• 6 terabytes: all academic journals printed in 2002.
• 10 terabytes: the print collections of the U.S. Library of Congress.
• 40 terabytes: all books printed in 2002.
• 60 terabytes: all audio CDs released in 2002.
• 80 terabytes: capacity of all floppy discs produced in 2002.
• 140 terabytes: all newspapers printed in 2002.
• 170 terabytes: the searchable part of the World-Wide Web in 2002.
• 250 terabytes: capacity of all zip drives produced in 2002.

A petabyte is about 10^15 bytes.

• 1.5 petabytes: all office documents generated in 2002.
• 2 petabytes: all U.S academic research libraries.
• 6 petabytes: all cinema release films in 2002.
• 90 petabytes: the "Deep Web" in 2002
• 130 petabytes: capacity of all audio tapes produced in 2002.
• 400 petabytes: all photographs taken in 2002.
• 440 petabytes: all emails sent in 2002.

An exabyte is about 10^18 bytes.

• 1.3 exabytes: capacity of all videotapes produced in 2002.
• 2 exabytes: capacity of all hard disks produced in 2002.
• 5 exabytes: all the words ever spoken by human beings.
• 9 exabytes: all the genomes of every living person in 2015.

A zettabyte is about 10^21 bytes.

• 500 zettabytes: the information needed to completely describe the state of a gram of water at room temperature.

So, the answer is:

It takes 500,000,000 petabytes to completely describe one gram of water, down to the positions and velocities of the individual subatomic particles...

... limited, of course, by the Heisenberg uncertainty principle!   That's what makes the amount of information finite.

How can we calculate this?  It sounds hard, but it's not if you look up a few numbers.

First of all, the entropy of water! At room temperature (25 degrees Celsius) and normal pressure (1 atmosphere), the entropy of a mole of water is 69.91 joules per kelvin.

To understand this, first you need to know that chemists like moles — and by a 'mole', I don't mean that fuzzy creature that ruins your lawn: I mean a certain ridiculously large number of molecules or atoms, invented to deal with the fact that even a tiny little thing is made of lots of atoms.  By definition, a mole is about the number of atoms in one gram of hydrogen.

A guy named Avogadro figured out that this number is about 6.023 × 10^23. People now call this Avogadro's number. So, a mole of water means 6.023 × 10^23 molecules of water. And since a water molecule is 18 times heavier than a hydrogen atom, this is 18 grams of water.

So, if we prefer grams to moles, the entropy of a gram of water is is 69.91/18 = 3.88 joules per kelvin.  By the way, I don't want to explain why entropy is measured in joules per kelvin — that's another fun story.

But what does all this have to do with information? Well, Boltzmann, Shannon and others figured out how entropy and information are related, and the formula is pretty simple: one nat of information equals 1.3808 × 10^(-23) joules per kelvin of entropy. This number is called Boltzmann's constant.

What's a 'nat' of information?  Well, bits of information are a good unit when you're using binary notation — 0's and 1's — but trits would be a good unit if you were using base 3, and so on.  For physics  the most natural unit is a nat, where we use base e.  So, 'nat' stands for 'natural'.

Don't get in a snit over the fact that we can't actually write numbers using base e — if you do, I'll just say you're nitpicking, or natpicking! The point is, information in the physical world is not binary — so base e turns out to be the best.

Okay: so, by taking the reciprocal of Boltzmann's constant we see that one joule per kelvin of entropy equals 7.24 × 10^22 nats of information. 

That's all we need to look up.  We can now just multiply and see that a gram of water (at room temperature and pressure) holds

3.88 × 7.24 × 10^23 = 2.81 × 10^24 nats

of information. In other words, this is how much information it takes to completely specify the state of one gram of water.

Or if you prefer bits, use the fact that a bit equals ln(2) or .693 nats. Dividing by this, we see a gram of water holds

4.05 × 10^24 bits

of information.  And amazingly, this is something we know quite precisely!  I've rounded off the numbers, but we could actually work it out to more decimal places if we wanted. 

If you want to learn more about this, study statistical mechanics - that's where physics meets information theory. 

A bunch of my figures came from here:

• Peter Lyman, Hal R. Varian, Kirsten Swearingen, et al, How much information? 2003,

and the chart originally came from here:

though it was edited by folks at Gizmodo.  All this stuff and more is on this page of mine:

#informationtheory    #bigness  
Shared publiclyView activity