The reason that professionals care about the curve more than the numbers is that the curve is full of secrets of its own. One of the most interesting secrets is what's called "universality:" it turns out that there is a very small set of shapes, and almost every distribution curve you actually encounter in the world, whether you're measuring the frequency at which people use words, or the time it takes to get through a checkout line, or the number of marmots in a field, turns out to have one of these shapes.
(Or more specifically, real distributions are almost always combinations of these shapes, say a bump of one shape here and a curve of another shape there. That always means that two distinct things are going on, each contributing a single shape, and it can be one of the fastest ways to understand how the system is really working.)
Why are distributions more interesting? Apart from being able to read the secrets of their shapes, the problem with just looking at numbers is what they hide. There's an old joke about two guys in a bar in Seattle, grousing about how broke they are, when Bill Gates walks in. One of them pauses for a sec, thinks hard, then jumps up and yells "Drinks are on me, everyone!" His friend asks him, "Are you crazy? I thought you said you were broke!" "No, I just did the math! On the average, everyone in here is a millionaire!"
So let's talk about these shapes a bit more. There are four particularly common shapes. (There are a few others as well, but 99.5% of the time what you see are combinations of these four)
The first is the Gaussian, or bell curve. Gaussians generally happen when there's something that happens roughly the same way every time, and the deviations from that "same way" are completely random and uncorrelated noise. For example, if you measure how long it takes you to walk down the same hallway every day, the distribution curve will probably be a Gaussian. The position of the center of the bell curve tells you the average, and its width tells you how frequent random disruptions are.
The second is the exponential curve. This looks like a sharp spike followed by a decay. (Specifically, a decay shaped like F = e^−(ax), where x is the value that we're measuring, F is the number of times we saw x, and a is a constant) This is most often a sign of queues. For example, if you measure how long it takes to read a small file from a hard disk, and do it over and over again, the frequency distribution of times will look like a Gaussian added to an exponential. Why? Because the time it takes to read from an idle disk is a Gaussian -- there's a random amount of time it takes the head to seek to the right position, and then it takes a constant amount of time to do the reading. However, if the computer is busy, then you have to wait in line before that can start, and it turns out that the distribution of the time it takes to wait in line is almost always an exponential. (That fact is something we regularly use in computing to identify when the problem is that things are getting stuck waiting in line)
The third shape is a power law. This looks sort of like an exponential, but it has a much heavier tail. (The formula is F = (x/x0)^−a, and if you plot it on a log-log chart, it looks like a straight line) Power laws turn out to be extremely universal whenever humans are involved: for example, if you look at a large body of text and ask "what fraction of words show up frequently versus rarely," the shape is a power law, no matter what language. (In that case, a is about 1.8, in case you're curious. Syntax words like "and" and "the" are the most common, obscure nouns like "zymurgy" are among the rare ones, and there's a long tail of words that basically appear only once in a corpus) But power laws show up all over the place: look at the number of pictures people take each week. Look at the number of friends people have. Look at the size of cities.
It turns out that there are some fairly deep reasons that power laws are so common. For example, imagine you're looking at a network -- say, the phone network, or the network of people's friends. If a network formed by "random attachment," that is, any new node that shows up is equally likely to attach to any other node, then the distribution of how many neighbors everyone has is a Gaussian. But if it forms by "preferential attachment," that is, a newcomer is more likely to attach to someone who already knows a lot of people, then you can show mathematically that you get a power law. Lots of real networks do this: new people in a social group, for example, are more likely to start out by meeting the really gregarious person at the middle.
(This is really important for practical applications, too, because preferential-attachment networks have all sorts of other interesting features. For example, a random-attachment network doesn't become disconnected (no longer joining everyone) until you blow up a lot of links. A preferential-attachment network, on the other hand, can disconnect very quickly if you lose the very central nodes. If your network is the Internet, then you really want to keep things from being disconnected, so it's important to know what parts are the most critical. On the other hand, if your network is the graph of people who were in physical contact with each other (a preferential-attachment network), which also happens to be the graph along which contagious diseases spreads, you might be very interested in making this network disconnect: that tells you that investing your money in making sure that contagious diseases don't spread through the people with the highest number of connections is a much better bet than trying to protect everyone equally. It turns out that immunizing a hermit doesn't do nearly as much as immunizing, say, the janitor of a large building.)
And then there's a fourth common shape, the Tracy-Widom Curve. This one looks sort of like a skewed bell curve, Gaussian on the left and exponential on the right. It shows up all over the place as well, especially when studying systems with a lot of strong interactions amongst themselves. However, we don't really understand why it's so common yet: it's proven a tougher nut to crack than the power law.
However, there's been some recent progress: it turns out that this curve may be happening whenever there's a certain kind of phase transition behavior in the system, similar to ice melting or water boiling. And this article will tell you more about that.
The key takeaways are:
* Distributions matter! Averages and so on lie, because they hide things. If you really want to understand something, always demand the distribution.
* Distributions always seem to be combinations of a handful of standard shapes. If there's more than one shape in a distribution, you're seeing several different physical processes at once, and each shape tells you a story.
* There are four standard shapes, and we understand three of them pretty well and know how to read stories from them: Gaussians telling you about random events, exponentials about something waiting in line, power laws about humans or biology being somehow involved, or about preferential attachment, or a few other similar things. There's a fourth one which we're only starting to understand, but it keeps showing up, too.
* Statistics is cool, because it reveals the secrets of the universe and helps you fix problems.
These questions sound like those old "If cars improved as much s computers, they'd use thimble of gas" jokes. (Which GM responded to once with a list of several "if cars were like computers" things like, "Sometimes your car would inexplicably stop while you're driving it, and you would be expected to pull off the highway and start it again before continuing. This would somehow be accepted as normal.")
- The University of Texas at AustinSenior Technology Architect, 1989 - present
- Chuck E. Cheese'sGiant Rat, Party Host, etc., 1986 - 1987
- UT Austin, BSCS w/French, Psych1987 - 1993
- Oliver Wendell Holmes High, San Antonio TX1986 - 1987
- Clark High, San Antonio TX1985 - 1986
- East Anchorage High, Anchorage AK1983 - 1985
- Wendler Jr. High, Anchorage AK1983 - 1983
- Romig Jr. High, Anchorage AK1982 - 1983
- Washington Elementary, Peoria IL1978 - 1982
- Columbia Elementary, Peoria IL1975 - 1978
- Woodrow Wilson Elementary, Peoria IL1974 - 1975
- Miskatonic University, Arkham MAEx Ignorantia Ad Sapientiam; E Luce Ad Tenebras, 1922 - 1925
The man who made a game to change the world • Eurogamer.net
There is no one agreed family tree of video games, arranged and pruned by consensus. There is no single progenitor that sits at the top of t
Find out if service / server running in chrooted jail or not under Linux...
Chrooted jail allows run command or service such as http / mysql / postfix with special root directory i.e. chroot changes the root director
Recovering OS X Open Directory from Backup - Pivotal Labs
Lost Open Directory Database You've lost your Open Directory server database. You need to recover it, but you don't have an Open Directory A
Read-copy-update - Wikipedia, the free encyclopedia
In computer operating systems, read-copy-update (RCU) is a synchronization mechanism implementing a kind of mutual exclusion which can somet
Discrete cosine transform - Wikipedia, the free encyclopedia
A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different
REDUCING OS JITTER DUE TO PER-CPU KTHREADS This document lists per ...
REDUCING OS JITTER DUE TO PER-CPU KTHREADS This document lists per-CPU kthreads in the Linux kernel and presents options to control their OS
The Wall Street Journal Needed a Fresh Face for Climate Inaction. It Fou...
When the Wall Street Journal publishes yet another argument for doing nothing about global warming, it’s just a dog-bites-man story. So why
Security Experts Expect ‘Shellshock’ Software Bug in Bash to Be Significant
The newly discovered bug could allow hackers to write code that could surreptitiously take over a machine, or run their own programs in the