What does a scientist mean when she talks about a theory?
Scientists and mathematicians often borrow words from common speech and give them new meanings, usually only loosely related to the original meaning. In math, for example, we have groups, fields, rings, sheaves, stacks, and so on. In physics, the color of a quark has nothing to do with visible light; we also talk about fields, but we mean something completely different than the mathematicians: the finite field ℤ₂ has absolutely nothing to do with the electromagnetic field, and a plot of treeless land is only a very weak metaphor for both of these meanings.
The same thing happens when we speak of a scientific theory. The colloquial meaning of the word theory
is a "hunch" or a "guess". To a scientist, the word theory
means something much more complicated.
In 1960 the psychologist Peter Wason devised a simple game he called the "2-4-6 game"  to illustrate the process of scientific exploration and building theories, and also to try to understand how untrained minds think, .
The game has two players; let's call them Alice and Bob. Alice thinks of a rule describing triples of natural numbers, like "any three even natural numbers" or "any three strictly increasing natural numbers" or "any three consecutive natural numbers" and writes the rule down on a piece of paper, then tells Bob one example triple of numbers that passes her test. In Wason's experiment, he used the rule "any three strictly increasing natural numbers" and Bob was told that the triple (2,4,6) was one example of a triple that passed the test.
Bob's job is to figure out what the rule is. He can propose triples and Alice has to answer truthfully whether the triple fits the rule or not. When Bob thinks he knows what the rule is, the game ends. If he is correct, he wins; if he is incorrect, Alice wins.
Wason's results were staggering: only six
of the twenty-nine adults got the rule on their first try! A typical session went as follows:
B: (8,10,12)? A: Yes.
B: (14,16,18)? A: Yes.
B: (4,6,8)? A: Yes.
B: (3,6,9)? A: Yes.
B: (5,10,15)? A: Yes.
B: I think it's any three numbers that go up by the same amount each time.
Having been forewarned, it's easy for us to see where Bob went wrong: he saw a pattern in the first example and then tested to see whether that pattern continued to hold. Unfortunately, that perceived pattern was too specific. How could Bob have discovered this?
The only way to test it is to look at a situation where Bob thinks the test should fail. He never tried a triple like (1, 4, 5) where the numbers went up different amounts each time; if he had, he would have seen that his rule predicted a different answer. In fact, given the transcript above, Alice's rule could have been "Any three numbers", because Bob never got a "no" answer!
When I played this with some friends a couple of weeks ago (I was Alice and everyone else was Bob), several interesting things happened. First, a teenager named Andre noticed that he couldn't remember which triples had been guessed yet. He wanted someone to write down their questions and my answers
. Tycho Brahe was a Danish astronomer who spent his life making very accurate measurements of the positions of the planets. Telescopes hadn't been invented yet, but he had a large altimeter on a graduated, rotating base. He rotated the base and measured the angular distance to the planets from nearby fixed stars every night for decades. The brightest crater on the moon is named for him. His student Johannes Kepler discovered that planets move in ellipses rather than circles, but he was only able to do so because Brahe had collected the data.
The next thing that happened in our game was that one of the players made exactly the same mistake as Bob
above and got eliminated.
Next, Andre made a guess (10,9,8) that didn't fit the pattern
. When I said, "no", he was disappointed, as though he had failed. I explained how 80% fail to do that and guess rules that are too specific: he was winning!
Finally, people's faces lit up when I pointed out that they were doing physics experiments
: making measurements to try to learn a law of nature.
is a set of principles and equations describing some measurable quantities that
1) explains all the observations that the previous theory did,
2) explains some observations that weren't explained by the previous theory,
3) makes predictions about what we should observe when we look in places or ways we haven't before, and
4) says what should not happen
Note that a theory can restrict itself to ranges of observables: Newtonian physics works well in a "goldilocks" range: neither too big nor too small, neither too hot nor too cold, and not too fast. When things get too small, we have to use quantum mechanics. When things get too fast, we have to use special relativity. When things get too massive, we have to use general relativity. And when things are both very heavy and very small, we have no good theory; understanding quantum gravity is a very hard problem that no one has figured out satisfactorily yet. String theory and loop quantum gravity are the major contenders, but neither one can make predictions yet.
The website "29+ Evidences for Macroevolution"  is an excellent website that explains how evolution does all four
of the things above, and is therefore what a scientist means by a theory; for every principle of the theory, it gives existing confirmation, predictions that came true, and examples of observations that could falsify that principle. For example, evolution predicts atavisms
, features of ancestors that appear from time to time among groups that have, for the most part, lost those features. Atavisms include humans with tails, extra toes on horses, whales with hindlimbs, and so on. "No organism can have a vestigial structure that was not previously functional in one of its ancestors. Thus, for each species, the standard phylogenetic tree makes a huge number of predictions about vestigial characters that are allowed and those that are impossible for any given species."
 Wason, Peter C. (1960), "On the failure to eliminate hypotheses in a conceptual task", Quarterly Journal of Experimental Psychology (Psychology Press) 12 (3): 129–140, doi:10.1080/17470216008416717, ISSN 1747-0226