Natural language understanding (NLU) is one of the hardest problems for computers to solve -- but one we've made tremendous advances in in the past few years. Today, Google open-sourced SyntaxNet, one of the latest such systems that uses deep neural nets to parse sentences into trees -- the first step in understanding what a sentence means. (It also released a prebuilt SyntaxNet model for parsing English, named Parsey McParseface)
There are a lot of reasons why language processing is hard. The ambiguity of the meaning of words (bank as in river, or bank as in finance?) is pretty easy to deal with, but often the entire structure of the sentence is ambiguous, and you need not only grammar but world knowledge to parse it. To take an example from this article, in the sentence "Alice drove down the street in her car," does it mean (Alice drove down the street) (in her car), or (Alice drove down) (the street in her car)? You need to know something about how streets and cars work to realize that the first is a lot more sensible than the second.
In fact, language understanding is what we refer to as "AI-complete:" to solve the problem requires solving the entire problem of human-level intelligence. (Fortunately, we can still solve the large majority of language understanding -- enough to be practically very useful -- without doing that.)
You can think of language understanding as happening in several steps, with a lot of ambiguity which is only gradually resolved. (I'm going to skip speech understanding, which has even more ambiguities in it as we try to resolve sounds into words) First, you might break the sentence into a tree, showing that this is a verb and it affects these nouns and so on. As in the Alice driving case, you may end up with a few possibilities. Then, you have to understand what each item in the sentence means; for example, resolving pronouns so that you know who "you" is in this sentence. Then a lot of world-knowledge comes in, because many sentences only make sense if you use that information. And if you're lucky, at this point you have a clear and unique interpretation left.
Except, you're rarely that lucky. Here are some fun examples:
I poured water into a glass.
I poured Harry into a glass.
*I poured water after a glass.
The third of these sentences seems grammatically fine at first -- we just switched one preposition for another -- but it makes no sense to pour anything "after" anything else, and that's a property of the idea of pouring. The second sentence is grammatically sensible, but it's a pretty surprising use of the word "pour;" it suggests that Harry was some kind of fluid. The meaning of a surprising sentence is generally the thing that wasn't obvious about it.
You can also look at how sentences relate. Consider:
I poured water into the glass. I poured the glass into the sink.
After the first sentence, you could say "the glass is full of water" -- but saying that requires you to understand that a glass is a container, and that pouring fills its target. In the second sentence, you first resolve the determinant "the," so you know that we're talking about the same glass as before; and that the object of pouring is either the fluid being poured, or the container being poured out of, and in the latter case, it implies that the container is now emptied. So the second sentence means that you poured water into the sink from the glass.
But this isn't all! You could also interpret "the glass" in the second sentence as the fluid, and these sentences make perfect sense if we're talking about a pile of molten glass. (In which case your sink has probably had it) You need world knowledge to differentiate the two.
For an example of how even resolving pronouns can be AI-complete, here's a classic example:
WOMAN: I'm leaving you.
MAN: .... Who is he?
To understand who "he" is referring to in the second sentence, you need to understand a huge number of things. The first sentence implies that the woman and the man had a romantic relationship, because to leave a person implies termination of such a relationship, which means one existed beforehand. Now the man is inferring from the woman's statement that she's leaving him for another man, and is really asking who that man is, which requires a fairly complex understanding of the way romantic relationships work in certain societies. And for us to understand his sentence requires us to understand what the man thinks that the woman is thinking, which means that we need to understand how people understand other people just to resolve the pronoun "he!"
Fortunately, most natural language problems aren't this complicated. That's useful, because people increasingly want to interface with computers using natural language (people don't say noun phrases as much into speech interfaces the way they do into search engines), people want computers to handle natural language jobs (remember how bad voice mail trees used to be? Not that they're fantastic now), and computers need to understand language in a whole range of contexts, from captioning videos to understanding documents.
The past few years have brought around tremendous shifts in this field, just as they have in the related fields of speech recognition and synthesis and language translation. Things which used to be nearly impossible (even part-of-speech tagging was considered extremely hard just ten years ago!) are now routine -- and I expect that in the next few decades, we'll continue to see tremendous shifts in computers' ability to understand us and speak with us naturally.