First, I'm not sure that the paper actually says "instead of" in there. I couldn't find it. The news sites say "instead of links" but my reading makes it seem like they are talking "in addition to" links. There was a time on the web when popularity and quality shared the same metrics. That's changed recently - especially with content marketers like us and fake news sites and sites like buzzfeed and what not.
Links are a good way of measuring popularity still, but not quality. The web has changed such that popular doesn't always mean correct. For some reason we like to share and link to and talk about stupid things (see blue and black dress.) and the algorithm needs to adapt to handle that for a certain subset of queries.
If you're searching for medical advice, you don't want the "man thought this was a zit, you'll never believe what happens next" video. You want real medical advice. The same is true for all factual queries.
Some people have said that it will require a ton of humans verifying facts. Some have said that it will require a lot more expansion of the knowledge graph. Some have suggested this will be easy to game. and some have suggested adding random facts to their sites.
People won't need to validate facts. More KG info will help, but shouldn't be necessary, Yeah I'm sure you can game parts of it but it won't be easy, and adding random facts about dresses to your e-commerce site won't help you rank higher.
These people aren't thinking like an algorithm. How does the current algorithm determine relevancy to a term? It uses information retrieval theories similar to but more advanced than tf-idf. It basically means they look at every result that has that term and use math to see how "important" that term is. It's not based on any inherent value that the term has, it's based on the set of results. If every site has the term it's probably not as important. If only a subset of sites mention that term, it's probably more important. Again, this is simplified - there's TONS of math involved here.
Now, take that concept and apply it to "facts." Actually, a better word that facts would be "statements." If 10,000 websites say "the dress is blue and black" and only 300 say "the dress is gold and white", then to Google the dress is definitely blue and black. It's not a "fact" but it's a "statement' that the corpus of results agrees with.
When applied to the web as a whole, this is generally pretty reliable. Unless a LOT of sites all have the same false information (see: SEO blogs) the algorithm will mostly get the right answer. Sure there will be edge cases, and other algorithms needed to detect that, but that's true of everything.
The interesting part is, this is how the Google translate works now. It's not built off a database of words and definitions. It's built off of crawling the web. They look at how the language is used, and train the algorithm based on that.
If I were coding this new "fact" algorithm I'd be mixing the concepts of term relevance with how the google translate algorithm works. If enough places on the web say it's true, then by golly it's true.
Will there be some edge case issues? I'm certain there will be just as I'm certain that we have always been at war with Oceana. See how that could be gamed? See how it won't be easy though?
Now, quit panicking and worrying how you can "trick" this algorithm. You can "trick" it just like every other algorithm - by creating something valuable that helps a user solve a problem and telling people about it.