Some things we seriously take for granted, and shouldn't be. This article pokes and prods into one of those areas.
Some things we seriously take for granted, and shouldn't be. This article pokes and prods into one of those areas.
I've said this before a few times in the past, but understanding how search engines might treat named entities, or specific people, places, or things, when they see them in queries is a good thing to know.
The patent I've linked to in this post is about when Google recognizes a named entity in search results, how it might also include other results that show off related entities. For example, if the query included William Shakespeare, the search results might include plays by Shakespeare.
The patent describes how "authoritative" resources for an entity might be found. One potential source might be an online database such as Freebase or online encyclopedias such as Wikipedia.
In lieu of those, or in addition to them, authoritative resources might include links from places such as an online encyclopedia that might point to something such as the "official" web page for the entity.
If those don't exist, or to augment the ones that do, authoritative sources for an entity might be found with a search, and from the search results that rank highly or exceed a certain threshold score for the search.
In the Google patent, "Query rewriting with entity detection", if Google identified the existence of a named entity in a query along with additional words, it might assume that a searcher intended to do a site search on the authoritive pages for that entity. The patent has been republished a couple of times and the newest version drops the inferred site search idea completely, but that doesn't mean that the original isn't still in use. Multiple pages from the same site are showing up in such queries. The address of the first version of the patent is at:
If you are attempting to optimize for a named entity, including specific people, places, and things (including brands), learning as much as you can about how Google might treat a named entity, and associate a named entity with a specific web page, is probably a good idea.
I didn't know that he wanted a site online, and it came with his Web access. So he did it.
Took me months before I had the chance to replace it with something better.
Thanks. Verizon. For. Nothing.
You may find yourself asking why the Microsoft link, and honestly it's probably because Microsoft has written much more about page Segmentation and carried the ideas and concepts behind it much further than Google or Yahoo.
Google does have a few patents directly on the concept of page segmentation, and I included in it my "10 most important SEO Patents" series.
Here are some of the things that Microsoft described in white papers, though:
A block level PageRank, where links from different blocks or sections on pages would carry and pass along PageRank as if they were pages in the older approach to PageRank.
A way to decide which was the most important block on a page, especially pages that had multiple main content sections like a magazine mode with multiple stories, so that the text in the most important block should carry the most Relevance Value.
A way to analyze and understand the different blocks or segments of a page based upon linguistic features of those blocks, or sections.
Does the content of that block use mostly full sentences in Sentence case, with only the first words capitalized, and full punctuation?
Does the block only contain lists of words/phrases, in title case, and mostly links elsewhere?
Does the block contain a copyright notice, so that it's most likely a footer for a page, and the text within it should be ranked very lowly from a relevance stance?
Here are a few posts I've written on web page segmentation, for those of you who want to do a little more investigation on the subject:
Google and Document Segmentation Indexing for Local Search
Google's Page Segmentation Patent Granted
Search Engines, Web Page Segmentation, and the Most Important Block
Breaking Pages Apart: What Automatic Segmentation of Webpages Might Mean to Design and SEO
I've been pretty much avoiding those for some time now. In fact, don't even write at SEL/SEW either anymore. Peeps can fckn PAY ME or I'll stick to things that make me some coin
Google doesn't trust the meta tags people use to tell it what language a page is in, and would rather figure that out on their own. They tell us in the patent that people rarely use character set and language markup like the following, and that it's often wrong when it is used:
This patent was filed in 2003, and it's possible that Google has been using this process for over a decade. It's quite possible that Google might take passages like the following from many pages on the web and break them down into smaller pieces, often referred to as "n-grams" where "n" refers to a number. With a very large set of n-grams, the search engine might then be able to create statistics about languages, including which language is being used. So, this passage from the patent might be broken down as follows into tri-grams, or three word bites:
Search engines operate in two capacities, which both require identifying the language in which the content is expressed. First, search engines collect information about potentially retrievable Web content and news messages
Search engines operate
engines operate in
operate in two
in two capacities
two capacities, which
capacities, which both
which both require
.... and so on.
Statistics about the words used can be involved in the creation of models that can do things such as help identify paraphrases, find unnatural language such as spun or scraped content.
Documents might be classified through a method like this, and things like the meta language tags, and even the country code TLDs that a page is on can be ignored as clues.
The co-occurrence of words within these smaller pieces of content of a page can be used to help understand the language the page is in, and the meaning of the page.
If the word "football" is identified within a page, and the page is classified as being US English, it's likely tied to a different meaning than another page which is classified as British English, or one classified as Australian English.
The use of a language attribution model can also be used to identify pages that might be considered gibberish, and webspam - generated by scraping together different pieces of content from multiple pages, or generated from automated translation into one language, and then into another.
There is also a Google Books N-Gram Viewer that can be used to look at the appearance of different words in the English language and when they might have peaked in usage at different times over many generations of books:
Building language models isn't a complete answer to information retrieval, but it is a useful avenue of exploration, and can impact how pages might be ranked in search results by providing more context for the words being used on a page.
There are a lot of steps that you can take if you receive a message or a warning in Google Webmaster Tools or notice a large drop off in traffic to your site, but you shouldn't panic. We're going to discuss penalities, and both and have been working with a lot of clients to diagnose these kinds of problems with their sites. This should be a great discussion, which I'm looking forward to very much.
We'll get to know Panda (again) and our old friend Penguin. We'll discuss the differences between algorithmic penalties and manual penalties... and other types of penalties you may have encountered along the way too.
How do you know if your website is under the dark cloud of a penalty?
If you are under a penalty what are things you need to do to start the recovery process. Should you use Google's link "Disavow Tool" ? If so, when... and are there some best practices that can be highlighted to improve the chances of a manual reconsideration review after submitting a disavow link file.
What kind of timeline should you expect for a recovery to occur if those processes are implemented.... and is recovery even a possibility?
It can be a very complicated subject for business owners to grasp if they are not immersed in search engine marketing on a daily basis. What also makes it worse is that sometime business owners have found themselves in this situation because of techniques implemented by agencies they had hired to help them.
We'll try to break it down in a simple way to give the viewers who find themselves in this situation the answers and resources they need so they can turn the page.... and start the road to recovery.
Did the site in question make changes of some type? Did their server or host have problems or make changes?
If they lost some rankings, were their competitors making changes to their site that may have caused those to rank higher?
Did searchers change the way or words that they used to search for the products or services offered by the site?
Usually, a penalty or a search update is best saved till last after looking at those types of things.
I did have a client who started losing traffic over this past Christmas holiday, and the reason was some server issues that were quickly resolved, and traffic started shooting right back up after they were fixed. Fortunately for them, it was a slow time for what they offered on their site anyway. It does happen, though.
I'm looking forward to the Hangout. Talk to you tomorrow!
PageRank is the algorithm that seems to have set apart Google from the other search engines of its day, but chances are that it started changing from the moment that it was set loose on the world. I can't in good faith write about the PageRank of the late 90s, but wanted to point to a different model.
Not every link on a page passes along the same weight, the same amount of PageRank, and likely not even the same amount of hypertextual relevance. We heard this from Google Representives for a few years, and from even search engines like Yahoo and Blekko, where we've been told that some links are likely completely ignored such as those that might show up in comments on blog posts.
As this patent tells us, Google might see the anchor text of "terms of service" on a page, and automatically not send much PageRank to that page.
You see the name "Jeffrey Dean" listed as one of the inventors on this patent, and if you start digging through other Google patents, you'll see it frequently. He often writes about technical issues involving the planet-wide data center that Google has been building, and how the whole of the machinery works overall. If you have a few days to spare towards looking at patents from Google, it wouldn't hurt looking for ones written by him. His "Research at Google" page might overwhelm you:
Jeffrey Dean - Research at Google
There have been a lot of things written about PageRank over the years, but if you haven't read about the Reasonable Surfer and don't understand the transformation it describes from a random surfer model, you really should.
Here's a blog post I wrote about it that you can use as a kick start:
Google's Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data
- Go Fish DigitalDirector of Search Marketing, 2013 - present
- SEO by the SeaPresident and Internet Marketing Consultant, 2005 - present
I presently live in the Virginia Piedmont, about 50 miles west of Washington, DC in a county filled with horse pastures and farm fields.
I enjoy reading fiction and science fiction, listening to most types of music, delving into the history behind small towns, out door photography, and exploring nature.
I am the Founder and President of SEO by the Sea, and I like working with people with their web sites, to help make them easier to find, and easier to use.
Some posts I've written in the past that focus upon analyzing patents from the search engines:
- Google's Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data
- The Google Rank-Modifying Spammers Patent
- Google’s Agent Rank / Author Rank Patent Application
- The Google Hummingbird Patent?
I am the Director of Search Marketing at Go Fish Digital.
I am often called a patent analyst or patent expert or patent guru by many people and bloggers and media writers, but my job is not to analyze or interpret patents. I do that for fun, and to learn things about search engines and search that I otherwise couldn't. I consider it performing due diligence and feeding my curiosity - the information behind many business models and algorithms that search engines use are being made public, and it's worth taking the time and making the effort to read through them and trying to understand what they say and why they say it.
- Widener University School of LawLaw
- University of DelawareEnglish
- Hillsborough High School