Hash bang AJAX URLs no longer supported by Google Webmaster Tools fetch utility?

As I detailed in a Product Forum post...
https://productforums.google.com/d/msg/webmasters/31GEUoFgd9k/g2cQHDA6pXMJ
... the improved fetch functions for Google Webmaster Tools seemingly makes it impossible to retrieve either fetch or fetch-and-render data on hash-bang (#!) URLs.

Prior to the change Google would report on what it saw when the "#!" was transformed to "?escaped_fragment=" as per Google's mechanism for crawling AJAX (where the "ugly URL" was used to fetch an HTML snapshot).

I can transform "#!" to "?escaped_fragment=" myself, but a benefit of the fetch function before is that I could use it to verify that Google was correctly fetching the ugly URL when a #! URL was requested in the first place.

Anyone have any clever ideas about how I could get a bot's-eye view of AJAX now?

cc:  +John Mueller +Maile Ohye 
15
6
Melissa Harden's profile photoGlenn Gabe's profile photoWarren Chandler's profile photoBarbara Starr's profile photo
8 comments
 
+Ben Hicks Ha ... well the WMT changes shouldn't impact the spiderability of Wix sites. :)
 
Oh that's quite the evolution. Does this mean Google thinks we should simply trust developers with AJAX all of a sudden? That's going to be fun.
 
+Aaron Bradley - I would assume this has some correlation with the fact that google just announced that they can execute properly formed JavaScript on page. ( http://googlewebmastercentral.blogspot.com/2014/05/understanding-web-pages-better.html ).

The way that you would get a view on the current way Google views a web document before parsing, is to run a headless browser software. Google has a headless browser type of setup, see above link.

Selenium, is an industry-standard headless browser, so is phantomjs.

Cheers!
 
That seems like something we should fix / support here too - thanks for posting! In the meantime, you can just rewrite the URLs yourself and submit those.
 
Thanks +Mark Keller.  Despite the improved JavaScript handling capabilities I'm working on the assumption that the special handling in place for AJAX is still in place.

That is, I would expect that if Google encounters example.com/#! it will still still fetch the content found at example.com/?_escaped_fragment_= - regardless of how well it can or cannot execute the JavaScript it encounters if it were to try to parse the code of example.com/#! directly.  In other words, Google will continue to avail themselves of HTML snapshots when they're directed to do.  So far looking at the cache of AJAX pages in index that, indeed, has not changed.

Thanks for the headless browser links though - knew of phantomjs, but now will give selenium a spin too! :)
Add a comment...