Shared publicly  - 
 
HELP! MY SITE HAS 939 CRAWL ERRORS!!1

I see this kind of question several times a week; you’re not alone - many websites have crawl errors. 

1) 404 errors on invalid URLs do not harm your site’s indexing or ranking in any way. It doesn’t matter if there are 100 or 10 million, they won’t harm your site’s ranking. http://googlewebmastercentral.blogspot.ch/2011/05/do-404s-hurt-my-site.html 

2) In some cases, crawl errors may come from a legitimate structural issue within your website or CMS. How you tell? Double-check the origin of the crawl error. If there's a broken link on your site, in your page's static HTML, then that's always worth fixing. (thanks +Martino Mosna)

3) What about the funky URLs that are “clearly broken?” When our algorithms like your site, they may try to find more great content on it, for example by trying to discover new URLs in JavaScript. If we try those “URLs” and find a 404, that’s great and expected. We just don’t want to miss anything important (insert overly-attached Googlebot meme here). http://support.google.com/webmasters/bin/answer.py?answer=1154698

4) You don’t need to fix crawl errors in Webmaster Tools. The “mark as fixed” feature is only to help you, if you want to keep track of your progress there; it does not change anything in our web-search pipeline, so feel free to ignore it if you don’t need it.
http://support.google.com/webmasters/bin/answer.py?answer=2467403

5) We list crawl errors in Webmaster Tools by priority, which is based on several factors. If the first page of crawl errors is clearly irrelevant, you probably won’t find important crawl errors on further pages. 
http://googlewebmastercentral.blogspot.ch/2012/03/crawl-errors-next-generation.html

6) There’s no need to “fix” crawl errors on your website. Finding 404’s is normal and expected of a healthy, well-configured website. If you have an equivalent new URL, then redirecting to it is a good practice. Otherwise, you should not create fake content, you should not redirect to your homepage, you shouldn’t robots.txt disallow those URLs -- all of these things make it harder for us to recognize your site’s structure and process it properly.  We call these “soft 404” errors.
http://support.google.com/webmasters/bin/answer.py?answer=181708

7) Obviously - if these crawl errors are showing up for URLs that you care about, perhaps URLs in your Sitemap file, then that’s something you should take action on immediately. If Googlebot can’t crawl your important URLs, then they may get dropped from our search results, and users might not be able to access them either. 
96
69
Rajesh Kumar C's profile photoEvgeniy Orlov's profile photoRob Wood's profile photoMatthew Shepherd's profile photo
81 comments
 
"404 errors on invalid URLs do not harm your site’s indexing or ranking in any way" as long as the 404 isn't coming from pages you want to see ranking ;)
 
+Raul Marengo Lopez I went into that in the last point, you're right that you will want to keep important URLs in the search results (but having invalid ones on your site really doesn't affect the rest). 
 
(as a footnote, I would add that a lot of 404 may suggest a structural issue, usually common in badly designed CMS, that can effectively harm rankings. Of course it depends on the URLs reported...)
 
Good point, +Martino Mosna! I'll add something about that (sorry to those who already shared this :))
 
+What about soft 404 URLs? I am having soft-404 to lots of pages that show some meta-information for the user in the top, but the user needs to login to access that information, like for instance this one http://goo.gl/ZbIyD . I have set meta tag robots to noindex because I suppose it is not good for  Google to index these pages only for registered users, but WebmasterTools does not stop reporting them as soft-404. Does these pages hurt my site rankings and I should keep the robots tag to noindex or not? Would it be possible to make WebmasterTools stop reporting them as soft-404s?
 
regards #6 - I would say that you should fix a link to a page where the URL has changed, and redirect to the new page.  Surely it's better to have an external link that points to relevant content as opposed to a 404?  Whilst 404s won't affect your rankings, they could have an adverse affect on other aspects such as bounce rate 
 
+John Mueller : Thanks for the list! Can I suggest you to add the exception in the last point directly to the point 1? I think that the "unless..." part is a key one. I know how information can be accidentally distorted or misunderstood among SEOs and I'm sure that sooner or later I'll find some client or colleague who will tell me "Google said that 404 are irrelevant".  :-D
 
+Manuel Lemos if you don't want those pages indexed, then it doesn't matter either way. We warn about soft-404s so that you can fix them if you did want to have them indexed.
 
+John Mueller yes, I put the noindex in the pages to not lead users from search to pages that they need to register and they would not like that, so they do not bounce back to search and Google considers that as a bad quality page sign.

If I am reasoning well, now I would like Webmaster Tools to not notice me about soft-404 on noindex pages because they are too many (thousands) and it makes me hard to figure other soft-404 that I really wanted to index. Is it possible to configure WMT to not warn me on noindex soft 404 pages?
 
+Manuel Lemos There are some ways around it: don't return a 200 HTTP result code (eg use HTTP auth, or return 403/410 with the login page), redirect to a single login-page and redirect to the content from there, or move the private content to a separate hostname (subdomain/domain). 
 
+John Mueller I see, the problem is that only part of the content in that page (a programming source file) is restricted to logged users.

The other part is at the top and it is meta data that tells the user what that file is so he decides whether it is good to register or not.

The page is not really a soft-404. Google may be considering it soft-404 because the meta data is not a lot of information.

I just added meta robots noindex because Google started considering it a soft-404 despite the meta-data at the top exists and is useful for the user.

Also this is user contributed content. As site moderator I encourage contributors to lift login requirements on as many files/pages as possible, so the contributor may restore the access to the content again to non-logged users. In that case would 403 or 410 still be a good idea to return in that page? Or maybe a 302 to another URL to show a copy of that page with the login requirements?
 
That looks suspiciously like an Auto-Response approach :D
How many times have you now answered this question? :D
 
we are counting in hundreds? :D
 
But yes, this is - as I mentioned in the beginning of the post - a fairly common question :)
 
The main problem of error reports in Google Webmaster Tools, is the way graphs are made. On the graph, yesterday I had 100 errors, today I have 200 errors. All clients call me with a dramatic "Alert, the number of errors increase!!". Usually that is wrong statement, it's because the graph aggregates with previous errors, so my error numbers are stable, with 100 errors/day.
Well, this is the explanation I give to my clients, but is this correct? Anyway this precise point is not really well documented, and it leads to arguments with colleagues and clients.
 
+Tom Acheté-Eymel yes, sometimes that can be misleading & confusing. It's good that your clients have you for things like this :)
 
Other issues can/do include it being a little laggy/slow - sometimes you get asked to look into something you've already fixed (but you don't know without looking :( )
 
I don't know the team under GWT (they don't have a Google+ page!) but you should point them to this thread, John ;-) Hope it helps.
To sum up, non-SEO people are afraid of the term "error", because they don't know if it's a real problem for them, for their users, or just for Googlebot.
 
I'm the poster child over in the GWT forum for this type of question. ;-)  I had over a million 404s at one point (long story).  It had zero effect on my rankings. In fact, at the time, ranking had never been better.
 
+John Mueller any specific suggestion for pages that are seen as 403 when they are currently returning unconditionally all 200??
 
+Andrea Moro I'm not aware of any "soft-403" but feel free to ping me the link so that I can take a look.
 
+John Mueller going to send you url in private once at work. It's a user on the forum which URL is not popping on top of my head ATM.
 
This is a good post. Thanks +John Mueller . I used to get a bit freaked about the errors in WMT. I now feel more comfortable with them and use them to assess the health and structure of our site. If there is something coming up regularly or a sudden change then I know I need to investigate deeper, otherwise I just look out for issues on pages that I know shouldn't be 404ing so that I can take appropriate action to ensure our users aren't getting a frustrating experience and that we aren't losing traffic, for example, from inbound links. Your post and subsequent comments confirm that this is a good approach. Thanks.
 
Thank you for the good information. 
 
We recently re-launched our site so subsequently have lots of 404's for many of the old pages on the old site which are no longer relevant which is expected. Firstly, will these eventually disappear from our crawl error section and how long should it take for these to clear as they seem to be increasing rather than decreasing.
Secondly, our developer has set up our old urls to 301 redirect to our 404 page so essentially these pages return a 301 and then eventually a 404 header response response. Is this correct or should these old urls directly return a 404 response?
 
1) You should Redirect (301) from Old to New
2) You should return a straight 404 (or 410), you should not really use a Redirect to a 404/410.

G will repeatedly attempt to access any URL that it knows of.  That is normal.  Over time, the volume and frequency of such requests should go down - but it may take a while.
If G still see's thingsl ike internal links or Sitemaps, or lots of inbound links, don't expect it to stop asking any time soon (if at all).
 
Hey Lyndon,
That's how I thought it should be setup but thought I would get some further clarification. I'll see if we can get our old urls to return straight 404s from now on!
Thanks!
 
I'll jsut double check ....

You had Old pages.
They have been replaced with New pages.
IS the content on the New related to the content on the Old?
Is there strong similarities?

If so - you should be redirecting - else you will lose the relevancy and any PR those Old pages had.
If not, then indeed, 404 the lot of them and let them fall out the index over time.
 
Yeah that's what we've done already, anything that can be redirected from the old urls to the new urls has been.
It's really pages that are no longer relevant that are 404ing, although they are 301 > 404 as I have already suggested which was my worry, as I had thought they should just 404 which you confirmed.

Thanks for double checking! :)
 
Well, just so there is "no waste" ... can some of the Old pages that are irrelevant point else where?
Are there possibly relevant categories, merged content or substitute content that would still be of use to the audience?
IF so, those could possibly be redirected as well.

That aside - yes, 404 the irrelevant pages (or 410 if possible).
 
I'll have a look but as far as I could see most of the urls that were listed were urls that were generated automatically each day from our old blog, pages that had no content so there is nowhere to redirect to. I wasn't aware they were even being generated until they appeared in our crawl errors.

I'm pretty sure 90% of the pages in there should be 404ing, I'll have a quick look through and redirect anything I can though! Cheers.
 
+John Mueller Thank you for a  very clear post.
 
This is a usefull article. I'm not sure what to do, my website have 56 errors. In the past errors were more than 400, all of them urls of the previous owner of the domain. In this period the site hasn't good positions in Serp.  This situation continued may be 6 months. One day I checked the crawl errors in webmaster tools and I found just 20. The site started to have better positions in serp. Now I'm worried because errors started increasing again - 56. The number is not so big, but all of these pages are from previous website with different thematics and I don't want Google to connect us with them. Could you help me? What to do? 
 
As Mr Mueller said, they won't hurt you but if you know they are a normal 404 caused by say sold properties like in the dynamic idx it does hurt to leave them there.  What I mean by that is it hurts my head when I try to pick out the URLS that truly need repair when I try to find them amongst thousands of soft 404... So they may not hurt you from an SEO point of view but they sure can hurt your head.  And may hurt your SEO when you get a dozen urls that should be fixed amongst thousands that do not need fixing.  So use you head for something but a hat rack and at least weekly clear those soft 404 out  so that you can see  the ones that do need work.  That is my two cents..
 
+John Mueller Hmm.. But actually it works. It will redirect all your errors to homepage which will actually remove your 404 errors from your site. After some time, when Google has removed those URL from index, you can remove the script again. :)
 
Instead, G will see them as "Soft 404's" ... something G had to develop for because people failed to provide proper 404 behaviour!

If a URL is temporarily unavailable - return a 404.
Once it has returned, G will see the 200 and continue as normal.

If a URL is removed/gone;
1) Return a 410 (if possible) or
2) Return a 404 (far more common) or
3) Return a 301 to a suitably relevant/useful page

Doing 302 redirects elsewhere, or 301s to an irrelevant page is pointless.
Yes, you may loose some PR ... but that's what happens when you remove content (and the reason for (3) being suggested).

You should not provide soft 404s if it can be avoided.
(Come on, this is 2013 not 2005)
 
+Mehul Mohan Placing JavaScript on a page that returns a 404 HTTP result code changes absolutely nothing and is confusing to users at the same time. Redirecting to the homepage instead of returning 404 for missing URLs results in users and search engines getting confused. Both are really not something you'd want to do. 404s are fine.
 
I'd rather not +Mehul Mohan ... I kind of take pride in the fact that I follow standards and do things properly for my clients, rather than risk stuffing their sites/rankings.
 
+Lyndon NA Okay +John Mueller  As You Wish. That is basically a shortcut to fix because setting 301 Permanent Redirection to each and every 404 not found is nightmare...
 
?
If something is "gone", then a 404 is the Right response!
You only supply a 301 if the content has moved, or if you have a viable alternative.
 
Hey John,
How does googe keep from penalizing you for tons of bad 404s that cause you folks to keep bouncing back on tons of pages they hit. Are you sure google is sofisticate enough to deduct all those bounces from my bounce list.... I am not sure about that so based on that google will punish me for 404 without even reallizing they are doing it.  Also they crud up my tool box so bad I cannot keep up with the ones I need to fix.. and how the heck do I know which of the tons of 404 are ok and which ones need a fix.. the only way is to visit each one to see if its what I consider a good url... In my case its not as difficult as it is for some since most of my urls are IDX for real estate that is sold.. but it has a good identifier... MLS- with 6 digit number so I  can quickly each day go thru my tool box and mark then fixed so they don't totally clutter up my toolbox.. hiding other things that I need to see..and fix..
 
+Mehul Mohan I just reviewed your site and what you have done is really dumb.. Google does not speak Javascript... so they totally ignore you script and will never be forwarded by a Javascript. They have been looking at following JS for awhile but as far as I know they still ignore it... And something you did that was even more horrible is you probably talked a lot of folks into placing thousand upon thousands of pages of duplicate content on their site... So when google hits these tons of pages and you forward him to your home page yu just stuffed the same index page (same content) down googles throat thousands of times... Can you say DUPE content... CAN YOU SAY DUMB,,DUMB..
The reason the 404 left your tool box is that you lied to google and now when they visit they will still see the same old 404 since yu did absolutely nothing with the JS except forward all your visitors..to your home page.... wow is google  pissed at you...actually they may not be pissed... they will have no idea what you did since they  cannot follow javascriipt... IF you do not believe me test it this way.. set up a forward script on a single page.. point it to one of your other pages then set up a logger scriipt that can see bots.. and watch how many bots do not show up on your other page.. It real simple to watch the spider activity on the 404 pages adn lets say you had 2 google spiders on those old 404 pages for 2 hours on a certain day.. so you can see maybe 200 pages hit... now check the same time frame on the JS landing pages and you may see some spiders but it will not conincide with the exact time you saw them on the 404 pages..
If you really don't believe this i will be happy to personally demo it for you.. For the rest of the folks please do not do this.. it is a horrible thing to do and can get you banned if a google human comes across it..
 
hey +John Mueller
I have so many urls returning error codes 403 and soft 404s.
does they harm my website?
 
+Virender Kumar if those are URLs that don't exist, or shouldn't be indexed, then that's fine. Fixing soft-404's to be real 404s helps to optimize crawling, but it's not critical for most sites.
 
Had a bunch of 401 error in our CDN server and our download page missed a nofollow tag for a day or two and guess what , i am removed from first page position 2 to page 12 or around for a particular keywords and also lost most of my keyword rankings. John, do these kinds of errors recover or you will put me in page 12 for lifetime? :(
 
Thanks you all i found something really useful from Content and also form comment.. 
 
+John Mueller Are you looking into all SEO people that most of the time never use 404 because then they loose all incoming links to that page. They often tend to redirect to similar page or main page if needed. Eg if product in shop is removed, link to similar. If no similar found, redirect to category page, and if that also is missing redirect to start page.
To me it seems like missuse just for all link juice.
 
I have a question here say when i am checking index pages from webmaster i am able to see just 70-80 pages there but when i am checking through operator, more than 1000 pages are there....why it happens? why the data is not same?
 
+John Mueller I am seeing lots of urls which is part of ajax,js like .com/ajax .com/getsession .com/insert those arent real urls, (i know you explained them in your post) my question is, the urls you find are valuable like html links? for example if i am using real url in ajax, is this valuable as html link?
 
+Stefan Janson if they redirect instead of returning a 404, we'd generally recognize that as a "soft-404", which we would treat like a 404. Our main problem with that is that it makes it hard to determine which URLs need to be crawled. 
 
+Charu Rastogi That can be normal in some cases -- I wouldn't focus so much on the indexed page count and instead work to make those happy who come and visit :)
 
+Serbay Arda Ayzit Our main goal when trying out URLs like that is to find new & interesting content, not to use that as a source of links to existing pages. If we find interesting content, index it, and other people end up linking to it, then those links are usually a good sign.
 
Hi, can anyone help?! 
I have 14 crawl errors on my site - they all appear to be photos that I have in a slideshow gallery. 
My website is ranking so low for the keywords I'm typing in that I'm getting to page 30 of Google and still not seeing my site.  Could these errors be the reason? 
I also see in the Webmaster tools that my site has 0 pages indexed, whereas on my other website it says 35 pages indexed. 
Really stuck here and don't understand why photos are causing these errors :(
Any advice would be appreciated.
I am using Weebly Pro.
 
+ABBA Chique Doubt the photos are causing your website to rank lowly in the SERPs. Without seeing your site, it's hard to say what the issue could be. 
+John Mueller I believe your rankings would be affected if for example, you have links going to a URL that was once a definitive page on your site. (I.e. during a a site migration). 
 
Hi +John Mueller, thanks for this post, very interesting to see the Googlebots' mindset. I'm currently getting invalid pages indexed at biochemistri.es/post/:id/:summary_ and _/page/:page from Javascript (added server-side by Yahoo), and can't set this address as a 404 given the platform's limitations - is there any way to remove that (as it's currently for another page)? webmasters.stackexchange.com/questions/61724

Hope it's not too off topic for this comment thread, but I wasn't able to remove it specifically since Google sees it still pointing to a "live" page (i.e. the root URL) and visitors to the link won't reach the article they were looking for, so a bit of a pain
 
+Louis Maddox I wouldn't worry about that. It's not perfect if they don't return 404, but we generally pick them up as soft-404s and treat them in about the same way. 
 
+J James Yes, if those URLs are ones that you care about (where you have or had content; my point #7), then having them return 404 will of course result in us dropping them. The right solution in that situation is to 301 redirect from the "bad" URLs to your preferred ones.
 
sir if you can reply me too, if you can tell the effect of panda 4.0 on the same please
 
Thank for the articles which solved my worry since I just find out one of my website have thousands of this 404 crawl errors in Webmaster Tools. 
 
Thanks +John Mueller   for the great insight and bullet list that answers many questions. I have practiced fixing and redirecting 404 using 301 on tens of websites, the same way you've mentioned. I found a good amount of positive impact on overall lower bounce rate and some positive impact on conversion. 
 
+John Mueller  You mean, 404 don't harm a site. But how is it with the crawling quality? If the bot crawls too many pages with 404, doesn't it mean, the crawling budget will be exhausted and for the "good" pages the bot has no more crawling capacity?
 
question, when moving a site from one company to another you are going to have 404 errors...is this going to effect your site ranking?
 
Thanks +John Mueller for this valuable information on 404. I see 404 more as a UX problem. If a site is showing couple of 404 pages (for long time) that simply means that the webmaster doesn't care much about his site and his audiences. Google Webmaster tool gives us valuable info on 404 pages and we should take immediate action on them (301 redirect to another relevant or identical page or at least a custom 404 page with relevant links and a searching option) which will help my users and search crawlers to browse through my site. Eventually this will reduce the bounce rate and increase the dwelling time and page views.
Add a comment...