Shared publicly  - 
 
We're working on a spam research project here at SEOmoz with the eventual goal of classifying, identifying and removing/limiting link juice passed from sites/pages we're pretty sure Google would call webspam. A score (or scores) of some kind would (eventually, assuming the project goes well) be included in Mozscape/OSE showing the spamminess of inlinks/outlinks.

We're certainly not going to be as good at it or as scaled as Google (and thus our algorithms will be relatively conservative and focused on only the most obvious and manipulative forms), but it's making for interesting research already.

Some of our team members, though, do have concerns about whether SEOs will be angry that we're "exposing" spam. My feeling is that it's better to have the knowledge out there (and that anything we can catch, Google/Bing can surely better catch and discount) then to keep it hidden. I'm also hopeful this can help a lot of marketers who are trying to decide whether to acquire certain links or who have to dig themselves out of a penalty (or reverse what might have caused it).

Would love your thoughts/feedback! Obviously, this is at the very early stages, but want to make sure we consider all the perspectives.
87
27
Jim Christian's profile photoAlan Bleiweiss's profile photoDavid Suarez's profile photoDon Dikaio's profile photo
106 comments
AJ Kohn
+
1
2
3
2
 
Information and transparency are good things and I look forward to seeing what you and team come up with.
 
I think outing the spam will just help good SEOs do their job better. That way we don't get out-ranked by a $150 / mo BuildMyRank budget. Plus, you're right, if you, me, or anyone can already detect spam, chances are Google has known about it anyways. I say be as transparent as possible!
 
Goodie :) Nobody should be afraid of exposing spammers, the web shouldn't be built on oppression of opinions. If they're running out of space/ideas well... too bad!
 
I agree with +AJ Kohn. The information is valuable enough to make something like this worthwhile.

If an individual feels he/she is being "outed" specifically, they can make their case for why they feel their tactics aren't spammy, and it'll be a valuable source of feedback for developing your algorithm.
 
I would rather invest my time in learning how to produce a great user/brand experience than jumping on the latest "aren't I such a clever geek" bandwagon. While I don't support any form of "outing" other SEOs, it would certainly make my job a helluva lot easier if I didn't have to screw about with solutions that bring what every client wants i.e. short term gains.
 
I also agree with +AJ Kohn and think it is a great idea. Google does manual assessments that are not made public. It may be that the way to move the Moz index closer to Google's would be for manual reviews.
 
I agree that transparency is key. The community feedback is most likely going to hinge on whether or not other SEOs feel like your classification of "spam" is accurate and fair. Stating the criteria that the publishers feel is "spam" upfront should help with this. It may be a good idea to contact anyone you are considering "outing" ahead of making it public if that's even an option.
 
+Jordan Silton Very good point. We're going to be writing about the forms of classification we're assembling on the blog, then taking community feedback and trying to ID and score only those that are the most manipulative and obvious (for example, we'd probably catch a lot of very obvious/bad link farms, but not necessarily many private blog networks or paid links from reputable sites).

The criteria of what we're trying to measure will definitely be public and open to critique/feedback before we build it.

+Luella Ben Aziza Our data scientist, Matt, will be presenting on his findings at Mozcon, and hopefully building something thereafter, so my guess would be sometime in Q4 2012 or Q1 2013 for a launch.

+Thomas Høgenhaven Yeah, to be fair and upfront, since our business is providing marketers with data, this would be a "self serving" move, in that we'd benefit from folks using it (as it would mean they use our tools/membership/API). My hope is that it is providing broad value, though, and that we're doing it in a balanced, universal way (outing millions of sites, not anyone in particular, and certainly with no motivation other than making information the engines already have more accessible).
 
+Rand Fishkin - Seems like this would be a nice way to become a valuable, if not critical, component of someone wanting to have an alternative search engine to Google. Good corporate strategy for you guys in my opinion :)
 
Perhaps this is already apparent, but I just want to mention that in general the type of people that follow you are going to agree with you, while the people that don't follow you on Google+ are the ones that would be 'upset' about something like this happening.

Just my two cents on the issue, as if anyone cares.
 
+Mitchell Wright true, but sod them if they have something to hide - can't be doing SEO any good, and we've had enough of that!
 
Transparency is only a negative thing if you have something to hide - if it is possible to get information out there, do it. Information should be made available to everyone that needs it - hence why I setup a grass roots seo site ( www.grassrootsseo.co.uk ) to get the basic info to people - saves so called seo companies charging vast amounts for basic services.
 
I think any guidance for identifying potentially harmful links is very welcome. After a few email conversations with people at Google including Matt Cutts, it seems link removals or link profile cleansing are going to be a big part of the SEO role moving forwards, it pains me to say that but it is proving true. So a tool that can help identify could be essential.

As stated above, the problem is going to be classification, as Google only reconsider a site with a link penalty when every effort has been made to remove 'all' unnatural links. If the tool only focusses on the obvious it could be missing the mark for purposes of removal. Just this week we've seen brand anchor text links (in blog rolls) and content syndicated from company blogs onto sites like the ibtimes, be classed as unnatural and in need of removal.

So, not classing brand anchor text as spam could be misleading etc... However I still think with a few caveats you guys can put something really useful together.
 
Hear, hear! Go for it.
I've done my very best on my tiny little part of the net and the globe. Agree with you that this is the best way to handle what is spam, what is not. Let people/SEO Customer decide for themselves. Give examples!

Eg Google is the very largest, most accessible, most public scene on earth. How do SEOs think they will manage to hide?
 
I recently said at a conference: If you're doing Black Hat SEO you're shitting where you eat now and in the future. Contributing to a polluted industry where people tag SEO as snake oil, black magic and tricks that die every 2 years.
 
Professional SEO has to be effective. Today it is more effective to focus on a long term strategy than ever before. Of course it is the right way to extract and name such spammy sites. Linkbuilding gets harder - and it is neccessary to get as many important informations as possible.

The question ist: are you good enough to find "the eventual goal of classifying, identifying and removing/limiting link juice"? If not it's a risk. Good luck :-)
 
Rand, you need a blackhat to add to, if not dominate this research, as unless you do blackhat "spammy" seo all day every day, unless you are up on all the latest tricks, you really won't know what is a "spammy" link and the information will be increasingly skewed and thus increasingly useless. The tendency will be that anything that looks distasteful to white hats will be called spammy. At any rate, I do blackhat research all day every day. I've gotten very good at providing research for large affiliate networks and SOHOs, and getting SOHOs out of penalty as well. I recently published my findings on Penguin.

Basically you need a hacker to catch one. LMK if I can help.
 
Exactly +Pedro Dias. It can't be sustainable. SEO has a perception problem already, I'm always hoping that the next update / milestone will be THE fresh start. I have some ideas for this based on Rory Sutherland's awesome TED Talk on http://whatwillgoogledonext.com/?p=165
 
+Rand Fishkin as you know I am usually against the outing of what some call spam or for that matter any outing at all. But, I think most smart SEO's would agree that all metrics are subjective, so I don't see to many being angry about this. And I think most SEO's will end up using it in conjunction with lots of other metrics to gauge value.

So, as someone that is against outing, I say go for it. :)
 
Do it. Iam sick of working in a industry destroying what i love: the internet.
I dont think it will change. Just to mutch money in the game. But it cant harm to try.
 
+Rand Fishkin As I am working, in part, in a few industries with less exhaustive ethical rule-sets (for SEO), I know that this would create quite a bit of havoc deep in the culture of those industries. In the past, every time we have been faced with such large-scale discovery it has sparked new innovations both in SEO and in business so I have always looked at these shifts as essential and overall very productive.

Maybe expect some blackhats not to like this and to take measures to keep their practices sitting on the "line" hidden from you (your crawler). And then what do you do to fight that? Go blackhats on the blackhats (cloak the crawler)? I guess what I am hinting at is, are you not concerned, in some ways, that if there's a major reaction vs this project that it will be as hard as before to identify the problem because it will be hidden from your crawling infrastructure? Will it introduce long term effects on your crawling capacities? Thus far you have provided an all-positive service, this is a significant sea change.

I have supported and welcomed anyone who promoted transparency in the past and I'm not about to stop. More transparency = more maturity = a more useful internet.

Can't wait to see the criteria, maybe I can help by identifying a few. What are your plans to keep this up-to-date over time? (I'm guessing that this project will have to be gaining in scale and classifier quality before we see brute value and this may take some time.)
 
The web according to Google is bad enough. I don't need the web according to SEOmoz. Not a fan.
 
Leave that to the engines. You can't reengineer their view on and judgment of links, hence everything you'd mark as possible link spam is a candidate for a false positive, what's seen as true negative behavior by the webmasters you surely will harm if you make that happen.
Ben Cook
+
1
2
3
2
 
Hey Rand, I think this is a brilliant idea!

After all, as your oh so intelligent followers have stated, the only reason transparency could be a bad thing is if you have something to hide!

In fact, they're only echoing Google's Eric Schmidt in that statement so of course they're right.

And, if your churn rate for SEOmoz's tool sets remain high, you could always become the Ripoffreport of the SEO industry labeling sites as "spam" links unless they pay you enough to remove them.

Or better yet, why not allow friends of SEOmoz to pay a bit extra to keep their link schemes out of your network?

As mentioned before, most of your followers already thank you for not delivering an index update for over two months so I'm sure most of them will trip over themselves thanking you for such a great tool.

And, given the way in which your community typically responds to any type of criticism, I'm not sure you'll get many dissenting opinions even if people were to disagree with you.

SEOmoz is so good at marketing to the masses that you've created a substantial financial incentive to staying on your good side and not providing you the type of feedback that might lead you to create a more valuable company.

Since that boat sailed for me years ago, allow me to be one of the few to say I think it's a horrible idea and you'd be much better off focusing on Linkscape, which is one of the only tools you offer that I see any value in.
 
I like it. I think the more education/tools the better. And as +Tim Grice mentioned, link cleaning is now another responsibility of an SEO. It could also be helpful as an alert if you have a site in the danger zone.
 
+josh bachynski Thanks for the offer! Let's connect over email (rand@seomoz.org) and see if there's some ways we could leverage your help. Might be in later project stages, but certainly appreciated.

+Joe Hall Well, our goal would be to have a reasonably good methodology and to map against sampling from Google (to see whether sites/pages we mark as spam have been penalized or banned) so we could get more than a "subjective" metric, but glad that you support the idea!

+Meg Geddes OK fair enough. Certainly no obligation to use if you don't find it valuable/useful.

+Sebastian X We certainly don't think we could be as comprehensive as Google, but our goal would be more around catching only the worst/most obvious stuff (and sampling our findings against things Google's penalizing/banning so we can have a reportable degree of accuracy/confidence around the score's ability to successfully ID spam).

+Benjamin Cook Missed you buddy :-)

+Jason Nelson +Tim Grice Yeah - given how much "link cleaning" appears to be a common SEO task these days, we surmised that a scoring system like this could be very helpful to those who need to ID and make efforts to remove low quality links.
 
+Rand Fishkin you state this in your post:
"I'm also hopeful this can help a lot of marketers who are trying to decide whether to acquire certain links or who have to dig themselves out of a penalty (or reverse what might have caused it)."

Does that last part mean there will be a component for webmasters to devalue links themselves if they do not know where the links came from? ie a combative measure against negative seo ?

I have been mentioning this on SEOMoz this past week due to it happening to several clients - and i am not really making much headway with that argument.

I have several examples of websites that definitely have spammy links coming to them that I can Guarantee were not placed by the client and where we specifically know that they had someone spamming their site.

I also have some personal websites that have links (due to us making the websites) that were in place since 2004. +Matt Cutts pointed one such link to me of a blog we created for a guy in England many years ago that still had us linked in the credits / sponsors section. Because it did not have a nofollow he pointed it out to me. We have hundreds of sites to one of our web development companies like this dating back to 2001. Many we could rectify - since we host a lot but the one example Matt pointed out was no longer hosted with us and it took me 7 weeks to track the owner down and get him to nofollow it.

Google should give website administrators the ability to devalue links themselves if they feel spammy links are present to their site.

I agree a tool such as this would be helpful. I just would want some kind of proactive component to it as opposed to it just being an informational dataset. I can, as most SEOs could, point out the crap already. The question is how did it get there and what can be done to remove it.
 
At the risk of sounding anything like Ben, it does sound a bit like a distraction from what I'd rather see Moz build on and develop, aka a bigger, badder index and more quantitative metrics (versus qualitative which gets hairy, IMO). As a marketer, i'd rather derive the qualitative insights on my own based on MY criteria.
 
I wonder if the $18M in new investments from +seomoz will be spend on building the Spam index, with SpamRank as the TM.
Any Spam index is only as good as the underlying data, and you will need a lot of data to avoid any false positives, throwing some websites under the train in the process.

Not a fan!
 
Oh I wasn't talking about using it myself; that was never on the table. I meant I don't need my sites, my friends' sites, my family's sites, my client's sites, my neighbor's sites, my cat's sites - heck, anyone's sites - subject to the SEOmoz seal of Anti-Spam approval.
 
Isn't all the data Moz provides now based on a judgement as to whether one factor or another has a positive or negative influence on a site? Isn't that the whole point of the Moz tools? To NOT include an evaluation of links at the level you're now finally striving for has, in my opinion, been a shortcoming of the data.

Does anyone think we as SEOs should NOT have been looking manually at links within the data all along, to determine based on our own, personally limited understanding of what's good and what's possibly either harmful or what's possibly been wasted efforts in link building?

This is invaluable additional data viewing. And just like DA, on-page report grades, crawl diagnostics, and every other individual judgement data point in the Moz campaign eco-system, it's up to the individual reviewing the data to still have to make a determination as to how accurate/valuable/useful that data is.

I say YES! a resounding yes!
 
+Dennis Goedegebuure Umm... I don't think we could "throw websites under the train." Our metrics have no influence on how Google/Bing actually rank things; they're merely correlated representations of what the engines are already doing.

+David Minchala We are definitely focusing on that. Bigger, faster, fresher, more reliable are all in the pipeline. This is work outside of those specific team members, and it's designed to help make the metrics more useful and the index more unique in its value proposition (since there's now multiple places to get raw link info).
 
+David Minchala the problem is though its not your view or my view of quality that matters. If we cross an invisible set of guidelines we run the risk of being penalized.

Tools like this could certainly help business owners.

So a web company cannot have a link on the bottom of a website it created - and can get penalized for that because it was deemed spammy (their view) - (NOTE we did not pay for it) - BUT I could get a link on +Barry Schwartz 's SER website and pay for it and rank for SEO India? like rank first place? and that's a quality link. But its a paid followed link - so shouldn't that be classed as spammy?

It's their judgement. It always will be. But to be able to benefit from targeted search - its a good idea to have more transparency in those rules.

I just want something that is more than an "ALERT" or "WARNING" after some idiot has done something to a client.

Or after a link that was ok to place 8 years ago - (web design credit) now needs to be nofollow - and begins to get a website penalized.

In GWT - we can change crawl rates, upload sitemaps, hell we can even geo target our domain to a specific country and tinker with our site preview. All nice!

How about a tool or function in GWT that has NO DOWNSIDE. ie the functionality I mentioned above to devalue links we want to devalue or have ignored.
 
+Rand Fishkin So you would say that when you flag a certain site as been penalized based on your data, these sites will never be used in presentations or examples by the consumers of the data?

I've seen so many organizations which tumbled their rankings by internal errors, not penalties, which might be classified as a penalty.

Your data might not influence if a website gets hit by a train, but the people who use the data might... intentionally or by accident.

I'm wondering how much data you are planning to use for such an algorithm, as it would need a considerable investment!

That's where my comment come from, where a lack of data can show false positives, and then what...
 
+Dennis Goedegebuure people use data in presentations already - I've seen references to DA, and a lot more, in some presentations. Some people may have allowed that info to sway them one way or another. Personally, I haven't. The value of the tool should not be judged negatively based on how some individuals choose to use the data.
 
First off: I like your product and pay for it each month. I also have no problems with you personally (seeing as we have never spoken).

Now: the concept of what you want to do is fine, but you had better damned sure you get it right before you release this. And I'd be lying if I didn't say I have my apprehensions.

You're already collecting data on our sites without permission (we can't block inclusion in your index through robots.txt). You preach transparency, but that fiasco over whether or not you really have a crawler a few years back causes me to raise a skeptical eye each time you use the word. You've kind of conditioned me to believe that you're transparent when it benefits you (ie: showing off traffic stats, large indexes, and revenue), and there's nothing wrong with that. It's good marketing, and buys you good will among your fans. And yes, you're 'transparent' when you screw up (like with the delays on the last update). But is that transparency? Or is it just good customer relations and marketing?

Now you're considering creating your own rules to decide what spam is, and volunteering us all into a potential reputation management problem. I don't spam. Google knows I don't spam. But what will your algorithm think? I don't know. And to be honest, I don't really trust you to decide that.

Here are a few of my concerns:

1. It's impressive that you're indexing as many links as you are. But in an effort to improve the quality of your link index, you decided not to index lower quality pages. Aren't those the pages most likely to be the spam? Without those, what's the point?

2. You haven't figured out how to fix basic problems with your core products. For instance, Open Site Explorer has no idea that www and non www are different versions of the same page. As far as I can tell, it fails at all canonicalization issues (minus 301s).

3. The latest link index shows search results pages from some search engines as backlinks (I'm seeing tons from cox.net, for instance). That's kind of a pretty basic screw up.

4. Your index is very stale. I find 404s in 'fresh' indexes all the time, as well as recorded links that haven't existed for six months or more. Like I said earlier, I find your product useful for a lot of things, but how am I supposed to trust it when it comes to things like spam, when it can literally be turned on and off with the push of a button? How can you catch that with such a slow index?

And even if you do get it right, what actionable thing am I supposed to do with this data?

I don't mean to be an ass, but I really don't see how this is a good idea. Unless you just want a lot of free PR and links.
 
+Alan Bleiweiss The value of the tool should be judged based on the amount of data used to build the list & algorithms.
When people use the data for examples, there can be a lot of false positives. Would you like to see a clients website flagged as spam based on an incomplete dataset, executed in front of 100's of people on how bad their practices are... aka yours..?
 
if Google is foolhardy enough to surrender their own algorithms in favor of this new feature of the Moz tools, yeah, we'd have a problem. That's about the only situation I'd be concerned with. If Google was alerted to a site based on the link profile and evaluation of a site within Moz that someone alerts them to, and then from there, they take a closer look and make their OWN determination based on their broader data set, I have no problem with that.

Do I believe anything other than those two scenarios is reason to be concerned if Moz starts providing an evaluation on inbound links? No more than I am now with every other data point evaluation they already provide. Which is zero. I just don't see how this is any different. Every data point already in the system, and every data point already resulting in an evaluation is already ripe for potential false positives. So why the big deal over this one?
AJ Kohn
+
3
4
3
 
I don't view what this produces to be the authority on things. In fact, no tool should deliver that type of certainty. Do we trust the volume data from Google's Keyword Tool? Do we think the site: operator is perfect?

The SearchMetrics data of Penguin 'winners and losers', do you trust that? It's interesting but I know it's not 100% correct.

You have to apply some critical thinking to any data, whether raw or processed. When possible, validating it against other tools is always a nice way to increase confidence.

Will some people view this data and take it as 'the truth'? Yes. Is that Rand's problem? I don't think so.

For me, additional data of this sort simply gives me another source for validation and improves my own ability in terms of pattern recognition.
 
+Rand Fishkin

"Our metrics have no influence on how Google/Bing actually rank things; they're merely correlated representations of what the engines are already doing."
====

So the tools are always to be designed to be showing a representation of how Google and Bing "may" view our sites and the "neighborhoods" they are in... like a lagging indicator?

So websites with high MozRank could equally well have been hurt too (as they were) - because the tools don't predict where the SE's are going with particular intricacies of quality and its bearing on ranking.

The quality calculations therefore of SEOMoz and its tools are therefore not a leading principle... they do not classify what is quality in it's view... it represents what is a given.

If we had seen a drop in MozRank on websites prior to the Penguin update - now that would be a tool worth building. Prediction based on your own quality guidelines. Your own spam index lets say. True - those of your own Engineers and quality team - but then you would have different a different product to the array of tools out there that sheepishly follow what are givens and the current ranking factors of the SE's at any given time. Whats the point in a bunch of parameters that in essence tell you where you are? I like a lot of the SEOMoz tools and we use them and recommend them - but I think based on recent happenings... webmasters are looking for something more.

It needn't even have to be a visible indicator like MozRank... it could be like you have hinted at - warnings or alerts. Your tools already give warnings about other items - some that people may not agree with - but its all helpful... its there whether you take the warning / notice or not.

These could be designed to be ONLY visible for a website owner after they do some kind of site verification with SEOMoz.

I would be interested in that.

However without the other part of this jigsaw - the ability to do anything about it in GWT - then again we are just left with information which in this particular set of circumstances - is hard for many webmasters and business owners to act upon. With the other warnings and notices you give in your SEOMoz - they are usually actionable items. With these warnings my feelings are they wont be. They are after the event with no "meaningful way" of correcting it in GWT.
 
+AJ Kohn I agree. While Google makes flawed decisions on data THEY control, we in the industry should now, as we always should have been doing - understand and recognize data for trends. And if someone wants to try and out another site by pointing to the new Moz LA (or whatever it will be called), they can do that now without any new process being created. In my opinion, the benefits far outweigh the paranoia in this one.
 
+Carlos Fernandes understood that many are at mercy of big G. For those that also care about other channels, and who prefer to form an understanding of search ranking signals based on multiple inputs, this feature wouldnt be as useful.

+Rand Fishkin which leads me to my thesis: i prefer a tool that does my heavy lifting, not my thinking. It's the same reason i dont make purchasing decisions on google but i will go there to start down the funnel. A tool can be manipulated to pull info i care about, it couldn't possibly anticipate what i'm going to care about in every situation (until G goes self-aware, that is). I for one would like to see Moz develop an author/influencer index based on hard, objective metrics. That sh*t might just give me a hard-on. My $.02.
 
You might like to have a look at http://mwaves.org then. That site's front page is still a PR5 even though I reported it days ago. It has better than 100 paid follow links on it, in crystal clear violation of Webmaster Guidelines.
 
Bummer .. I'd really like to make the "mwaves" link above a nofollow
 
+David Minchala Stop giving away the great idea that we've already built and is the basis of my MozCon presentation
 
To me, adding a this type of thing comes down to intent. Is the goal to help SEO's clean their link profiles? Or, is it to identify the sites that have "spammed" and gotten away with it?

If this is being done in the vein of transparency, then I think you should make the "spam score" an inseparable part of the normal reports SEOs rely on to share with their clients. Then it's truly transparent, right?

My biggest concern is an entirely new form of negative SEO that's solely intended to make people's moz scores full of negative "marks". As you said...it will only show the most negative/nasty/egregious/obvious spam links...but I would imagine those are also the easiest to create. Point being...do you plan on including a metric that indicates a "likely negative SEO effort"?

Again, for me personally, it's difficult to say whether I'd be for or against until I see implementation. But, I would make sure to think through the implications it could have on (folks like myself) that use moz reports for client communication. Sending a report...which is transparent...indicating that there are spam links (whether they be real spam or not) is going to have a serious effect on client relationships.

Just my two cents...
Rob May
 
Rand, I love this idea and project. For #SEO engineers all around, it will prove to be a very useful tool to anyone who has access. I can't wait to see it in action, read up on development progress and implementation. Invaluable to us in so many ways if you look to use it correctly. Keep us posted!
 
+Rand Fishkin most of us would prefer hard metrics in addition to a final magic 'SpamRank' score. So if you do release one please ensure that we can expand it and understand the underlying data.

For example:

- Anchor text distribution
- Outbound links from the linking page
- mR to C-class ratio

+Dennis Goedegebuure spam detection can be made lightweight and produce reasonable results. We experimented with deviation value of pagerank to linking domain ratio in a collection of domains (http://goo.gl/3tMVZ) and found a threshold point at which we could easily flag spam. Manual inspection found 1/9 to be a false positive.

But I do agree with what you're saying... the end metric should be presented as "suspicion level" based on metrics and patterns involved and not so much as a definite spam identifier.
 
+Dan Petrovic 1/9 false positive, that's a rather large % IMHO.
Any chance you're publishing your findings..with all the caveats?
 
+Dan Petrovic Yeah - I think we'd present it as a "matches patterns of sites we've seen Google penalize/ban" more so than a "you are definitely webspam" type of thing. One idea currently floating around is not to show the actual spam score itself (at least on the high level), but rather a deviation from the spamminess of inlinks/outlinks of sites like you. So, for example, if you have a site that's ~DA 50 and ~1,000 linking root domains, we'd give a sense of the number of standard deviations you are from average in terms of the spamminess of your inlinks/outlinks.

Every site has spam links pointing in, most have a few (by accident or through UGC or through age) pointing out. This score would just show you how far you are from average and then you could dig in to see the inlinks/outlinks distributions, patterns, etc.

+Spencer Belkofer Totally agree. In terms of reporting, we'd definitely make inclusion of spam metrics optional for custom reports.
 
+Rand Fishkin: deviation from the spamminess of inlinks/outlinks of sites like you

This is spot on if you're looking for a lightweight spam analysis. How do you determine the "sites like you" part?
 
Perhaps a domain analysis of what constitutes 'bad neighbourhoods' as per Google's webmaster tools guidelines.

i.e - What factors would rank a web 2.0 link wheel as more spammy than say a link network between pharma based websites
 
+Kevin Spence Sorry for missing your comment earlier - I'll be getting your feedback to the product team, but we are definitely aware of a lot of those challenges/setbacks and are working in tandem to grow index size/freshness, get more accurate with new TLD extensions, etc. Thank you for the excellent feedback.

+Dan Petrovic For "sites like you" it would likely be sites in your range of DA/linking root domains/etc as I noted in the example above. We may have other classification methodologies in the future, but those would be the basic ones to start.
 
I thought the term 'link juice' was now known as 'link equity'... :)
 
+Rand Fishkin roger that, I though you were going to classify websites by type e.g. blog, forum, business or by niche/topic and draw norms from that.
 
+Rand Fishkin This is a great idea, but please ensure that these websites are left in the index - one of the only reasons we use alternate sources when doing backlink discovery is due to the depth of the index.

We want to know if a client's competitors are using spam techniques (and they are working) so we can advise the client - if you remove spam from your index it removes that metric from our reporting and we would need to go elsewhere for this information!!
 
I shall post what I posted on my re-post of this:

I think its a damn good idea to help the white hat seo community and if done correctly it will help people in finding more quality linkings rather then be used to help google remove those that acquire spammy links.
 
what Alan Bleiweiss said works for me. Just hope that us "Canucks" will also be looked at metrics wise too...! :-)
 
how does really expensive outreach to get exact match anchor texts classify ;)
 
I guess I don't fully see the point.

So this tool may show we have 10% of our links from "spammy" sites....so what? We didn't ask for a single one of them, they're all from scraper type sites. I'm sure google.com has hundreds of thousands of "spammy" links pointing to it.
 
I'm curious just how many people there are out there that can't look at a website for just a few seconds and determine if it is worth getting a link from?

I just don't understand how a tool like that can possibly be used for "good" over the long term, but I can see an awful lot of ways it can be used for "evil"
 
SEO changes over time, as we well know, and some of us have been blogging for years, using tactics that worked at the time of writing, and not so much now, perhaps. If we have issues on our sites that your tool can uncover so that we can change the problems, I can see this being a good thing.

It will also be good to ferret out those not-so-lucious linkage sites that we see as good on the surface, but maybe not so much underneath for the very reasons I describe above.

My only concern is (and Google has admitted to their algo doing this) that the algorithm will return false positives. In the scenario of linking, what happens to the poor webmaster whose site is OK, but that is seen as not-so-OK by SEOMoz? Passing bad reputation to a site that doesn't deserve it could be an issue for you, too.

Of course, we all want things to work 100% of the time, 100% of the way they were intended, but we know full well that's pie in the oh-so-proverbial sky.
 
Diagnosis can be one of fiendishly tricky aspects to nail down so as a tool for aiding marketers to dig themselves out of some inherited agency or clientside SEO hole, it would be very welcome, even if it is limited in its dataset. Here's my question: why would SEOMoz be any better than a future technical or policy iteration of GWT? because you can benchmark more openly?
 
+William Vicary Yes, we'll try to leave them in the index. We've heard some loud and clear feedback that folks would prefer we index/show the bad stuff, even if it's not passing link equity or is very spammy.

+David Ingram We worry a bit about that, but we haven't heard much outrage regarding PA/DA/mozTrust/etc. so I'd be surprised if this one was particularly bad. Perhaps some good name branding could help (e.g. rather than "spamscore," something like "inlink deviation" and "outlink deviation" to indicate a deviation from the average.

+Philip Rudy I suspect we'd not be able to catch those more advanced and hard-to-detect types of link manipulation.

+Brad Livermore Yeah, that's why rather than just showing "you have X spammy links," we'd try to show that "for your size and relative quantity/quality of inlinks, here's how your spam inlinks/outlinks deviate from the average."

+Steve Gerencser Sadly, I can tell you from experience, it's a HUGE number who can't.

+Pat Marcello Hopefully, the false positives will be largely a wash, since we'll be highlighting deviation from averages, and the average will also be affected by the false positives.

+Paul Gailey Alburquerque If I thought Google would start to show this data/metric, we probably wouldn't invest, as there's would be much more accurate/higher quality. However, they seem to be shying away from information of this nature, thus creating a need.
 
Define Spam. Wouldn't you agree that is the first step to identifying these links? And I mean - really define it. Would it have to be agenda based?
 
+Rand Fishkin I'm sorry, Rand, but I see this as a bad idea, from nearly every angle. If it were a tool that one could only use to look at their own site, after verifying ownership, I could see some forensic value in it. But the potential for harm is just to great, IMO, the way you've painted it.

You're talking about a significant investment of time and energy (and the attendant expense), for what? So that others, benignly or not, can examine and expose the practices of other sites? We've all noted Google's rather inept handling of their recent update, that did little more than facilitate negative SEO. Don't you see the parallel here?

Personally, I think you'd be better serving your community and SEO in general, to direct that energy at fine-tuning your existing tools, and/or developing new tools that DON'T facilitate the harming of sites by others, regardless of their motives.

As Steve said above, anyone worth their salt can readily detect most of those issues on a site. Even if they can't, it's Google's job to sort that out, not yours or mine. I, for one, am tired of Google getting the public to do their job. Do you really want to facilitate that, too?

As someone said above, I'm afraid this can only end badly, for Moz and for many site owners. And you know that some of those site owners will be innocent of any spammy behavior.
 
+Rand Fishkin then perhaps they should be looking for work in an industry better suited to their talents and abilities. There are already far too many people in this industry that can't do any thing more than regurgitate what someone else says, and giving them even more tools to proclaim their awesome guruness doesn't seem useful to me.
 
Noting that I work with +Rand Fishkin and people can make whatever assumptions they like about my own bias, I just want to clarify something (and Rand can correct me if I'm wrong). Spam detection would be a part of the SEOmoz link-graph in order to help better gauge the "real" (i.e. Google's perception) strength of a given site's link-graph. We're not going to end up with a "Top 100 Spammers" list or some kind of Outing Machine. At best, it might send notifications to people's individual campaigns and adjust their DA/PA if SEOmoz thinks links are spammy.

This isn't about policing the internet - it's about emulating Google's behavior as best we can. That isn't an endorsement of Google - it's a way to help people understand how Google perceives (positively and negatively) their sites, so that they can adjust to Google's changing demands. Of course, getting it right will be tricky, but equating the core concept to mass outing is a wild stretch, IMO.
 
Thanks +Pete Meyers I guess the main challenge is figuring out whether correlation (Google vs SEOMoz) in results/link metrics is due to link discounting or some other factor as you don't know whether Google is applying dampening on any given link with 100% certainty.
 
I think it'll be a good link building tool. It'll be very interesting to see how much "spam" websites perceived as "quality" actually have...
 
Rand and Pete,

consider including the following parameters in your algo:
1. # of outbound links from a page; more than 50=possible spam
2. outbound links contain the keyword that is in the page title? yes - possible spam, no - good
3. amount and types of media on the page (are there any images in the body? yes = good, no = possible spam; same with a video).
4. website ip in any online spam database? (obvious)
5. ratio of website age to # of backlinks
6. ratio of links to main page/internal pages to the website (too many links acquired quite fast = possible spam)
7. backlink destribution curve (there should be some footprint for spam sites, eg: too many PR0 links = possible spam)
8. PR and DA of the sites that this site links to; low metrics = possible spam.

let's make the web better.

Slava Rybalka
 
if the concern is centered on the notion of outing/exposing/pointing out the exact activity being labeled as potential spam, I suppose you could just include it as an aggregate "ding" against the totality of a site's link graph. So instead of saying "hey, this link might be spam", the tool could say "the raw domain authority of this site is 86, but adjusted for potential spam, the domain authority is 83"

That would at least give a site owner the heads-up that something may be amiss. With any knowledge of SEO, it probably wouldn't take long to identify the potential offenders.

Or, I like +Doc Sheldon 's idea of giving more granular detail upon site verification. Not sure if that is something that Moz would consider, though.
 
You need to have a site ownership validation in order to run a report. It is a great tool... but spam is a dirty word. It would be irresponsible for you to provide a tool to trash other site owners... not only would the tool be imperfect... but it would feed the finger pointing, name calling noise in the industry. People like being called a spammer as little as i am sure you like being called a google loving uncle tom bacstabbing traitor. .. yunno if anyone ever would say/think that.

 
Great Initiative +Rand Fishkin If many organizations like +SEOmoz involve in eradicating spam by creating tools and to make the web more cleaner and safer, the web will be "Spam Free" with in next 5 years to be realistic. Great Work. Keep Going.
 
will look forward to see how you guys will figure out the tool in checking the data/metrics of a website and consider whether its a spam or not.
 
when your site is sick, you like to know which bug (link) is to blame! Good initiative. In the LT this is a must (look at all the fuzz arount negative seo nowadays). And you guys are right on top of it, search en destroy ;-)
 
+Rand Fishkin pull you fingers away from the comment and get us something up to look at so we can see what to expect please :) Even if its just a screenshot of what the data might look like, it would be good to know and see a little more on this project, if you have anything to share yet?

Apart from that, ignore those that don't want this because they clearly have something to hide and I'm sure we will see spam from their sites in this once its finished lol

This (like most other SEOmoz projects) sounds like it could be a good help to real SEO, not only to help find more quality links but to also warn you of an bad links that you didn't know of so they can be removed or reported by yourself to google so they don't think you have built them.

See this tool could offer more then just a way to report link spammers, it could also help save you from bad links you didn't know spammy auto sites had pointing to you without you knowledge!
 
+Rand Fishkin I think it's great. OSE rolls in pages discounted by Google now - hundreds of thousands - but excluding them isn't the same as providing a, say, report of sites that you do consider spam.

In consideration of blog link networks and penguins, I think this would be the best update you could make, and a real value to SEOs.

Glad to see it. I can stop asking you for it now.
Eye Paq
 
It's like a private police department, judge and jury. It's a great idea but it comes with great responsibility. Some live on the edge. Those might get in the net of seomoz but pass google or the other way around - I am more worried about the grey are, the ones on the edge, then the ones that are clearly one way or another.
Will there be a classification or just guilty or not guilty ?
In my opinion a grade should be in place .. like A+,B .. so on F ( A+ as clean that is).

Just a thought...
 
+Eye Paq I think the whole point in this is to be done by algorithms so that people can see what's good and what's not, so having people being able to vote would mess it all up because the spammers could vote their spam sites as being good, so that simply wouldn't work!
Eye Paq
 
+Nathaniel Bailey Yes, by algorithm but not 0 or 1, black or white. I ment to set some levels on the type of page / site that the algo will find.
 
+Martin Evans ah yes that would make sense, sorry I miss understood what you ment lol :)

So Mr +Rand Fishkin do you have any comments on the above about how you would rank different links as to how spammy they are? Or would it just be a case of its spam or not spam?
 
I hate to be considered a "troll" but you guys are all dense.

Fact is there is no more "black hat" "grey hat" or even "white hat" they are buzz terms that people like Rand like to use. If you are building links to improve the rank of your sites, or a clients site you are manipulating the google algorithm to place result based on what you believe is the best listing and there is no arguing that. Google wants SEO to be NON EXISTENT. I have no idea what "clients" you guys have other than commenting on Rand status all day but mine are interested in one thing. Profit, and Results.

With all that funding moz you need to find better things to do. Google can handle there own research. And the rest of you stop reading seo blogs all day and actually test, and get some work done.
 
I'm with you +Rand Fishkin and I beleive your approach is helpfull for our community to distinguish good backlinks from just crappy ones. That said, in fact the concern will be how you'll report on spammy websites ? Proceeding as per Blekko Spam Clock is too agressive and the catched SEO can be your neighbour, your friend, me ... Ok for the project, need to refine how to communicate findings ;)
 
+Pedro Dias , you take the point.
 
+Justin Garza ROFL is all I can say to you without sounding rude lol

Ok, so you don't read about SEO, changes with google, algo updates and so on? Maybe that's why you don't understand why a tool like this will be of value to people that know what's going on in the SEO world!

Not here to get in an argument with an SEO hater, so if you don;t like the idea maybe you simply shouldn't comment on it!? You wouldn't go into a shop and complain to customers about a product you have no interest in so why do that here?
 
+Rand Fishkin I value your expertise in this field immensely. Your intelligence and insight is something I greatly respect.

That being said, and with all due respect, I think this is a bad idea. As a marketer it's your job to get your clients good rankings or to create products that help other marketers get their clients good rankings. As the SE it's Google's job to make rules about what is and isn't spam, to determine who's breaking those rule and how to block the efforts of those "doing evil".

You're crossing a line from marketer to regulator of some sort and I'm not entirely sure it's SEOmoz's place to be that.

I might understand if you wanted to start some separate organization that helps Google and the Bing regulate the web by calling out spam, but SEOmoz is first and foremost a marketing organization that, to some degree, has to work in conflict with the SEs as you try to manipulate their results, not working with them to help police the web. I think there are immense conflicts of interest here that need to be addressed before you can move forward on this.

To be clear: I hate spam. I hate bad results. I hate seeing crap rank ahead of quality. But it's not my job to destroy those sites or call out their shady tactics, it's my job to rank my sites. It's unrealistic to think the entire world is going to employ only white hat tactics, but you serve the web best by touting the benefits of white hat and staying on the side of benevolence instead of delving into the depths of maliciousness (which this is, despite your good intentions). Flaming a black hater, however deserved, for the good of the web as a whole is no different than unconstitutionally silencing a racist voice for the good of the people. It's just not right.

Again, I respect your opinion immensely in all matters of SEO and OLM but not this one. I look forward to an interesting discussion on the topic.
 
Rand you are opening up an unbelievably smelly can of worms here.
 
Fantastic idea, Rand. We have page rank from Google, but that's been a meaningless number for a long time. There hasn't been much else useful, until SEOMoz came along (I think it was Page Strength at first, now it's stuff like Domain Authority, MozRank, etc). While what you're doing might be a "guess" at Google's and Bing's evaluations, we can still use it to help understand why something is (or isn't) ranked as we'd expect. We can learn from our mistakes (or other peoples). There's this worry about being "outed", but the information is already out there (so for an SEO who did a spammy job, the client can take the time - if they know how - to establish what was done). It's not like names are going to be associated with each score.
 
This is a good idea. Can't wait to see the results
 
Rand, I swear to you, when I wrote this years April Fools post on my blog about the "New Google Fairness Algorthm" another option I was considering was almost this exact announcement you've just made for real!

The headline was going to be "Rand Fishkin Will Determine Legitimacy" and I was including the announcement of a new metric on your toolbar called CrapRank - and I had you entering into the partnership with +Jill Whalen - Now I'll never get to write that post...
:(
 
+Nathaniel Bailey did you even read my message? I guess your too busy with your 2 website portfolio and playing video games.

Let the big boys talk optimization and hit $1k-5k days off the first page of google, while you focus on your 9-5 job in your agency that sells over priced solutions.

I guess now that wordpress is popular, and yoast seo plug in is out you consider yourself good at "seo".
 
I see the world of SEO is not easy as we think, each time we google create a new algorithme and all the work can be changed in one second. So we must be careful and take attention.
 
Hey Rand, let me know if you want/need any help on this project. I'm so sick and tired of wordpress sponsorship links, widget links, and the rest.
 
BHW will be sitting in the back seat of your car if you do this. Good luck!
 
Who do you think you and SEOMoz are to point fingers at a site and say it's spam? Don't try to play God mode here Rand, you're just another guy in this whole industry!
Rob May
 
I think this type of tool Moz is working on will be beneficial, if at all as another tool to analyze data and find patterns. Imagine being able to look at data from  this research tool and find that your site, content,  or whatever may be COULD be classified by Google as SPAM. Wouldn't you think that to be an assett and tool worth the value? I would :) Remember the  Moz tools are designed on a platform to mimic that of Google's (or close as possible).. it could be huge for us as a tool once a site is developed, launched and ranking.. could avoid long headaches later !! Way to go Rand ! 
 
+Rand Fishkin Needed.

+josh bachynski like you, I've also been dedicating/investing time in finding webspam and the amount of time that is needed to locate it, well, it can be a tedious task. As you mentioned, having input from someone that is up-to-date on all the latest blackhat trickery could be quite useful. I'd be interested in sharing notes.

+Meg Geddes Penalized lately?

+Gary A. Fowler Unfortunately that's a common comment I see everywhere on the net "*I reported it days ago*" and I think it's an area Google clearly needs to improve on. There's two sides to this, typically you don't report search engine spam unless you have a self gratifying MO - nothing wrong with that. The other side of the coin is that "Google is requesting the community to report spam" to improve there algo so there should be some type of transparency or acknowledgment that the time invested reporting webspam is bearing fruit. Gary one thing I would suggest if you did not do it as so is to report one link at a time. When you find sites providing the links it's probably not a good idea to list the site that is selling the link and then the site buying the link and in the additional info section of the form to list all the other links. I have from a pretty reliable source that when you submit to the Google webspam team it is looked at almost immediately, but because of the amount of spam that is submitted the individual reviewing the spam has about a minute to go through that spam report and then moves on to the next one that's why it's important to be as precise as possible and only list the two links. Here's an example how I would submit the report: http://cl.ly/image/2t2J2y0E161L
Add a comment...