
Influence Finder is a new link analysis tool that aims to make link research more targeted and less time-consuming while producing better results.
Despite how SEO has evolved over the years one aspect remains crucial to the success of any SEO campaign, links. So just about any tool that claims to make the process faster, smarter, and better quality is worth taking a look at.
Influence Finder is a web-based tool which has a clean interface and is pretty easy to use. When you log in the first thing you’ll see is the project dashboard, where all your current projects are located.

The projects you see there are some templates they provide, however you are free to choose a custom project and name it whatever you’d like. The project options are:
It’s important to note that the report creation interface is exactly the same whether you choose Competitor Profiling, Vertical Media, or Custom. These initial report types are just there to give the user an idea of what they might want to cover in their research.

We ran through a report as if we were running a “Brand” report so you can see how the system works.
Let’s say we work for Waste Management, a leading provider of trash removal and recycling services here in the US. So we selected the first project type in the image above and clicked “go to step 2″.
The interface is simple to work with. You can do the following in this screen:


Once you move on to step 3 you are presented with some more options. Here you can add keywords manually or via the anchor text they found when crawling the targeted URL’s. They will look for occurrences of these keywords in the following places:
You can choose whether they are brand or non-brand keywords. As of this writing actual anchor text is not available, however I have been told that this will be an enhancement in version 2.
So basically if you choose “trash removal” as a non-brand keyword and “recycling” as a non-brand keyword, then they will be grouped under the “non-brand” keyword data point in the results section.

The second place you can add them is via the keywords found during the initial crawl by Influence Finder’s bots (over Majestic SEO’s data). They are sorted by frequency.


When they are looking for these keywords they are looking based on phrase match and not exact match. The idea here is that you are looking for link opportunities around a keyword or phrase rather than for specific data about an exact match keyword. So if you have a site about auto insurance you’ll get results that will show linking opportunities based on auto insurance, online auto insurance, dirt cheap auto insurance, and so on.
It is based on phrase match and I think the addition of the actual anchor text will be helpful in making this tool both a link opportunity research tool as well as a competitive research tool with respect to competitor backlink profiles.
When you are ready to begin the full index simply click “create index”. Above the “create index” tab you can show more keywords from the initial crawl if you want. This can take anywhere from a hour to a few hours depending on the size of the backlink profile.

So here is the results pane for this report. There are 2 panes, the left pane which is for Link Sources and the right pane which are Page Level details related to the domain you highlight in Link Sources (we’ll get to the numerous data points in just a moment):

Here is the right pane. When you highlight a source in the left column (Link Sources), the right pane (Page Level) contains the pages within that site that reference either the brand or non-brand keyword (note, these are sites that do and do not have links to the current domain which can be filtered as discussed later on in this review):

When you highlight a page you can see a screenshot and open it in a new tab, as shown above.
For the left-side pane, Link Sources, you have the following data points available:
You have the same options within the Page Level area in the right pane. Both sets of options are available from the Change Filters -> Link Source or Page Level Filters options within the tool.
The left pane (Link Sources) of the application is where your results are populated, where the right pane is domain or page specific information (Page Level) based upon what is highlighted on the left (more on that in a moment). The left side has the following options, as shown below:



They also have a flagging system, which is purely optional:

Flags are color coded, with the following colors available. Use them for whatever system you devise
:
In addition to the data points mentioned earlier (Max Authority, Heartbeat, Affiliate relationships, etc) The custom sorting feature gives you these additional options which you can include in the dropdown referenced above, but in case you missed it here it is again

(click the more button to add additional sorting options)
The additional options include:
Clearly lots and lots of options here. Just one usage example could be that you wanted to see sites that are currently not linking to you, but talk about your brand on their site (in key areas like title tags). These could be good link prospects. First thing to do is change the link display option to “no target links found”.

The next thing is to change the sorting options to have the Brand keyword in the H1 and the Body, these should be good link targets. They do not link to us, they have our brand keyword in the H1 and/or Body copy.
To show those columns you have to go to “Change Filters” as shown below, so they will show those columns in the Link Sources (Left Pane) if you click the checkboxes on the right as we did with Brand keyword in H1 and Brand keyword in Body:

And here you can see the new columns, noted with red dots:

We can see that Earthtimes.org appears to be a worth link prospect with a Max Authority of 12, possibly being a blog (guest post), has a strong heartbeat, and not only has pages with our brand name in an H1 tag but also has it within the body copy.
When we highlight a domain in the Link Sources area, the right pane populates the Page Level data like so:

What’s great here is that now you have pages that are targeted to your content which (most) use your brand keyword in the H1 tag and Body copy. Remember too that there are many, many other filters available as mentioned above. This is just one example of what you can do. It certainly is a pretty targeted way of building links. Now, you know the following:
You have a whole host of other filters available as well, but this makes for a fairly targeted link prospect.
In order to get custom columns, like we did with Link Sources, you have to go into Advanced Page Filters on the right to select those custom columns (Brand keyword in H1 and Body in this example):

We have discussed some of these already as it is used in the normal flow of how you would use Influence Finder. There are an enormous amount of data points available to you within this tool and it’s likely that you will not use all of them on every report you run.
The interface for this part of the tool looks like this:

You have 4 options here:
These are essential tools for slicing and dicing the data to suit your report needs (link research, competitive research, link prospecting, and so on).
Influence Finder has a lot of features. Chances are you have a link tool or two already. As more and more tools enter the online marketing space it’s important to consider the overlap and unique features of the tool you are considering and the tool(s) you might already have.
Influence Finder, as we have outlined for you, has a seemingly endless array of filters you can use to target link prospects. The 3 bigger players in the link research and/or management space are typically thought to be:
When comparing tools in the same space it’s important to make sure they are designed to do the same things, in this case Influence Finder is unique in its stated purpose. Influence Finder is much more about finding worthwhile link prospects in a very targeted manner.
These other tools are much more about pure backlink research (like Open Site Explorer and Majestic) or backlink management, tracking, and workflow (like Raven, which also has Majestic functionality baked into their research features).
Influence Finder runs off of Majestic’s data. When you run a report in Influence Finder, their bots re-crawl the Majestic data to make it a bit more fresh and to customize it to your chosen parameters. The key points of differentiation on Majestic’s side are
Open Site Explorer is a solid link research tool from SeoMoz. It doesn’t quite have the size that Majestic does but it’s certainly big enough to be a worthy link research tool. The UI is top notch and it is very easy to use. Some of the cool things you can do with Open Site Explorer:
So much like Majestic, Open Site Explorer is more of a link research tool/competitive analysis tool. Though, with either, you can certainly find worthwhile linking partners off of a competing site and you can look up sites of “influence” and check backlinks that way too.
Influence Finder’s core benefits are finding linking partners which are relevant to your brand and non-brand keywords so they are naturally much stronger in this area than Open Site Explorer and Majestic. Conversely, Open Site Explorer and Majestic are much stronger in the area of competitive link research.
We recently reviewed Raven and Raven certainly sets the standard for link workflow, management, and reporting at the moment. Raven uses Majestic’s data in their link research feature set.
Raven is kind of in the middle here. They have Majestic built in so they are part competitive research plus part link management plus part link building workflow.
While Influence Finder is planning on introducing reporting and workflow into an upcoming version, their current tool combined with Raven’s link building and monitoring tools make for a powerful link building toolset. So with Raven:
With just about anything you buy, generally you’ll get features you either don’t need or are just a bit beyond what you need them for in terms of depth. The nice thing with Raven is you get access to a bunch of tools in one spot for a fair price.
Do they have all the features? Nope but do you really need every single option on every single tool? There’s something to be said for managing most aspects of a campaign in one spot.
So if you take Influence Finder’s unique core features and combine it with Raven for reporting, workflow, and research and/or with another link research tool like Open Site Explorer then you’ll have a really strong set of tools.
The point is, none of these tools do everything the other does so it’s a good idea to take a look at each of them and weight the features, benefits, and costs against what you “need” for your campaigns.
Lots of data here, so we’ll outline how it all ties together.
You can use this tool for many different purposes and they even give you some guides as to what you might want to use the reports for. I just want to stress that those reports are only exclusive of each other in naming only, the functionality of the tool after you select the report “type” is the same irrespective of which report you choose or if you just go with custom.
We talked about left pane and right pane a lot, here’s a condensed screenshot of the interface:

The left pane also houses the Custom Sort data when selected while the right pane houses the Change Filters options as mentioned eariler.
So this was an example of a report on your domain for one core keyword and some brand related keywords. This is a pretty powerful tool and if they add the actual anchor text where a link exists as well as some stronger work flow (assignments, notes, etc) and reporting features then I think this will be a tool well worth a look for you or your company.
They did tell me the features I mentioned above will be a part of version 2 which they are working on as we speak. When that comes out, we will certainly take a look and post that new information as well as our thoughts. As it stands now this is a really comprehensive tool for link prospecting and link building.
You can find out more at InfluenceFinder.Com.
It is no secret that in the past Rand and I have had some minor difference of opinions (mainly on outing).
But in spite of those, there is no denying that he is an astute marketer. So I thought it would be fun to ask him about his background in SEO and to articulate his take on where some of our differences in opinions are. Interestingly, it turns out we shared far more views than I thought! Hope you enjoy the interview.
Throughout your history in the SEO field, what are some of your biggest personal achievements?
The first one would have to be digging myself (and my Mom) out of bankruptcy when we were still a small, sole proprietorship. Since then, there have been a lot of amazing times:

My wife and I in San Franicsco (via her blog)
What are the biggest counter-intuitive things you have learned in SEO (eg: that theoretically shouldn't work, but wow it does (or the opposite – should work but doesn't)?
The most obvious one I think about regularly is that the "best content rarely wins." The content that best leverages (intentionally or not) the system's pulleys and levers will rise up much faster than the material the search engines "intended" to rank first.
Another big one includes the success of very aggressive sales tactics and very negative, hateful content and personalities. Perhaps because of the way I grew up or my perspective on the world, I always thought of those things as being impediments to financial success, but that's not really the case. They do, however, seem to have a low correlation with self-satisfaction and happiness, and I suppose, for the people/organizations with those issues, that's even worse.
A very specific, technical tactic that I'm always surprised to see work is the placement of very obvious paid text links. We realized a few months back that with Linkscape's index, we could ID 90%+ of paid link spam with a fairly simple process:
We've not done the work to implement this, so perhaps there's some peculiar reason why applying it is harder than we think. But, it strikes me that even if you could only do it for pages with 3 or 4+ links in this fashion, you'd still eliminate a ton of the web's "paid" link graph. The fact that Google clearly hasn't done this makes me think it must not work, but I'm still struggling to understand why.
BTW – I asked some SEOs about making this a metric available through Linkscape/Open Site Explorer (like a "liklihood this page contains paid links" metric) and they all said "don't build it!" so we probably won't in the near term.
One of the big marketing angles you guys tried to push hard on was the concept of transparency. Because of that you got some pretty bad blowback when Linkscape launched (& perhaps on a few other occasions). Do you feel pushing on the transparency angle has helped or hurt you overall?
I think those inside the SEO community often perceive a conflict or tiff internally as having a much broader reach than it really does. I'd agree that folks like you and I, and maybe even a few hundred or even a thousand industry insiders are aware of and take something away from those types of events, but SEOmoz as a software company with thousands of paying subscribers and hundreds of thousands of members seems to be far less impacted than I am personally.
Re: Linkscape controversy – there have been a few – but honestly, the worst reputation/brand problems we ever had have always been with regards to personal issues or disputes (a comment on someone's blog or something we wrote or allowed to be published on YOUmoz). I don't have a good explanation for why they crop up, but I can say that they seem to have a nearly predictable pattern at this point (I'm sure you recognize this as well – think I've seen you write fairly eloquently on the subject). That does make it easier to handle – it's the unpredictable that's scary.
We certainly maintain transparency as a core value and we're always trying to do more to promote it. To me, core value means "things we value more than revenue or profits" and so even if it's had some hard-to-measure, adverse impact, we'd maintain it. We've actually got a poster hanging up in the office that our design team made:

An excerpt from our TAGFEE poster
There's a quote I love on this topic that explains it more eloquently than I can:
"(Our) core values might become a competive advantage, but that is not why we have them. We have them because they define for us what we stand for, and we would hold them even if they became a competitive disadvantage." – Ralph Larson, CEO of Johnson and Johnson
What type of businesses do you think do well with transparency? What type of businesses do you feel do poorly with it?
Hmm… Not something I've tried to apply to every type of business, but my feeling is that nearly every company can benefit from it, though it also exposes you to new risk. Even being the transparency-loving type, I'd probably say that military contractors, patent trolls and sausage manufacturers wouldn't do so well.
How have you been able to manage the transparency angle while having investors?
I thought it would be tougher after taking investment, but they've actually been very supportive in nearly every case (some parts of Linkscape, particularly those re: our patent filings being exceptions). I don't know if that would be true had we taken on different backers, but that's why the startup advice to choose your investors like you choose your husband/wife is so wise.
When you took investment money did you mainly just get capital? What other intangibles came with it? How have your investors helped shape your business model?
It certainly made us much more focused on the software model. As you noted, we dropped consulting in 2010 entirely, and we've generally limited any form of non-scalable revenue to help fit with the goals of a VC-backed business. We did gain some great advisors and a lot more respect in many technology and startup circles that would have been tough without the presence of venture funds (although I think that's shifting somewhat given the changes of the past 2-3 years in the startup world).
Have you guys ever considered buying out your investors? Are you worried what might happen to your company if/when it gets sold?
While we'd love to, I doubt that would ever be possible (barring some sort of massive personal windfall outside of SEOmoz). Every dollar we make gets our investors more excited about the future of the company and less likely to want to sell their shares before we reach our full potential. Remember that with VC, the idea is high risk, high reward, so technically, they'd rather we go for broke and fall to pieces than do a mid-size, but profitable deal. Adding $5 or $10 million dollars back to a $300+ million fund is largely useless to a VC, so a bankruptcy while trying to return $50 or $100 million is a very tolerated, sometimes preferable result.

I wrote about this more in my Venture Capital Process post (where I talked about failing to raise money in summer 2009)
Now that you are already well known & well funded you are taking a fairly low risk strategy to SEO, but if you were brand new to the space & had limited capital would you spam to generate some starting capital? At what point would you consider spamming being a smaller risk than obscurity?
You ask great questions.
While I don't think spam has any moral or ethical problems, I don't know that I'd ever be able to convince myself that spam would be a more worthwhile endeavor than brand building for a white hat property. Overnight successes take years of hard work, and I'd much rather get started as a scrappy, bootstrapping company than build up a reserve with spam dollars and waste that time. However, I certainly don't think that applies to everyone. As you know, I've got lots of friends who've done plenty of shady stuff (probably a lot I don't even want to know about!), but that doesn't mean I respect them any less.
Speaking of low risk SEO, why do you think neither of our sites has hit the #1 slot yet in Google for "seo"? And do you think that ranking would have much business impact?
We've looked at the query in our ranking models and I think it's unlikely we could ever beat out the Wikipedia result, Google or SEO.com (unless GG pulls back on their exact-match domain biasing preference). That said, we should both be overtaking SEOchat.com fairly soon (and some of the spammier results that temporarily pop in and out). Some of our engineers think that more LDA work might help us to better understand these super-high competitive queries.

SERPs analysis of "SEO" in Google.com w/ Linkscape Metrics + LDA (click for larger)
In terms of business impact – yeah, I think for either of us it would be quite a boon actually (and I rarely feel that way about any particular single term/phrase). It would really be less the traffic than the associated perception.
As an SEO selling something unique (eg: not selling a commodity that can be found elsewhere & not as an affiliate) I have found word of mouth marketing is a much more effective sales channel than SEO. Do you think the search results are overblown as a concern within the SEO industry? Do you find most of your sales come from word of mouth?
I see where you're coming from, but in our analyses, it's always been a combination of things that leads to a sale. People search and find us, then browse around. Or they hear of us and search for information about us. Then they'll find us through social media or referring site and maybe they'll sign up for a free account. They'll get a few emails from us, have a look at PRO and go away. Then a couple months later they'll be more serious about SEO and search for a tool or answer and come across us again and finally decide, "OK, these guys are clearly a good choice."
This is what makes last touch attribution so dangerous, but it also speaks to the importance of having a marketing/brand presence across multiple channels. I think you could certainly make the case that many of us in the SEO field see every problem as a nail and our profession as the hammer.
What business models do you feel search fits well with, and what business models do you feel search is a poor fit for?
I think it's terrific for a business that has content or products they can monetize over the web that also relate to things people are already searching for. It's much less ideal for a product/service/business that's "inventing" something new that's yet to be in demand by a searching population. If you're solving a problem that people already have an identified pain point around, whether that's informational, transactional or entertainment-driven, search is fantastic. If that pain point isn't sharp enough or old enough to have generated an existing search audience, branding, outreach, PR and classic advertising may actually do better to move the needle.
Have you ever told a business that you felt SEO would offer too low of a yield to be worth doing?
Actually yes! I was advising a local startup in Seattle a couple years ago called Gist and told them that SEO couldn't really do much for them until people started realizing the need for social-plugins to email and searching for them. This is the case with a lot of startups I think.
In an interview on Mixergy you mentioned up racking up a good bit of debt when you got started in search. If a person is new to the web, when would you recommend them using debt leverage to grow?
Never, if you're smart. Or, at least, never in the quantities I did. The web is so much less costly to build on nowadays and the lean startup movement has produced so many great companies (many of them only small successes, but still profitable) from $10K or less that it just doesn't make sense, especially with the horror that is today's debt market, to go too far down that route. If you can get a low-cost loan from a family member or a startup grant through a government-backed, low interest program, sure, but credit card debt (which is where I started) is really not an option anymore.
How were you able to maintain presence and generally seem so happy publicly when you first got started, even with the stress of that debt?
To be honest, I really just didn't think about it much. If you have $30K in debt, you're constantly thinking about how to pay it off month by month and day by day. When you're $450K in debt with collectors coming after you and your wife paying the rent, you think about how to make a success big enough to pay it all off or declare bankruptcy – might as well go with the former until life runs you into the latter. There's just not much else to do.
As Bob Dylan says – "when you got nothing, you got nothing to lose."
Many people new to the field are afraid to speak publicly, but you were fairly well received right off the start. What prepared you for speaking & what are keys to making a good presentation?
Oh man – I sucked pretty hard my first few presentations. I think everyone does. The only reason I was well received, at least in my opinion, is because I'd already built a following on the web and had a positive reputation that carried over from that. The only thing that really prepared me for big presentations (things like the talk to Google's webspam/search quality team or keynotes at conferences) was lots and lots of experience and for that I'll always be grateful to Danny Sullivan for giving me a shot.
I'd say to others – start small, get as many gigs as you can, use video to help (if you're great on camera, you'll be good in front of a live audience) and try to emulate speakers and presentations you've loved.
When large companies violate Google's guidelines repeatedly usually nothing happens. To cite a random example…I don't know…hmm Mahalo. And yet smaller companies when outed often get crushed due to Google's huge marketshare. Because of the delta between those 2 responses, I believe that outing smaller businesses is generally bogus because it strips freedoms away from individuals while promoting large corporations that foist ugly externalities onto society. Do you disagree with any of that?
I think I agree with nearly all of that statement, though I'd still say it's no more "bogus" to out small spammers than it is to spam. I would agree it's not cool that Google applies its standards unfairly, but it's hard to imagine a world where they didn't. If mikeyspaydayloans.info isn't in Google's index, no ones thinks worse of Google. If Disney.com isn't in Google (even if they bought every link in the blogosphere), searchers are going to lose faith and switch engines. The sensible response from any player in such an environment is to only violate guidelines if you're big enough to get away with it or diversified enough to not care.
I'm unhappy with how Google treats these issues, but I'm equally unhappy with how spam distorts the perception of the SEO field. Barely a day goes by without a thought leader in the technology field maligning our industry – and 9 times out of 10 that's because of the "small" spammers. If we protect them by saying SEOs shouldn't "out" on another, we bolster that terrible impression. I don't think most web spam should even have the distinction of being classified as "SEO" and I don't think any SEO professionals who want our field to be taken seriously by marketing and engineering departments should protect those who foist their ugly externalities onto us.
I know we disagree on this, but it's always an interesting discussion
One of the most remarkable things about the SEO industry is the gap in earnings potential between practicing it (as a publisher) and teaching it / consulting. Why do you think such a large gap exists today?
Teaching has always been an altruist's pursuit. Look at teachers in nearly every other field – they earn dramatically less than their production/publishing oriented peers. Those who teach computer science never earn what computer scientists who work at Google or Microsoft make. Those who teach math are far less well compensated than their compatriots working as "qaunts" on Wall Street. It's a sad reality, but it's why I have so much respect for people like Market Motive, Third Door Media and Online Marketing Connect, who are trying to both teach and build profitable businesses. I love the alignment of noble pursuits with profitable ones.
You guys exited the consulting area in spite of being able to charge top rates due to brand recognition. Do you think lots of consultants will follow suit and move into other areas? How do you see SEO business models evolving over the next 3 to 5 years?
I don't think so – our consulting business was going very well and I've heard and seen a lot of growth from my friends who run SEO consulting firms. The margins and exit price valuations wouldn't have made sense for VCs, but I don't think it was a bad business at all and others are clearly doing remarkable things. Just look at iCrossing's recent sale to Hearst for $325million. You can build an amazing company with consulting – it's just not the route we took.
In regards to the evolution of the SEO business model, I'd say we're likely to see more sophistication, more automation, more scalability (and hopefully, more software to help with those) over the next few years from both in-house SEOs and external agencies/consultants. It's sometimes surprising to me how little SEO consulting has progressed from 2002 vs. things like email marketing or analytics, where software has become standard and tons of great companies compete (well, Google's actually made competition a bit more challenging in the analytics space, but creative companies like KissMetrics and Unbounce are still doing cool, interesting things).
Small businesses in many ways seem like the most under-served market, but also the hardest to serve (since they have limited time AND small budgets). Do you think the rise of maps & other verticals gives them a big opportunity, or is it just more layers of complexity they need to learn?
Probably more the former than the latter. The small business owners I know and interact with in my area (and wherever I seem to visit) are only barely getting savvy to the web as a major driver of revenue. I think it might take another 10 years or more before we see true maturity and savvy from local businesses. Of course, that gives a huge competitive advantage to those who are willing to invest the time and resources into doing it right, but it means a less "complete" map of the local world in the online one, which as a consumer (or a search engine) is less than ideal.
When does the delta between paid search & SEO investment begin to shrink (if ever)?
I think it's probably shrinking right now. Paid search is so heavily invested in that I think it's fair to call it a mature market (at least in global web search, though, re: your previous question, probably not in local). SEO is ramping up with a higher CAGR (Compound Annual Growth Rate) according to Forrester, so that delta should be shrinking.

via Forrester Research's Interactive Marketing Forecast 2009-2014
Often times a Google policy sounds like something coming out of a conflicted government economist's mouth. But even Google has invested in an affiliate network which suggests controlling your HTML links based on payment. How much further do you think Google can grow before they collapse under complexity or draw enough regulatory attention to be forced to change?
I think if they tread carefully and invest heavily in political donations and public relations, they can likely maintain another very positive 5-10 years. What the web looks like at that time is anyone's guess, and the unpredictable nature and wild shifts probably help them avoid most regulation. Certainly the rise of Facebook has been a boon to their risk exposure from government intervention, even if they may not be entirely happy with their inability to compete in the social web.
I remember you once posted about getting lots of traffic from Facebook & Twitter, but almost 0 sales from it. Does there become a point where search is not the center of the web (in terms of monetization), or are most of these networks sorta only worthwhile from a branding perspective?
As direct traffic portals, it's hard to imagine a Facebook/Twitter user being as engaged in the buying/researching process as a Google searcher. Those companies may launch products that compete with Google's model or intent, but as they exist today, I don't foresee them being a direct sales channel. They're great for traffic, branding, recognition and ad-revenue model sites, but they're of little threat to marketers concerned with the relevance or value of search disappearing.
What are the major differences between LDA & LSI?
They’re both methodologies for building a vector space model of terms/phrases and measuring the distance between them as a way to find more “relevant” content. My understanding is that LSI, which was first developed in 1988, has lots of scaling issues. It’s cousin, PLSI (probabilistic LSI) attempted to address some of those when it came out in 1999, but still has scaling problems (the Internet is really big!) and often will bias to more complex solutions when a basic one is the right choice.

LDA (Latent Dirichlet Allocation), which started in 2002, is a more scalable (though still imperfect) system with the same intuition and goals – it attempts to mathematically show distances between concepts and words. All of the major search engines have lots of employees who’ve studied this in university and many folks at Google have written papers and publications on LDA. Our understanding is that it’s almost universally preferred to LSI/PLSI as a methodology for vector space models, but it’s also very likely that Google’s gone above and beyond this work, perhaps substantially.
The “brand” update was subsequently described as being due to looking at search query chains. In a Wired article Amit Singhal also highlighted how Google looks for entities in their bi-gram breakage process & how search query sequences often help them figure out such relationships. How were you guys able to build a similar database without access to the search sessions, or were you able to purchase search data?
In a vector space model for a search function, the distances and datasets leverage the corpus rather than query logs. Essentially, with LDA (or LSI or even TF*IDF), you want to be able to calculate relevance before you ever serve up your first search query. Our LDA work and the LDA tool in labs today use a corpus of about 8 million documents (from Wikipedia). Google’s would almost certainly use their web index (or portions of it).
It’s certainly possible that query data is also leveraged for a similar purpose (though due to how people search – with short terms and phrases rather than long, connected groups of words – it’s probably in a different way). This might even be something that helps extend their competitive advantage (given their domination of market share).
Sometimes one can see Google’s ontology change over time (based on sharp ranking increases and drops for outlier pages which target related keywords but not the core keyword, or when search results for 2 similar keywords keep bouncing between showing the exact same results to showing vastly different results). How do you guys account for these sorts of changes?
Thus far, we haven’t been changing the model – it just launched last week. However, one nice thing we get to do consistently is to run our models against Google’s search results. Thus, if Google does change, our scores (and eventually, the recommendations we hope to make) should change as well. This is the nice part about not having to “beat” Google in relevance (as a competing search engine might want to do) but simply to determine where Google’s at today.
For a long time one of the thing I have loathed most in the SEO space was clunky all-in-one desktop tools that often misguide you into trying to change your keyword density on the word “the” and other such idiocy. Part of the reason we have spent thousands of Dollars offering free Firefox extensions was my disgust toward a lot of those all-in-one tools. A lot of the best SEOs tend to prefer a roll-your-own mix and match approach to SEO. Recently you launched a web application which aims to sorta do all-in-one. What were the key things you felt you had to get right with it to make it better than the desktop software so many loathe?
I think our impetus for building the web app was taken from the way software has evolved in nearly every other web marketing vertical. In online surveys, you had one-time, self built systems and folks like Wufoo and SurveyMonkey have done a great job making that a consolidated, simple, powerful software experience. That goes for lots of others like:
You’re likely spot-on in thinking that power players will continue to mash up and hack their own solutions, build their own tools and protect their secret processes to make them more exclusive in the market and (hopefully) competitive. But, these folks are on the far edge of the bell curve. In every one of the industries above (and many others), it looks like the way to build a scalable software product that many, many people adopt, use and love is to optimize of the middle to upper-end of the bell curve (what we’d probably call “intermediate” to “advanced” SEOs, rather than the outlier experts).
When you gather ranking data do you use APIs to do so? If not, how hard was it been on the technical front scaling up to that level of data extraction?
Some data we can get through APIs, but most isn’t available in that fashion, so relatively robust networks are required to effectively get the information. Luckily, we’ve got a pretty terrific team of engineers and a VP of Engineering who’s done data extraction work previously for Amazon, Microsoft and others. I’d certainly say that it ranks in the top 10 technical challenges we’ve faced, but probably not the top 3.
What do you gain by doing the all-in-one approach that a roll your own type misses out on?
Convenience, consistency, UI/UX, user-friendliness and scalability are all big gains. However, the compromise is that you may lose some of that “secret-sauce” feeling and the power that comes from handling any weird situation or result in a hands-on, one-to-one fashion. Plenty of folks using our web app have already pointed out edge-case scenarios where we’re probably not taking the ideal approach, and those kinks will take time to be ironed out.
Some firms use predictive analytics to automatically change page titles & other attributes on the fly. Do you see much risk to that approach? Do you eventually see SEO companies offering CMS tools as part of their packages to lock in customers, while integrating the SEO process at a much deeper level?
When we were out pitching to take venture capital last summer, a lot of VCs felt that this was the way to go and that we should have products on this front.
Personally, I don’t like it, and I’d be surprised if it worked. Here’s why:
There are cases I could see where something like this would be pretty awesome, though – e.g. a 404 detector that automatically 301s pages it sees earning real links back to the page it thinks was the most likely intended target.
On your blog recently there was a big fuss after you changed your domain authority modeling scores. Were you surprised by that backlask? What caused such a drastic change to your scores?
We were surprised only until we realized that somehow, our internal testing missed some pretty obvious boneheaded scores.
Basically, we calculate DA and PA using machine learning models. When those models find better “correlated” results, we put them in the system and build new scores. Unfortunately, in the late August release, the models had much better average correlation but some really terrifically bad outliers (lots of junky single-page keyword-match domains got DAs of 100 for example).
We just rolled out updated scores (far ahead of our expected schedule – we thought it would take weeks), and they look much better. We’re always open to feedback, though!
When I got into SEO (and for the first couple years) it seemed like you could analyze a person’s top backlinks and then literally just go out and duplicate most of them fairly easily. Since then people have become more aware of SEO, Google has cracked down on paid links, etc. etc. etc. Based on that, a lot of my approach to SEO has moved away from analysis and more toward just trying to do creative marketing & hope some % of it sticks. Do you view data as being a bit of a sacred cow, or more of just a rough starting point to build from? How has your perception as to the value of data & approach to SEO changed over time?
I think your approach is almost exactly the same as mine. The data about links, on-page, social stats, topic models, etc. is great for the analysis process, but it’s much harder to simply say “OK, I’ll just do what they did and then get one more link,” than it was when we started out.
That analysis and ongoing metrics tracking is still super-valuable, IMO, because it helps define the distance between you and the leaders and gives critical insight into making the right strategic/tactical decisions. It’s also great to determine whether you’re making progress or not. But, yes, I’d agree that it’s nowhere near as cut-and-dried as it once was.
The frustrating part for us at SEOmoz is we feel like we’re only now producing/providing enough data to be good at these. I wish that 6-7 years ago, we’d been able to do it (of course, it would have cost a lot more back then, and the market probably wasn’t mature enough to support our current business model).
How much time do you suggest people should spend analyzing data vs implementing strategies? What are some of the biggest & easiest wins often found in the data?
I think that’s actually the big win with the web app (or with competitive software products like Raven, Conductor, Brightedge, etc). You can spend a lot less time on the collection/analysis of data and a lot more on taking the problems/opportunities identified and doing the real work of solving those issues.
Big wins in our new web app for me have been ID’ing pages through the weekly crawl that need obvious fixing (404s and 500s are included, like Google Webmaster Tools, but so are 20+ other data points they don’t show like 302s, incorrect rel canonicals, etc.)
Blekko has got a lot of good press by sharing their ranking models & link data. Their biggest downside so far in their beta is the limited size of their index, which is perhaps due to a cost benefit analysis & they will expand their index size before they publicly launch. In some areas of the web Google crawls & indexes more than I would expect, while not going to deeply into others. Do you try to track Google’s crawls in any way? How do you manage your crawl to try to get the deep stuff Google has while not getting the deep stuff that Google doesn’t have?
Yeah – we definitely map our crawls against Google, Bing and Majestic on a semi-regular basis. I can give you a general sense of we see ourselves performing against these:

the problem with maintaining old URLs became more clear when we analyzed decay on the WWW
In terms of reaching the deep corners of the web, we’ve generally found that limiting spam and “thin” content is the big problem at those ends of the spectrum. Just as email traffic is estimated to be 90%+ spam, it’s quite possible that the web, if every page were truly crawled and included, would have similar proportions. Our big steps to help this are using metrics like mozTrust, mozRank and some of our PA/DA work to help guide the crawl. As we scale up index size (probably December/January of this year), that will likely become a bigger challenge.
—
Thanks Rand. You can read his latest thoughts on the SEOmoz blog and follow him on Twitter at @randfish.