I love a challenge, especially when it comes to learning new skills. Today I’m going to show you an incredibly simple way to get your SEOmoz data straight into Excel – hopefully, my post will inspire some of you hardcore Excel junkies to go and build some new toys. What’s not to like?
Do I need to learn to code?
Not for this post. All you’ll need to do is follow the instructions, think about what you’re reading and be willing to try something new.
What do I need to know?
Well, you need to be reasonably familiar with Microsoft Excel, not advanced, just familiar. You’ll need a basic primer on simple regex (this post by Rob will do nicely) and a reasonable sense of how an API call works – check out Ben’s introduction to the SEOmoz API here. Finally, you’ll need Niel’s awesome SEO Tools for Excel.
Basically, if you can form a URI and an Excel query, we’re good to go! Here’s how:
Form a valid API call
To get a response from the SEOmoz api, you just need to form a URL. For testing purposes, forming the following URL and pasting into your browser would get a response:
http://[memberid]:[secret-key]@lsapi.seomoz.com/linkscape/url-metrics/[domain]
Depending on the type of API access you have, you’ll get a lot of json back in your browser, like this:

Using bit flags, we can get to the data we actually want, say, links to the root domain if you have the full site intelligence API (like we do, yeah!) or juice passing links if you’re rocking the free api. Either way, it’s getting the data we care about, so check this out:

?Cols=32 is the bit flag for the external links API call. The insanely clever thing about bit flags is (are?) that, if you add the number of another bit flag to 32, let’s say, http status code (536870912) to make 536870944, you get this:

Anyway, we’re here to get API data into Excel. To get the job done, we’re going to use SEO Tools for Excel, and specifically the following functions:
=BuildHttpDownloaderConfig()
=DownloadString()
=RegexpFind()
You’ll need to understand =Concatenate, and understand that “\d+” in regex means “a character in the range 0-9), 1 or more times”.
Form a valid http request
If you attempt to request our API URL without authentification, you’ll get a 400 response, and obviously no data. For ages, I tried using the member ID and secret key in the request URI with no luck. Thankfully my buddy Neils taught me the =BuildHttpDownloaderConfig() function.
The function is designed to control HTTP requests made by SEO Tools functions, In our case, we need to use the function to authenticate with a member ID and secret key. So, the query looks like this:
=BuildHttpDownloaderConfig(FALSE,,,,,,[member-id],[secret-key])
BuildHttpDownloader’s output is in XML, which is used to configure the function we’re using to fetch the data, =DownloadString()
In its simplest form, the correct syntax to fetch the API output is now:
=DownloadString([API-request],[BuildHttpDownloader-config])
To make this work, create a cell for the api call URI (“A1”), and a cell for your BuildHttpDownloader output (“A2”). That would make your DownloadString function look like this:
=DownloadString(A1,A2)
And here’s what that looks like in Excel:

How good is that?! That, my friends is API output straight into Excel!
The last bit – how to extract that number
This is where the regex comes in. If you’re an actual coder, this is where you’ll tempted to mention JSON parsers and all that stuff. A JSON parser makes it easy for developers to fetch the data they need from a JSON output. There is actually a JSON parsing function in SEO Tools, so it’s perfectly possible to head down that route, and even write the results to an array. For the purpose of learning, I really wanted to extract my precious number as quickly and simply as possible. And anyway, if I give it up in one blog post, what is there left for you to work out?
So, assuming the fruits of your labour have been written to cell A3, here’s all you need to do:
=RegexpFind(A3,"\d+")
And just like that, you’ll get a nice clean result for which you can add to a table, combine with data from your other tools and generally have an awesome time dreaming up ways to cook your menu of new data skills.
Want to download the spreadsheet? Here’s a simple version: [download] – don’t forget you’ll need to install SEO Tools for Excel, too.
Image credit: DannyMCL
Update!
Danny Goodwin has created a better way to do this with the newer version of SEO Tools – check out his code here: (Very nice Dan!)
Fun With The SEOmoz API – Get Links Data Straight into Excel is one of our latest posts from: SEOgadget.co.uk.
Our Excel Extension for Google Adwords relies on a framework called ExcelDNA – a .NET Framework runtime that allows developers a great deal of freedom to develop add-ins using either C# or VB.NET. While I was checking out the web site for ExcelDNA I came across an incredible tool called SEOtools created by Neils Bosma.
SEOtools is a must have for Excel enabled SEOs

Here’s a screenshot of Excel extracting the H1 elements found on the URLs listed on the left hand side. That’s an incredibly efficient way to extract your competitors keywords. The platform does a lot – and I’m hopeful it may be extended to provide even more features in the future.
Functions
SEOtools functions are highly self explanatory and incredibly reliable – even with a lot of data. Here’s a run through of the best ones (refer to the guide page for the full list.)
=GooglePageRank([URL]) - Retrieves the Google PageRank by URL
=GoogleIndexCount([URL]) - Returns an approximation of the number of pages indexed by Google by domain (the equivalent to “site:yourdomain.com”)
=WhoIs(string domain) - Retrieves WhoIs for a domain
=IsDomainRegistered([DOMAIN]) - Returns “true” if a domain is registered. WhoIs servers are configured in SeoTools.config.xml
=FacebookLikes([URL]) – Retrieves the the number of total Facebook Likes of an url (same count as in a badge)
=AlexaReach([URL]) – Retrieves Alexa Traffic Reach Rank
=ResolveIp([URL]) – Resolves the IP address of the domain in an URL
=HttpStatus([URL]) – Returns the HTTP status code and its description. Also retrieves the Location header (useful for debugging redirects)
=ResponseTime([URL]) – Returns the number of milliseconds it takes for an url to load (cached)
=LinkCount([URL]) – Returns the number of links on a page (cached)
=HtmlTitle([URL]) – Returns the HTML title on a page (cached)
=HtmlMetaDescription([URL]) – Returns the HTML meta description on a page (cached)
=HtmlH1([URL]) – Returns the first HTML H1 element on a page (cached)
=HtmlH2([URL]) – Returns the first HTML H2 element on a page (cached)
=HtmlH3([URL]) – Returns the first HTML H3 element on a page (cached)
=XPathOnUrl([URL]; [XPATH]) – Fetches the url and returns the result from xpath expression (cached).
EG: =XPathOnUrl("http://google.com";"/html/head/title")
On XPath – a great little tool to help you construct XPaths for search engine results pages is the brilliant XPathBuilder which has been carefully constructed to help Google Docs hacks users form XPath to extract data from search engines results. You can take the XPath generated by that tool and use it in this query too.
Download this tool now
Download the spreadsheet extension here: http://nielsbosma.se/projects/seotools/ - SEO with Excel got even easier today. Thanks Niels, and if you’re listening, we’d love to see a few things added:
- Tweets to URL
- Facebook Shares
- Google +1′s
- Ranking for keyword in Bing and Google
Image credits:
Mikko Miettinen
Amazing SEO Tools for Excel: SEOtools By Neils Bosma is one of our latest posts from: SEOgadget.co.uk.
Capture search volumes and seasonality data straight into Excel
Yes, you heard that right. Introducing the SEOgadget Adwords API Extension for Microsoft Excel – easily capture large volumes of search query data directly into Excel using your Google Adwords API key. Save huge amounts of time gathering data and spend more time analysing!


Wow! Tell me about the features
- Exact, Broad and Phrase Match
- Generate local search volumes by country code
- Rapidly extract search volumes for 1000′s of keywords
- Acquire Mobile and Desktop Search Volume
- Generate 1000′s of related queries using suggest service
What are the standard functions?
The SEOgadget Adwords Excel plugin creates a series of functions designed to fetch data from the Google Adwords API. Once you’ve installed the plugin, you’ll get a handful of new functions to make fetching keyword data a far more effective and efficient process. Here they are:
getAdWordAvg()
getAdWordAvg(keyword,"[MATCHTYPE]","[COUNTRYCODE]","DEVICE")
Example: =getAdWordAvg(A1,"EXACT","GB","WEB")
Description: returns average search volume from the adwords API. Matchtype accepts broad, exact and phrase match. Country codes can be found in the Adwords documentation and devices can be mobile or web.
getAdWordStats()
getAdWordStats(keyword,"[MATCHTYPE]","[COUNTRYCODE]","DEVICE")
Example: =getAdWordStats(A1,"EXACT","GB","WEB")
Description: returns local search volume and previous 12 months separated by commas
getAdWordIdeas()
getAdWordIdeas(keyword,"[MATCHTYPE]","[COUNTRYCODE]","DEVICE",[NoOfResults])
Example: =getAdWordIdeas(A1,"EXACT","GB","WEB",100)
Description: returns keyword suggestions based on API suggest service
There are Array functions too?
Yes, there are! Use these with a little caution as they can request a lot of data (800 rows per API call) – I suggest you set Excel to manual calculation. Then press F9 (or click Calculate Now) to recalculate the workbook when ready. You’ll find the calculation options under the formulas tab (and also under Excel Options which also offers an option to stop auto-calculation on exit).
If you don’t disable auto-calculate, no problem – just use the array formula to capture data, then copy and paste values to avoid accidentally re-requesting very long lists of keywords.
arrayGetAdWordStats()
arrayGetAdWordStats([TABLE],"[MATCHTYPE]","[COUNTRYCODE]","DEVICE")
Example: =arrayGetAdWordStats(myKWlist,"EXACT","US","MOBILE")
Description: array formula (auto adds {} brackets) will return data from a list of keywords in a table (average search volume and seasonal data).
arrayGetAdWordIdeas()
arrayGetAdWordIdeas([TABLE],"[MATCHTYPE]","[COUNTRYCODE]","DEVICE",[NoOfResults])
Example: =arrayGetAdWordIdeas(Table1,"BROAD","US","WEB",20)
Description: array formula (auto adds {} brackets) will return suggstions from a list of keywords in a table (average search volume and seasonal data).
Performance & requirements
The SEOgadget Excel plugin works with Excel 2003, 2007 and 2010 in 32 and 64 bit. If you’re going to be doing big bits of research using arrays, you’ll need quite a powerful computer (a good excuse to ask for a faster machine with more memory!). Under load testing, my i7 Windows 7 machine took around 3 to 4 minutes to process 30,000 keywords. You probably don’t want to use that much data every day, though.
Obviously, you’ll need a Google Adwords API key to get search volume data using this tool.
How to install
1) Download the zip file, extract the “SEOgadget” folder and open “distribution/setup.xls” – enable the macros and follow these step by step instructions. When you open the setup file you’ll see this screen:
![]()
*Don’t forget to enable macros!*

2) Add your API credentials and click “Update Credentials”

3) Click the “Install API Add-in” button and wait for this dialogue:

4) Restart Excel and start capturing your search data!
Now for the technical bit
SEOgadget hired Master Datasmith Tom Gleeson to develop this plugin – (here he is on Twitter). I highly recommend Tom (he’s our Excel developer of choice). If anyone knows how to automate the hell out of Excel, it’s Tom. Here’s his technical bit.
The add-is was developed using a marvelous open-source tool called ExcelDNA. This allows for the fast (and cheap) development of .NET add-ins for Excel. Microsoft would prefer if everybody used their Visual Studio (not the free express version, the expensive one) based VSTO technology to build such add-ins; don’t, ExcelDNA is way better and easier to use. ExcelDNA is built around a C core (making it fast and robust) and allows for the development of add-ins using either C# or VB.NET.
The biggest downside with ExcelDNA is the lack of documentation, but the user forum is very active and top notch. For a good introduction see Ross Mclean’s series of blog posts on the subject.
So why not use plain old VBA to build the add-in? The answer to that lies with the other piece of technology used i.e. Google’s v201101 Adwords API. Most APIs these days use either JSON or simple XML formats to pass data and requests back and forth between servers and clients. In the early days of Web Services it was envisaged that a highly structured (and extremely complex) variation of XML, namely SOAP, would be used for this task. Thankfully common sense prevailed and SOAP is now very much a minority sport.
Unfortunately, the Adwords API developers went with SOAP, but, made up for it by providing developers with a set of excellent client libraries for all the major development platforms, such as the JVM, CPython and .NET but alas, not for VBA.
This .NET client library also accounts for the choice of the C# , rather than VB.NET, as the add-in’s development language, as all the examples (like most .NET libraries) were coded in the language, using C# makes its easier to get up to speed.
(The ability to package the resulting code in a single DLL (a .xll rather than VBA’s .xla) is also another good reason for using ExcelDNA if VBA is not an option.)
The add-in is available for three platform types:
- Excel 2007/2010 32 bit (requires NET 2.0 run time, which is now to be found on all but the most ancient of machines).
- Excel 2003 (should also support back to Excel 2000, and possibly ‘97, but not tested). This too requires NET 2.0. It lacks however, the very useful helper “array” functions that auto-resize arrays to fit the returned tables and the ability to run in multi-threaded mode.
- Excel 2010 64 bit. This requires not just a 64bit OS, but a 64 bit installation of Excel (the default for Excel is to install the 32 bit version even on 65 bit machines, if installed using the supplied SETUP.xls it will check this for you). The 64bit version requires the NET 4.0 run time, which is very likely to be already installed if using a modern Windows 7 64 bit OS. But if not or you’re not sure, then power up Internet Explorer and visit http://www.hanselman.com/smallestdotnet/ the site will check what versions of .NET are installed and suggest the most suitable download.
I want to play! Where can I download this plugin?
This plugin is completely free to download and use. As it’s a free tool, we may only be able to provide very limited support and by downloading and installing the plugin, you agree that the tool is not provided under any warranty, and that the software author and SEOgadget are excluded from any liability to the users of this software.
DOWNLOAD HERE
Google Adwords API Extension for Excel is one of our latest posts from: SEOgadget.co.uk.
Categorising Your Links with Excel and Open Site Explorer
Having an understanding of the link profile data associated with a website has always been key to understanding the likely competitiveness of the site in organic search. With some luck, this post will help you discover a way to determine the types of links in yours, or your clients back link profiles.
Link data deep diving
For me, there are a few key metrics at play – linking C blocks, quality metrics such as page authority, domain authority and of course, anchor text. There are so many tools available that I probably barely need to mention the fact that Open Site Explorer, Majestic SEO are as critical to link analysis as oxygen is to breathing.
Because of such tools, it’s far easier to understand how many links you have and their overall quality, it’s still pretty tough to pin down what type of links you have. I’m talking about directories, “quality” article websites, forums, “academic” links and so on.
Today I’m going to share a snippet of my link assessment methodology – a method to help you determine what type of links might be lurking in the back link profile of your latest client acquisition.
Background reading
Before you get started, you’ll need some background reading. I’ve already covered the Excel skills you’ll need to assimilate over on SEOmoz, so if you missed out, skip over to “Keyword Research – Using Categories to Make Your Process More Actionable“, have a play around with arrays and the categorisation query and head back here for the rest of this post.
How to categorise your back links by link type
What type of links do you have? What category of websites might be out there, linking to you? You can’t categorise them all, but, you can have fun trying to get a rough idea. Check out this beauty – a simple dashboard counting the types of links to an example website I created shortly after the first US Panda update.
If your site has a massive directory submission footprint, or has a very large number of links from article sites such as ezinearticles.com, you’ll be able to see what’s what in this handy dashboard view.
How to get started
To get started, you’ll need to download the file at the bottom of this blog post, and of course, fetch the “linking pages” data from your favourite website via Open Site Explorer:

Then, simply open the file and paste your data in the top right hand cell of the “OSE Data” tab:

Finally, head to the “DashBoard” tab, and select: “Data > Refresh”

As soon as you click refresh, the pivot tables will all update and you’ll get a gorgeous, infographic style report on your back link profile.
How do I drill down on these links?
Being able to “see”, visually, the make up of your back link profile is great to a point – but it’s nice to be able to drill down and get to the data. In the “DashBoard” pivot table view, simply double click a value (see right: “Directory”). The double click action will open a new tab with only the directory links identified.
How does this work (roughly) and how can I improve it?
This particular version of the spreadsheet is ready to be extended to meet whatever purpose you see fit. Essentially, the formula in the spreadsheet checks each row in the “URL” column in the “OSE Data” to see if it matches with any of the domains in the “Domains” tab. To see the domains that have been included, right mouse click the tabs and unhide the “Domains” sheet.


The “Panda Winners” data is calculated from the original Sistrix data set on the topic, while the free directory list is available on the Directory Maximiser website. It’s easy to update the lists or rework them as you see fit. For example, you may wish to extend the directories based on your own data, or include blog links you’ve built to look at crossover between your own, and other’s link building strategies.
Anchor Text Distribution
Scroll down to the bottom of the dashboard to see your top 10 anchor text term distribution. As a side note, we have a Linkscape powered anchor text tool here – give it a whirl!

Ok, this is awesome – where can I download this spreadsheet?
Download the spreadsheet here – if you find it useful, or have suggestions, modifications and improvements, feel free to add your comments and downloads right here. The best files will get listed on this blog post with a link back to your site. Happy link data deep diving!
Here’s one I made earlier
I couldn’t resist one last demo – click the image for the full sized version
Image credits:
spacepleb
Categorising Your Links with Excel and Open Site Explorer is one of our latest posts from: SEOgadget.co.uk.
Originally posted on SEOmoz, this post still has legs, having been recently referred to in a number of Excel related posts including Mike’s Excel Guide to SEO. I thought it would be great to revisit the post, give it a bit of a rewrite and share it with our readers.
Despite what some say, tracking rankings for your target terms is an important part of an organic search marketing campaign. On occasion, SEO people who follow specific niches tend to describe ranking fluctuations with smart looking charts describing ranking position, by keyword over time. If you’d like to be able to build those charts quickly and easily, you’ll need this step by step guide. By the end of the guide you’ll have a rankings chart that looks just like this:

Step 1: Collect your data
To be able to produce a chart like the example above, you’ll need Microsoft Excel, and a rankings checker that will export rankings data, by search engine and by date. For tracking rankings over time, I’m using Advanced Web Ranking.
Start by putting your data in an Excel table named “rankings” just like this:

Create a pivot chart
Pivot tables were designed for exactly this type of application, and making them is heaps of fun. Let’s start by selecting “Insert > PivotTable >PivotChart” in the options along the top of your Excel ribbon.
You should see a window appear like the one below. Make sure you’ve named the correct range (our table name: “rankings”) and select “New Worksheet“, followed by OK.

Drag and drop your legend, axis and value fields
The cool thing about making a pivot table is the drag and drop functionality when you’re creating the row labels and values for the table. Here’s the visual explanation of where to put your keyword, date and position data:

Now, filter for the keywords you’d like to create a chart for. It’s quite inpractical to create a chart with hundreds of keywords, but you can add a good number for comparision purposes. Head to the “Column labels” drop down and filter for the keywords you’d like to build the chart for:

Filter by search engine
If you’ve collected data on multiple search engines, you’ll need to add a filter. Drag the “Search Engine” field down into the “Report Filter” section, and select the search engine you’re interested in using the drop down at the top of your pivot table.
Format your chart nicely
If you’ve followed the instrutions so far, you’ll see a slightly noisy and weird looking bar chart, so next we’ll create a line chart to show the positional changes over time.
For pure charting awesomeness, a simple right mouse click on the chart, followed by “Change chart type > Line“, will do the trick. Finally, you’ll need to reverse your Y axis, leaving position 1 at the top and your lower rankings at the bottom. Using your right mouse button, click on the axis and select “Format axis” – you should see a window like this:

The end result
After spending some time having fun with formatting, you can create really nice charts. Here’s mine:

Hope you find this approach useful – it’s certianly handy for visualising rankings over time. Enjoy!
[twitter-follow username="richardbaxter" scheme="light" count="yes"]
Image credits:
Alamodestuff
Visualising Search Engine Rankings by Keyword in Excel is one of our latest posts from: SEOgadget.co.uk.
Originally posted on SEOmoz, this post still has legs, having been recently referred to in a number of Excel related posts including Mike’s Excel Guide to SEO. I thought it would be great to revisit the post, give it a bit of a rewrite and share it with our readers.

Image credit: Alamodestuff
Despite what some say, tracking rankings for your target terms is an important part of an organic search marketing campaign. On occasion, SEO people who follow specific niches tend to describe ranking fluctuations with smart looking charts describing ranking position, by keyword over time. If you’d like to be able to build those charts quickly and easily, you’ll need this step by step guide. By the end of the guide you’ll have a rankings chart that looks just like this:

Step 1: Collect your data
To be able to produce a chart like the example above, you’ll need Microsoft Excel, and a rankings checker that will export rankings data, by search engine and by date. For tracking rankings over time, I’m using Advanced Web Ranking.
Start by putting your data in an Excel table named “rankings” just like this:

Create a pivot chart
Pivot tables were designed for exactly this type of application, and making them is heaps of fun. Let’s start by selecting “Insert > PivotTable >PivotChart” in the options along the top of your Excel ribbon.
You should see a window appear like the one below. Make sure you’ve named the correct range (our table name: “rankings”) and select “New Worksheet“, followed by OK.

Drag and drop your legend, axis and value fields
The cool thing about making a pivot table is the drag and drop functionality when you’re creating the row labels and values for the table. Here’s the visual explanation of where to put your keyword, date and position data:

Now, filter for the keywords you’d like to create a chart for. It’s quite inpractical to create a chart with hundreds of keywords, but you can add a good number for comparision purposes. Head to the “Column labels” drop down and filter for the keywords you’d like to build the chart for:

Filter by search engine
If you’ve collected data on multiple search engines, you’ll need to add a filter. Drag the “Search Engine” field down into the “Report Filter” section, and select the search engine you’re interested in using the drop down at the top of your pivot table.
Format your chart nicely
If you’ve followed the instrutions so far, you’ll see a slightly noisy and weird looking bar chart, so next we’ll create a line chart to show the positional changes over time.
For pure charting awesomeness, a simple right mouse click on the chart, followed by “Change chart type > Line“, will do the trick. Finally, you’ll need to reverse your Y axis, leaving position 1 at the top and your lower rankings at the bottom. Using your right mouse button, click on the axis and select “Format axis” – you should see a window like this:

The end result
After spending some time having fun with formatting, you can create really nice charts. Here’s mine:

Hope you find this approach useful – it’s certianly handy for visualising rankings over time. Enjoy!
Visualising Search Engine Rankings by Keyword in Excel is one of our latest posts from: SEOgadget.co.uk. Going to this year's SMX Advanced London? Use this discount code - SEOGADGET011 to get 15% off your entrance fee!
Page Level Search Engine Indexation [Data & Collection Methodology]
Our first post of the New Year looks at the topic of measuring SEO indexation. Inspired by Rob’s post on Distilled, and a wish to revisit some of my previous work on the topic, I thought it might be interesting to share a method of collecting data to build a clearer understanding of page level indexation.

Image credit: Justin De La Ornellas
Hopefully by the end of this post you’ll have a few new methods to collect site index data for your own SEO studies.
Why do SEO’s need an understanding of the principles of indexation?
How hard is your website working for you? Which pages and content groups yield the most benefit, traffic wise? Are there any weak spots, groups of pages that don’t seem to be working well? How can you make changes to navigation, architecture and sometimes page layout to improve a website’s overall search engine visibility or long tail traffic performance? These are questions that should occur to an SEO on a regular basis – but coming to a reliable answer is not always straightforward.
Seeking answers to indexation and site architecture related questions is a worthy cause, but achieving a meaningful answer is a significant hurdle to overcome. All of the (excellent) resources on this topic tend to approach indexation from the perspective of analytics data, or content grouped together inside sitemaps. What about individual pages, though? I use the term page level indexation, because I’m seeking a granular, page level answer to my indexation questions.
What data sources can tell us how a website has been indexed by Google?
For me, there are a number of approximate indicators that a page (or page group) is indexed – for example, reporting on pages that receive at least one entry from Google via Analytics. You might wish to take a look at the “URLs in Web Index” report in the sitemaps section of Webmaster Tools. Savvy webmasters and SEO’s may even use multiple sitemaps to get clearer page group level insight.
Logfiles will tell you a lot about GoogleBot visits, but not indexation, so where else can we look for inspiration?

Expressing the number of pages included in a web index as a percentage, but exactly which URLs are included?
Collecting indexed pages data using the “cache” query
As a method to compliment your existing approach, you might find this methodology quite interesting. The outcome will be a page by page URL list for your site, where Google cache data, SEOmoz custom crawl and XENU data will give you a cracking starting point for you to diagnose your indexation problems. These steps involve having a Mozenda account, although you could do the same (or similar) by building your own crawler or using 80Legs.
Collect Google cache data via a proxy
Fundamentally we’re going to be executing a series of Google cache queries via a proxy. With Mozenda you have to have a method of distributing queries to Google via a proxy. even then it gets unreliable quickly if you overcook the requests. If you use a simple PHP proxy and go very, very slowly, you’ll probably be alright.

Get a URL list
For this, you’ll need a list of all of the URLs your site can generate. The easiest way to get to this list is to extract all of the URLs from your XML sitemap(s) or ask your developer. Remember that, if you crawl your site with XENU you might miss orphaned pages.
Build a Mozenda agent scraper
Your crawler needs to execute the Google cache query and should be configured to capture the URL and cache date.
http://webcache.googleusercontent.com/search?sourceid=chrome&ie=UTF-8&q=cache:[URL]
Would result with:

If no results are found, your agent needs to be able to record the alternative result. When you’re happy with the agent you’ve built, upload and run the agent. Execute this very slowly (proxy in image is a publicly available service – proceed with caution).

Mozenda running my cache agent:

Combine your new data with a few other sources
While your cache scraper is running, think about where else you could gain insight through combining your data sources. Let’s not forget we’re trying to locate pages that are not indexing. Some of the data points you could include may be;
- Click depth of URL from home page
- Internal links out from page
- Internal links in to page
- Meta robots
- X-Robots in server header
- Status code response
All of these data points can be gathered from two sources – Xenu’s link sleuth and the SEOmoz Custom Crawl tool. Xenu needs little introduction, but few know that click depth, internal links in and out of a page are part of the available data. SEOmoz’s Custom Crawl is awesome, and includes data on the server header response, contents of the X-Robots tag, meta title and rel canonical target.
Having a list of all URLs on your site, with a definitive answer on click depth, number of internal links and the Google Cache status is a very interesting piece of data to have, but (of course), it can be extended even further.
If you’re looking for a larger crawl of your site, but the same data, Adam from SEOmoz has pointed out you can get 10,000 pages + (depending on your membership level) crawled and exported from the SEOmoz Pro Account:

You can find this data via the “Crawl Diagnostics” tab in your campaign dashboard. Thanks Adam!
Content grouping
Most websites have a relatively simple approach to content types via their URL formation. This blog, for example, uses “/category/” in the URL to indicate the category content type. Paginated URLs might appear as “/page/*/”. If you’re a retail site, perhaps your product pages contain “/product/”.
By using an Excel query to group your contnet types, you’ll have the ability to get a sense of overall indexation in an area of your site, without having to group the sitemaps together. Try something like:
=NOT(ISERROR(SEARCH(“[URL CHUNK]“,Table3[[#This Row],[URL]],1)))
Where “[URL CHUNK]” could be “/page/”, “/products” or whatever. The outcome is “TRUE” if your URL belongs to a recognised group, and “FALSE” if it doesn’t.
Entries via Google to URL
With a simple VLOOKUP, you can combine traffic numbers by URL in your indexation data. This might help highlight pages that *should* have a little traffic from Google, but don’t – or at least you’ll have another point of reference for your investigations.

The end result
Here’s a screenshot of the example data I built while writing this post. You’ll see all of the data I’ve mentioned in this post, along with a number of “content groups” I found most relevant to my blog. There are some properly configured duplicated pages with SEOgadget which, I can confidently report, are not cached, nor are they generating traffic. My data tells me that the paginated URLs on the homepage, category and tag pages are properly set to noindex but that those pesky comment pages (where a blog post has more than a certain level of comments, we paginate them) are misbehaving (they should be set to noindex). Time to roll my sleeves up.
Click to enlarge…
I hope you can see from this screenshot how you might benefit from combining data into a single point to identify, diagnose and fix indexation issues on your site. Of course there are other data sources out there, and we’ve not touched on the visual aspect of representing this data, which I’m saving for another post.
In the meantime, I’d really like to hear your thoughts, particularly on the data you might choose to help diagnose your architecture and indexation issues.
Page Level Search Engine Indexation [Data & Collection Methodology] is one of our latest posts from: SEOgadget.co.uk.
Extract Your Competitor Keyword Strategy [Excel Skills]
At the recent SEOmoz / Distilled Pro Seminar in London, I gave a presentation on advanced keyword research. Today I’d like to share one of the tips I gave on how to extract your competitors target keyword list, with relative ease using tools we all have at our disposal.

Image credit: House of Sims
Here’s the pro training slide, just if you’re keen for a sneaky preview of this post:

Predictable, SEO people can be
That meticulous attention to detail SEM practitioners apply to the optimised pages of their big dynamic sites can be quite predictable sometimes. Even really big sites consist of only a few page templates, which makes extracting data from them simple. Let’s use this blog post to learn the basic skills needed to generate an instant keyword list, with no keyword research tools whatsoever. Yet.
It’s been a while since we mentioned Xenu’s Link Sleuth
How did you guess I was going to mention Xenu? Xenu’s a cracking piece of software, enjoyed by SEOs pretty much since the dark ages, and I totally recommend it. While I write this post, Xenu is dutifully crawling a jobs website ready for me to export the data to Excel. Note the “Title” column, an oft-overlooked column in Xenu’s captured data list:

Carry out a crawl of your target website, with the appropriate settings to avoid crawling external links. We’re really only interested in the onsite data, list of URLs and Titles.
Dust off your Excel Skills
Start by exporting the data you’ve gathered into Excel via a TAB separated text file. That’s quite easy if you’re unfamilar with the process, just go to file > export to tab separated file. Import the file into Excel, just like this:

PS – by a handy coincidence, Excel imports text files with the TAB delimiter selected by default. You can just click finish straight up when you’re importing one of these files.
Make a table and work out some formulas
There are some seriously cool functions in Excel for handling text. My favourites, in no particulalr order, are; “LEFT”, “MID”, “RIGHT”, “FIND” and “SEARCH”. An honourable mention goes out to “LEN”, too.
Here’s what they do:
LEFT – Returns the specified number of characters from the start of a text string
MID – Returns the characters from the middle of a text string, given the starting position and length
RIGHT – Returns the specified number of characters from the end of a text string
When you combine the queries above with the numerical output from the queries “FIND” (Returns the starting position of one text string from inside another), and “LEN” (returns the number of characters in a string), you can quickly construct mechanisms to extract and repurpose the keyword data extracted from your site crawl.
Here’s a simple example:
=IFERROR(LEFT(Table1[[#This Row],[Title]],FIND(” in”,Table1[[#This Row],[Title]],1)),”No Jobs”)
If you translated this Excel query into English, it might read:
Return the characters in column “Title” in this row, until the string “in” is found. If there’s an error with this formula, display the text “no jobs”.
Download the example
You can see this example in action in the template file I’ve created for you to download here. Check out the “Optimised KW 1″ column, which extracts generic job category terms from our test crawl data.
Here’s a more complex example, also included in the data
=IFERROR(MID(Table1[[#This Row],[Title]],(FIND(“in “,Table1[[#This Row],[Title]],1)+3),FIND(” | “,Table1[[#This Row],[Title]],1)-(FIND(“in “,Table1[[#This Row],[Title]],1)+3)),”No Location”)
This one extracts the location contained in the title data, which you can see in action in our “Location” column. The query looks a little more daunting, but if you de-construct it, bit by bit – you’ll see that the query is basically the same as the first, using the text “in ” as a starting point and returns characters until the ” | “. As FIND is used to return the position of the pipe, we subtract the starting position of the “in ” to get the number of characters between our two markers – thus extracting our precious location from the title. WIN.
This trick isn’t just for keyword research
If you can improve your skillset to include extracting data, like job titles, locations, stock / gift items, books, car data etc, you can repurpose it or make it “more unique” quite easily, too. Think about concatenating your new data points together in new sentences and title constructions, ready to upload to your own database (laughs evil laugh).
If you’re keen to get the data from other sources, such as search results pages, product listing pages and the like, you could also give importXML a try using Google Docs. Of course you’d probably be keen to get the data into Excel as quickly as possible either by importing it, or editing it within Excel using Google Cloud Connect. Have fun!
Extract Your Competitor Keyword Strategy [Excel Skills] is one of our latest posts from: SEOgadget.co.uk.
Put me in front of a Mac and it’s almost as if I never learned to use a computer. Put me in front of Google Spreadsheets and all of the time I’ve spent working with Excel feels a little like time wasted, and not in a good way. I’m just not very used to a spreadsheet that isn’t Excel.
Unafraid of a challenge, I recently decided to give Google’s (exceptional) importXML, importFEED and importHTML functions a try – the ability to fetch information from the web to retrieve the data you need. Mostly to make an interesting blog post, but partly out of envy that Excel doesn’t have this function.

Image credit: Tech109
It’s frustrating trying to get XML data into Microsoft Excel – unless you’ve got the time and patience to build some basic Macros or VBscript for your requirements. With Google Docs, it’s really easy.
A few resources
If you want to use Google Docs to extract data from the web, it would be a good idea for you to learn a little xPath. “XPath is used to navigate through elements and attributes in an XML document”, or, in simple terms, you can use xPath to fetch little bits of data contained in structured elements like <span>, <div> or links or pretty much anything, really.
Also, there are a few people who have been doing this a while, and probably have sample spreadsheets that blow some of the examples below away – but you have to start somewhere, right? If you’re already an importXML / Google Docs Ninja, maybe go and find something else to do instead of reading this post.
If you’re interested, I made a Google Docs Spreadsheet with all of the examples below:
Does anyone know?
“Does anyone know” is such an interesting search on Twitter – just combine that query with a keyword, like “restaurant” and a location for everyone on Twitter looking for a very specific, thing. Great if you happen to be trading in that thing.

Try a query like this to pull through results from the Twitter search RSS feed:
=Importfeed(“http://search.twitter.com/search.atom?q=+restaurant+%22anyone+know%22+london+OR+manchester+OR+birmingham”)

Twitter followers
A nod to Steven Foskett for this one, and particular kudos for the mention of vCard, the query for LinkedIn connections, Klout score and Alexa Rank. Nice!
Try this query: =importXML(“http://twitter.com/[your-username]“,”//span[@id='follower_count']“)
Which will give you the number of followers you have on your Twitter profile. I added together the total followers that my SEO team have (that’s the three of us) for kicks. I wonder how long it will be before someone totals up all followers counts for all UK agencies? I wonder if there’s a correlation between that data and turnover ![]()
Pull price data from the web
I think that, after some mild haranguing, Will might have purchased himself a pair of Etymotic headphones. Perhaps my pitch would have gone slightly more efficiently with a little xPath and Google Product search:

For something like this, a way smarter approach to get pricing data from Amazon would be to use their API – but you get the point with this brief example.
Get all of your (competitors) URLs from their sitemap
Try something like this: =ImportXML(“http://www.yourcompetitordomain.com/sitemap.xml”,”//url/loc”)
I mentioned doing this with Excel to find orphaned pages, but you can have a lot more fun with importXML. For one, theoretically you could go off and fetch all keywords contained in the <title> tag of each of the URLs – an instant keyword strategy!

Pull link data from Blekko
With a query like this: =ImportXML(“http://blekko.com/ws/http://seogadget.co.uk/+/links+/rss”,”//link”)
Blekko is everyone’s favourite new SEO tool, and fair enough, it is quite cool. As Blekko are happy to push their data out via RSS, we’re able to pull this data into our spreadsheets with ImportXML (to be fair this is really easy with Excel, unless you’d like to create multiple columns with different domain queries.

More Blekko – link data tables
Blekko have a feature that allows for a pretty insightful breakdown of their SEO data on your domain. If you want to pull some of that through in to Google Docs, no problem:

Try this query: =importhtml(“http://blekko.com/ws/www.smashingmagazine.com+/seo”,”table”,7)
Have fun
This wasn’t a particularly “advanced” post – I did quite enjoy the thought of what to do next with this data, though. Fetch IP addresses, WHOIS details, root domain links or keyword research data with Google Suggest, the Alchemy API, or plain scraping your competitor home pages. If you’re using importXML, I’d really like to hear how.
Anyway, as I mentioned earlier, please feel free to take the queries from here: http://bit.ly/9Fs7aF – improve them, and let me know what you did.
A little update
I got in touch with my friend Tom from Distilled to see if he wanted to contribute. He’s been out in Vegas, but came back with a tip to solve the problem of Google caching a result for around two hours at a time:
Google docs will cache a URL for ~2 hours and so if you want to crawl a URL more often than that then you need to add a modifier to the URL.
I use int(now()*1000) to generate a unique timestamp and then add that into the URL in a dummy query string. E.g.
http://www.google.com/search?q=seattle+seo+consulting&pws=0&gl=us&time=1354333
The search results won’t change when you change the time value but Google docs will treat it as a fresh URL and crawl it again.
Also – you can do lots of amazingly fancy things using Google Scripts (kind of like macros for google docs) but don’t have a huge amount of time to go into detail about that now!
Well, hopefully Tom will have time soon – thanks for contributing!
Playing Around with ImportXML in Google Spreadsheets is one of our latest posts from: SEOgadget.co.uk.
Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
For a little while now I’ve been wrangling with an interesting problem in site architecture diagnostics. Do I have any orphaned pages on my site, and how do I find them?
Frequently, web developers will write code to generate an XML sitemap that will include all URLs in the database, linked to or no. How can we use this fact to our advantage?

Image credit: Nick Johnson
I thought I’d share a strategy using Excel to extract all the URLs on your site, comparing them to the raw export from the amazing IIS Toolkit. (PS – word is that Daniel’s writing a “how to” for IIS, including the installation…)
Import your XML sitemap URLs into Excel
Did you know you can import an XML sitemap into Microsoft Excel? It can be a little crash-tastic, but oh-so rewarding when it works. Let’s use SEOgadget’s XML sitemap as the example:

Then, pop your XML URL into the dialogue box:

I’d recommend you only do this if you have reasonably powerful machines, we rock i7′s with 4gbs of ram as a system minimum. Big sitemaps take a while!
The end result is something like this:

By the way, you could import Twitter search results via the RSS feed into Excel using the same method, if you’re into that kind of thing.
Get your IIS site crawl data
Dan’s working on his next post, so I’m not going to steal any thunder, I promise. Assuming you already have IIS Toolkit, go ahead and export the URL listfrom your site crawl:

Import the CSV export into Excel, and turn the data into a table called IIS. If you haven’t played with exporting data from IIS, you should – it’s amazing.

There’s a ton of brilliant data in the IIS export, including the contents of the server header reponse, internal links counts and the directory the page is hosted in. We’ll talk more about that later on, use for now we’re interested in orphaned pages.
Use VLOOKUP to find pages in your sitemap that have no internal links
At the end of the day, a page in your website with no internal links is very unlikely to rank. Many pages with no internal links indicates some problems with your site architecture that you’re going to need to fix. Alas, how to find them? With a trusty VLOOKUP, that’s how.
We’re most interested in the “linked by” column, which contains the total number of internal links a page has. All you have to do is pull this data into your imported sitemap XML list. Assuming you name your IIS table “IIS”, the query will look like this:
=IFERROR(VLOOKUP(XML[[#This Row],[ns1:loc]],IIS,18,0),”Not crawled”)
Why is this good?
The end result will be a list of URLs in your XML sitemap with corresponding internal link numbers. That’s cool for doing the following:
- Finding URLs in your XML sitemap with no internal links, and are as a result, “Not crawled”
- Finding URLs in your sitemap with very low numbers of internal links
Have fun, and have a great Bank Holiday weekend
Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit is one of our latest posts from: SEOgadget.co.uk, UK SEO consultants helping people and organisations succeed in search.







