Profile cover photo
Profile photo
Terence Chisholm
Terry Chisholm's Google+ Page
Terry Chisholm's Google+ Page


Post has attachment
I built a JavaScript based, SEO Site Crawler that helps people scan entire sites, looking at all pages and capturing data about each page.

It does things like:

- Find the "needle in the haystack" (up to 5 needles per crawl)

- SEO Site Audits (capture links list, H1,H2,H3,H4 headlines, URLS, body text, links list and more - automatically)

- View all pages of your site, in both mobile and desktop view side-by-side for QA testing

- Crawl from your site and (optionally) follow links to other domains, and subdomains - so if you are crawling a site that has many subdomains, you can crawl them all, in one go.

- Find broken links and redirected links easily - site-wide

- Crawl test sites that aren't public yet. So you are building a new site to replace one that is live now, you can crawl and compare them both.

How to use it:
- go to
- type in your homepage URL
- press CRAWL
It will then crawl your site, looping through a few steps on each page. When it has completed every page of the site, it will prompt you to copy and paste the data into a Spreadsheet.

Options while it crawls:
- adjust the speed of the crawl to match how fast your page loads by pressing SLOWER or FASTER a few times, and the duration of the delay between taking the next steps will change.
- Copy and paste CUED, DONE, and SEO DATABASE while the crawl is in progress, and let it carry on crawling

Options to set up before starting a crawl:
- If you want to crawl from more than one domain, you can. With this JavaSCript Crawler, you can crawl up to 30 domains/subdomains in one go. Or add a WildCard domain, and it will crawl any link with that in it. Great for crawling big corporate sites and internal intranet sites too.
- Type in some text or code that you are looking for on each page, and it will help you find those "needles in the haystack", by clicking the (x) to close the top layer, and then scroll down to NEEDLE IN HAYSTACK text fields, and type in up to 5 different bits of text or code you are looking for. Then start your crawl as normal, and it will list any pages that have those "needles in the haystack"
- Extract all HTML for each page, and convert them to a single line of HTML that can be put into spreadsheet.
- Extract bu DIV-ID or CLASS-NAME if you like too. So you can just get the bits you want captured on your crawl data spreadsheet. (The ClassName one only works on pages that have that class-name within it, but the DIV-ID based extraction works perfectly, as does the entire page extraction too).

Check it out at:

Use it and enjoy it! And if you have an idea to extend it, please DO share. Don't hack/blag/pretend you wrote that code and try to sell it. That'd be stupid. And illegal.
Add a comment...

Post has attachment
Geekery minified. Electric - Love it.
Add a comment...
Wait while more posts are being loaded