The entire point of having a website is to be able to connect with your target audience and get traffic that produces revenue. All the backlinks and awesome content you put on it are nothing if search engines can’t crawl and index your pages.
This article is going to help you understand what indexability and crawlability are, how they are affected by different factors, how you can make your website easier to crawl and index for search engines, and we’ll close with a few useful tools to manage your digital property’s crawlability and indexability.
What Google Says about Indexability and Crawlability
Before we jump into the water, let’s get our feet wet by taking a quick look at what Matt Cutts, a former Google employee, has to say about how search engines discover and index pages.
According to Google,
“Crawlers look at web pages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those webpages back to Google’s servers.”
In other words, if you care about SEO and why it’s important, it’s a good idea to make your website as indexable and crawlable as possible.
What is Indexability
Indexability is the ability of search engines to add your web page to their index. There is a chance that Google is able to crawl your site, but may not be able to index it due to outstanding indexability issues.
Here’s a screenshot of a page that is indexable, and a link to the tool in the Chrome Store
What is Crawlability
Search engines need to access your site and crawl the content on your pages in order to understand what your site is about.
Spiders crawl your site by moving through links between pages. That’s why a good linking structure and a sitemap are useful.
Things like broken links and dead ends might hinder the search engine’s ability to crawl your site.
This is a screenshot of a URL that has passed a crawlability test.
What Affects Crawlability and Indexability
Whether you’re an experienced SEO or just a beginner looking for an SEO guide, keeping an eye on the following factors is crucial.
1. Site Structure
Having a weak site structure will hinder the robot’s ability to crawl and index your site. Structure issues include pages that have no incoming links pointing to them, for example.
2. Internal Link Structure
Having a good internal linking structure will help the crawlers navigate through your website with ease, not missing any content and indexing your website correctly.
Google Search Console is a great tool to check your link structure, as you can see here:
3. Loped Redirects
Broken page redirects will completely stop a crawler and cause immediate issues.
4. Server Errors
Server-related issues will impede crawlers to do their job right.
This is what a server error might look like. Does it look familiar?
5. Unsupported Scripts and Other Technology Factors
6. Blocking Web Crawler Access
There are a few reasons for you to block crawlers from indexing your pages on purpose, including having pages with restricted public access.
However, make sure you’re careful not to block other pages by mistake.
These are the most common factors affecting crawlability and indexability, however, there are many more factors that can make your website crawler-Unfriendly.
How to Check Your Website for Indexability
Download the entire Checklist here.
You know the importance of link building and other SEO tactics. But regularly checking the aspects below is a good practice to help you keep your site healthy.
1.Check Your Pages for Noindex Tags
Paying attention to detail is important. Even as an experienced SEOs, you might accidentally insert or forget to remove a “noindex, follow” tag.
This might look like this:
2. Check your Robots.txt File
When you configure your robots.txt file, you can give search engine crawlers specific instructions as to which directories should be crawled.
Make sure you don’t accidentally exclude important directories or block any of your pages. There’s a good chance that the Googlebot will end up finding your pages through backlinks but if you correctly configure your robot.txt file, it will be easier for search engines to crawl your site regularly.
3. Check your .htaccess File for Errors
The most common uses for .htaccess files are:
- Rewriting a URL
- Redirecting an old URL to a new URL
- Redirecting to the www version of a page
These files can potentially prevent your page from showing up on search results and perceive crawlers as unauthorized access. The .htaccess is a control file stored in a directory of the Apache server.
In order for your .htaccess rules to be executed, you must always name the file exactly the same way. For example:
Redirecting or rewriting URLs:
Rewriting requires using:
Define the rule that the server is to execute:
RewriteRule seitea.html seiteb.html [R=301]
If the file was named incorrectly, it will not be able to rewrite or redirect URLs, causing users and crawlers to not be able to access, crawl, or index the pages.
4. Test your Canonical tags
Canonical tags help you prevent duplicate content issues by specifying the “preferred” version of a page for crawlers.
These are the most common errors you can make when setting your Canonical tags:
- The Canonical tag refers to a relative side path
- The Canonical tag refers to a URL that is located in the Noindex tag
- A paginated page refers to the first page of the pagination by Canonical tag.
- The Canonical tag refers to a URL without a trailing slash
This is what a Canonical Tag looks like.
5. Monitor Your Server Availability and Status Error Messages
If your server fails, crawlers won’t be able to index your pages. Just like users won’t be able to access it, it’s impossible for crawlers to do it as well.
To stay on top of any issue, you should regularly check your site for 404 pages and 301 redirects to be working properly.
Here’s a screenshot of a server error message.
6. Find Orphaned Pages
Make sure new pages, new categories, and any new restructuring you might add to your site are linked internally and listed in the sitemap.xml. The most important tip regarding orphaned pages is to avoid them no matter what.
7. Find External Duplicate Content
Content theft, to put it bluntly. External pages may have duplicated your content and this might make them rank better than you, or, even worse, prevent your content from being indexed at all.
Google is pretty good at knowing which one is the “original” content, but you can find this stolen content by performing a search of some of the most key and original phrases in your piece.
8. Identify Internal Nofollow Links
If any of your internal pages are labeled with the rel=”nofollow” attribute, they will not be crawled or indexed by Googlebot. Make sure you check and adjust accordingly.
9. Check your XML Sitemap
If your XML sitemap does not contain all the URLs to be indexed, you’ll have to deal with a problem similar to orphaned pages.
The screenshot below shows Sitemaps submitted in Google Search Console.
10. Regularly Check Whether Your Pages have Been Hacked
To make sure your pages don’t get hacked take the following actions:
- Check Google Search Console regularly for clues.
- Change the passwords to your backend regularly.
- Always install all offered updates.
- Check out Google’s Webmaster Central Blog for more tips.
How to Make Your Website Easier to Crawl and Index
Download the entire Checklist here.
Besides making sure the issues listed above don’t happen to you, you can also take proactive steps into making sure your site is properly configured to be crawled and indexed correctly.
1. Submit Sitemap to Google
Your sitemap will help Google and other search engines better crawl and index your site.
This is what submitting a sitemap looks like in Google Search Console.
2. Strengthen Internal Links
A strong interlinking profile will certainly make it easier for search engines to crawl and index your site. It will also help with your general SEO and user experience.
3. Regularly Update and Add New Content
Updating and adding new content to your site is a great recipe for improving your rankings and SEO health, as well as user experience. Another advantage of doing this is that it will make crawlers revisit your site more often to be indexed.
Once you do, you can ask Google to reindex your page. It looks like this:
4. Avoid Duplicating Any Content
Having duplicate content on your site will decrease the frequency with which crawlers visit your site. It’s also a bad practice regarding your SEO health.
5. Speed Up Your Page Load Time
Crawlers have a crawl budget. And they can’t spend it all on a slow website. If your site loads fast, they’ll have time to crawl it properly. If it takes too long to load and the crawler’s time (crawl budget) runs out, they’ll be out of there and on to the next website before crawling all your pages.
The screenshot above is part of the results PageSpeed Insights from Google gives you.
Tools for Managing Crawlability and Indexability
The web is filled with tools that will help you monitor your website and detect any indexability and crawlability issues on time. Most of them have free tools or free trials that will allow you to check your site.
Once you land on Google PageSpeed Insights, you get a screen like this:
Making sure your website is properly configured to be indexed and crawled by search engines is a smart business decision. Most of the times websites are business tools to attract and convert. This is why taking all required steps to be indexed and crawled by search engines correctly needs to be part of your overall SEO strategy and maintenance.
Link to checklist