A big part of SEO is managing what does and does not show up in Google. Although there is no way to ensure a specific page is/isn’t going to show up in search results there are several tactics to manage how Google crawls and indexes your site. In this post we will go over some strategies & the steps to manage your Google index.
Please Note: Incorrectly using some of these tools and techniques to manage your index can cause significant damage to your websites search ranking. This post should be used strictly for educational purposes. Before making any major adjustments always speak with your web developer.
Definitions You Will Need To Understand
- The index is the comprehensive list of URLs Google uses as potential targets for search queries. Essentially, it is the list of web pages that Google thinks it can serve to users when they perform a search.
- Robots are automated scripts that help websites index the internet. They follow links on pages and take note of each location they visit. Robots help Google find new pages on your site when they re-crawl your website.
- Robots.txt is a file that is placed on your website that blocks robots from crawling specific pages or portions of your site. This can be set to exclude specific robots or all robots and is useful for excluding potentially sensitive pages or login pages.
- Crawl is the term used for when a website such as Google sends its robots to view your website. If a robot is following your site links and indexing them, then they are crawling your site.
- Sitemap is an itemized and organized list that exists on the website to aid robots in finding all your discoverable links. The sitemap should contain all URLs you want indexed as well as information about the importance of each page and suggestions for re-crawling the specific pages.
NoIndex With Yoast
The Yoast SEO Plugin provides built in noindex functionality. What this does is adds a meta noindex tag to the page. This tag essentially tells robots and search engines that this page should not be crawled and indexed.
How to set a page to noindex using Yoast
- Go to the Yoast module on the desired page and select the gear icon.
- Click the drop down for the “Meta Robots Index” item
- Select the noindex option & save the page
Now your web page is set to noindex and should be ignored by robots and search engines alike. Although you have this setting on, it is only a suggestion and some robots or search engines could still surface this page. There is no way to guarantee a page will be ignored. However, if your page is still surfacing with a noindex tag you can request the removal of pages. The process for removing pages from an index is very specific and should not be used by the average user.
Google Search Console
Google Search Console (GSC) is the program that is used to manage your search index. It allows you to see how many pages are indexed, data on your rankings & clicks, submit a sitemap, and request removals of specific pages from the index.
Note: Google is currently testing a new layout for GSC which looks slightly different – if you are reading this post and your GSC dashboard looks different, not to worry, it still functions the same and everything is still relatively in the same location.
See Your Indexed Pages
- Select Index Status under Google Index dropdown menu
- Review your total number of pages indexed (Total Indexed: 422)
What you want to see is flat or steady growth here. If you see sudden spikes or drops in your chart here there may be an issue causing your site to be un-crawlable or Google to de-index it. Contact your marketing team.
- Use site:yoururl.com in a Google search to see full index on the front end
Managing Your Sitemap
Once we have a clear view of how many pages are being indexed we can check how many of these pages were manually submitted through sitemaps. A high number of pages that have not been indexed, but are submitted signals, that these pages are low quality and will require some attention to get them indexed. Low quality refers to the perceived value to users by Google. For more information about low quality pages see our blog post about Reviewing Content Quality for SEO.
In the inverse, if you have a high number of pages in the index, but have not been submitted through the sitemap, this shows that we need to adjust how our sitemap is structured and what is/isn’t included.
- Click the Sitemap item under the Crawl dropdown menu
- Review the number of Pages & Images indexed vs submitted
- Add/Test a new Sitemap
Click the Add/Test Sitemap button and then simply type in the URL of your sitemap you wish to test/add. If you are using Yoast, typically your URL will be /sitemap_index.xml. Once submitted your new sitemap will be included at the bottom of the page. Please note, if this is a new sitemap you will not get Submitted vs Indexed data for a few days.
Ranking & Click Data
If you, like most business owners, want to know what traffic you are getting from search, what queries people are using to find your site, what devices they are using, and more, you can access this information using GSC and Google Analytics.
Using GSC to review data
To access data using Google Search Console select Search Analytics from the dropdown under Search Traffic. You will be presented with a graph similar to the following graphic where you can explore and manipulate the data based on a number of variables including Queries, Pages, Countries, Devices, Search Type, Appearance, and Date Range. You can also select which data points you would like to see, whether it’s Clicks, Impressions, Click Through Rate (CTR), or Avg. Position.
Using Analytics to review data
Although the GSC data is useful in certain instances, we find it can be inaccurate on smaller websites and not 100% reliable. Instead, we prefer to use Google Analytics which pulls referral data directly from users who have visited your website. We have more control, which means we can trust the data slightly more. To login to your Google Analytics account go to analytics.google.com.
- Navigate to Acquisition > All Traffic > Channels
This will give you a list of all the traffic from each source – organic, social media, email campaigns, specific marketing campaigns etc.
- Select Organic Search from list of sources
- Review Traffic for that channel
You can view trends, spikes, compare to previous periods all to see how performance has changed over time. To change the range or compare to previous ranges use the date located in the top right of the screen to adjust.
- Create Custom Filters to find answer specific questions
You can add secondary filtering to see which pages were landed on the most, which devices were used most often, or any number of other dimensions. To do this select the Secondary Dimension tab and search for the dimension you wish to sort by.
Requesting Removal of Pages
Note: This is a very sensitive system – a small error in the removal of page requests can cause entire websites to be de-indexed. We DO NOT RECOMMEND the average user attempt this themselves.
If you have already tried setting pages to noindex and they are still displaying in search or getting lots of robot traffic, you can request a page be removed from the search index. This can be done through the Remove URLs section of the dropdown for Google Index in GSC. Again, we highly recommend do not using this tool unless you are extremely familiar with indexation & URL structures.
If you have a WordPress website that has the Yoast plugin you will have a sitemap created automatically for you. You will simply need to choose what specifically you want to include and what you want to exclude.
Here is how you can add/subtract portions of your website from the sitemap.
- Navigate to the Sitemap Section of the Yoast Plugin
- Select Post Types
- Add/Exclude any Post Types you would/wouldn’t want in your sitemap
There is a variety of reasons for wanting to include/exclude different sections of your website from your sitemap. Each website is different. In addition, what is included in the Post Types section will depend on the way your website is built. Additional sections of your site could be found under the Taxonomies section of the Sitemap Generator for Yoast.
The robots.txt file is a great way to stop robots from crawling your website. You can stop specific robots from crawling your site, stop all robots from crawling specific pages, or stop specific robots from crawling specific pages. There are thousands of different robots out there and they act in just as many number of ways. The most common robots to crawl your site would be from major search engines such as Google and Bing.
Next would be SEO websites such as Majestic, Moz, and Ahrefs. These companies study large data sets and trends about Google Rankings. As such, they are continually crawling and indexing websites for their data set.
If you are using analysis software for your website such as Raven, Screaming Frog, WebCEO, etc., you will also see some robots traffic from these websites crawling your website and identifying any errors.
Lastly, are the other robots. There are an innumerable amount of robots each with their own purpose. Some are spammy and some are harmless. Generally we find it best to just allow all robots unless you find ones that are causing issues for your site. At that point you can block these robots using the robots.txt file.
Typically robots are harmless to your site. However, occasionally there are a few that may be using too much bandwith, accessing pages you do not want accessed, or generally just causing issues on your site. When this is the case you should ask your developer to add these exclusions to your robots.txt file. For 95% of the users reading this post this will be unnecessary. However, it is another tool to help guide the way your website is crawled, indexed, and managed in search engines.
If you are having troubles managing your websites’ search index or are unhappy with how it is currently displaying in search please contact us and a member of our SEO team will be happy to help!