How to disable the indexing of the site, pages and individual elements on the page?

We will send the material to you by email:


Время чтения: 6 мин.

Закрытие от индексации страниц сайтаThe purpose of this article is to show all the ways in which you can close a site, pages or parts of a page from indexing. In what cases, which method is better to use and how to correctly explain to the programmer what he needs to do in order to properly set up indexing by search engines.

Closing from indexing site pages

There are three ways to close site pages from indexing:

  1. using the “robots” meta tag (<meta name=”robots” content=”noindex,nofollow” />);
  2. creating a root robots.txt file;
  3. using the Apache server service file.

These are not mutually exclusive options, they are most often used together.

Close the site from indexing using robots.txt

The robots.txt file is located at the root of the site and is used to control the indexing of the site by search robots. Using a set of instructions, you can enable or disable indexing of the entire site, individual pages, directories, pages with parameters (such as sorting, filters, etc.). Its peculiarity is that in robots.txt you can write clear instructions for a specific search robot (User-agent), be it googlebot, YandexImages, etc.

Закрыть сайт от индексации с помощью robots.txt

In order to contact all search bots at once, you need to write the directive “User-agent: *”. In this case, the search engine, having read the entire file and not finding specific instructions for itself, will follow the general instructions.

Read everything about the robots.txt file and how to compose it correctly here, as well as recommendations for using this file from Yandex and Google.

For example, below is the robots.txt file for the Rozetki website:

Зачем закрывать сайт от поисковых систем?

As you can see, the site is closed from indexing for the Yahoo!

Why close the site from search engines?

It is best to use Robots.txt in such cases:

  • when the site is completely closed from indexing during its development;
  • to close the site from non-target search engines, as in the case of Rozetka, so as not to load your servers with “extra” requests.

In all other cases, it is better to use the methods described below.

Prohibition of indexing using the “robots” meta tag

The “robots” meta tag tells the search robot whether a particular page and links on the page can be indexed. The difference between this tag and the robots.txt file is that it is impossible to write separate directives for each of the search bots.

There are 4 ways to tell the search engine how to index a given url.

1. Index both text and links

<meta name=”robots” content=”index, follow”> (used by default) is equivalent to <META NAME=”Robots” CONTENT=”ALL”>

2. Do not index text or links

<meta name=”robots” content=”noindex, nofollow”>

This option can be used for confidential information that should not be found through a search engine, information that is necessary for site visitors, but search engines can impose sanctions for it, for example, duplicate pages, filter crossings in an online store, etc.

Не индексировать ни текст, ни ссылки

3. Do not index the text on the page, but index the links

<meta name=”robots” content=”noindex,follow”>

Such a record means that this page does not need to be indexed, but you can follow links from this page to explore other pages. This is useful when allocating an internal citation index (ICI).

4. Index the text on the page, but do not index the links

<meta name=”robots” content=”index, nofollow”>

This option can be used for sites that have a lot of links to other sources, such as media sites. Then the search engine will index the page, but will not follow the link.

Should I choose the “robots” meta tag or robots.txt?

Using the “robots” meta tag and the robots.txt file in parallel has real benefits.

An additional guarantee that a particular page will not be indexed. But this still does not insure you against the arbitrariness of search engines, which can ignore both directives. Google especially likes to ignore the rules of robots.txt by giving out the following data in SERP (Search Results Page):

Что выбрать мета-тег «robots» или robots.txt?

In the case when we close a directory in robots.txt, but we still need certain pages from this directory for indexing, we can use the “robots” meta tag. The same works in reverse: there are pages in the indexed folder (website directory) that need to be banned for indexing.

In general, you need to remember the rule: the robots meta tag takes precedence over the robots.txt file.

Read more about using meta tags at Yandex and Google.

Close the site from indexing using .htaccess

.htaccess is a service file of the Apache web server. Matt Cutts, former head of Google’s web spam team, argues that using .htaccess to block a site from indexing is the best option and draws a happy smiley face in the video.

Using regular expressions, you can close the entire site, its parts (sections), links, subdomains.

Closing from indexing elements on site pages

SEO tag <noindex>

The <noindex> SEO tag is not used in the official html specification, and was invented by Yandex as an alternative to the nofollow attribute. An example of the correct use of this tag:

<!–noindex–>Any part of the site page: code, text to be closed from indexing<!–/noindex–>

Examples of using the <noindex> tag to close elements on site pages from indexing:

  • you need to hide the codes of counters (liveinternet, TIC and other service ones);
  • hide non-unique or duplicate content (copypast, quotes, etc.);
  • hide dynamic content from indexing (for example, content that is displayed depending on the parameters with which the user entered the site);
  • in order to at least minimally protect yourself from spam bots, it is necessary to close the mailing list subscription forms from indexing;
  • close information in the sidebar (for example, an advertising banner, text information, as Rozetka did).

Примеры использования тега <noindex>

rel=”nofollow” attribute

If you add the rel=”nofollow” attribute to the link, then all search engines that support the standards of the World Wide Web Consortium (including Yandex and Google) will not take into account the weight of the link when calculating the site’s citation index.

Examples of using the rel=”nofollow” attribute of the <a> tag:

  • encouragement and punishment of commentators of your site. Those. spam links in comments can either be deleted or closed in nofollow (if the link is thematic, but you are not sure of its quality);
  • advertising links or links placed “by barter” (exchange of sentries);
  • do not transfer weight to a very popular resource, such as Wikipedia, Odnoklassniki, etc.;
  • search engine crawl prioritization. It is better to close your registration forms from clicking on links for bots.

SEOhide

A controversial technology, in the essence of which, with the help of javacript, to hide content that is unnecessary from the point of view of an SEO specialist from search engines. And this “smells” like cloaking, when users see one thing, and search engines see another. But let’s look at the pros and cons of this technology:

Pros:

+ correct control of static and anchor weight;

+ fight against overspam (reducing the number of keywords on the page, the so-called indicator of “nausea” of the text);

+ can be used for all search engines without restrictions, as in the case of noindex;

+ practical use of this technology by large online stores.

Minuses:

– search engines will soon learn to index JS;

– at the moment, this technology can be perceived by search engines as cloaking.