Drupal 8 – How to prevent some content from being indexed by search engine robots
Under certain circumstances, you might want some of your content not to be indexed by any search engine robots. These cases include for example:
-
1
Some of the published content is intended for internal use, but you don’t want to setup users and permissions
-
2
Your data structure is hierarchical, and until some “second level” content gets published, the “first level” parent has little or no meaning
-
3
You don’t want some article (or any content type) to be indexed, until some flag is checked by a supervisor
-
4
You don’t want some user comment to be indexed, even if published, until some flag is checked by a moderator
-
5
A certain type of page should be indexed only after some date in the future
You name it.
The most common case
When “some of your content” means exactly a certain content type or entity type, then the well-known metatag module fits the bill, and it is very easy to configure; see the snapshot:
The most interesting case
But what if you needed a more flexible solution? For example, if the decision on whether or not to index a piece of content is based on some custom business logic (which is the reason I came up to this topic in the first place)?
Well, it’s just about adding a ‘robots’ meta tag in the <head>, right?
Right. But how?
Hooks to the rescue!!!
The one that we actually need is “hook_page_attachments_alter“, which allows you to modify these “attachments” before the page is rendered.
And among those “attachments” we find ‘html_head’, which is exactly what we’re going to alter.
Let’s dive into a practical example:
/**
* Implements hook_page_attachments_alter().
*/
function mytheme_page_attachments_alter(array &$attachments) {
// We want to prevent the search engines from indexing this content
if (putYourBusinessLogicHere()) {
// add the metatag
$newtag = [
'#tag' => 'meta',
'#attributes' => [
'name' => 'robots',
'content' => 'noindex, nofollow',
],
];
$attachments['#attached']['html_head'][] = [$newtag, 'robots'];
}
}
That’s it.
The next time a page that meets the criteria will be rendered, you’ll find that
<meta name="robots" content="noindex, nofollow">
was added in the <head> section.
These are a few methods that allow you to have good control over the search engine indexing, without messing up with the robots.txt file and Google Search Console.
Thoughts by