Block or block not

Posted by Dave CollinsAI, Content

Based on recent emails, you fall into one of two categories:

  • You’re considering blocking AI bots from your content or have done it already.
  • You couldn’t care less.

Most of you fall into the latter for now.

You should care. Here’s why.

You work hard on your content.

After publishing, an AI tool can swoop in and take all the glory, with little attribution.

Here’s an example.

Some time ago, HubSpot invested time and money in creating a huge post on “The Ultimate Guide to Content Creation”:

https://blog.hubspot.com/marketing/content-creation

10,000 words, 70 images and 264 links.

This is a lengthy and wordy piece of content.

When I asked Perplexity for [10 reasons to create good content for your business], it used the article as a source but reduced it to 41 words.

Perplexity borrowing content from Hubspot
The link to the post is the “4”.

Describing this as easy to miss is an understatement.

Should you block AI from indexing your content?

You can try.

But there are hurdles.

Blocking the AI bots may be difficult or impossible.

In September 2023, The Guardian newspaper announced it was blocking OpenAI from its content.

Their website’s robots.txt shows they’re also trying to block more AI platforms:

The Guardian's robots.txt
This is a small sample of the entire robots.txt file — https://www.theguardian.com/robots.txt

The file also contains the following comment:

# Guardian content is made available under our terms and conditions of use.
# Any other uses are not permitted, incl. but not limited to: for large language
# models (LLMs), machine learning and/or artificial intelligence-related
# purposes; with any of the aforementioned technologies; and/or for any
# commercial purposes. Contact licensing@theguardian.com for assistance

It’s hard to gauge the effectiveness, but when I searched for [Give me a list of 10 recent articles from The Guardian newspaper in the UK] in Perplexity, it listed some  recent stories and summarised them.

So much for blocking.

There’s another reason why you might want to hold off blocking the bots.

You’re almost-certainly happy for Google to index your content.

Even if this can result in your content being displayed without a click.

If Perplexity added clearer links to their sources, this would result in a website receiving significantly more visitors from the scraped content.

Would you reject this opportunity?

Imagine a lawsuit leading to a policy change making links more visible.

A potential future version of how Perplexity presents links to original content source.
Currently, a small, easy-to-miss link that doesn’t look like a link is the only way for Perplexity to use your content.

If you’re considering blocking them for now and revisiting if their attribution changes, this carries a risk.

I doubt Perplexity would start indexing your content and including you in the results the moment you changed your mind.

Blocking them today could be a bad move tomorrow.

There’s a history of bot blocking.

Blocking AI bots is the latest concern of organisations worried that other companies might misuse their content by way of historical context.

I remember a conference conversation with an attendee who was so incensed by Google displaying his content in their Knowledge Graph that he blocked Google from indexing it.

Predictably, his organic traffic decreased to almost nothing.

No matter how indignant and self-righteous he felt, he didn’t gain from this at all.

I understand the frustration of having your work scraped and presented by a tool as its own.

But giving in to the knee-jerk response of blocking may not be in your interest.

Burning bridges is bad for business.

Amid the hype around Artificial Intelligence, it’s easy to forget that the technology is still in its early stages.

So before blocking, I’d hesitate.

If you manage to block all the bots, which is unlikely, you’d never see how much traffic you lost.

In other words, if people clicked on links to your content 100 times a day, blocking them means you’d never know.

This makes it a highly uninformed decision.

But the biggest question to consider is:

What do you gain by blocking?

What would you lose?

If the person searching on Perplexity doesn’t find your content, they probably won’t then look for it on Google.

By blocking them, you have more to lose than the bots.

Get the Google Demystifier. Unique ideas for your business.