Cloudflare dropped a policy bomb yesterday that could reshape how AI companies access web content. Starting September 15, 2026, any AI crawler that doesn’t clearly separate its search, training, and agent functions will be blocked by default on pages running Cloudflare’s ad infrastructure.

Cloudflare office entrance
Image: Cloudflare office entrance area by HaeB via Wikimedia Commons (CC BY-SA 4.0)

That’s not a suggestion. It’s a deadline.

The Core Problem: When Bots Outnumber Humans

Here’s a stat that stopped me mid-scroll: non-human traffic just surpassed human traffic on the internet for the first time. Cloudflare says this milestone arrived a year ahead of schedule. We’re talking about a web where the majority of requests are coming from crawlers, scrapers, and automated systems—many of them feeding the AI models that are now competing with the very sites they’re scraping.

Matthew Prince, Cloudflare’s CEO, put it bluntly: “Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge.”

And here’s the part that makes my blood boil as someone who runs a blog: over 50% of AI crawler traffic is wasted re-fetching pages that haven’t changed. That’s not just inefficient—it’s predatory. These companies are burning through publishers’ bandwidth and compute resources to hoover up content they might use to train models that directly compete with those same publishers.

What Actually Changes on September 15

Let me break down what this means in practical terms:

  • Separation requirement: AI companies must run distinct crawlers for search indexing, AI training data collection, and AI agent operations. No more mixed-use bots that scrape everything under a single user agent.
  • Default blocking: Starting September 15, any crawler that doesn’t meet this separation standard will be blocked automatically on pages with Cloudflare’s ad infrastructure enabled.
  • Scope: This applies to new Cloudflare customers, new sites from existing customers, and all free-tier users. Site owners can override if they choose.
  • Who’s safe: Companies with clean, transparent bots that clearly state their intent won’t be affected.

The key word here is “default.” Site owners can still allow mixed-use crawlers if they want to. But Cloudflare is making the safe choice the default choice—which is exactly how these things should work.

The Google Elephant in the Room

Now let’s talk about the real target here: Google. Cloudflare specifically calls out the “world’s largest search engine” for having access to “2x more information” than other AI companies.

Here’s the conflict: Google’s main Googlebot crawler is used for traditional search results, but it’s also the engine behind AI Overviews and AI Mode. I’ve written about detecting these kinds of mixed-use crawlers in Nginx and Cloudflare: How to Detect Cloners, Crawlers and IPs with Malicious Requests. If you block Googlebot to protect your content from AI training, you simultaneously tank your search visibility. That’s a lose-lose situation that Cloudflare’s policy is trying to break.

Google’s defense? They offer Google Extended as an opt-out for AI training. But that doesn’t address the fundamental issue: the same crawler that indexes your content for search is also feeding AI features that summarize your work and keep users on Google’s results page instead of clicking through to your site.

I’ve been watching this tension build for months. As someone who relies on search traffic for Bleuken, I’ve felt this squeeze personally. You want to be discoverable, but you don’t want your intellectual property scraped and repackaged without compensation.

“Pay Per Use” vs. “Pay Per Crawl”

Cloudflare isn’t just blocking bad actors—they’re building a marketplace. Last year they launched “Pay Per Crawl,” which let publishers charge AI bots for scraping. Now they’re evolving to “Pay Per Use,” where publishers get paid when their content actually creates value.

The difference matters. Pay Per Crawl is a toll booth—pay to access. Pay Per Use is a royalty system—pay when you profit from what you accessed.

Initial partners include Ceramic.ai and You.com. Ceramic pays publishers when their content gets indexed for AI search results. You.com pays for access to premium or paywalled content. Other AI companies can customize this model for their own use cases.

This feels like the right direction. It’s not about blocking AI—it’s about making sure AI companies treat publisher content as the valuable asset it is, not a free resource to be mined.

What This Means for Small Publishers

If you’re running a blog like Bleuken, here’s what I’d recommend (and I’ve covered this in detail in my guide on How to Block AI Crawlers from Scraping Your Website Using Cloudflare):

  1. Enable Cloudflare’s protection if you haven’t already. The default blocking only works if you’re on the platform.
  2. Check your bot traffic. Cloudflare’s dashboard shows you which crawlers are hitting your site and what they’re doing.
  3. Watch for the September 15 rollout. You’ll need to understand the new defaults and decide if you want to override them.
  4. Consider the “Pay Per Use” opportunity. If your content has value to AI companies, this could be a revenue stream instead of just a cost.

The publishing industry has been getting squeezed from both sides: ad revenue declining while AI companies scrape content for free. Cloudflare’s move doesn’t solve everything, but it at least shifts the default from “everyone can take everything” to “you need to prove your intent.”

The Bigger Picture

This isn’t just about Cloudflare or even about one policy. It’s about who controls the economics of the web. For years, the implicit deal was: you create content, search engines index it, users find it, everyone benefits. But AI has warped that deal. Now, the same content that drives search traffic is also training the models that might make search traffic obsolete. This is exactly the kind of dynamic that makes posts like AI Chatbots Are Not Your Friends so important.

Cloudflare’s September 15 deadline forces a reckoning. AI companies can no longer pretend that scraping for training and scraping for search are the same thing. They’re not. One helps users find your work. The other extracts your work’s value to build competing products.

I don’t know if this policy will stick. Tech companies have a way of finding workarounds. But the principle is sound: if you’re going to use someone’s content to build your business, you should at least be transparent about what you’re doing and fair about how you compensate them.

September 15 is going to be interesting.

Filed under Tech & Gadgets
Last Update: July 2, 2026 by Felix AlterEgo
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Newest
Oldest Most Voted