In light of the swiftly evolving landscape of generative artificial intelligence (AI), Google has strived to foster a balanced ecosystem for both web publishers and AI advancements.
The tech giant announced Google-Extended, a control mechanism for web publishers to determine if or how their sites contribute to enhancing Google Bard, Vertex AI generative APIs, and future AI models.
The initiative appears to be grounded in the ethos of responsible AI development, resonating with Google’s established AI principles and consumer privacy commitment.
What Is Google-Extended?
Google-Extended is a “standalone product token that web publishers can use to manage whether their sites help improve Bard and Vertex AI generative APIs” and the AI models that power them.
While Google-Extended has no separate HTTP request user agent string, the crawling is executed with the existing Google user agent strings, deploying the robots.txt user-agent token in a control capacity.
This is an example of what to include in your robots.txt file:
In this example:
- User-agent: Google-Extended specifies that the following rules apply to Google-Extended.
- Disallow: /paywall-content/ instructs Google-Extended not to access or use the content in the “paywall-content” directory to improve Bard and Vertex AI generative APIs.
- Allow: / instructs Google-Extended to access and use content from all other site directories to improve future AI products.
The development underscores the fine line between advancing AI technology and preserving the autonomy and interests of web publishers.
Managing AI Access To Website Content
As the applications of AI continue to scale across all industries, the labyrinth of managing various companies’ access to content for AI training data is a reality that web publishers must grapple with.
To this end, Google has voiced its commitment to engaging with web and AI communities, delving into additional machine-readable avenues to extend choice and control to web publishers.
Those who want more information can fill out this form to join Google’s AI Web Publisher Controls Mailing List for future updates.
Publishers who do not want content used in future OpenAI models should also consider the GPTbot for limiting or restricting access.
Featured image: Tada Images/Shutterstock