Cloudflare AI Audit Tool: A Defense Against Data Scraping Bots

Cloudflare Introduces AI Audit Tool to Block AI Bots
With the increasing popularity of generative AI technology, there has been a rush to train large language models (LLMs) with human-created data. This is important not only to create a foundation model but also to advance and improve it. However, the main issue here is accessing publicly available data sources. Most AI firms have already gone through these datasets and now require more data to train AI models.
AI bots are designed to help developers gather more information to train AI models. They can be understood as systematic programmes imitating a real user that can enter websites and copy text, image, and video data. These AI bots can scrape through large volumes of data in a short period and deliver it to the AI model. In recent times, several media firms and large websites have filed lawsuits against AI firms alleging plagiarism and illegally using data to feed LLMs.
Enhanced Control and Features
Cloudflare's AI Audit tool comes as a protective layer that can block such bots from accessing websites. The company revealed in a press release that it has also made improvements to the tool to give users more control over which bots will be restricted and which bots will be allowed access. This is useful in cases where the platform has struck a deal with an AI firm and does not mind its bots taking the data. Alternatively, the website owner might want to give access to certain AI models which attribute the source of the data to gain a better reach.
Cloudflare highlighted that it is also building a workflow where website owners can set a fair price on their content. AI bot owners, on the other hand, will be able to transact with this firewall, and once the amount has been paid, they will be given the right to scan content. The company highlights that its marketplace-like tool can be beneficial for users who do not have the bandwidth or resources to negotiate and strike deals with each AI firm that approaches their website.
This article was prepared using information from open sources in accordance with the principles of Ethical Policy. The editorial team is not responsible for absolute accuracy, as it relies on data from the sources referenced.