Baidu Baike Takes Action Against Googlebot and Bingbot in AI Data Scraping
Baidu Baike's Block on Google and Bing
Chinese internet search giant Baidu has recently implemented a significant change in its robots.txt file that now prevents Googlebot and Bingbot from indexing content on its Baidu Baike service. This decision reflects Baidu's strategic move to enhance control over its valuable data assets, especially as the demand for AI datasets has surged.
Details of the Implementation
- The change occurred on August 8, as noted by the Wayback Machine records.
- Previously, Baidu Baike allowed partial indexing, but has now made broader restrictions.
Implications for AI and Data Sharing
This bold shift comes after similar actions by Reddit, which also restricted search engine indexing, except for its partnership with Google. Microsoft, too, has been vocal about safeguarding its data, threatening to withdraw access if competitors misuse the information.
Comparisons with Other Platforms
Notably, China's version of Wikipedia has opened up its entries to search crawlers, further emphasizing the difference in strategy. Despite the blockage, older entries may still be found through cached versions on US search engines.
Market Consequences
The current landscape indicates a thrust by digital platforms to secure their data as generative AI continues to attract interest and investment. These shifts may have lasting impacts on how AI developers access reliable content.
This article was prepared using information from open sources in accordance with the principles of Ethical Policy. The editorial team is not responsible for absolute accuracy, as it relies on data from the sources referenced.