Newsroom
Highlights:
ByteDance's bot, Bytespider, scrapes data 25 times faster than OpenAI's GPTbot.
The bot ignores robots.txt, a website instruction to prevent scraping.
Get smarter at marketing in just 5 minutes
Our 1x weekly, bite-sized newsletter will give you everything you need to know in the world of marketing:
ByteDance, the parent company of TikTok, is increasing its web scraping efforts with a new tool called Bytespider, which was launched in April, Fortune reports.
According to Sam Crowther, CEO of Kasada, a bot management company, there has been a significant increase in Bytespider's data collection over the past six weeks. Bytespider has been gathering data at a high speed, reportedly 25 times faster than OpenAI's GPTbot and 3,000 times quicker than Anthropic's ClaudeBot. “It’s like they’re trying desperately to catch up,” Crowther noted.
Potential reason for ByteDance data scrapping activity
It remains unclear why ByteDance is gathering data at a high rate, but it could be tied to its recent AI initiatives. There is speculation that ByteDance could be working on a new Large Language Model (LLM). It is also rumored that the company is developing an internal search engine powered by AI, potentially leveraging tools like ChatGPT. Recently, TikTok introduced its ‘Search Ads Campaign’ for advertisers to target users in search results.
This spike in scraping activity is also interesting given the increasing pressure from the U.S. government to restrict TikTok's operations. President Joe Biden has signed legislation pushing ByteDance to sell TikTok or shut it down due to national security concerns.
Controversies surrounding data scraping
Generative AI tools have changed web scraping practices. Bytespider has been collecting web data, ignoring restrictions like robots.txt files that allow publishers to block AI companies from accessing their web content without their permission.
Many individuals and organizations claim their copyrights are violated when their work is scraped. In August, Reddit's Chief Executive Officer, Steve Huffman, accused Microsoft of using Reddit’s data to improve its artificial intelligence models without Reddit's permission.
Previously, publishers including The New York Times and Condé Nast blocked Apple Intelligence from scraping their data.
10/07/2024
📰
Stories like this, in your inbox every Wednesday
Our 1x weekly, bite-sized newsletter will give you everything you need to know in the world of marketing: