All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

US Publishers Demand Common Crawl Stop Scraping Content for AI Training

By

Matt G. Southern

24d ago· 4 min readenNews

Summary

Digital Content Next (DCN), a trade body representing US digital publishers, has sent a cease and desist letter to the Common Crawl Foundation, demanding it stop scraping publisher content and remove protected material from its datasets. Common Crawl has been crawling billions of pages monthly since 2007 to build a free public archive, which has been widely used to train AI models, including OpenAI's GPT-3. DCN CEO Jason Kint announced the legal notice, escalating tensions between publishers and AI companies over the use of copyrighted content for AI training without permission or compensation.

Source

bskyUS Publishers Demand Common Crawl Stop Scraping Content for AI Trainingbuff.ly

Key quotes

· 3 pulled
Digital Content Next, a trade body representing US digital publishers, has sent a cease and desist letter to the Common Crawl Foundation.
The letter demands Common Crawl stop collecting publisher content and remove material already in its datasets.
Common Crawl has crawled several billion new pages each month since 2007 to build a free public archive.
Snippet from the RSS feed
Digital Content Next sent Common Crawl a cease and desist letter demanding it stop scraping publisher content and remove protected material from its datasets.

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.