All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

AI Search - Website Source CSS content selectors for precise content extraction in AI Search

2mo ago

Source

CloudflareAI Search - Website Source CSS content selectors for precise content extraction in AI Searchcloudflare.com
Snippet from the RSS feed
AI Search now supports CSS content selectors for website data sources. You can now define which parts of a crawled page are extracted and indexed by specifying CSS selectors paired with URL glob patterns. Content selectors solve the problem of indexing only relevant content while ignoring navigation, sidebars, footers, and other boilerplate. When a page URL matches a glob pattern, only elements matching the corresponding CSS selector are extracted and converted to Markdown for indexing. Configure content selectors via the dashboard or API: curl " \ -H "Authorization: Bearer {api_token}" \ -H "Content-Type: application/json" \ -d '{ "id": "my-ai-search", "source": " "type": "web-crawler", "source_params": { "web_crawler": { "parse_options": { "content_selector": [ { "path": "**/blog/**", "selector": "article .post-body" } ] } } } }' Selectors are evaluated in order, and the first matching pattern wins. You can define up to 10 content selector entries per instance. For configuration details and examples, refer to the content selectors documentation .

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.