Configure Scraping
Before WebSpeaker can power AI search on your website, it needs to build a knowledge base from your content. This is done through the scraping configuration, where you define which pages to crawl, what content to extract, and what to skip. Proper configuration ensures that the AI has access to relevant, high-quality content and ignores noise like navigation menus, footers, or irrelevant pages.

Add Website URLs
In the management portal, navigate to your project and open the AI Search -> Content -> Scraping section. Here you can add one or more root URLs that WebSpeaker should start crawling from. The scraper follows links found on these pages to discover additional content, building a comprehensive map of your site. You can add multiple starting URLs if your content is spread across different sections or subdomains.
Each URL entry defines a crawling scope. The scraper will follow internal links within the same domain by default, discovering pages automatically. You do not need to list every individual page; just provide the entry points and the scraper will take care of the rest.
Configure CSS Selectors
To ensure that only meaningful content is extracted from each page,
you can define CSS selectors that target specific areas of your
HTML. For example, if your main content is inside an <article> tag
or a <div class="content"> element, you can specify that selector
so the scraper ignores headers, sidebars, footers, and other
peripheral elements. This produces cleaner text that leads to more
accurate and relevant search results.
You can configure both include selectors (which elements to extract content from) and exclude selectors (which elements to skip even if they appear within the included area). This gives you fine-grained control over what ends up in your knowledge base.
Set Up Skip URL Patterns
Not every page on your website is relevant for the knowledge base. You can define URL patterns to skip during crawling. For example, you might want to exclude login pages, admin panels, paginated listing pages, or URLs containing specific query parameters. Skip patterns accept standard pattern matching, allowing you to filter out entire sections of your site efficiently.
Common patterns to skip include URLs containing /admin, /login,
/cart, or pagination parameters like ?page=. Setting up
appropriate skip patterns reduces the amount of irrelevant content
in your knowledge base and speeds up the scraping process.
JavaScript Hooks for Dynamic Content
Some websites load content dynamically using JavaScript. If your pages rely on client-side rendering or lazy loading, the scraper may not capture all the content from the initial HTML alone. WebSpeaker supports JavaScript hooks that allow you to configure how the scraper handles dynamic content. This ensures that content rendered after the initial page load is also captured and included in the knowledge base.
Content Age Limits
You can configure content age limits to exclude pages that were published a long time ago. When you set an age limit in months, the scraper skips any page whose publication date is older than the specified threshold. This is useful for websites with a large archive of outdated content, such as news sites or blogs, where old articles are no longer relevant. By setting an age limit, you ensure that only recent content is included in your knowledge base and that visitors receive up-to-date answers from the AI search.
Use the Date selector field to specify a CSS selector pointing to
the HTML element that contains the publication date on your pages
(e.g. time.published, .post-date). The scraper reads the date
from that element and compares it against the age limit to decide
whether the page should be included.
Review your scraping configuration carefully before triggering a run. A well-tuned configuration produces a clean, focused knowledge base that directly improves the quality of AI search results.