Cheerio Scraper
Pricing
Pay per usage
Request queue enables recursive crawling and the use of Pseudo-URLs and Link selector.
Pseudo-URLs to match links in the page that you want to enqueue. Combine with Link selector to tell the crawler where to find links.
[]CSS selector matching elements with 'href' attributes that should be enqueued. To enqueue urls from '<div class="my-class" href=...>' tags, you would enter 'div.my-class'.
Choose to use no proxy, Apify Proxy, or provide custom proxy URLs.
{}Debug messages will be included in the log. Use context.log.debug('message') to log your own debug messages.
Crawler will ignore SSL certificate errors.
The scraper will use a cookie jar to persist cookies between requests. This is a temporary solution and the feature is UNSTABLE, meaning that it will most likely be removed in the future and replaced with a different API. Use at your own risk.
Maximum number of times the request for the page will be retried in case of an error. Setting it to 0 means that the request will be attempted once and will not be retried if it fails.
Maximum number of pages that the crawler will open. 0 means unlimited.
Maximum number of results that will be saved to dataset. The crawler will terminate afterwards. 0 means unlimited.
Defines how many links away from the StartURLs will the crawler descend. 0 means unlimited.
Defines how many pages can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. Use this option to set a hard limit.
Maximum time the crawler will allow a web page to load in seconds.
Maximum time the crawler will wait for the page function to execute in seconds.
This object will be available on pageFunction's context as customData.
{}