Pricing

Pay per usage

Go to Apify Store

Puppeteer Scraper

Try for free

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Apify Technologies

Actor stats

Bookmarked

Total users

Monthly active users

7 years ago

Last modified

Categories

Open source

Start URLs

startUrls

Required

URLs to start with

Type:array

Use request queue

useRequestQueue

Optional

Request queue enables recursive crawling and the use of Pseudo-URLs and Link selector.

Type:boolean

Default:true

Pseudo-URLs

pseudoUrls

Optional

Pseudo-URLs to match links in the page that you want to enqueue. Combine with Link selector to tell the crawler where to find links.

Type:array

Default:

[]

Link selector

linkSelector

Optional

CSS selector matching elements with 'href' attributes that should be enqueued. To enqueue urls from '<div class="my-class" href=...>' tags, you would enter 'div.my-class'.

Type:string

Page function

pageFunction

Required

Function executed for each request

Type:string

Proxy configuration

proxyConfiguration

Optional

Choose to use no proxy, Apify Proxy, or provide custom proxy URLs.

Type:object

Default:

{}

Debug log

debugLog

Optional

Debug messages will be included in the log. Use context.log.debug('message') to log your own debug messages.

Type:boolean

Default:false

Browser log

browserLog

Optional

Console messages from the Browser will be included in the log. This may result in the log being flooded by error messages, warnings and other messages of little value, especially with high concurrency.

Type:boolean

Default:false

Download media

downloadMedia

Optional

Crawler will skip downloading media such as images, fonts, videos and sounds. This helps to speed up the crawl, but may break certain websites.

Type:boolean

Default:false

Download CSS

downloadCss

Optional

Crawler will skip downloading CSS stylesheets. This helps to speed up the crawl, but may break certain websites.

Type:boolean

Default:false

Ignore SSL errors

ignoreSslErrors

Optional

Crawler will ignore SSL certificate errors.

Type:boolean

Default:false

Max request retries

maxRequestRetries

Optional

Maximum number of times the request for the page will be retried in case of an error. Setting it to 0 means that the request will be attempted once and will not be retried if it fails.

Type:integer

Minimum:0

Default:3

Max pages per crawl

maxPagesPerCrawl

Optional

Maximum number of pages that the crawler will open. 0 means unlimited.

Type:integer

Minimum:0

Default:0

Max result records

maxResultsPerCrawl

Optional

Maximum number of results that will be saved to dataset. The crawler will terminate afterwards. 0 means unlimited.

Type:integer

Minimum:0

Default:0

Max crawling depth

maxCrawlingDepth

Optional

Defines how many links away from the StartURLs will the crawler descend. 0 means unlimited.

Type:integer

Minimum:0

Default:0

Max concurrency

maxConcurrency

Optional

Defines how many pages can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. Use this option to set a hard limit.

Type:integer

Minimum:1

Default:50

Page load timeout

pageLoadTimeoutSecs

Optional

Maximum time the crawler will allow a web page to load in seconds.

Type:integer

Minimum:1

Maximum:360

Default:60

Page function timeout

pageFunctionTimeoutSecs

Optional

Maximum time the crawler will wait for the page function to execute in seconds.

Type:integer

Minimum:1

Maximum:360

Default:60

Custom data

customData

Optional

This object will be available on pageFunction's context as customData.

Type:object

Default:

{}

Initial cookies

initialCookies

Optional

The provided cookies will be pre-set to all pages the scraper opens.

Type:array

Default:

[]

Pre goto function

preGotoFunction

Optional

This function is executed before navigation to a given URL. It can be useful to do pre-processing, changes to the page that allow bypassing anti-scraping protections or just setting cookies.

Type:string