Live View avatar

Live View

Try for free

No credit card required

Go to Store
Live View

Live View

vratous/live-view
Try for free

No credit card required

Dockerfile

1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node-chrome:beta
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY --chown=myuser:myuser . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

1{
2    "name": "apify-project",
3    "version": "0.0.1",
4    "description": "",
5    "author": "It's not you it's me",
6    "license": "ISC",
7    "dependencies": {
8        "apify": "1.0.0-beta.22"
9    },
10    "scripts": {
11        "start": "node main.js"
12    }
13}

main.js

1const Apify = require('apify');
2
3Apify.main(async () => {
4    // Get queue and enqueue first url.
5    const requestQueue = await Apify.openRequestQueue();
6    const enqueueUrl = async url => requestQueue.addRequest(new Apify.Request({ url }));
7    await enqueueUrl('https://news.ycombinator.com/');
8
9    // Create crawler.
10    const crawler = new Apify.PuppeteerCrawler({
11        requestQueue,
12        disableProxy: true,
13        launchPuppeteerOptions: {
14            liveView: true,
15            slowMo: 0,
16        },
17
18        // This page is executed for each request.
19        // If request failes then it's retried 3 times.
20        // Parameter page is Puppeteers page object with loaded page.
21        handlePageFunction: async ({ page, request }) => {
22            console.log(`Request ${request.url} succeeded!`);
23
24            // Extract all posts.
25            const data = await page.$$eval('.athing', (els) => {
26                return els.map(el => el.innerText);
27            });
28            
29            // Save data.
30            await Apify.pushData({
31                url: request.url,
32                data,
33            });
34            
35            // Enqueue next page.
36            try {
37                const nextHref = await page.$eval('.morelink', el => el.href);
38                await enqueueUrl(nextHref);
39            } catch (err) {
40                console.log(`Url ${request.url} is the last page!`);
41            }
42        },
43
44        // If request failed 4 times then this function is executed.
45        handleFailedRequestFunction: async ({ request }) => {
46            console.log(`Request ${request.url} failed 4 times`);
47            
48            await Apify.pushData({
49                url: request.url,
50                errors: request.errorMessages,
51            })
52        },
53    });
54    
55    // Run crawler.
56    await crawler.run();
57});
Developer
Maintained by Community

Actor Metrics

  • 0 monthly users

  • 0 No stars yet

  • Created in Jul 2018

  • Modified 3 years ago

Categories