Live View
Try for free
No credit card required
Go to Store
Live View
vratous/live-view
Try for free
No credit card required
Dockerfile
1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node-chrome:beta
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]
package.json
1{
2 "name": "apify-project",
3 "version": "0.0.1",
4 "description": "",
5 "author": "It's not you it's me",
6 "license": "ISC",
7 "dependencies": {
8 "apify": "1.0.0-beta.22"
9 },
10 "scripts": {
11 "start": "node main.js"
12 }
13}
main.js
1const Apify = require('apify');
2
3Apify.main(async () => {
4 // Get queue and enqueue first url.
5 const requestQueue = await Apify.openRequestQueue();
6 const enqueueUrl = async url => requestQueue.addRequest(new Apify.Request({ url }));
7 await enqueueUrl('https://news.ycombinator.com/');
8
9 // Create crawler.
10 const crawler = new Apify.PuppeteerCrawler({
11 requestQueue,
12 disableProxy: true,
13 launchPuppeteerOptions: {
14 liveView: true,
15 slowMo: 0,
16 },
17
18 // This page is executed for each request.
19 // If request failes then it's retried 3 times.
20 // Parameter page is Puppeteers page object with loaded page.
21 handlePageFunction: async ({ page, request }) => {
22 console.log(`Request ${request.url} succeeded!`);
23
24 // Extract all posts.
25 const data = await page.$$eval('.athing', (els) => {
26 return els.map(el => el.innerText);
27 });
28
29 // Save data.
30 await Apify.pushData({
31 url: request.url,
32 data,
33 });
34
35 // Enqueue next page.
36 try {
37 const nextHref = await page.$eval('.morelink', el => el.href);
38 await enqueueUrl(nextHref);
39 } catch (err) {
40 console.log(`Url ${request.url} is the last page!`);
41 }
42 },
43
44 // If request failed 4 times then this function is executed.
45 handleFailedRequestFunction: async ({ request }) => {
46 console.log(`Request ${request.url} failed 4 times`);
47
48 await Apify.pushData({
49 url: request.url,
50 errors: request.errorMessages,
51 })
52 },
53 });
54
55 // Run crawler.
56 await crawler.run();
57});
Developer
Maintained by Community
Actor Metrics
0 monthly users
-
0 No stars yet
Created in Jul 2018
Modified 3 years ago
Categories