One‑Page HTML Scraper with Cheerio
Scrape single page with provided URL with Axios and extract data from page's HTML with Cheerio.
src/main.js
1// Apify SDK - toolkit for building Apify Actors (Read more at https://docs.apify.com/sdk/js/).2import { Actor } from 'apify';3// Axios - Promise based HTTP client for the browser and node.js (Read more at https://axios-http.com/docs/intro).4import axios from 'axios';5// Cheerio - The fast, flexible & elegant library for parsing and manipulating HTML and XML (Read more at https://cheerio.js.org/).6import * as cheerio from 'cheerio';7// this is ESM project, and as such, it requires you to specify extensions in your relative imports8// read more about this here: https://nodejs.org/docs/latest-v18.x/api/esm.html#mandatory-file-extensions9// import { router } from './routes.js';10
11// The init() call configures the Actor for its environment. It's recommended to start every Actor with an init().12await Actor.init();13
14// Structure of input is defined in input_schema.json15const input = await Actor.getInput();16const { url } = input;17
18// Fetch the HTML content of the page.19const response = await axios.get(url);20
21// Parse the downloaded HTML with Cheerio to enable data extraction.22const $ = cheerio.load(response.data);23
24// Extract all headings from the page (tag name and text).25const headings = [];26$('h1, h2, h3, h4, h5, h6').each((i, element) => {27    const headingObject = {28        level: $(element).prop('tagName').toLowerCase(),29        text: $(element).text(),30    };31    console.log('Extracted heading', headingObject);32    headings.push(headingObject);33});34
35// Save headings to Dataset - a table-like storage.36await Actor.pushData(headings);37
38// Gracefully exit the Actor process. It's recommended to quit all Actors with an exit().39await Actor.exit();Scrape single-page in JavaScript template
A template for scraping data from a single web page in JavaScript (Node.js). The URL of the web page is passed in via input, which is defined by the input schema. The template uses the Axios client to get the HTML of the page and the Cheerio library to parse the data from it. The data are then stored in a dataset where you can easily access them.
The scraped data in this template are page headings but you can easily edit the code to scrape whatever you want from the page.
Included features
- Apify SDK - toolkit for building Actors
- Input schema - define and easily validate a schema for your Actor's input
- Dataset - store structured data where each object stored has the same attributes
- Axios client - promise-based HTTP Client for Node.js and the browser
- Cheerio - library for parsing and manipulating HTML and XML
How it works
- 
Actor.getInput()gets the input where the page URL is defined
- 
axios.get(url)fetches the page
- 
cheerio.load(response.data)loads the page data and enables parsing the headings
- 
This parses the headings from the page and here you can edit the code to parse whatever you need from the page $("h1, h2, h3, h4, h5, h6").each((_i, element) => {...});
- 
Actor.pushData(headings)stores the headings in the dataset
Resources
- Web scraping in Node.js with Axios and Cheerio
- Web scraping with Cheerio in 2023
- Video tutorial on building a scraper using CheerioCrawler
- Written tutorial on building a scraper using CheerioCrawler
- Integration with Zapier, Make, Google Drive, and others
- Video guide on getting data using Apify API
- A short guide on how to build web scrapers using code templates:
Crawlee + Cheerio (Quick start)
A scraper example that uses Cheerio to parse HTML. It's fast, but it can't run the website's JavaScript or pass JS anti-scraping challenges.
Crawlee + Puppeteer + Chrome
Example of a Puppeteer and headless Chrome web scraper. Headless browsers render JavaScript and are harder to block, but they're slower than plain HTTP.
Crawlee + Playwright + Chrome
Web scraper example with Crawlee, Playwright and headless Chrome. Playwright is more modern, user-friendly and harder to block than Puppeteer.
Crawlee + Playwright + Camoufox
Web scraper example with Crawlee, Playwright and Camoufox. Camoufox is a custom stealthy fork of Firefox. Try this template if you're facing anti-scraping challenges.
Bootstrap CheerioCrawler
Skeleton project that helps you quickly bootstrap `CheerioCrawler` in JavaScript. It's best for developers who already know Apify SDK and Crawlee.
Cypress
Example of running Cypress tests and saving their results on the Apify platform. JSON results are saved to Dataset, videos to Key-value store.