No credit card required

My Actor 24

honzakirchner43/my-actor-24

Try for free

No credit card required

Developer

Jan Kirchner

Actor Metrics

1 monthly user
2 bookmarks
>99% runs succeeded
Created in Aug 2023
Modified a year ago

Categories

.actor/Dockerfile

1# First, specify the base Docker image.
2# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
3# You can also use any other image from Docker Hub.
4FROM apify/actor-python:3.11
5
6# Second, copy just requirements.txt into the actor image,
7# since it should be the only file that affects the dependency install in the next step,
8# in order to speed up the build
9COPY requirements.txt ./
10
11# Install the packages specified in requirements.txt,
12# Print the installed Python version, pip version
13# and all installed packages with their versions for debugging
14RUN echo "Python version:" \
15 && python --version \
16 && echo "Pip version:" \
17 && pip --version \
18 && echo "Installing dependencies:" \
19 && pip install -r requirements.txt \
20 && echo "All installed Python packages:" \
21 && pip freeze
22
23# Next, copy the remaining files and directories with the source code.
24# Since we do this after installing the dependencies, quick build will be really fast
25# for most source file changes.
26COPY . ./
27
28# Specify how to launch the source code of your actor.
29# By default, the "python3 -m src" command is run
30CMD ["python3", "-m", "src"]

.actor/actor.json

1{
2    "actorSpecification": 1,
3    "name": "my-actor-24",
4    "title": "Scrape single page in Python",
5    "description": "Scrape data from single page with provided URL.",
6    "version": "0.0",
7    "meta": {
8        "templateId": "python-start"
9    },
10    "input": "./input_schema.json",
11    "dockerfile": "./Dockerfile"
12}

.actor/input_schema.json

1{
2    "title": "Scrape data from a web page",
3    "type": "object",
4    "schemaVersion": 1,
5    "properties": {
6        "url": {
7            "title": "URL of the page",
8            "type": "string",
9            "description": "The URL of website you want to get the data from.",
10            "editor": "textfield",
11            "prefill": "https://www.apify.com/"
12        }
13    },
14    "required": ["url"]
15}

.dockerignore

1# configurations
2.idea
3
4# crawlee and apify storage folders
5apify_storage
6crawlee_storage
7storage
8
9# installed files
10.venv
11
12# git folder
13.git

.editorconfig

1root = true
2
3[*]
4indent_style = space
5indent_size = 4
6charset = utf-8
7trim_trailing_whitespace = true
8insert_final_newline = true
9end_of_line = lf

.gitignore

1# This file tells Git which files shouldn't be added to source control
2
3.idea
4.DS_Store
5
6apify_storage
7storage/*
8!storage/key_value_stores
9storage/key_value_stores/*
10!storage/key_value_stores/default
11storage/key_value_stores/default/*
12!storage/key_value_stores/default/INPUT.json
13
14.venv/
15.env/
16__pypackages__
17dist/
18build/
19*.egg-info/
20*.egg
21
22__pycache__
23
24.mypy_cache
25.dmypy.json
26dmypy.json
27.pytest_cache
28
29.scrapy
30*.log

requirements.txt

1# Add your dependencies here.
2# See https://pip.pypa.io/en/latest/reference/requirements-file-format/
3# for how to format them
4apify ~= 1.1.1
5beautifulsoup4 ~= 4.12.0
6requests ~= 2.31.0

src/init.py

src/main.py

1import asyncio
2import logging
3
4from apify.log import ActorLogFormatter
5
6from .main import main
7
8# Set up logging of messages from the Apify SDK
9handler = logging.StreamHandler()
10handler.setFormatter(ActorLogFormatter())
11
12apify_client_logger = logging.getLogger('apify_client')
13apify_client_logger.setLevel(logging.INFO)
14apify_client_logger.addHandler(handler)
15
16apify_logger = logging.getLogger('apify')
17apify_logger.setLevel(logging.DEBUG)
18apify_logger.addHandler(handler)
19
20asyncio.run(main())

src/main.py

1# Apify SDK - toolkit for building Apify Actors (Read more at https://docs.apify.com/sdk/python).
2from apify import Actor
3# Requests - library for making HTTP requests in Python (Read more at https://requests.readthedocs.io)
4import requests
5# Beautiful Soup - library for pulling data out of HTML and XML files (Read more at https://www.crummy.com/software/BeautifulSoup/bs4/doc)
6from bs4 import BeautifulSoup
7
8async def main():
9    async with Actor:
10        # Structure of input is defined in input_schema.json
11        actor_input = await Actor.get_input() or {}
12        url = actor_input.get('url')
13
14        # Fetch the HTML content of the page.
15        response = requests.get(url)
16
17        # Parse the HTML content using Beautiful Soup.
18        soup = BeautifulSoup(response.content, 'html.parser')
19
20        # Extract all headings from the page (tag name and text).
21        headings = []
22        for heading in soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6']):
23            heading_object = { 'level': heading.name, 'text': heading.text }
24            print('Extracted heading', heading_object)
25            headings.append(heading_object)
26
27        # Save headings to Dataset - a table-like storage.
28        await Actor.push_data(headings)

Web Scraper

apify/web-scraper

The scraper of Web

Apify Technologies

134

Tutti Fajly Actor

vratous/tutti-fajly-actor

This actor was created just to test multifile support

Vratislav Bartonicek

Filler test

pavel-at/filler

Filler

Pavel

Screenshot Url

apify/screenshot-url

Actor serving as an example of Input Schema. Takes URL of website and screenshot configuration parameters as input and outputs a screenshot of the website into Key-Value store.

Apify Technologies

Instagram Profile Scraper

pepa.valek/ActorWithAllDetailsForStore

Scrape and download Instagram posts, profiles, places, hashtags, photos, and comments. Supports search queries and URL lists. Download your data as HTML table, JSON, CSV, Excel, XML, and RSS feed.

Josef Válek

user-me-2

jnnv/user-me-2

Jan Novotný

Micro Actor

tereza_vrchovinova/micro-actor

Super speedy Actor used for Cypress tests.

Tereza Vrchovinová

Micro

seed_ne68cdec_devel/micro

Joe Doe Admin

Micro

seed_8vt5y404_admin/micro

Joe Doe Admin

Test-Actor-Validation

grizzlygriff/my-actor

This is a test of actor validation. David H. and I are setting up some rules around actor publication and some standards that they should have in order to be validated by Apify.