Page Analyzer
No credit card required
Page Analyzer
No credit card required
Analyzes pages and searches for provided query strings
This Apify actor analyzes a web page on a specific URL. It extracts HTML and javascript variables from main response and HTML/JSON data from XHR requests. Then it analyses loaded data:
- It performs analysis of initial HTML (html loaded directly from response):
- Looks for Schema.org data and if it finds anything, it saves it to output as
schemaOrgData
variable. - Looks for JSON-LD link tags and parses found JSON, if it finds anything it outputs it as
jsonLDData
variable. - Looks for
meta
andtitle
tags and outputs found content asmetadata
variable.
- Loads all XHR requests -> discards request that do no contain HTML or JSON -> parses HTML and JSON into objects
- When all XHR requests are finished it loads HTML from the rendered page (it might have changed thanks to JS manipulation) and does work from step 1 again because javascript might have changed the HTML of the website.
- Loads all window variables and discards common global variables (console, innerHeight, navigator, ...), cleans the output (removes all functions and circular paths) and outputs it as
allWindowProperties
variable.
When analysis is finished it checks INPUT parameters if there are any strings to search for and if there are. Then it attempts to find the strings in all found content.
The actor ends when all output is parsed and searched. If connection to URL fails or if any part of the actor crashes, the actor ends with error in output and log.
Input to actor is provided from INPUT file. If the actor is run through Apify, then INPUT comes from key value store. If you want to start the actor localy, then call
npm run start-local
and provide input as a file in directory kv-store-dev
.
INPUT
1{ 2 // url to website, that is supposed to be analyzed 3 "url": "http://example.com", 4 // array of strings too look for on the website, if empty, search is skipped during analysis 5 "searchFor": ["About us"] 6}
During the actor run, it saves output into OUTPUT file, which is saved in key value store if the actor is run through Apify, or in kv-store-dev
folder if the actor is run localy.
OUTPUT
1{ 2 // Initial response headers 3 "initialResponse": { 4 "url": "https://www.flywire.com/", 5 "headers": {...} 6 }, 7 // True if window variables were parsed after XHR requests finished 8 "windowPropertiesParsed": true, 9 // True if meta tags were parsed from initial response 10 "metaDataParsed": true, 11 // True if Schema.org was loaded and parsed from initial response 12 "schemaOrgDataParsed": true, 13 // True if JSON-LD was loaded and parsed from initial response 14 "jsonLDDataParsed": true, 15 // True if HTML was loaded and parsed from initial response 16 "htmlParsed": true, 17 // True if HTML was loaded and parsed after XHR requests finished 18 "htmlFullyParsed": true, 19 // True if XHR requests were all parsed 20 "xhrRequestsParsed": true, 21 // Filtered window properties by search strings 22 "windowProperties": {}, 23 // Object containing cleaned up window object properties 24 "allWindowProperties": {...}, 25 // Array of properties which contain searched strings (at least one) with path to variable from root 26 "windowPropertiesFound": [], 27 // Schema.org data filtered by search strings. 28 "schemaOrgData": {}, 29 // Array of schema org properties which contain searched strings (at least one) with path to variable from root 30 "schemaOrgDataFound": [], 31 // Complete output of found schema.org data 32 "allSchemaOrgData": [], 33 // Complete output of all found meta tags 34 "metaData": { 35 "viewport": "width=device-width, initial-scale=1", 36 "og:title": "International Payments Solution", 37 ... 38 }, 39 // List of meta tags matching the searched strings 40 "metaDataFound": [], 41 // JSON-LD Data filtered by search strings. 42 "jsonLDData": {}, 43 // Array of JSON-LD data properties which contain searched strings (at least one) with path to variable from root 44 "jsonLDDataFound": [], 45 // Complete output of found JSON-LD 46 "allJsonLDData": [], 47 // Array of selectors to HTML elements that contain the searched values 48 "htmlFound": [], 49 // Array of parsed XHR requests with content type of JSON or HTML 50 "xhrRequests": [ 51 { 52 "url": "https://www.flywire.com/destinations", 53 "method": "GET", 54 "responseStatus": 200, 55 "responseHeaders": {...}, 56 "responseBody": { 57 // Valid provides information whether JSON was parsed successfully 58 "valid": true/false, 59 // Data contains the parsed JSON 60 "data": [...], 61 } 62 }, 63 { 64 "url": "https://www.flywire.com/asdasd", 65 "method": "GET", 66 "responseStatus": 200, 67 "responseHeaders": {...}, 68 // For HTML requests responseBody contains HTML as string 69 "responseBody": "<html>...." 70 }, 71 ], 72 // same list as above, but filtered by search strings 73 "xhrRequestsFound": [...], 74 // contains error if actor failed outside of page function 75 "error": null, 76 // contains error if actor failed in page.evaluate 77 "pageError": null, 78 "outputFinished": true, 79 80 // timestamps for debugging 81 "analysisStarted": "2018-02-09T12:34:49.938Z", 82 "scrappingStarted": "2018-02-09T12:34:50.050Z", 83 "pageNavigated": "2018-02-09T12:34:53.495Z", 84 "windowPropertiesSearched": "2018-02-09T12:34:53.810Z", 85 "metadataSearched": "2018-02-09T12:34:51.624Z", 86 "schemaOrgSearched": "2018-02-09T12:34:51.627Z", 87 "jsonLDSearched": "2018-02-09T12:34:51.625Z", 88 "htmlSearched": "2018-02-09T12:34:53.746Z", 89 "xhrRequestsSearched": "2018-02-09T12:34:53.517Z", 90 "analysisEnded": "2018-02-09T12:34:53.810Z", 91}
Actor Metrics
1 monthly user
-
1 star
>99% runs succeeded
Created in Jun 2019
Modified 2 years ago