Scrape Data
This endpoint scrapes data from a specified URL within the session’s environment.
Endpoint
POST /env/scrape
Authorizations
Authorization (required):
Type:
stringLocation: Header
Description: The access token received from the authorization server in the OAuth 2.0 flow.
Body
Content Type: application/json
keep_alive:
Type:
booleanDefault:
falseDescription: If
true, the session will not be closed after the operation is completed.
max_nb_actions:
Type:
integerDefault:
100Description: The maximum number of actions to list after which the listing will stop. Used when
min_nb_actionsis not provided.
min_nb_actions:
Type:
integer | nullDescription: The minimum number of actions to list before stopping. If not provided, the listing will continue until
max_nb_actionsis reached.
only_main_content:
Type:
booleanDefault:
trueDescription: Whether to only scrape the main content of the page. If
true, navbars, footers, etc., are excluded.
scrape_images:
Type:
booleanDefault:
falseDescription: Whether to scrape images from the page. Images are not scraped by default.
screenshot:
Type:
boolean | nullDescription: Whether to include a screenshot in the response.
session_id:
Type:
string | nullDescription: The ID of the session. A new session is created if not provided.
session_timeout_minutes:
Type:
integerDefault:
5Description: Session timeout in minutes. Cannot exceed the global timeout.
Range:
0 < x ≤ 30
url:
Type:
string | nullDescription: The URL to observe. If not provided, uses the current page URL.
Response
Response Parameters
metadata (required):
Type:
objectDescription: Metadata of the current page, including URL, title, and snapshot timestamp.
Attributes:
metadata.title (required):
string- The title of the page.metadata.url (required):
string- The URL of the page.metadata.timestamp (required):
string- The timestamp when the scrape was performed.
session (required):
Type:
objectDescription: Browser session information.
Attributes:
session.created_at (required):
string- Session creation time.session.duration (required):
string- Session duration.session.last_accessed_at (required):
string- Last access time.session.session_id (required):
string- The ID of the session.session.status (required):
enum<string>- Session status. Options:active,closed,error,timed_out.session.timeout_minutes (required):
integer- Session timeout in minutes.session.error (optional):
string | null- Error message if the operation failed to complete.
data (optional):
Type:
objectDescription: Extracted data from the page.
Attributes:
data.images (optional):
object[]- List of images extracted from the page (ID and download link).data.markdown (optional):
string | null- Markdown representation of the extracted data.data.structured (optional):
object[] | null- Structured data extracted from the page in JSON format.
screenshot (optional):
Type:
file | nullDescription: Base64-encoded screenshot of the current page.
space (optional):
Type:
objectDescription: Available actions in the current state.
Attributes:
space.actions (required):
object[]- List of available actions in the current state.space.description (required):
string- Human-readable description of the current webpage.space.special_actions (optional):
object[]- List of special browser actions.
Example Request
Example Response
200 - application/json
Last updated