Scrape Data
This endpoint scrapes data from a specified URL within the session’s environment.
Endpoint
POST /env/scrape
Authorizations
Authorization (required):
Type:
string
Location: Header
Description: The access token received from the authorization server in the OAuth 2.0 flow.
Body
Content Type: application/json
keep_alive:
Type:
boolean
Default:
false
Description: If
true
, the session will not be closed after the operation is completed.
max_nb_actions:
Type:
integer
Default:
100
Description: The maximum number of actions to list after which the listing will stop. Used when
min_nb_actions
is not provided.
min_nb_actions:
Type:
integer | null
Description: The minimum number of actions to list before stopping. If not provided, the listing will continue until
max_nb_actions
is reached.
only_main_content:
Type:
boolean
Default:
true
Description: Whether to only scrape the main content of the page. If
true
, navbars, footers, etc., are excluded.
scrape_images:
Type:
boolean
Default:
false
Description: Whether to scrape images from the page. Images are not scraped by default.
screenshot:
Type:
boolean | null
Description: Whether to include a screenshot in the response.
session_id:
Type:
string | null
Description: The ID of the session. A new session is created if not provided.
session_timeout_minutes:
Type:
integer
Default:
5
Description: Session timeout in minutes. Cannot exceed the global timeout.
Range:
0 < x ≤ 30
url:
Type:
string | null
Description: The URL to observe. If not provided, uses the current page URL.
Response
Response Parameters
metadata (required):
Type:
object
Description: Metadata of the current page, including URL, title, and snapshot timestamp.
Attributes:
metadata.title (required):
string
- The title of the page.metadata.url (required):
string
- The URL of the page.metadata.timestamp (required):
string
- The timestamp when the scrape was performed.
session (required):
Type:
object
Description: Browser session information.
Attributes:
session.created_at (required):
string
- Session creation time.session.duration (required):
string
- Session duration.session.last_accessed_at (required):
string
- Last access time.session.session_id (required):
string
- The ID of the session.session.status (required):
enum<string>
- Session status. Options:active
,closed
,error
,timed_out
.session.timeout_minutes (required):
integer
- Session timeout in minutes.session.error (optional):
string | null
- Error message if the operation failed to complete.
data (optional):
Type:
object
Description: Extracted data from the page.
Attributes:
data.images (optional):
object[]
- List of images extracted from the page (ID and download link).data.markdown (optional):
string | null
- Markdown representation of the extracted data.data.structured (optional):
object[] | null
- Structured data extracted from the page in JSON format.
screenshot (optional):
Type:
file | null
Description: Base64-encoded screenshot of the current page.
space (optional):
Type:
object
Description: Available actions in the current state.
Attributes:
space.actions (required):
object[]
- List of available actions in the current state.space.description (required):
string
- Human-readable description of the current webpage.space.special_actions (optional):
object[]
- List of special browser actions.
Example Request
Example Response
200 - application/json
Last updated