Scrape Data
This endpoint scrapes data from a specified URL within the session’s environment.
Endpoint
POST /env/scrape
Authorizations
Authorization (required):
Type:
string
Location: Header
Description: The access token received from the authorization server in the OAuth 2.0 flow.
Body
Content Type: application/json
keep_alive:
Type:
boolean
Default:
false
Description: If
true
, the session will not be closed after the operation is completed.
max_nb_actions:
Type:
integer
Default:
100
Description: The maximum number of actions to list after which the listing will stop. Used when
min_nb_actions
is not provided.
min_nb_actions:
Type:
integer | null
Description: The minimum number of actions to list before stopping. If not provided, the listing will continue until
max_nb_actions
is reached.
only_main_content:
Type:
boolean
Default:
true
Description: Whether to only scrape the main content of the page. If
true
, navbars, footers, etc., are excluded.
scrape_images:
Type:
boolean
Default:
false
Description: Whether to scrape images from the page. Images are not scraped by default.
screenshot:
Type:
boolean | null
Description: Whether to include a screenshot in the response.
session_id:
Type:
string | null
Description: The ID of the session. A new session is created if not provided.
session_timeout_minutes:
Type:
integer
Default:
5
Description: Session timeout in minutes. Cannot exceed the global timeout.
Range:
0 < x ≤ 30
url:
Type:
string | null
Description: The URL to observe. If not provided, uses the current page URL.
Response
Response Parameters
metadata (required):
Type:
object
Description: Metadata of the current page, including URL, title, and snapshot timestamp.
Attributes:
metadata.title (required):
string
- The title of the page.metadata.url (required):
string
- The URL of the page.metadata.timestamp (required):
string
- The timestamp when the scrape was performed.
session (required):
Type:
object
Description: Browser session information.
Attributes:
session.created_at (required):
string
- Session creation time.session.duration (required):
string
- Session duration.session.last_accessed_at (required):
string
- Last access time.session.session_id (required):
string
- The ID of the session.session.status (required):
enum<string>
- Session status. Options:active
,closed
,error
,timed_out
.session.timeout_minutes (required):
integer
- Session timeout in minutes.session.error (optional):
string | null
- Error message if the operation failed to complete.
data (optional):
Type:
object
Description: Extracted data from the page.
Attributes:
data.images (optional):
object[]
- List of images extracted from the page (ID and download link).data.markdown (optional):
string | null
- Markdown representation of the extracted data.data.structured (optional):
object[] | null
- Structured data extracted from the page in JSON format.
screenshot (optional):
Type:
file | null
Description: Base64-encoded screenshot of the current page.
space (optional):
Type:
object
Description: Available actions in the current state.
Attributes:
space.actions (required):
object[]
- List of available actions in the current state.space.description (required):
string
- Human-readable description of the current webpage.space.special_actions (optional):
object[]
- List of special browser actions.
Example Request
curl --location \
--request POST 'https://api.notexai.pro/env/scrape' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
"session_id": "abcd1234-5678-90ef-ghij-klmnopqrstuv",
"url": "https://example.com",
"scrape_images": true,
"only_main_content": true,
"screenshot": true
}'
Example Response
200 - application/json
{
"metadata": {
"title": "Example Page Title",
"url": "https://example.com",
"timestamp": "2025-01-24T16:00:00Z"
},
"session": {
"created_at": "2025-01-24T15:00:00Z",
"duration": "10 minutes",
"last_accessed_at": "2025-01-24T15:50:00Z",
"session_id": "abcd1234-5678-90ef-ghij-klmnopqrstuv",
"status": "active",
"timeout_minutes": 10,
"error": null
},
"data": {
"images": [
{
"id": "image1",
"url": "https://example.com/image1.jpg"
}
],
"markdown": "# Example Page\nContent goes here.",
"structured": null
},
"screenshot": "...base64-encoded-data...",
"space": {
"description": "This page allows users to perform various actions.",
"actions": [
{
"id": "action1",
"description": "Search for items."
}
]
}
}
Last updated