Scraping Flow

Creating a Scraping Flow with NotexAI

This guide demonstrates how to use NotexAI to scrape an HBS news article and summarize it using OpenAI.


1. Use the Scrape Endpoint

Send a request to scrape a webpage:

curl --location 'https://api.notexai.pro/env/scrape/' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
    "url": "https://www.hbs.edu/news/articles/Pages/awa-ambra-seck-profile-2024.aspx",
    "scrape_images": false,
    "screenshot": false
}'

Response:

{
    "session_id": "31f613a4-068e-464d-88b1-b8eb5d4d5c6f",
    "error": null,
    "title": "New Faculty Profiles: Awa Ambra Seck - News - Harvard Business School",
    "url": "https://www.hbs.edu/news/articles/Pages/awa-ambra-seck-profile-2024.aspx",
    "timestamp": "2025-01-08T14:23:32.969858",
    "screenshot": null,
    "data": {
        "markdown": "...",  
        "images": null,
        "structured": null
    },
    "space": null
}

2. Extracted Data

The response includes structured data extracted from the page, such as the article title, metadata, and content. Here is an example:


3. Summarize with OpenAI

Use OpenAI’s API to summarize the extracted content:

Summary Output:


That’s it! You’ve successfully created a scraping flow and summarized the content using NotexAI and OpenAI. 🌌

Last updated