Scraping Flow

Creating a Scraping Flow with NotexAI

This guide demonstrates how to use NotexAI to scrape an HBS news article and summarize it using OpenAI.

1. Use the Scrape Endpoint

Send a request to scrape a webpage:

curl --location 'https://api.notexai.pro/env/scrape/' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
    "url": "https://www.hbs.edu/news/articles/Pages/awa-ambra-seck-profile-2024.aspx",
    "scrape_images": false,
    "screenshot": false
}'

Response:

{
    "session_id": "31f613a4-068e-464d-88b1-b8eb5d4d5c6f",
    "error": null,
    "title": "New Faculty Profiles: Awa Ambra Seck - News - Harvard Business School",
    "url": "https://www.hbs.edu/news/articles/Pages/awa-ambra-seck-profile-2024.aspx",
    "timestamp": "2025-01-08T14:23:32.969858",
    "screenshot": null,
    "data": {
        "markdown": "...",  
        "images": null,
        "structured": null
    },
    "space": null
}

2. Extracted Data

The response includes structured data extracted from the page, such as the article title, metadata, and content. Here is an example:

# New Faculty Profiles: Awa Ambra Seck
## Article Metadata
- Date: 18 DEC 2024
- Title: New Faculty Profiles: Awa Ambra Seck
- Author: Harvard Business School

## Article Content
### Introduction
HBS faculty comprises scholars and practitioners who bring leading-edge research, extensive experience, and deep insights into the classroom, to organizations, and to leaders across the globe.

### Interview with Awa Ambra Seck
#### Educational Background
Awa Ambra Seck has a rich educational background, including a bachelor’s degree from the University of Torino and a PhD from Harvard University, with a focus on development economics and political economy.

#### Area of Research
Her research focuses on the intersection of development economics and economic history, particularly in Africa.

#### Teaching and Interests
She teaches Business, Government & the International Economy and has interests in cooking, dancing, and painting.

3. Summarize with OpenAI

Use OpenAI’s API to summarize the extracted content:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
     "model": "gpt-4o-mini",
     "messages": [
        {"role": "system", "content": "You are a helpful assistant that summarizes news articles in max 100 words."},
        {"role": "user", "content": "...extracted article content..."}
     ],
     "temperature": 0.7
   }'

Summary Output:

Awa Ambra Seck is a new faculty member at Harvard Business School specializing in development economics and political economy with a focus on Africa. She has degrees from the University of Torino, Bocconi University, and a PhD from Harvard. Her research explores the impact of cultural traits and colonial history on Africa’s economic systems. Outside academia, she enjoys cooking, dancing, and painting.

That’s it! You’ve successfully created a scraping flow and summarized the content using NotexAI and OpenAI. 🌌

PreviousNavigation NextSession Management

Last updated 1 year ago

hashtagCreating a Scraping Flow with NotexAI

hashtag1. Use the Scrape Endpoint

hashtag2. Extracted Data

hashtag3. Summarize with OpenAI

Creating a Scraping Flow with NotexAI

1. Use the Scrape Endpoint

2. Extracted Data

3. Summarize with OpenAI