Scraping Flow
Creating a Scraping Flow with NotexAI
This guide demonstrates how to use NotexAI to scrape an HBS news article and summarize it using OpenAI.
1. Use the Scrape Endpoint
Send a request to scrape a webpage:
curl --location 'https://api.notexai.pro/env/scrape/' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
"url": "https://www.hbs.edu/news/articles/Pages/awa-ambra-seck-profile-2024.aspx",
"scrape_images": false,
"screenshot": false
}'Response:
{
"session_id": "31f613a4-068e-464d-88b1-b8eb5d4d5c6f",
"error": null,
"title": "New Faculty Profiles: Awa Ambra Seck - News - Harvard Business School",
"url": "https://www.hbs.edu/news/articles/Pages/awa-ambra-seck-profile-2024.aspx",
"timestamp": "2025-01-08T14:23:32.969858",
"screenshot": null,
"data": {
"markdown": "...",
"images": null,
"structured": null
},
"space": null
}2. Extracted Data
The response includes structured data extracted from the page, such as the article title, metadata, and content. Here is an example:
# New Faculty Profiles: Awa Ambra Seck
## Article Metadata
- Date: 18 DEC 2024
- Title: New Faculty Profiles: Awa Ambra Seck
- Author: Harvard Business School
## Article Content
### Introduction
HBS faculty comprises scholars and practitioners who bring leading-edge research, extensive experience, and deep insights into the classroom, to organizations, and to leaders across the globe.
### Interview with Awa Ambra Seck
#### Educational Background
Awa Ambra Seck has a rich educational background, including a bachelor’s degree from the University of Torino and a PhD from Harvard University, with a focus on development economics and political economy.
#### Area of Research
Her research focuses on the intersection of development economics and economic history, particularly in Africa.
#### Teaching and Interests
She teaches Business, Government & the International Economy and has interests in cooking, dancing, and painting.3. Summarize with OpenAI
Use OpenAI’s API to summarize the extracted content:
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant that summarizes news articles in max 100 words."},
{"role": "user", "content": "...extracted article content..."}
],
"temperature": 0.7
}'Summary Output:
Awa Ambra Seck is a new faculty member at Harvard Business School specializing in development economics and political economy with a focus on Africa. She has degrees from the University of Torino, Bocconi University, and a PhD from Harvard. Her research explores the impact of cultural traits and colonial history on Africa’s economic systems. Outside academia, she enjoys cooking, dancing, and painting.That’s it! You’ve successfully created a scraping flow and summarized the content using NotexAI and OpenAI. 🌌
Last updated