Parsing
If you want to turn websites into LLM-ready text, you can use Fixpoint's parsing APIs. Fixpoint can either:
- turn the HTML into plain text or markdown
- turn the HTML into chunked plain text for your RAG application (coming soon)
Parsing APIs
Parsing a single webpage
- HTTP API:
POST /v1/parses/webpage_parses
- Python SDK:
FixpointClient.parses.webpage.create(...)
- API spec (opens in a new tab)
from fixpoint.client import FixpointClient
from fixpoint.client.types import CreateWebpageParseRequest, WebpageSource
client = FixpointClient(api_key="...")
parsed = await client.parses.webpage.create(
CreateWebpageParseRequest(
workflow_id="my-parsing-workflow",
source=WebpageSource(url=site),
)
)
print(parsed.content)
Crawling and parsing
- HTTP API:
POST /v1/parses/crawl_url_parses
- Python SDK:
FixpointClient.parses.crawl.create(...)
- API spec (opens in a new tab)
Like with crawling extraction, you specify your source:
source = CrawlUrlSource(crawl_url=site, depth=2, page_limit=3)
You can specify the depth
(how many URLs to crawl from your starting URL), and
your page_limit
(how many URLs to crawl in total).
from fixpoint.client import AsyncFixpointClient
from fixpoint.client.types import CreateCrawlUrlParseRequest, CrawlUrlSource
parsed = await client.parses.crawl.create(
CreateCrawlUrlParseRequest(
workflow_id="my-parsing-workflow",
source=CrawlUrlSource(
crawl_url=site,
depth=2,
page_limit=3,
),
)
)
for page in parsed.page_contents:
print(page.source.url)
print(page.content)
print("\n\n")