Parsing

If you want to turn websites into LLM-ready text, you can use Fixpoint's parsing APIs. Fixpoint can either:

turn the HTML into plain text or markdown
turn the HTML into chunked plain text for your RAG application (coming soon)

Parsing APIs

Parsing a single webpage

HTTP API: POST /v1/parses/webpage_parses
Python SDK: FixpointClient.parses.webpage.create(...)
API spec (opens in a new tab)

from fixpoint.client import FixpointClient
from fixpoint.client.types import CreateWebpageParseRequest, WebpageSource
 
client = FixpointClient(api_key="...")
 
parsed = await client.parses.webpage.create(
    CreateWebpageParseRequest(
        workflow_id="my-parsing-workflow",
        source=WebpageSource(url=site),
    )
)
print(parsed.content)

Crawling and parsing

HTTP API: POST /v1/parses/crawl_url_parses
Python SDK: FixpointClient.parses.crawl.create(...)
API spec (opens in a new tab)

Like with crawling extraction, you specify your source:

source = CrawlUrlSource(crawl_url=site, depth=2, page_limit=3)

You can specify the depth (how many URLs to crawl from your starting URL), and your page_limit (how many URLs to crawl in total).

from fixpoint.client import AsyncFixpointClient
from fixpoint.client.types import CreateCrawlUrlParseRequest, CrawlUrlSource
 
parsed = await client.parses.crawl.create(
    CreateCrawlUrlParseRequest(
        workflow_id="my-parsing-workflow",
        source=CrawlUrlSource(
            crawl_url=site,
            depth=2,
            page_limit=3,
        ),
    )
)
 
for page in parsed.page_contents:
print(page.source.url)
print(page.content)
print("\n\n")

Extraction Datasets Overview