Datasets Overview
Fixpoint has two types of datasets:
- pre-built: datasets already built by Fixpoint, such as company data, people data, patents, academic papers, etc.
- custom: datasets built by you, which combine your own custom web search and scraping, other datasets, and your own documents
We are continually adding new pre-built datasets.
Pre-built Datasets
Company Data
Fixpoint pre-extracts structured data about companies, giving you a comprehensive starting point for the company data you need for your applications, data workflows, or AI agents. If you want, you can further enrich this data via web scraping, or use the pre-extracted company data.
See the Company Data page for more information.
Patents
Fixpoint indexes patents and lets you search them, turn them into LLM-ready plain-text or RAG chunks, or do data extraction on them. Fixpoint can also resolve all of the people and companies involved in the patent.
See the Patents page for more information.
Custom Datasets
Research Documents
One type of custom dataset is a Research Document, which is a essentially a
table of extracted data built when you run
Record Extraction. When you run record
extraction, specify a document_id
and your extractions will be grouped into
that Research Document.
import os
from fixpoint.client import FixpointClient
from fixpoint.client.types import CreateRecordExtractionRequest, WebpageSource
client = FixpointClient(api_key=os.environ["FIXPOINT_API_KEY"])
extraction = client.extractions.record.create(
CreateRecordExtractionRequest(
# Specify a `document_id` to group extractions into a Research Document.
document_id="example-research",
# Optionally, set a human-readable display name.
document_name="My Example Research",
source=WebpageSource(url=site),
questions=[
"What is the product summary?",
"What are the industries the business serves?",
"What are the use-cases of the product?",
],
)
)