SEARCH THE DATABASE.
RESOLVE ENTITIES.

A Python library for searching and resolving organizations, people, roles, and locations. 9.7M+ organizations and 63M+ people with embedding-based USearch HNSW indexes for fast lookups. Data sourced from GLEIF, SEC Edgar, UK Companies House, and Wikidata.

Heads up: the search server runs on-demand serverless GPU, so the first query after an idle period can take a minute or two while the worker spins up and loads the model. Subsequent queries are fast.
Try:

Enter a query above to search

Quick Start

Install & Search

# Install from PyPI
pip install corp-entity-db

# Download lite database + indexes
corp-entity-db download

# Search organizations
corp-entity-db search "Microsoft"
corp-entity-db search "Microsoft" --hybrid

# Search people
corp-entity-db search-people "Tim Cook"

Python API

from corp_entity_db import (
    OrganizationDatabase,
    get_database_path
)

db = OrganizationDatabase(
    get_database_path()
)
matches = db.search(
    "Microsoft", limit=10
)
for m in matches:
    print(m.record.name, m.score)

How It Works

Architecture

Embedding-Based Search

Search is semantic (so that CEO and boss are similar, Andy and Andrew are similar) not textual. The index is a USearch HNSW index which enables extremely fast approximate nearest neighbor lookups over millions of embeddings We use the Gemma Embedding model (300M params) to generate vector embeddings and quantize them to 8 bit integers for reduced storage and memory overhead.

SQLite Database

The details of each person/org are stored in a SQLite database along with the choice of canonical representation. Each person+role+org has a separate entry but if we know that the same underlying person is represented one is taken as canonical. E.g. 'Barack Obama, President' over 'Barack Obama, Senator'.

Multi-Source Data

Data comes from the following sources: ~9.9M organizations from GLEIF, SEC Edgar, Companies House, and Wikidata. ~66.9M people from Wikidata and Companies House officers. Canonicalization links equivalent records across multiple sources.

Compact Storage

The lite database variant ships without embeddings — just the USearch HNSW indexes for fast ANN search. No need to download or store raw embedding vectors.

Data Sources

SourceDescriptionScale
Companies HouseUK registered companies + officers~5.5M orgs, ~27.5M people
WikidataOrganizations & notable people~1.7M orgs, ~39.4M people
GLEIFLegal Entity Identifier records~2.6M orgs
SEC EdgarUS public company filers & officers~73K orgs
TotalOrganizations, people, roles & locations~9.9M orgs, ~66.9M people

We Need Your Feedback

The entity database is actively being expanded. If you find missing organizations, incorrect data, or have suggestions for new data sources, we'd love to hear from you.

neil@corp-o-rate.com
Who We Are

About Corp-o-Rate

The Glassdoor of ESG

Real corporate intelligence from real people. Track what companies actually do, not what they claim.

Corp-o-Rate is building a community-powered corporate accountability platform. We believe that glossy sustainability reports and PR-polished ESG claims don't tell the full story. Our mission is to surface the truth about corporate behavior through crowdsourced intelligence, AI-powered analysis, and transparent data.

The entity database is a core component of the Corp-o-Rate platform — providing fast, reliable entity resolution across 9.7M+ organizations and 63M+ people. Available as the corp-entity-db Python library on PyPI.

Community-Driven

Powered by employees, consumers, and researchers sharing real knowledge about corporate practices.

AI-Powered

Using NLP and knowledge graphs to structure, connect, and analyze corporate claims at scale.

100% Independent

No corporate sponsors. No conflicts of interest. Just transparent corporate intelligence.

We're Pre-Funding & Running on Fumes

Corp-o-Rate is currently bootstrapped and self-funded. We're building in public, shipping what we can, and working toward our mission one step at a time. If you believe in corporate accountability and transparent business intelligence, we'd love your support.

GPU Credits

Help us train better models

Angel Investment

Help us scale the platform

Partnerships

Data, research, or distribution

Shop smarter. Invest better. Know which companies match your values.