SEARCH THE DATABASE.
RESOLVE ENTITIES.
A Python library for searching and resolving organizations, people, roles, and locations. 9.7M+ organizations and 63M+ people with embedding-based USearch HNSW indexes for fast lookups. Data sourced from GLEIF, SEC Edgar, UK Companies House, and Wikidata.
Enter a query above to search
Quick Start
Install & Search
# Install from PyPI
pip install corp-entity-db
# Download lite database + indexes
corp-entity-db download
# Search organizations
corp-entity-db search "Microsoft"
corp-entity-db search "Microsoft" --hybrid
# Search people
corp-entity-db search-people "Tim Cook"Python API
from corp_entity_db import (
OrganizationDatabase,
get_database_path
)
db = OrganizationDatabase(
get_database_path()
)
matches = db.search(
"Microsoft", limit=10
)
for m in matches:
print(m.record.name, m.score)How It Works
Architecture
Embedding-Based Search
Search is semantic (so that CEO and boss are similar, Andy and Andrew are similar) not textual. The index is a USearch HNSW index which enables extremely fast approximate nearest neighbor lookups over millions of embeddings We use the Gemma Embedding model (300M params) to generate vector embeddings and quantize them to 8 bit integers for reduced storage and memory overhead.
SQLite Database
The details of each person/org are stored in a SQLite database along with the choice of canonical representation. Each person+role+org has a separate entry but if we know that the same underlying person is represented one is taken as canonical. E.g. 'Barack Obama, President' over 'Barack Obama, Senator'.
Multi-Source Data
Data comes from the following sources: ~9.9M organizations from GLEIF, SEC Edgar, Companies House, and Wikidata. ~66.9M people from Wikidata and Companies House officers. Canonicalization links equivalent records across multiple sources.
Compact Storage
The lite database variant ships without embeddings — just the USearch HNSW indexes for fast ANN search. No need to download or store raw embedding vectors.
Data Sources
| Source | Description | Scale |
|---|---|---|
| Companies House | UK registered companies + officers | ~5.5M orgs, ~27.5M people |
| Wikidata | Organizations & notable people | ~1.7M orgs, ~39.4M people |
| GLEIF | Legal Entity Identifier records | ~2.6M orgs |
| SEC Edgar | US public company filers & officers | ~73K orgs |
| Total | Organizations, people, roles & locations | ~9.9M orgs, ~66.9M people |
We Need Your Feedback
The entity database is actively being expanded. If you find missing organizations, incorrect data, or have suggestions for new data sources, we'd love to hear from you.
neil@corp-o-rate.comAbout Corp-o-Rate
The Glassdoor of ESG
Real corporate intelligence from real people. Track what companies actually do, not what they claim.
Corp-o-Rate is building a community-powered corporate accountability platform. We believe that glossy sustainability reports and PR-polished ESG claims don't tell the full story. Our mission is to surface the truth about corporate behavior through crowdsourced intelligence, AI-powered analysis, and transparent data.
The entity database is a core component of the Corp-o-Rate platform — providing fast, reliable entity resolution across 9.7M+ organizations and 63M+ people. Available as the corp-entity-db Python library on PyPI.
Community-Driven
Powered by employees, consumers, and researchers sharing real knowledge about corporate practices.
AI-Powered
Using NLP and knowledge graphs to structure, connect, and analyze corporate claims at scale.
100% Independent
No corporate sponsors. No conflicts of interest. Just transparent corporate intelligence.
We're Pre-Funding & Running on Fumes
Corp-o-Rate is currently bootstrapped and self-funded. We're building in public, shipping what we can, and working toward our mission one step at a time. If you believe in corporate accountability and transparent business intelligence, we'd love your support.
Help us train better models
Help us scale the platform
Data, research, or distribution
Shop smarter. Invest better. Know which companies match your values.