Hi everyone,
we are currently working on the EU Risk Intelligence Platform—a monitoring system for institutional credit risk analysis of European issuers.
To support structured financial extraction (iXBRL/PDF) and covenant monitoring, we need to ingest and process 100% of UK company filings. While we are currently integrating the existing API and Bulk products, we have identified a significant efficiency gap regarding historical data acquisition.
1. The Challenge: The “75%-90% Gap” and REST Overhead
Our current strategy is to:
-
Seed via the Free Company Data and Free Accounts Data (Historical/Daily) products.
-
Sync via the Streaming API and Daily Accounts Bulk files.
The Problem: As noted by the community, the Free Accounts Data Product covers only ~75–90% of account filings. For institutional-grade data, we must backfill the remaining 10%–25%, plus all non-account filings (Mortgages, PSCs, etc.).
With 5M+ companies, fetching full filing histories and document metadata via the “Get the filing history list of a company“ endpoint of the Companies House Public Data API and “Fetch a document’s metadata” endpoint of Document API would require tens of millions of requests. This seems like an unnecessary burden on the Companies House infrastructure.
2. Proposal: New Bulk Metadata Products
We’ve noted that a Bulk Filing History and Bulk Document Metadata list is still missing—a request I’ve seen echoed here for years.
To enable devs to be “good citizens” and stay off the REST API for historical syncing, could Companies House provide daily/monthly CSV bulk files for these two datasets?
Suggested Schema Examples:
FilingHistory Product (Approx. 65k rows/day ~7MB unzipped)
| CompanyNumber | TransactionId | Category | Type | Date | ActionDate | MadeUpDate |
|---|---|---|---|---|---|---|
| 00013882 | MzUxMjQ5MTExMmFkaXF6a2N4 | accounts | AA | 2026-03-26 | 2025-12-31 | 2025-12-31 |
| … other filings for that day/month |
DocumentMetadata Product (Similar in size)
| TransactionId | DocumentId | DocumentContentTypes |
|---|---|---|
| MzUxMjQ5MTExMmFkaXF6a2N4 | 7Hc28PddVSKdrVEp17WnjqHWW0_03isnyXxumd0_lY0 | application/pdf,application/xhtml+xml |
| … other documents for that day/month |
For historic data you could produce monthly bulk zips just like you have for the historical Free Accounts Data Product.
3. Why this helps
By providing these indexes in bulk:
-
Infrastructure Load: We (and other devs) can completely bypass millions of REST calls for historical backfilling.
-
Data Integrity: We can programmatically identify exactly which 10% of accounts are missing from the bulk accounts zip and only target those via the API.
-
Completeness: It allows for a full local mirror of the register without the “missing data” risks of the current public bulk products.
Is there any possibility of these being added to the Bulk Cloud Gateway/FTP in the near future? Or is there an existing internal “Prod” product that already covers this that we might have missed?
Any feedback from the CH team or other devs who have solved this at scale would be greatly appreciated.
Thanks!