Hi, I have been transforming and loading the officer and company bulk snapshot kindly provided by your bulk data team. I have been comparing my data to the data returned by the Companies House API and noticed a few discrepancies I would like to resolve.
Officer data from Prod195_3925_XX_X_28032025020001 snapshot and company data from BasicCompanyDataAsOneFile-2025-05-01.
-
Officer record for person number 282852570001 is missing from company number 13017028 in the snapshot data. It is no where in the original .dat and he resigned in 2024 so it should be in the bulk snapshot rather than the streaming service. This is just one example I noticed from my test data but it lowers my confidence that the snapshot is providing all the same data as the CH API.
-
What is the logic extract
premises
fromaddress_line_1
? For example see the data from the snapshot below and the data from the API below that:
{
"address_line_1": "The Basement Swan Buildings"
}
{
"address_line_1": "Swan Buildings",
"premises": "The Basement"
}
- How can we map company_category/type effectively. The documentation recommends
https://github.com/companieshouse/api-enumerations/blob/master/constants.yml#L54-L93
but there are many company_categorys in the CSV that do not directly match any of these keys. A few we cannot currently map:
- PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital)
- Community Interest Company
Thank you in advance,
Harry