Discrepancies between bulk snapshot and CH API

Hi, I have been transforming and loading the officer and company bulk snapshot kindly provided by your bulk data team. I have been comparing my data to the data returned by the Companies House API and noticed a few discrepancies I would like to resolve.

Officer data from Prod195_3925_XX_X_28032025020001 snapshot and company data from BasicCompanyDataAsOneFile-2025-05-01.

  1. Officer record for person number 282852570001 is missing from company number 13017028 in the snapshot data. It is no where in the original .dat and he resigned in 2024 so it should be in the bulk snapshot rather than the streaming service. This is just one example I noticed from my test data but it lowers my confidence that the snapshot is providing all the same data as the CH API.

  2. What is the logic extract premises from address_line_1? For example see the data from the snapshot below and the data from the API below that:

{
   "address_line_1": "The Basement Swan Buildings"
}

{
   "address_line_1": "Swan Buildings",
   "premises": "The Basement"
}
  1. How can we map company_category/type effectively. The documentation recommends https://github.com/companieshouse/api-enumerations/blob/master/constants.yml#L54-L93 but there are many company_categorys in the CSV that do not directly match any of these keys. A few we cannot currently map:
  • PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital)
  • Community Interest Company

Thank you in advance,
Harry

Not from Companies House and I can only possibly help you with the last part (3), but:

A quick way to check things e.g. around Company Types / Status is by using the Advanced Search (which you can do on the CH website). And/or by finding an example and requesting the Company Profile via the Public Data API given the Company Number.

So e.g. a quick search for CICs (limiting this to active ones) turns up e.g. company 15716837

The Company Profile is (most information snipped for clarity):

{
    "company_number": "15716837",
    "company_status": "active",
    "is_community_interest_company": true,
    "subtype": "community-interest-company",
    "type": "private-limited-guarant-nsc",
    ...
}

So - you can see that these are actually marked in the API data by a boolean and the “subtype” field. (The fields are mentioned in the Company Profile documentation, thought that won’t tell you exactly what to expect from which type of entities…)

You’ll also note that this does have a type of “Private, limited by guarantee, no share capital”.

You can run the Advanced search just for (active) “private/ltd/nsc” companies too (these are categorised as two “types”).

Hope this helps (a little).

with regard to point 1, prod195 only contains active officers. If you want historical appointments (resigned officers) you’ll need prod 216.