Format of Officer/PWSC names

Hi,

We are looking to consume the Officers (list) and Persons with Significant Control (list) endpoints and we have been using the online public OpenApi specifications as guidance.

  • GET /company/{company_number}/persons-with-significant-control
    
  • GET /company/{company_number}/officers
    

However, we’re unable to find any information on which characters we can expect back for individual’s names (e.g. “names” or “name_elements.*” props). Is there a specific character set or regex that defines which characters we can expect?

Thanks

(I’m not from Companies House - just another API user).

Short: probably the best thing you can do is to (request and) download the bulk data and simply process this yourself to see what is there. (Character set - see below).

In theory this would be spelled out in the JSON schema for the REST API, which you can obtain here:

The “definitions” / Schema for the REST API are available from here:

https://developer-specs.company-information.service.gov.uk/api.ch.gov.uk-specifications/swagger-2.0/spec/swagger.json

Caveat - you have to adjust the paths in the links there yourself because (for a very long time now) they’ve provided this with URIs with the “local host”.

Here’s an example of the definitions for the officer Appointments:

https://developer-specs.company-information.service.gov.uk/api.ch.gov.uk-specifications/swagger-2.0/spec/officerAppointmentList.json

As you can see - this only says “you get a string”. So … not very helpful!

Can you do better? Sometimes - by looking at XML definitions of the companion Companies House XML Gateway. Presumably reflecting the same underlying dataset (albeit not guaranteed to be the same…). That also has schemas (XML schemas), and some of those may be more strictly defined.

These are available at:

https://xmlgw.companieshouse.gov.uk/v1-0/xmlgw/SchemaStatus

Here’s the current one for PSCs - PSCBaseTypes-v1-4.xsd (note: these change from time to time so you should always start from the link above and find the current one): https://xmlgw.companieshouse.gov.uk/v1-0/schema/PSCBaseTypes-v1-4.xsd

It looks like there may not be a firmer definition here though. There are limits for things with company names - see the base types (currently baseTypes-v3-7.xsd): https://xmlgw.companieshouse.gov.uk/v1-0/schema/baseTypes-v3-7.xsd

Character set: well, since the responses are in JSON you should be getting the data in the UTF-8 character set. I know that’s not really what you’re asking…

Good luck.

1 Like

Thanks @voracityemail , really appreciate the thorough write-up. Going through the raw OpenApi schemas and the XML definitions is also where I ended up, but as you mentioned, there didn’t seem to be any evidence of which special characters to expect beyond it just being a string.

Yes you’re correct on assuming UTF-8 for a JSON API, I misspoke there :slight_smile:

I guess unless someone from CH confirms otherwise I will take your advice and analyse a bulk dataset to get an idea of what to expect.

Thanks again!