(I’m not from Companies House - just another API user).
Short: probably the best thing you can do is to (request and) download the bulk data and simply process this yourself to see what is there. (Character set - see below).
In theory this would be spelled out in the JSON schema for the REST API, which you can obtain here:
The “definitions” / Schema for the REST API are available from here:
https://developer-specs.company-information.service.gov.uk/api.ch.gov.uk-specifications/swagger-2.0/spec/swagger.json
Caveat - you have to adjust the paths in the links there yourself because (for a very long time now) they’ve provided this with URIs with the “local host”.
Here’s an example of the definitions for the officer Appointments:
https://developer-specs.company-information.service.gov.uk/api.ch.gov.uk-specifications/swagger-2.0/spec/officerAppointmentList.json
As you can see - this only says “you get a string”. So … not very helpful!
Can you do better? Sometimes - by looking at XML definitions of the companion Companies House XML Gateway. Presumably reflecting the same underlying dataset (albeit not guaranteed to be the same…). That also has schemas (XML schemas), and some of those may be more strictly defined.
These are available at:
https://xmlgw.companieshouse.gov.uk/v1-0/xmlgw/SchemaStatus
Here’s the current one for PSCs - PSCBaseTypes-v1-4.xsd (note: these change from time to time so you should always start from the link above and find the current one): https://xmlgw.companieshouse.gov.uk/v1-0/schema/PSCBaseTypes-v1-4.xsd
It looks like there may not be a firmer definition here though. There are limits for things with company names - see the base types (currently baseTypes-v3-7.xsd): https://xmlgw.companieshouse.gov.uk/v1-0/schema/baseTypes-v3-7.xsd
Character set: well, since the responses are in JSON you should be getting the data in the UTF-8 character set. I know that’s not really what you’re asking…
Good luck.