00365818 - companies search has "date_of_cessation": “Unknown”, company profile just doesn’t have this field. Company “status”: “dissolved”.
NF002764 - companies search: “date_of_cessation”: “Unknown”, company profile doesn’t have this field. “company_status”: “converted-closed”
SZ000001 - companies search has “date_of_creation”: null, company profile has “date_of_creation”: “”. Company “status”: “active”.
Related issues have been noted before - IIRC we’ve also found epoch-format values in date fields (sorry, no example recorded):
P.S. dates within “description_values”: { “description”} (“legacy” fields) in filing history - presumably these won’t be marked up and will stay as plain data? Example from company 00229606 (some fields removed for clarity):
{
“description”: “legacy”,
“links”:
{
“self”: “/company/00229606/filing-history/MDAxMjU4NjA0NGFkaXF6a2N4”
},
“description_values”:
{
“description”: “Return made up to 25/07/05; full list of members”
},
“type”: “363s”,
“date”: “2005-08-18”,
“category”: “annual-return”,
“transaction_id”: “MDAxMjU4NjA0NGFkaXF6a2N4”
}
As an external consumer and refiner of Companies House data I am far from clear what you are trying to achieve and, more importantly, what you expect their systems to supply you with.
Taking 00365818 as the example this company was dissolved; the last accounts, which we of DORMANT type covering the accounting period to 2 April 1994. Interpretation of the data suggests it was dissolved in the succeeding 2 or 3 years by Compulsory Strike-off so it effectively moved to the Companies Closed Register a good 20 years ago.
If you were permitted to use our system you would not, as a user, be allowed to see the closed archive; I think that we are very fortunate to be able to see anything at all like this on the open, ACTIVE, public system.
This is an issue with the quality and consistancy of data on a beta service that we’re testing. What someone is trying to achieve by reporting this issue seems fairly self evident. Given that Companies House also consumes and refines their own API through the beta search site, raising awareness of these issues benefits them as well.
Taking 00365818 as an example…
The fact that this company has an Unknown date_of_cessation in the company search results means that beta search site doesn’t know how to deal with it and they end up with this:
Which is not ideal in terms of presentation and something I’d be surprised if they didn’t want to fix at some point.
It’s also something I wasn’t aware could happen, and potentially would have never been aware of if I hadn’t read this post, so I can go and check how the systems that consume and refine Companies House data, that I’m responsible for, deal with this.
This could be relevant in some sectors (legal being the one we’re most likely to deal with, but there may be others), but from my (developer) perspective it’s not about relevancy and availability, but the ability to create an interface that returns…
00365818 - Dissolution date unknown
instead of
00365818 - Dissolved on
or even just crashing, when it encounters something other than the expected data. In the documentation date_of_cessation/creation are both listed as optional date fields in search and profile. The fact that these fields can also be null or a non-date string Unknown are issues whether or not the data itself is relevant.
OK - I can see the potential relevance to the legal sector but probably in a more recent past than twenty years ago. I may also have to change my views on relevance as well; some around me would say that was a good thing!
I totally agree with you that if the data is available then the interface should actually return that data and not unknown or null. Perhaps the issue here will prove to be actually availability within what was loaded.
I even more totally agree that the interface must not crash on encountering the unexpected.
We are on the same side really.
It will be interesting to see a response on this topic from the Companies House team!
Thanks for replies - my perspective was as a developer wanting to share what I’d found as much as hoping for definitive answers. I’ve learned quite a bit from reading around here. I expect changes here and appreciate that such a large and long-lived dataset will have exceptions. It’s because CH do seem to be helpful and responsive I’m asking whether there are “known unknowns”.
By-the-by I do work for the legal side so “old data” can sometimes be relevant although the particular data is by way of example.
…for a more recent date example, this time in Unix epoch (milliseconds), filing history, company #03888792, again some fields removed:
{
“associated_filings”: [
{
“action_date”: 1447113600000,
“category”: “capital”,
“date”: “2015-11-10”,
“description”: “statement-of-capital”,
“description_values”:
{
…
“date”: “2015-11-10”
},
“original_description”: “10/11/15 Statement of Capital;GBP 1339546478.75”,
“type”: “SH01”
}],
“type”: “AR01”,
“description”: “annual-return-company-with-made-up-date-no-member-list”,
“date”: “2015-11-10”,
“category”: “annual-return”,
“barcode”: “A4IG6XHU”,
“transaction_id”: “MzEzNDg4NjczOGFkaXF6a2N4”
…
}
Yes, here we get date correctly in description_values. But does this represent a bug, or will dates be “mostly ISO format, occasionally ‘Unknown’ or null, and rarely epoch”?
I have just come across this problem when searching for ‘Axa’. Some of the records returned have date_of_cessation set to ‘Unknown’.
Since the date format used by Companies House is of the form “2016-12-13” and we want to display it as ‘13-Dec-2016’ this makes it very difficult for us to do the conversion. I’m not sure that even checking the first character is numeric would do the trick since one can apparently get dates in the format ‘1405327611000’.
Would checking that the first two characters are ‘18’, ‘19’ or’20’ do it? How far back do Companies House records go?
CH seems to have gone a bit quiet on responding here on the documentation front or for data issues. I suspect this will be the case 'till the Swagger-compatible docs are available. However - they clearly do take note. And their documentation is quietly being updated over time.
As noted, I’ve found it’s wise to trap the following:
Values of “unknown” and “null” for non-text fields.
Values of “null” for objects e.g. address / registered_office_address
Integers possibly starting with 0 e.g. month and year in
date_of_birth.
Dates as YYYY-MM-DD or Epoch format (and if you want to poke inside
fields e.g. filing history good old DD/MM/YY put in by humans).
Undocumented constant values in lookup fields.
Unusual-looking markdown in e.g. filing history description constants “[ < link text > ]” for links.
UK company registration numbers missing leading / internal zeros (for corporate officers / PSCs).
I think the policy is of “just recording what we get on the form”. (In a perfect world some of these might be validated / corrected where it’s “obvious” what is meant…some things e.g. dates, filings get put into formats…)
Where (presumably) underlying data fields are missing / unknown this may affect several fields. An example from Companies search: NF003350 (admittedly this looks like a rather odd old entry).
“date_of_cessation” is “Unknown” and the description is incomplete also:
By chance, I encountered another “Closed on …” company earlier today, also an NF company: NF002699
There’s an enumeration for a status of “closed” rather than “closed-on”, which does not require a date, which suggests to me that when the date is unknown it should have been given that status instead.
I’ve been a little frustrated with this stuff today. As we don’t have access to the underlying data or the logic that CH employs to consume the API on the Beta search site there ends up being a lot of trial and error and guesswork involved. Made even more frustrating when we don’t get a response to a lot of documentation/data issues raised here. Please keep posting them though, they’re helpful to me at least!
It’s been a slightly odd experience working with this system - lack of documentation is annoying but then people have been very helpful (sometimes even CH although not with a direct response).
Roundabouts and swings, and maybe sometime we’ll have it all in Swagger… Good luck!
Late response but are any of those with date_of_cessation set to ‘unknown’ AC / IC / BR companies? CH doesn’t hold info on those. However from data seen I’d then not expect anydate_of_cessation field to be returned…