Formal complaint: document api not fit for purpose

The document API is not fit for purpose. Allowing us to access a tiny fraction of data renders it useless! When data is stored in iXBRL format, meaning source filings on CH, we can access this data.

However, most is stored on AWS in PDF. Yes we can load and parse (read) the PDFs except all data says object: protected, rendering your API utterly useless.

There needs to be a link so that those with a valid CH API key do have READ-ONLY access, else make all filings available in iXBRL format.

1 Like

I’m sorry that it fails to meet your expectations, however only accounts are filed in iXBRL format. How would you suggest we display our paper filings on the document API?

I am referring to accounts. Not all accounts have iXBRL as well as PDF.

I thought all paper filings had been stopped already. Do you mean seeing PDF without iXBRL represents firm that did a paper filing, which will be phased out with the 2025 changes, so in 2026 all companies will show iXBRL?

What else could/should you do? Provide apis for specific data. What we want is 1) Equity/Shareholder Funds 2) Number of staff/employees. That’s it. We are looking for very simple data.

Address should already segment out town and state. We had to write code to turn town into state. Why not provide website and reception phone as these are not subject to GDPR. This helps track that it is the correct company as some companies have similar names.

Hi Jenkins

There are opensource libraries available to convert the IXBRL and XBRL data CH make available but building a pipeline to extract this info from the millions of files requires quite a bit of engineering.

I have built https://convert-ixbrl.co.uk, a UK companies search engine, making the filed company data available through a website and also through a programmable API. Currently 9 filters are available which include net assets, turnover, cash and no.of employees but the next version which makes 100+ data points searchable is expected to go live this month. It is going through a series of final testing at the moment.

The website and API are currently free to use. At some point, if there is enough commercial interest, I will be considering offering a paid tier.

Thanks for sharing. There may well be demand. We are a software firm, but the data was for our own use. We are not going to be selling data to anyone. We already extracted some financial data from iXBRL format. But there are many cases, based on layout. We’re adding in a few cases and ignoring the rest - not worth our time. The main issue though is many firms and certainly most of the group accounts/largest firms have all filed in pdf only. Images are stored. So you’d need to use OCR to extract. Unless you also include OCR, I don’t see how you can monetise. I’m not sure how that will change after the new rules go into force. Ask CH directly.

(I am not part of Companies House, a lawyer, journalist or expert on UK Companies law either so the following is commentary, but may explain some things?)

Good luck with this one - what you probably need to be doing is lobbying for lots more money for Companies House and possibly changes in the law. (I don’t know how much lattitude Companies House has on setting “process” - but that all requires developers / support staff).

First - if you’re using this REST API you’re effectively getting data (which some clearly consider saleable) for free. While Companies House have made some changes over time I think there is a measure of “get what you pay for” here? You can opt to pay for the service by using the XML Gateway. (We used to, haven’t done for some time though but last time I checked it’s not a straight replacement). At that point your queries may get more of a listen - although I don’t know if that will change overall policy.

The main legal duty of Companies House (as I understand it) is as a registrar - to collect, hold and make available the information. That doesn’t cover “everything in a format that’s convenient for my use case”. Or even “checking all data is valid / sensible” apparently! Although they do seem to act on specific issues reported here.

It seems in the UK there is a general lack of emphasis put on Companies Law - or at least enforcing the mandatory reporting requirements. From some experience of this data set plus following the news it seems lots is allowed to slide. And there have been cases of gaming of the system / using this for opaque but probably dubious purposes.

With all that in mind perhaps it’s understandable - albeit frustrating - that it sometimes appears that this dataset only covers “the public can access whatever was submitted, somehow”.

Still a lot easier than having to call up companies in turn and request an inspection of their registers, or purchase filing documents hoping the one you’ve requested will contain what you’re looking for!

If things do become more easily accessible / APIs more regular / data more reliable, I will of course be grateful…

1 Like

Hi “voracity”,
Apologies for the slow response. I didn’t think you’d still be chiming in ;-). CH needs more money after the recent £100m grab? Hmm. I’m sure this will cover quite a few developers. Confirmation statements +162% - well beyond any country’s inflation.

I agree of course, that CH’s primary duty is to act as Registrar and compared to some countries its data provision is already superior. I’m arguing based on principles. I detest rules in society where there is one rule for the top 1% and another rule for others; same for large corporates vs. the rest. Everyone should be made to file in the same format, so that their data is easily accessible.

Thanks for sharing the link and mentioning the XML gateway. I was unaware. The api page seemed to suggest it was api only. We’ll take a look.

Update on xml, suggested by “voracity”. We reached out to the xml team directly at CH, as the forum and website did not provide clarity. We can subscribe for a small monthly fee (currently £4/month) but that only gives access to an image of the pdf. In our case, we already had access to download and read the image from AWS. The only solution is OCR to extract and read text. I sent CH some links for OCR in case they wish to integrate with their product.