StreamingAPI and Document_Id

Hi there,

I’m consuming the Co House filingHistoryStream API.

From this I was hoping to extract so form of document_Id to ultimately download the associated document.

One of the parameters returned from the filingHistoryStream is

“document_metadata” this has a url of -

https://document-api.company-information.service.gov.uk/document/Q0Lbp7gVAnynCcIVXvLF9vS5D3uYpR-uCSl30sv6sS4

I assume the string at the end is the document ID - is this correct?

So, when I then try and hardcode (to test) the value from the end of that URL as an input to the document API I’m getting a 500 response.

https://document-api.company-information.service.gov.uk/document/Q0Lbp7gVAnynCcIVXvLF9vS5D3uYpR-uCSl30sv6sS4/content

The above is what I’m using with “application/pdf” in the Accpept input aswell as the API key etc…

I can’t quite work out where I’m going wrong with this

Assuming that the API key you have for the Streaming API will work with the Documents API (which I think is the case, but don’t know for sure)…

I don’t know what tool you’re using there but should you have quote marks round the document ID (rather than none)?

As usual I recommend trying this with e.g. curl if possible (that way you can step through the process and with the verbose flag see exactly what is sent / received each step).

This thread (and the one linked in my comment) should help.

I’ve shown requesting the metadata first - you should check you can get that. Using your link there, I get:

curl -u APIKEYHERE: https://document-api.company-information.service.gov.uk/document/Q0Lbp7gVAnynCcIVXvLF9vS5D3uYpR-uCSl30sv6sS4
{"company_number":"10073158","barcode":"XE423ACA","significant_date":null,"significant_date_type":"","category":"annual-returns","pages":3,"filename":"10073158_cs01_2025-06-05","created_at":"2025-06-05T11:28:02.012299615Z","etag":"","links":{"self":"https://document-api.company-information.service.gov.uk/document/Q0Lbp7gVAnynCcIVXvLF9vS5D3uYpR-uCSl30sv6sS4","document":"https://document-api.company-information.service.gov.uk/document/Q0Lbp7gVAnynCcIVXvLF9vS5D3uYpR-uCSl30sv6sS4/content"},"resources":{"application/pdf":{"content_length":85358}}}

Then requesting the document (depending on what is offered in the metadata you may be able to choose something other than PDF), the first response you’ll get back is from Companies House with a redirect:

curl -v -u APIKEYHERE: --header "Accept: application/pdf" https://document-api.company-information.service.gov.uk/document/Q0Lbp7gVAnynCcIVXvLF9vS5D3uYpR-uCSl30sv6sS4/content

(Just showing the headers returned):

HTTP/1.1 302 Found
Date: Thu, 05 Jun 2025 17:22:41 GMT
Location: https://s3.eu-west-2.amazonaws.com/document-api-images-live.ch.gov.uk/docs/Q0Lbp7gVAnynCcIVXvLF9vS5D3uYpR-uCSl30sv6sS4/application-pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...

Your tool is probably trying to follow that and sending the Companies House authorisation header - it shouldn’t do that (many do though…). curl handles this fine, but you may need to catch the redirect then request this without the API key header. It works for me without sending the Accept header to AWS either - that seems to be passed in the request url.

So: no API key passed here!

curl https://s3.eu-west-2.amazonaws.com/document-api-images-live.ch.gov.uk/docs/Q0Lbp7gVAnynCcIVXvLF9vS5D3uYpR-uCSl30sv6sS4/application-pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=... > a_test.pdf

That gets me an 85kB file.

Good luck.

Hi, thank you for your response.

I’ve tried every which way of formatting those inputs (quotes / no quotes)… the programme I’m using is OutSystems. It’s quirky and sometimes requires things to be in quotes.

Either way, I end up with a 500 response.

With regards to the API key. I generated this yesterday specifically for this test.

So I have one key that’s been used for the streaming API and a different one being used for the MetaData API (someone else set up the StreamingAPI and has a different account)_

Did you register live applications with CH or test / sandbox? Make sure you have live ones.

Did you try with curl?

I think your issue is most likely with the redirect / auth as noted.
I don’t use opensystems but this thread may help you with that.

https://www.outsystems.com/forums/discussion/72735/turn-off-automatically-follow-redirects-in-outsystems/#

Good luck.

Hi,

Yep it’s deffo Live.

I’ve not tried with CURL no… work system restrictions etc…

I’ll look over that thread. Thanks

Although, having said that. We have plenty of API calls to Companies House that don’t appear any more complicated than providing a key and whatever is required in the URL.

In the sample below (taken from the streaming API)- the section following ‘document/-’ is the document ID isnt it?

“links”:{


         "document_metadata":"https://document-api.company-information.service.gov.uk/document/-VMhpeD29uFYiX8KdzzAzG_RKhv0b4ZGDQPWWsq99SY",
         "self":"/company/15758378/filing-history/MzQ2OTA3NTY3MWFkaXF6a2N4"

Removing the Document ID from the API test in outsystems gives a 404 error rather than 500 (if that helps at all)

I have managed to get this to work after ALOT of googling.

For anyone else who is using this API with OutSystems… the Authorization requires a converted (TextToBinary(BinaryToBase64(APIKEY)) prefixed with 'Basic ’

So will look like this when actually processing -

Basic ZGQwZmNjNGQtMjBiYS00Y2VkLWJiNTItNDIwZTgwNjYyMGMw

Thanks for the suggestions and help along the way :slight_smile:

I don’t know if there is anyway to flag a solution on this forum?

I think so.

However (it seems to me…) that Companies House seem to have designed the API with the idea of “follow the links provided in the links members” rather than “find IDs and use them various places”. The reason (apart from them providing the whole URLs is that in several cases IDs which an external user might assume would reference the “same thing logically” don’t as it appears Companies House have several datasets which may be related or overlapping but operate on different IDs (e.g. officers and disqualified officers).

Anyway, that’s all a detail - but indeed lots of people seem to focus on “where do I find ID x” whereas Companies House seem to have envisaged “follow the links”…

I’m not sure how you were able to access the Streaming API - I don’t use that but according to the documentation it uses exactly the same authorisation mechanism?

Certainly authorization issues are the most common ones people find. Seems to be several related issues:

  • Lack of familiarity with the http Basic authorisation mechanism
  • Not knowing how their own system / language handles this.
  • (If the system doesn’t have some built-in mechanism for this or people can’t find how to do this within their system) not knowing how to work with http headers (in their system / language).
  • Companies House having chosen a slightly unusual way of doing this e.g. the API Key is actually the username part of a username:password string and the password is blank.

This was covered (well … enough for me, which seems isn’t clear enough for everyone) in the original API documentation here:

There’s a slightly longer one here:

Thanks for posting notes on using this with OutSystems - I think there are examples for many systems here now.