Live Streams giving a 504

Hello All

The live streams are giving a 504 error.

Started yesterday morning but thought I’d let you have a peaceful Easter.

Rgs
Robbie

2 Likes

Thank you for reporting this. There was an issue, which was resolved this morning.
This should have been detected by our monitoring systems and this is being investigated of course.
We apologise for any inconvenience.

Experiencing the same issue

All good now Mark - thanks.

Hello @robbie_crawford can you explain me how to use Live Streams api

Documentation for the Streaming API is available. I recommend starting with the Companies House Streaming API intro.

You should ensure that the Streaming API meets your requirements - there are several other sources of Companies House data available (bulk data, other APIs).

I am not experienced with the Streaming API. I would however recommend some familiarity with the main Public Data API also. That allows you to interactively investigate the data structures and data you may receive (they use very similar data structures and the underlying data set is - presumably the same?)

This forum has ended up being a valuable source of “documentation” since the Companies House documentation is intermittently updated and there are a few outstanding “features” of the system which have not been fixed. Search it using the “magnifying glass” icon at the top right of the page.

Good luck.

I have written a little guide on Streams | CH Guide about using the streaming API. Have a look at some of the links on there and give it a go, and give me a shout if you have any questions.
For an open source working example using NodeJS see https://companies.stream.
As voracityemail has said there are plenty of answered questions about the streaming API on this forum.

All the best

1 Like

Does the Stream API include a counter on the filings endpoint to track the number of processed files?

In case of connectivity issues, how can I ensure that no filings were lost between the disconnection and reconnection?

The timepoint parameter is used for this purpose.

Each event that comes through has a timepoint field under event, for example:

{
  "resource_kind": "filing-history",
  "resource_uri": "/company/16340168/filing-history/MzQ1OTgwNDMzMWFkaXF6a2N4",
  "resource_id": "MzQ1OTgwNDMzMWFkaXF6a2N4",
  "data": {
    "type": "NEWINC" 
  },
  "event": {
    "fields_changed": [
      "links.document_metadata"
    ],
    "timepoint": 188283700,
    "published_at": "2025-03-25T09:06:35",
    "type": "changed"
  }
}

If this was the last event you received, when reconnecting, specify timepoint=188283700 in the URL query string to pick up where you left off.

Open source example: companies-house-stream/server/src/streams/listenOnStream.ts at master · mrbrianevans/companies-house-stream · GitHub

Ok, thanks for your answer, I have a follow up question.

  • Are time point sequential, as in if i specify the timepoint of where i left off in the URL i will end up reprocessing the same filing and have a duplicate. could increment the timepoint by 1 and pass that in the URL ?

Yes they are sequential. So you could increment it by one for future requests.

However, they send multiple events per filing. They send an inital event for the filing before the document has been fully processed. And then another event once the PDF is available for download. They may also send some pure duplicates of a small number of events.
Therefore your system should be able to handle duplicates without issue.

Thanks very much for you prompt response, this is very helpful. i have one more if I may:

In regard to the night disconnection They deliberately disconnect every 24 hrs? Is this based on connection or key - for example In our system we have only one API Key shared across dev/staging/prod?
Is restarting at 0 the best thing to do? Or are we safe just continually iterating the timepoint?

It looks like that when we reconnected with a given timepoint we just received nothing at all

So the timepoints keep incrementing across days, so you’ll never reset it to zero.
They disconnect all connections each night, regardless of which API key was used to make the request. You just need to reconnect with the last timepoint you received on the connection.
It’s quite simple to create more keys for different environments, but not essential.
If you’re not receiving data at all, try look at https://companies.stream to see if there are currently events coming through. If you’ve used the latest timepoint in the URL parameter, then it might take 30 seconds or longer for a new filing event to come in.