getData returns null array php curl

If you can download a readable file using the Documents API then I’d say the API is fine - what you have there are other issues e.g. “We expected to find certain data in files / files which we can perform certain operations on, but we can’t!”

Are looking for particular data in accounts filings? From your mention of shareholder funds / number of staff it sounds like it. (That info is not available as JSON via the API, you have to look in the filings).

If so then probably the easiest route is to download the iXBRL data rather than the PDF data *. There is quite a lot of information on this forum on:

a) how to request iXBRL rather than PDF. Short - use the http Accept header to request the appropriate mime type, having first checked the mime types listed in the document metadata to ensure the requested format is available e.g. see:

and b) what is in there / how to parse it. See e.g. links in this summary:

  • Note: this depends on firms filing this data in the appropriate format. I am not sure but I believe it is still possibly for them to submit filings on paper / just as a (raster) PDF rather than as formatted accounts data. Plus historic filings past a certain date will not have formatted information available - it will be scans of (sometimes handwritten) forms! See Companies House note on this from 2017 here:

Examples:
Company number 09540283 has accounts filings in PDF and iXBRL. Getting the particular filing information (just for illustration - you could e.g. get this information by requesting the filing history list, possibly passing in the parameter to filter this to only return accounts categories etc):


curl -u APIKEY_HERE: https://api.company-information.service.gov.uk/company/09540283/filing-history/MzQzOTU1MTQ3MmFkaXF6a2N4
{"transaction_id":"MzQzOTU1MTQ3MmFkaXF6a2N4","barcode":"XDDTAN3U","type":"AA","date":"2024-10-15","category":"accounts","description":"accounts-with-accounts-type-total-exemption-full","description_values":{"made_up_date":"2024-04-30"},"pages":8,"action_date":"2024-04-30","links":{"self":"/company/09540283/filing-history/MzQzOTU1MTQ3MmFkaXF6a2N4","document_metadata":"https://document-api.company-information.service.gov.uk/document/clSoJ2xbeFSFypK5EcCgR26SpkVe8H4k1nlRrgqpD1Q"}}

Requesting the document metadata using the link above:

curl -u APIKEY_HERE: https://document-api.company-information.service.gov.uk/document/clSoJ2xbeFSFypK5EcCgR26SpkVe8H4k1nlRrgqpD1Q
{"company_number":"09540283","barcode":"XDDTAN3U","significant_date":"2024-04-30T00:00:00Z","significant_date_type":"made-up-date","category":"accounts","pages":8,"filename":"09540283_aa_2024-10-15","created_at":"2024-10-15T11:32:51.898330388Z","etag":"","links":{"self":"https://document-api.company-information.service.gov.uk/document/clSoJ2xbeFSFypK5EcCgR26SpkVe8H4k1nlRrgqpD1Q","document":"https://document-api.company-information.service.gov.uk/document/clSoJ2xbeFSFypK5EcCgR26SpkVe8H4k1nlRrgqpD1Q/content"},"resources":{"application/pdf":{"content_length":108473},"application/xhtml+xml":{"content_length":76922}}}

To get the iXBRL you’d request the file as you did before, but set the http Accept header to the application/xhtml+xml mime type.

On the PDF issue - I have no idea what PDF parser you’re using, nor have you mentioned which PDF, so I can’t help with that. (Aside from that we don’t do any PDF parsing ourselves currently either).

I’ve just checked a downloaded document which does have textual data (a Confirmation Statement - these are now usually machine-generated) and the PDF doesn’t seem to have much restricted (via Security) e.g. content can be copied etc.

If you search this forum there may be information on parsing of PDFs which could help you.