Good that you managed to answer your own question e.g. how to download documents for a given company number e.g.:
- Get the filing history
- For a given filing in the list, locate the “links” object, “document_metadata” member to get the metadata link
- Request the metadata link to get the document metadata object.
- Select your chosen data format (pdf, xml etc.), set the appropriate “Accept” http header, add “/content” on the end of the metadata link and request this to get the actual file.
So you’re now at the last point in list above.
amazonaws url of the pdf document got access denied error.
Most likely this is because you’re still sending the Companies House http basic authentication header to Amazon. You need to send this header with each request up to this point, but when you make the last request for the amazonaws url you shouldn’t. This is because:
- Amazon don’t use it (don’t send your “passwords” to others) - the documentation could be clearer here.
- Amazon do have an authorisation mechanism but it is done using the query string parameters (on the end of the url). If you also send an http authorisation header it will confuse Amazon.
See my response at:
http://forum.aws.chdev.org/t/cant-access-documents-from-amazons3-server/1871/2
For a general overview of downloading documents see my response at:
http://forum.aws.chdev.org/t/fetch-a-document-api/978/4
And do make use of the search facility on this forum, it’s helped me…