Can't access documents from AmazonS3 server

voracityemail · June 27, 2018, 9:36am

You’re almost there. As it says in the thread you mentioned:

So your final request:

final_test <- GET(final_url, 
             add_headers(Authorization = decoded_auth,
                         Accept = accept
                         ))

… should be like:

final_test <- GET(final_url,
             add_headers(Accept = accept
                        ))

(Apologies for formatting - I don’t speak R).

Amazon require authentication but all their authentication is done via the contents of the “final_url” e.g. the authentication is passed as parameters in the query string. So if you also include the http headers for authorisation from Companies House (the “Authorization = decoded_auth” ), this will confuse the Amazon servers.

You can check this by looking at the response you got back: (this is what’s being returned with Content-Type: application/xml in your last example). It will be something like:

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidArgument</Code>
<Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter,
Signature query string parameter or the Authorization header should be specified</Message>
<ArgumentName>Authorization</ArgumentName>
<ArgumentValue>Basic {your api key would be here}</ArgumentValue>
<RequestId>{blah}</RequestId>
<HostId>{long hostid string}</HostId></Error>

Again, check - among all the stuff in the final amazon URL you get in the http redirect (302) from Companies House you’ll see e.g.:

AWSAccessKeyId={their access key}&Expires={token expiry time}
&Signature={signature}&x-amz-security-token={very long security token}

So just omit the CH Authorisation header at the last step and you should be fine.

Here’s (yet another) plug for the free cUrl library / command line utility for diagnosing these issues / investigating REST interfaces. Although it’s old and there are more specialised tools (one I’ve used is SOAPui) it’s fast and simple. For info the most useful cUrl switches here are:

Send username (and optionally password): -uUSERNAME:PASSWORD or --user U:P
Show http headers: -I or --head
Dump http headers to a file: -D filename or --dump-header filename
Add a header line (e.g. Accept: content-type): -H headerline or --header headerline
Automatically follow redirects: -L or --location

Don’t forget to quote URLs, header lines etc. In particular, unquoted URL characters like “&” will cause problems in most shells / command line environments…