I’m starting to use this api in my application and I am struggling to understand how to proper user the paging options for the search/company endpoint.
when I set items_per_page=100 it seems that I can only read a total of 1000 records, because the start_index only accepts values below 1000, and only when I change the index to steps of 100 the page_number changes, therefore I’m assuming I can read only 10 pages of 100 items.
There are limits to the numbers of results (the highest number result you can actually retrieve, not just how many you can retrieve each time) returned by all the Search endpoints. I can’t find a post from Companies House to confirm this but I believe that for the Companies Search endpoint this is a maximum of 1000. For example querying:
… will not return results and will give a 416 http response code.
See e.g.
In general the paging is simple:
avoid setting items_per_page to 1 as there was a bug with that (I haven’t checked recently though)
start_item should be a multiple of items_per_page (I believe start_item starts from zero)
Whatever you request in items_per_page Companies House doesn’t have to abide by it of course - and indeed as you note some values may be deemed too high. I would always check how many results you actually got back, and if your items_per_page was greater than that perhaps set it to that value on your next call.
What goal are you trying to achieve here? Companies House have repeatedly said that the purpose of the API is focussed and limited queries. If you need thousands of results perhaps the API is not ideal for your use-case. There are several alternatives, depending on what you’re trying to achieve e.g. they have bulk datasets plus a streaming API if you want all changes as they occur (or want to maintain your own copy of the entire data set, up to date).