Fixing & Aggregating Broken OpenAPI specifications

Fixing & Aggregating Broken OpenAPI specifications

The API specifications for Companies House contains a few issues the make them non-user friendly to use. For example, JSON References in the specifications (specs) uses a loopback address - 127.0.0.1:10000 - rather than the actual developer spec address - developer-specs.company-information.service.gov.uk. (This makes sense from a development and testing standpoint, simply cd to the directory where all the specs are, and serve an HTTP for 127.0.0.1:10000 (e.g. using python -m http.server )this is a headache for clients trying to download the specs). This issue is compounded with how distributed the files for the various API products are making it. For example Companies House Public Data API product by itself being distributed over 22 files! This result in aggregations of specs being tedious and time consuming specifications and has been as issue brought up on the forum for years:

Further to this, some of the specs have technical issues in them and require formating to make them usable. For example, of the Companies House Swagger specifications contain data types that are non-standard for Swagger 2.0 specifications; for example, date is used as an type rather than a string format, booleans are declared as string enums, et cetera.

In order to fix these issues, I wrote companies-house-codegen, a code generator library in Python, that can dynamically downloads all the files related to a specification, formats them, compresses them into a single Swagger 2.0 specification file, and even convert the specification to OpenAPI 3.0.1 to make it usable by modern code generation tools like OpenAPI Generator, Speakeasy, and Stainless.

It is available on the Python Package Index (PyPI), is fully documented, has minimal dependencies, works on Python +3.8, and has a convenient CLI tool.