Hello
I am trying to read the iXRBL files that you can get from the Companies House page but when I try and use the code that has worked perfectly for all the months before (Except Oct 2024) I am getting fail after fail.
The main error says Failed to process zip: @‘noneType’ object has no attribute ‘groups’
I am sure this is a data fault at creation but Companies house pointed me here.
Does anyone else have this problem for May?
To check the code I decided to re download and process April. Works fine.
I then though I would try and download the data day at a time. Only the first of the month worked. So I know my code works when the file is structured correctly.
Any ideas anyone?
Thanks
I have gone back to the basic code
import httpx
from stream_read_xbrl import stream_read_xbrl_zip
# A URL taken from http://download.companieshouse.gov.uk/en_accountsdata.html
if __name__ == '__main__':
url = 'http://download.companieshouse.gov.uk/Accounts_Bulk_Data-2023-03-02.zip'
with \
httpx.stream('GET', url) as r, \
stream_read_xbrl_zip(r.iter_bytes(chunk_size=65536)) as (columns, rows):
r.raise_for_status()
for row in rows:
print(row)
and run it on the 2025-05-03 file (and changed to https) and it wont even print to the screen. It runs for a about a thousand records then crashes. I dont seem to be able to filter the error as it reports it is in the stream_xbrl_zip file.
The code crashes before I can filter the error.
Thanks
Hello
For those who are having the same problem you need to modify the stream_read_xbrl.py file at row 423 onwards.
context_dates = {
e.get('id'): _get_dates(period)
for e in document.xpath("//*[local-name()='context']")
for period in e.xpath("./*[local-name()='period']")[:1]
}
fn = os.path.basename(name)
mo = re.match(r'^(Prod\d+_\d+)_([^_]+)_(\d{8})(?:_[^.]*)?\.(html|xml|zip)', fn)
if not mo:
raise ValueError(f"Filename does not match expected pattern: {fn}")
run_code, company_id, date, filetype = mo.groups()
allowed_taxonomies = [
'http://www.xbrl.org/uk/fr/gaap/pt/2004-12-01',
'http://www.xbrl.org/uk/gaap/core/2009-09-01',
'http://xbrl.frc.org.uk/fr/2014-09-01/core',
]
core_attributes = (
run_code,
company_id,
_date(date),
filetype,
';'.join(set(allowed_taxonomies) & set(root.nsmap.values())),
)
Specifically I altered
fn = os.path.basename(name)
mo = re.match(r’^(Prod\d+\d+)([^]+)(\d{8})(?:_[^.]*)?.(html|xml|zip)', fn)
if not mo:
raise ValueError(f"Filename does not match expected pattern: {fn}")
now it doesnt crash. It skips the bad data. I have tested it on the month file of 2025 05 and it worked.
Neil