Companies House Bulk Data

Morning to all.
I am trying to read the data from the May bulk data files Companies House
Bu when i run the code which I have been running with no changes at all none of the files can be opened and imported.
I am getting a lot of errors see below as an example.
Traceback (most recent call last):
File “/usr/lib/python3.11/concurrent/futures/process.py”, line 256, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/neil/.local/lib/python3.11/site-packages/stream_read_xbrl.py”, line 432, in _xbrl_to_rows
run_code, company_id, date, filetype = mo.groups()
^^^^^^^^^
AttributeError: ‘NoneType’ object has no attribute ‘groups’
“”"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/var/www/investor/storage/accounts/sr3.py”, line 15, in
df = pd.DataFrame(rows, columns=columns)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/neil/.local/lib/python3.11/site-packages/pandas/core/frame.py”, line 843, in init
data = list(data)
^^^^^^^^^^
File “/home/neil/.local/lib/python3.11/site-packages/stream_read_xbrl.py”, line 557, in
yield _COLUMNS, (
^
File “/home/neil/.local/lib/python3.11/site-packages/stream_read_xbrl.py”, line 549, in imap
yield queue.popleft().result()
^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/concurrent/futures/_base.py”, line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/concurrent/futures/_base.py”, line 401, in __get_result
raise self._exception
AttributeError: ‘NoneType’ object has no attribute ‘groups’

None of my code has changed. I asked companies house and they referred me back to here.
Anybody else encountered this recently?
Thanks
Neil

Hi,

Can you please email this issue to bulkproducts@companieshouse.gov.uk please.

SDN

Good morning
I have already sent this issue to that email address along with the solution which was to alter the stream_read_xbrl.py file.

Specifically I altered
fn = os.path.basename(name)
mo = re.match(r’^(Prod\d+\d+)([^]+)(\d{8})(?:_[^.]*)?.(html|xml|zip)', fn)

if not mo:
    raise ValueError(f"Filename does not match expected pattern: {fn}")

run_code, company_id, date, filetype = mo.groups()
allowed_taxonomies = [
    'http://www.xbrl.org/uk/fr/gaap/pt/2004-12-01',
    'http://www.xbrl.org/uk/gaap/core/2009-09-01',
    'http://xbrl.frc.org.uk/fr/2014-09-01/core',
]

which starts at line 429.
It now works.