PROBABLY HAVE ALL THE DATA YOU WANT ALREADY. Access companies house data using natural language (NO CODE)

Not selling anything… I’m just asking for help, guidance, support, approval…

I created a tool that allows user to access companies house data using natural language.

So I already have the python code that automates collecting the company data, officers, people with significant control and accounts. It is all kept up to date in a self managed MySQL database. Probably what some of you are trying to achieve automating yourself… I have already done this, and built in the streaming api’s to keep the data updated.

Here is a link to the tool https://www.datadini.ai (it will only work on desktop).

I initially designed this for small businesses, startup, solo-entrepreneurs to make data driven decisions effortlessly, and affordably. Its designed to be as simple as possible.

The data is updated in real time using the steaming api’s, so it is updated within minutes of the stream data being provided.

All xbrl files have been translated into database format. That’s over 4million company accounts available in formattable text.

If you would like to test the prototype try typing this simple query into the chat “show me companies beginning with a”.

My ambitions were to have more complex queries such as “show me companies with a turnover of over £100,000 and all directors aged over 70 years old”, however it has been a struggle to consistently get the AI to perform as well as I expected.

Slowly it is improving, however after working on it single handedly for nearly a year, I am running out of steam. It seems a shame to loose all this data, and all the tools created to keep it up to date.

In short, the processes include automating the downloads of snapshot data, transforming, normalising, cleansing and uploaded to a self managed MySQL database. The streams run continuously, taking the 23 hour break to comply with the rules. The most complex part is identifying which data provided by the streams, matches the snapshot data.

I have all the company profiles, officers, people with significant control and all xbrl accounts files converted to mutable text.

To the best of my ability, I am abiding by Companies house’s terms of use because the data is kept up to date, accurate and copywrite statements display the source of the data.

I either would like help continuing with this, maybe funding to keep it going if anybody finds it interesting, approval from companies house to say what I am doing is permitted, or I will just shut it down and pivot to something else.

I spend a year of my life building this, while earning nothing. Now it is potentially all going to go to waste.

Let me know what you think.

Hi Huwl, love the idea man, sounds amazing. Though when I tried to access your site it says not accepting new registrations at the moment?

Hi James.

I really appreciate you showing an interest and your kind words.

I decided to make the data accessible to everyone, for free, without any need for registration.

This way anyone can get up to 50,000 rows of results for each “finessed” successful query.

Despite all my efforts to get the project goin, lack of interest means that I have very recently shut off any real time updating.

Still, the source code is in tact to rebuilt the database at anytime, and link in the live updates from the companies house stream.

I’m not sure if I would be breaching any polices on this forum by mentioning that I would love to see my work going to some good use, after all the hard work put in. If you are interested in more information, please send a message via the contact widget in the bottom right hand corner of the website, leaving your email, and I will get in touch.

Kind regards

H

Hi H,

Yeah I think it has real utility in the lead generation space, where agencies often get company data for clients via Facebook lead forms etc and then usually someone is manually going through the leads to see if they are any good.

But hard to match a random text input company name to an actual company and also would be good to scrape company accounts and filter out dissolved companies etc

Sure, I’ll reach out on your website!

Thanks
James

Hello Huwl

Read your post.

I have an organisation that publishes the top 100 companies in turnover by county.

If your tool can produce this we can identify sponsorship to raise funds to pay for your tool.

Please message me at therestistech dot com or connect with me on linkedin (therestistech).

This was a pretty cool way to search , and I say that even though I built a competing tool of my own at convert-ixbrl.co.uk around the same time. A key difference though is that mine doesn’t use AI, relying on traditional conversion methods to turn submitted financial statements into a searchable format. That was the only way I could get reliable search results so I admire the work you had done to get the natural language search working.

Your search seems to be turned off now :frowning:

Hi,

Thank you for showing an interest. I wasn’t maintaining this for a long time, but I am bringing it back to life now. If you would like to discuss any further, please leave your email in the chat on my website.

Kind Regards
Huw

1 Like

Is this back up and running?

I’m trying to get my own API key, but this tool seems awesome!

Thank you.

The Company Data table and the People with Significant Control table are updated within seconds of any changes being published by Companies House.

The Company Officers and Company Accounts tables have not been updated since November 2024, but I expect to bring them fully up to date within the next month.

Please be patient with the tool, queries targeting data you know exists in a single table will return results much faster than those requiring information from multiple tables. The Company Accounts table contains over 400 million rows, so it’s currently slower to process.

If you would like access to any specific datasets, just let me know.

Thank you,
Huw

image

If you don’t need very fresh data, you can try the Company House dataset on kaggle
It contains officers, pscs and some piece of accounts data