Company Name - Reg ex pattern

The Companies House XML Gateway lists data formats in their data schema (these are available from Companies House XML Gateway Input - Schema Status) for at:
http://xmlgw.companieshouse.gov.uk/v1-0/schema/chbase-v2-5.xsd

The Registered Company Name - minLength 1, maxLength 160

They capitalise company names (see http://forum.aws.chdev.org/t/formatting-of-company-names-using-all-caps/2051), but I believe search is case-insensitive here. In theory they’d normalise spacing e.g. convert multiple spaces to single but it seems this is not the case (http://forum.aws.chdev.org/t/names-with-double-spaces/1539). I don’t know if the matching algorithm ignores number of spaces but suspect so.

company names are more of mix of special characters

Do you mean that they allow characters from the unicode range? Here’s an example:

FC011780 - this is COÖPERATIEVE RABOBANK U.A.

however just concerned if we add XML tag or some character with different format (different language…)

I’m not sure what you’re asking - the JSON input to search in the API is in the UTF-8 character set. The search seems to be reasonably sensible e.g. searching for “Cooperatieve Rabobank” (without umlaut o) brings up the company FC011780 above as the first item. I don’t know how far you can take this - but then I believe most company names are registered using standard ASCII characters.

I don’t know what you’re referring to with the “XML tag” part. Is that something to do with the XML gateway? If so then maybe try posting to that forum.