Re: [xsl] Formatting string

Abel Braaksma wrote:

Jesper Tverskov wrote:

It is impossible to come up with a REGEX that can handle any
combination of upper case and lower case. What about PaulMcCartney or
JFK? If pascal notation is not used, XxxxXxxxx, or a similar strict
pattern, a REGEX solution is only possible if we know all input
strings from the start.

all provided solutions work with any combination of upper case andlower case. Which of the examples did you try?


PaulMcCartney  would become Paul Mc Cartney with any of them.

Perhaps I misunderstood what you are implying (should Mc Cartney bewritten McCartney? I didn't know). But if you mean that you want a listof exceptions that do not need to be split into words, then you areright: you'll need that list. We know little from the OP, we are onlyguessing here. I.e., is the string in one field, or is it part of alarger string? Should consecutive capitals be ignored or not? Are thereexceptions? Can a string contain non-latin characters, or punctuation? I.e.:


1. O'Reilly        >>>> O'Reilly
2. McDonald's      >>>> McDonald's
3. Paul McCartney  >>>> Paul McCartney
4. J.K.Rowling     >>>> J.K. Rowling (?)
5. JKRowling       >>>> J K Rowling (?)
6. JFK             >>>> JFK
7. BankOfUSA       >>>> Bank Of USA

1, 5 and 6 go well with my last regex, using "\{Lu}+".

For the rest, I think you need an exceptions list, which you can placeas alternates at the start of the regex (which may yield funny resultswhen the OPs text is from a larger corpus).

But all I'm doing is guessing on the requirements. Perhaps Babu willenlighten us? ;)


Cheers
-- Abel Braaksma


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--