xsl-list
[Top] [All Lists]

Re: [xsl] Formatting string

2007-05-16 06:38:59
Abel Braaksma wrote:
Jesper Tverskov wrote:
It is impossible to come up with a REGEX that can handle any
combination of upper case and lower case. What about PaulMcCartney or
JFK? If pascal notation is not used, XxxxXxxxx, or a similar strict
pattern, a REGEX solution is only possible if we know all input
strings from the start.


all provided solutions work with any combination of upper case and lower case. Which of the examples did you try?

PaulMcCartney  would become Paul Mc Cartney with any of them.

Perhaps I misunderstood what you are implying (should Mc Cartney be written McCartney? I didn't know). But if you mean that you want a list of exceptions that do not need to be split into words, then you are right: you'll need that list. We know little from the OP, we are only guessing here. I.e., is the string in one field, or is it part of a larger string? Should consecutive capitals be ignored or not? Are there exceptions? Can a string contain non-latin characters, or punctuation? I.e.:

1. O'Reilly        >>>> O'Reilly
2. McDonald's      >>>> McDonald's
3. Paul McCartney  >>>> Paul McCartney
4. J.K.Rowling     >>>> J.K. Rowling (?)
5. JKRowling       >>>> J K Rowling (?)
6. JFK             >>>> JFK
7. BankOfUSA       >>>> Bank Of USA

1, 5 and 6 go well with my last regex, using "\{Lu}+".

For the rest, I think you need an exceptions list, which you can place as alternates at the start of the regex (which may yield funny results when the OPs text is from a larger corpus).

But all I'm doing is guessing on the requirements. Perhaps Babu will enlighten us? ;)

Cheers
-- Abel Braaksma


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>