I have pointed this thread to Asmus Freytag who is a author of UAX#14.
Follwings are reply from him, I copied them with the permission
from Freytag.
Mr. Eliot:
In thinking about it, I think that the Annex 14 rules are stated in such
a way that the rules are appropriate for languages that do not use space
to determine line breaks without explicitly disallowing Western-style
line breaking behavior.
Kobayashi:
The question is UAX#14 is appropriate for Western language or not.
(Freytag until the end of body of this mail:)
The answer to that is YES. The whole idea about UAX#14 is to have a single
default algorithm that does well in a Western (space based) and East Asian
environment, by giving special treatment to characters that are of concern
in both environments.
The results should be usable in standard text handling, perhaps with minor
tailoring as suggested in the document.
High-end publishing systems may need to apply some additional tailoring.
These systems often give users a choice of line-breaking rules. There may
be some languages that require tailoring in specific situations.
In the message you pointed me to, the following statements were made:
For background, Annex 14 is very permissive, implicitly allowing line
breaks wherever they are not explicitly disallowed and does not, for
example, disallow breaks following closing punctuation, allowing for
example, this break:
"e.
g., a thing"
That is, Annex 14 allows this break, even though it would be wrong in any
Western language I'm familiar with.
However, the statement is incorrect. UAX#14 allows breaks after closing
punctuation, but not if it precedes alphabetic characters.
There are no breaks in "e.g.", but there is a break in "...tailoring.
These...", since there is a space after the ".".
Annex 14 is also informative--it does not require conforming Unicode
implementations to implement the Annex 14 rules except for those
characters that have normative line breaking properties, such as line
separator and soft hyphen.
This statement is correct. The rules in UAX#14 define what I would like to
call for the purpose of this discussion a 'best default practice with
normative nucleus'.
Some of the rules (and the properties they are based on) describe behavior
that is required. Usually, this is limited to special behaviors, such as
the non-breaking behavior of the NO BREAK SPACE for example. Without such
requirements, users would not be able to rely on the use of NO BREAK SPACE
to express the kinds of linebreak behavior for which NO BREAK SPACE has
always been intended.
However, many of the other rules are subject to customization (tailoring)
to fit the requirements of particular languages more precisely, or to match
the needs of a particular in-house style at a large publisher's. In other
words, the main reason that those rules are informative is that there is no
single set of rules for line breaking, often not even a single one for a
given language.
However, using UAX#14 as the starting point will allow an implementation to
cover all Unicode characters, so that texts with foreign material inserted
will behave quite reasonable, without the need for all implementers to
become experts in *all* languages. In some instances, a small amount of
tailoring will be useful if texts are known to be predominantly in a given
language which has special requirements.
--------------Up to here------
Best regards,
Tokushige Kobayashi
Antenna House, Inc.
E-mail koba(_at_)antenna(_dot_)co(_dot_)jp
WWW http://www.antenna.co.jp/XML/xml-top.htm
WWW http://www.antennahouse.com/xslformatter.html (English)
TEL +81-3-3234-1361(direct call)
FAX +81-3-3221-9975
Antenna House XSL School
http://www.antenna.co.jp/XML/school/xslday.htm
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list