Re: [xsl] constraining values with a pattern facet in relax ng?
2011-07-08 10:52:18
Dear David,
Given that your rules are complex, I think you need to think not only
about how to implement and enforce them, but about who will be doing so
... your code, sure, but also your users?
In other words, I think the best solution to the problem depends partly
on who the users are and how they'll be entering the data.
Especially if the rules are complex, I think most users would be happier
to enter, say, "101" and "105" and have the machine then figure out to
display "101-5", than they would be to remember that this one should be
"101-5" but another one should be "31-35".
If this is the case, this implies that you need your XSLT to know how to
crunch "101" and "105" into "101-5" rather than to match and validate
"101-5". (FWIW, I think your requirements for that would go beyond regex
checking, since for example "121-15" would presumably be incorrect. You
will need more power.)
It would also make other aspects of your problem easier, such as being
able to confirm that the last page has a higher number than the first
page (if this is, in fact, a rule), or to see that text numbers, volume
numbers and page numbers all align.
But your users may be different, or your use case.
Say they're transcribing entries that already have this info. What if
the source data is actually incorrect? Should your scribes be correcting
"101-105" to "101-5"?
The bottom line is that I think your life will be easier if you collect
the data in the simplest, most granular form possible. The validation
(including the interdependency checks) and XSLT will be easier. And
unless there's a reason they resist this, your users will probably thank
you too.
Cheers,
Wendell
On 7/8/2011 11:26 AM, Birnbaum, David J wrote:
Dear xsl-list,
I'm writing a RelaxNG compact-syntax schema where users can enter a page range
as the value of an element, and I think (see below for reservations and an
alternative) that I'd like to constrain the allowed values (integer and
lexical) with a facet. I'm uncertain about how to proceed, or even whether I'm
conceptualizing the problem in a useful way, and I'd be grateful for advice.
A page range begins with a start page, which is a positive integer with a
lexical value that consists of 1-3 digits, where the leftmost cannot be 0. In
other words, it looks like the standard lexical representation of a positive
integer.
The page range can end there (that is, if the text falls on a single page, the
end page is not specified), but if the text spans more than one page, the start
page is followed by an en dash (I'll write a hyphen in the examples below for
typographic convenience, but in production there would be an en dash instead)
and then a second part that indicates the end page. The constraints on the
value (integer and lexical) of the end page are easy to conceptualize but
awkward (impossible?) to conceptualize as a regex, which is what makes me
wonder whether I'm thinking about the problem in a useful way:
1. If the start page consists of 1-2 digits, the lexical representation of the
end page contains the full value, e.g., 5-6, 5-25, 5-123, 12-15, 12-34, 12-165.
The lexical representation is the one we naturally expect for the integer.
2. If the first part consists of 3 digits, the second part contains either 2 or
3 digits, with the 3rd digit appearing only if it is different from the 3rd
digit of the start page, e.g., 103-06 (which means 103-106, but there must be
at least two digits, even though the tens value is zero for both the start and
end pages), 123-26 (which means 123-126; the initial digit is omitted because
it's the same for both the start and end pages, but the middle digit isn't,
even though it's the same, because there must be at least two), 123-265.
3. The integer value of the end page (including the first of three digits,
which may or may not be present in the lexical representation) must be greater
than the value of the start page. In other words, in 106-09 the 09 must be
recognized as greater than 106, etc.
Is it even possible to express these constraints with a regex?
If it is impossible, or possible but wrong-headed, here's an alternative:
Should I have the full start and pages entered in different elements, with the
lexical space constrained as well as possible with a pattern facet (1-3 digits,
no leading zero) and then 1) use schematron to verify that the integer value of
the end page is greater than that of the start page and 2) use xpath/xslt to
format the lexical representation of the end page? I don't really do anything
with these values except print them, so having the user enter what I want to
print seemed more direct, but once I began to think about constraining the
values, that approach began to look unappealingly (and perhaps even impossibly)
complicated.
In case that wasn't bad enough, here are three further complications:
1. I know the number of pages in the books in question and would like to
specify maximum values, so that users couldn't try to enter a range like 456-98
for a 300-page book.
2. One set of entries consists of a three-volume series, so the page ranges are actually something
like "II: 123-34", meaning "volume 2, pp 123-234". Each volume begins numbering
the pages at 1 and I know the last page number for each volume. If I'm going to constrain the
maximum page value, I'd like to do that in a way that is sensitive to the different lengths of each
volume.
3. The three-volume series is a numbered set of texts, where, say volume 1
contains texts 1-84, volume 2 contains texts 85-147, and volume 3 contains
texts 148-210. The xml contains the text number as well as the page range, and
I'd like to constrain the page values to be credible. That is, I don't want a
user to be able to try to assign pages for text #99 to volume 3 because I know
that that text is in volume 2. (Or should I handle this by not having the user
enter a volume number, and just inferring that myself from the text number?)
Thanks,
David
djbpitt(_at_)gmail(_dot_)com
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--
======================================================================
Wendell Piez
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
|
|