xsl-list
[Top] [All Lists]

Re: [xsl] constraining values with a pattern facet in relax ng?

2011-07-08 10:52:18
Dear David,

Given that your rules are complex, I think you need to think not only about how to implement and enforce them, but about who will be doing so ... your code, sure, but also your users?

In other words, I think the best solution to the problem depends partly on who the users are and how they'll be entering the data.

Especially if the rules are complex, I think most users would be happier to enter, say, "101" and "105" and have the machine then figure out to display "101-5", than they would be to remember that this one should be "101-5" but another one should be "31-35".

If this is the case, this implies that you need your XSLT to know how to crunch "101" and "105" into "101-5" rather than to match and validate "101-5". (FWIW, I think your requirements for that would go beyond regex checking, since for example "121-15" would presumably be incorrect. You will need more power.)

It would also make other aspects of your problem easier, such as being able to confirm that the last page has a higher number than the first page (if this is, in fact, a rule), or to see that text numbers, volume numbers and page numbers all align.

But your users may be different, or your use case.

Say they're transcribing entries that already have this info. What if the source data is actually incorrect? Should your scribes be correcting "101-105" to "101-5"?

The bottom line is that I think your life will be easier if you collect the data in the simplest, most granular form possible. The validation (including the interdependency checks) and XSLT will be easier. And unless there's a reason they resist this, your users will probably thank you too.

Cheers,
Wendell

On 7/8/2011 11:26 AM, Birnbaum, David J wrote:
Dear xsl-list,

I'm writing a RelaxNG compact-syntax schema where users can enter a page range 
as the value of an element, and I think (see below for reservations and an 
alternative) that I'd like to constrain the allowed values (integer and 
lexical) with a facet. I'm uncertain about how to proceed, or even whether I'm 
conceptualizing the problem in a useful way, and I'd be grateful for advice.

A page range begins with a start page, which is a positive integer with a 
lexical value that consists of 1-3 digits, where the leftmost cannot be 0. In 
other words, it looks like the standard lexical representation of a positive 
integer.

The page range can end there (that is, if the text falls on a single page, the 
end page is not specified), but if the text spans more than one page, the start 
page is followed by an en dash (I'll write a hyphen in the examples below for 
typographic convenience, but in production there would be an en dash instead) 
and then a second part that indicates the end page. The constraints on the 
value (integer and lexical) of the end page are easy to conceptualize but 
awkward (impossible?) to conceptualize as a regex, which is what makes me 
wonder whether I'm thinking about the problem in a useful way:

1. If the start page consists of 1-2 digits, the lexical representation of the 
end page contains the full value, e.g., 5-6, 5-25, 5-123, 12-15, 12-34, 12-165. 
The lexical representation is the one we naturally expect for the integer.

2. If the first part consists of 3 digits, the second part contains either 2 or 
3 digits, with the 3rd digit appearing only if it is different from the 3rd 
digit of the start page, e.g., 103-06 (which means 103-106, but there must be 
at least two digits, even though the tens value is zero for both the start and 
end pages), 123-26 (which means 123-126; the initial digit is omitted because 
it's the same for both the start and end pages, but the middle digit isn't, 
even though it's the same, because there must be at least two), 123-265.

3. The integer value of the end page (including the first of three digits, 
which may or may not be present in the lexical representation) must be greater 
than the value of the start page. In other words, in 106-09 the 09 must be 
recognized as greater than 106, etc.

Is it even possible to express these constraints with a regex?

If it is impossible, or possible but wrong-headed, here's an alternative: 
Should I have the full start and pages entered in different elements, with the 
lexical space constrained as well as possible with a pattern facet (1-3 digits, 
no leading zero) and then 1) use schematron to verify that the integer value of 
the end page is greater than that of the start page and 2) use xpath/xslt to 
format the lexical representation of the end page? I don't really do anything 
with these values except print them, so having the user enter what I want to 
print seemed more direct, but once I began to think about constraining the 
values, that approach began to look unappealingly (and perhaps even impossibly) 
complicated.

In case that wasn't bad enough, here are three further complications:

1. I know the number of pages in the books in question and would like to 
specify maximum values, so that users couldn't try to enter a range like 456-98 
for a 300-page book.

2. One set of entries consists of a three-volume series, so the page ranges are actually something 
like "II: 123-34", meaning "volume 2, pp 123-234". Each volume begins numbering 
the pages at 1 and I know the last page number for each volume. If I'm going to constrain the 
maximum page value, I'd like to do that in a way that is sensitive to the different lengths of each 
volume.

3. The three-volume series is a numbered set of texts, where, say volume 1 
contains texts 1-84, volume 2 contains texts 85-147, and volume 3 contains 
texts 148-210. The xml contains the text number as well as the page range, and 
I'd like to constrain the page values to be credible. That is, I don't want a 
user to be able to try to assign pages for text #99 to volume 3 because I know 
that that text is in volume 2. (Or should I handle this by not having the user 
enter a volume number, and just inferring that myself from the text number?)

Thanks,

David
djbpitt(_at_)gmail(_dot_)com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--
======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>