xsl-list
[Top] [All Lists]

Re: [xquery-talk] [xsl] Re: Random number generation : requirements

2014-05-07 01:31:41
I think that a random() in XSLT should be provided in a way that lets
you call several random number generators (of the same kind) in
parallel. Generators may exhibit a big difference between a sequence
where all elements are due to successive calls of the same generator
and one where a sufficient number of generators is called one by one.

For instance: In Dimitre's example: values returned alternate between
even and odd, and using this to generate random points (x,y) in 2D
omits 50% of the possible points. And this is typical for an entire
class of random generators.

-W


On 07/05/2014, Michael Sokolov msokolov(_at_)safaribooksonline(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
My 2c:

I used an XQuery function based on Dmitry's version before; it works
fine although it's a little inconvenient to have to keep passing in the
prior value.

I would say the most convenient (or at least the most familiar)
signature for a random function is random($n) returning a random number
between 0 inclusive and $n exclusive; ideally it would return integers
if $n is an integer, floating point numbers if $n is a floating point
number, empty if $n is empty ? and an error otherwise.  And I would like
a seed function.  Ideally this should be callable many times: I'm not
sure how that could be done non-deterministically though.

I suppose a sequence would be useful, but it isn't the first thing that
leaps to mind.  What if I'm not sure how many I'll need?

For example, one use case for me was to load a huge amount of data, and
only include 1% of it, in order to generate a predictable test data
sub-set. I want to write an XSLT template that returns nothing 99% of
the time, and for the other 1% of the time it processed the content
normally.  I want this to be based on an identifier in the content so
that for a given seed, the same "random" 1% are selected each time: it
should *not* be order-dependent, rather I would like to seed the random
number generator with a hash of a given seed that is a configuration
parameter, and a node-identifier, and then evaluate the next random
number to see if it is > 0.01 (say).  Maybe there are other ways to do
that, but that is what I did using Java.

-Mike


On 5/6/2014 6:58 PM, Michael Kay wrote:
The big problem with a nondeterministic random() function is not defining
the order of execution, but preventing it being optimised out of a loop.
For example, how do we ensure that

$xxx[random() gt 0.5]

doesn't select either all the values or none?

Anyway, we're not planning to do non-determinism. This exercise is about
designing a deterministic way to meet the requirement.

Michael Kay
Saxonica

On 6 May 2014, at 23:48, Michael Sokolov 
<msokolov(_at_)safaribooksonline(_dot_)com>
wrote:

On 5/6/2014 6:41 PM, Michael Kay mike(_at_)saxonica(_dot_)com wrote:
My policy on side effects is: all expressions containing side effects
are going to be evaluated in order

I do something like that in Saxon as well. But I don't attempt to define
what "in order" means; for example, the order in which different global
variables are evaluated. Doing this in the spec would be much more
problematic.

You don't think it would be reasonable to say something to the effect
that the order in which non-deterministic expressions are evaluated is
non-deterministic (ie implementation-defined)? Certainly it would be
reasonable enough in the case of a random number generator.  Although I
suppose if you are going to seed it, you would like the seed to effect
the random numbers that are generated.

-Mike
_______________________________________________
talk(_at_)x-query(_dot_)com
http://x-query.com/mailman/listinfo/talk


--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>