[xsl] Streaming XSLT for Vote Tabulation

Perhaps of interest to some who live in counties that have an upcoming
election, I’ve been exploring the use of XSLT 3.0 Streaming for the
processing of very large data sets produced by voting equipment. These data
sets are commonly referred to as “Cast Vote Records” and describe
selections on ballots, the number of votes those selections represent,
their countability (among other things). NIST has released a Common Data
Format specification <https://github.com/usnistgov/CastVoteRecords> for
such records that can use XML.

There has been some concern that the XML representation of this information
is simply too large to process effectively. To test that premise, I
developed a test deck generator and tabulator
<https://github.com/HiltonRoscoe/CDFPrototype/blob/feature/streaming-cvr/CVR/STREAMING.md>
capable of naïvely tabulating the contests. Both are written in XSLT 3.0. I
generated test decks of various sizes to get a better idea of the
scalability of different processing approaches.

The first approach is to load the entire CVR set in memory and operate on
it. The second approach is to “burst-mode” stream each CVR using XSLT 3.0
Streaming. I ran each transform 25 times for each set and averaged the
results.

*1k*

*10k*

*100k*

*200k*

*300k*

*Stream*

313/sec

1237/sec

1778/sec

1762/sec

1785/sec

*Nostream*

379/sec

1536/sec

1487/sec

1974/sec

*INPUT Size (CVRs)*

10,039

100,005

999870

1,999,911

2,999,951

*INPUT Size (Megabytes)*

~10MB

~100MB

~1000MB

~2000MB

~3000MB

(Throughput for different input sizes)

*Inputs*

*Time*

*1k*

Precinct