User talk:HalFonts
Sorting ballots
Hi. In those 100,000 ballots, what are you sorting on? Do they have precinct numbers written on them, or are they absentee ballots that must be sorted by voter's address? If they have (or most of them have) a precinct number on them, then a radix sort could be reasonable. (Reply here; I will see it.) Glrx (talk) 15:23, 19 November 2010 (UTC)
Reply
Unfolded (7"x16") vote-by-mail paper-ballots have 120 distinct non-sequential 3-digit Precinct Numbers (labeled 101..801) printed on face-side foot-border (orientations are mixed). Precinct sizes vary (60 to 2500 population). We are looking at 1 county-wide Race, with 2-Candidates -- plus "Bads" (Under-votes, Over-votes, Write-Ins, comments).
Normally ballots are processed in random (arrival) order in batches of 100-120; machine-counted, with Results sorted later by the Tally/Reporting DataBase Software. For this mandatory Hand-Recount, result totals must be by Precinct.
Sort order is not mandated. We could first sort into 120 Precincts; and then Sort-Count each Precinct by Candidate (A,B,Bads). Or, we could Sort all 100,000 by Candidate, then Sorting each Candidate into Precincts. It's assumed that we would first sort by Precinct -- because the double operator A-B Sort+Count with Exchange and Verify process is traditional.
We can staff 15-20 experienced human Sorters, very adept at checking stacks of ballots. Physical Binary sorts go very fast; more than 6-8 piles slow down. Though, direct sort into a 10x12 array of 120 pigeon-holes is not out of the question. 120 piles on tables could involve miles of walking
Working with these numbers, manually at (paid) human speeds, efficiency can be much more critical than using EDP on such data. I've some CompSci/EDP training but this Real-World application has become a good puzzle. Thanks for any suggestions. HalFonts (talk) 23:07, 20 November 2010 (UTC)
Reply
First, can your machines do the sort? Sometimes the machines have two output bins. I wouldn't waste a lot of time looking at this possibility. A binary sort would have to do 7 passes on all of the ballots. If the machines are fast (or you have many of them) it may be viable. Moving on.
Yes, I would sort the ballots by precinct first and then examine and report the result for each precinct.
To sort the ballots, I'd first divide them by the hundreds precincts -- separate piles for 100s, 200s, ..., 800s. All the sorters could work that task in parallel. Then I'd take each pile, and separate them into decades piles. The result is about 80 piles of 1,250 ballots each. Then I'd separate each of the 80 piles into individual precincts. The result is each ballot gets examined 3 times for 300,000 looks (15,000 per sorter).
During each pass, the sorter would be distributing the input to 10 piles. There is a tradeoff about speed for a given number of piles. If a binary separate is more than 7/3 (about 2.3) times faster than a decade separate, then doing a binary separate could be better. An efficiently done binary sort would need to examine each ballot 7 times (27 = 128 -- more than the number of precincts (100)). At each decision point, the sorter would put precincts less than some number in one pile and the rest in the other pile. The split points should be chosen so there are no more than 64 precincts in any pile at the first split, no more than 32 at the second split, 16 at the third split, ... 1 at the seventh split. The last two split might be combined -- for example, separate into four final piles at split six rather than making the seventh split. Countering that, about 20% of the precincts won't need a seventh split (they will be unique at the sixth split).
A naive binary sort (eg, based on the three digit number) would take 11 looks per ballot. Many piles would terminate early.
I don't know about the relative speeds. Do you think a binary split is about 2.5 times faster than a decimal split?
Glrx (talk) 03:58, 21 November 2010 (UTC)
Reply
That's much as I was thinking. with some confirmation of my rusty math. This has to be simple, direct and straightforward, to get approval and for worker training. Complex algorithms or structures could be disastrous.
Counting machines run at MAX 400-ballots/minute (2000-4000/hour throughput ???), BUT likely only drop 1 Pct per pass -- (Hmmm 2-OutBins, MUST VERIFY AVAILABLE LOGIC !!!)
Major efficiency: Precincts are only: 101:157, 201:247, 301:303, 401, 501:505, 601:605, 701, 801 -- so that can greatly simplify bins if divided up correctly. Pcts 101:247 have 85% of population.
1st sort: 1nn, 2nn, (3nn, 4nn, 5nn, 6nn, 7nn, 8nn)
2nd sort: n0n, n1n, n2n, n3n, n4n, n5n
3rd sort: nn0, ... nn9 (Now got Precincts)
4th sort: Candidate-A, Candidate-B, Other-Bads
5th pass: Count
6th pass: Verify (A,B Sort and Counts)
I guess it depends on the most efficient number of bin-piles (2, 3, .. 10 or more), that workers can handle without slowing for confusion. Our workers check ballots every election, (looking for voter-intent, errors, contamination and stray dots) -- they're very good. I think they can sort to an array of 2x5=10 piles on a table (or into 10 vertical finger-slots).
Thanks much for ideas and encouragement. HalFonts (talk) 06:17, 21 November 2010 (UTC)
Additional
Looks like a Radix Sort, using 2, 5 or 10-bins. Now, physical reality enters the picture.
Normal processing (Opening, Checking) has Worker seated at a 2'x5' table with a source-pile in front, working left and right into 1 to 4-piles, 8-hrs/day.
- A binary-sort follows this fast efficient work-flow, with a simple flick of the thumb and wrist ("<" to the left; ">" to the right).
- Sorting to 8- or 10-bins, may require standing with a full-arm-reach motion (8-hrs/day). Special equipment (furniture, pigeonholes, buckets) might help.
- Sorting to 4- or 5-piles may be an optimal compromise. I can estimate ballots/hour factors.
HalFonts (talk) 18:12, 21 November 2010 (UTC)
Reply
I see no problem with your proposal. I don't know about sorting speed vs number of bins, so I cannot help you there. There is a nice item that three passes of five bins should be enough (53 = 125 > 120 precincts).
I'd be tempted to sort on the middle (ten's) digit first. It's a six way sort. The n0n-pile needs special handling, but the remaining piles only have 20 alternatives left. I might sort them as 1d0-1d4, 1d5-1d9, 2d0-2d4, 2d5-2d9 (4 bins) and follow with a 5 bin sort to separate out the units.
But 4 to 6 way sorts may be just as bad as a 10-way.
Good luck. Glrx (talk) 19:29, 23 November 2010 (UTC)
Reply
We escaped the Hand-Recount by half-a-dozen votes -- only a Machine-Recount is mandatory (two days of three-machines sorting 100-cartons of archived boxes. It'll go fast with minimal manual logging and ballot-checking this time -- and of course any sorts will be by the usual "merge and reporting" database software. Whew.
Nonetheless, I learned quite a bit and will try to summarize ideas in a report for the record (next time). It became a review of mathematical and algorithmic fundamentals, quickly merged with the real-world realities of available non-uniform keys, manual sort limitations, operator training and physical considerations.
1. Sequencing on Precinct Numbers makes the best sense -- and it's all we've got. One proposal was to pre-sort first by "District" but that required a "look-up table" -- as Precinct Numbers are not all subordinate to "District." Humans are slow and intelligent, but with limits. For humans to memorize or look-up each out-of-sequence Precinct-number, would have destroyed any "District" logic or productivity gain.
2. Sorting by digit, as you suggest makes good sense, using multiple "piles." A slightly-different variation is to define it to Sorters as "<nnn" or ">nnn" break-points (<115,>115) -- depending on current source group characteristics.
3. Humans can sort into 1 to 4 (or 5?) piles easily, however, more than that with large physical ballots becomes tedious and clumsy, affecting productivity. Special furniture (shelves, finger-racks) might help.
4. For efficiency, resulting "piles" should be as equal as possible, maximizing every laborious "pass" through the huge mass of ballots (data). Since Precinct populations vary (until re-balancing after the next census) sort-keys (digits or break-points) should produce equal piles of "ballots" rather than "precincts."
5. Math does matter. Theoretically sorting into 3-bins (vs. Binary-Sort, 2-bin splits)requires significantly fewer passes; and 3-piles are almost as easy as 2). Beyond 5-piles mathematical gains are less, and physical-motion handicaps rapidly increase.
6, Economics for 15-25 workers, perhaps some two-days work; were not as severe as I expected. However, something like this could easily explode way out-of-control, with larger ballot-counts, complex keys or inefficient workers.
7. Having two Sorters do a double-Sort-Count {Sort, Count, Sort, Count, Verify} is standard for Manual-Counts. I'm not sure if it is required for the initial Precinct pre-Sort. Precincts might be Verified as part of the last Candidate (A-B) Sort-Count process.
8. Other labor requirements include Supervisors, Ballot Handlers, and Observers.
So, were I in-charge, responsible for producing accurate results within budget, I'd study the keys closely, (with a careful eye towards population, and natural break-points). I'd confirm worker productivity rates, using 1-5 piles (5, while decimally useful) may be pushing a physical productivity limit (always need one more pile for "exceptions and weirds." I'd then set bin ranges for each pass, being very careful to simplify -- for both Management and Worker acceptance/approval.
Thanks much for your help, Glrx. ````
Reply
Well, congratulations on being lucky.
4. I disagree. It is a statement that requires proof. If one precinct were 99% percent of the ballots, then I'd want to separate that precinct from all the others first. That's unequal 99% and 1% piles. In one binary pass, I would have done 99% percent of the sort. Even if I needed 20 passes to sort the remaining precincts (one percent of the ballots), I would would have sorted all the ballots in effectively 1.20 passes. If I split to make the number of precincts in each pile (not the number of ballots) about equal, then I can guarantee that I don't need to look at any ballot more than N times. There may be a more efficient sort that looks at some ballots < N and others > N times. (It's a Huffman coding problem.)
7. I agree. I would keep the sorting simple in the beginning (to minimize mistakes) and only verify the sorts at the end.
Glrx (talk) 01:50, 27 November 2010 (UTC)
Reply
Another Co-conspirator also questioned #4. I agree, if Precinct sizes vary greatly. However in this application: Precincts are supposed to have "approximately equal" populations. Due to evolving boundaries between 10-yr Redistricting, our Precincts now vary randomly (24 to 2800)-- most Avg 970 Voters/Pct. My thinking was: In the extreme, extracting one big-Precinct out of each Pass would require <=119-Passes vs. 7-binary-Passes. Extracting the bigger-ones early could help, IF there are simple-rules for which PctKeys to sort-out first. I'll look at that.
Another efficiency would be that the Race being Recounted involves only 83 of 120 Precincts; however I have no simple-rules for defining which Precincts.
Back to the Wikipedia "Sorting Algorithms" -- I may attempt a brief reference to "Physical Sorts." There are many sorting tasks (Apples, Ballots, Logs, Lumber, Mail, etc.) that are done manually. When large populations, these can benefit from underlying mathematical principles -- while recognizing physical limitations. It takes both.
HalFonts (talk) 19:28, 28 November 2010 (UTC)
Reply
I spoke with a scary smart friend. His suggestion is using an old-technology needle sort / pin sort. Plan for a manual recount by punching 7 holes or notches in the ballots when they are printed (representing a binary encoding of the 120 precincts). Needles can then be used to separate the holes from the notches. (I don't think there is a Wikipedia article on this technology.) A needle is driven through a stack at one of the positions. The needle is then lifted. The cards with notches at that position fall away; the cards with just holes stay on the needle. One can process a hundred cards at a time....
If one wanted to be cheap, one could just print the holes and notches on the card. If there is a manual recount, the first pass could be manually punching the holes and notches -- it wouldn't require much thinking. Then all the cards could be needle sorted.
Glrx (talk) 05:55, 29 November 2010 (UTC)
I'm familiar with the notch-pin physical sort method. Problem is the (7"x16"? paper) ballots are precision formatted for high speed machine reading by the (400 ballot/minute) counting machines. It might take "an-act-of-congress" to modify formats, and physical holes would risk jamming. Unfortunately, the basic problem has escaped solution, until it surfaces again. HalFonts (talk) 22:50, 1 May 2011 (UTC)
- OK. At least you ducked this time out. Glrx (talk) 23:36, 1 May 2011 (UTC)
Sandra Fluke?
What are you talking about with her article? The current edit has a section on her speech, Casprings (talk) 05:57, 7 September 2012 (UTC)