This was kind of a cool trick.

I have a program that has a clear bug.  Unfortunately the easiest test case that I had only showed the bug at the end of a 40 minute run, processing a half-million records.  But if I ran my program again, it was able to notice things it could prove it clearly did wrong the first time.  (It probably did more things wrong, but it only detected some of them.)

I decided to evolve a faster test case.

I first wrote a script to run the program twice, and report how many things it found wrong on the second run.

I wrote a second script that would take a record set, sample half of it randomly, and then run the first script.

I wrote a third script that would take all records, run 4 copies of my test program, and then save the smaller record set that best demonstrated the bug.  Wash, rinse, and repeat.  Once it was no longer able to demonstrate the bug, it would run through its list of "best small examples" from smallest to biggest, trying again, any time it made progress repeating the process.

All three scripts combined come to 150 lines.  I left them running for a few hours and just checked how they were doing.

I now have a dozen small acceptably reproduction cases for my bug, including one with 362 records that runs in a fraction of a second.  (The program is doing statistical analysis, so there is never going to be a single record that shows the bug.  362 records to reproduce it is tiny.)
Shared publiclyView activity