I could use an extra set of eyeballs, if anybody's feeling charitable this morning. A friend/colleague from our sales department asked me to help him find a way to guide he and his team to more easily identify redundant accounts in our system. Duplicates are usually caused by poor effort to search for an existing account, so the rep just creates a new one instead. I thought maybe edit distance could help to identify similarly named accounts, so I hacked together the attached.
There are, in total, just shy of 121 billion comparisons to be made, so I left it running overnight. More than a third of the region jobs had completed before I left last night. I wasn't profiling, but I monitored the memory and CPU consumption in Task Manager and that appeared stable/not leaking. When I came in this morning, it had crashed. There isn't much useful information in the crash report because it's so long that I can't scroll up to see most of it, and it appears that there were nearly 9000 goroutines? Here's the last one (all of those which I can see are identical except for the goroutine number):
goroutine 8951 [runnable]:
main.process(0xc085f02470, 0x2, 0xc088385000, 0x5903, 0x5ae8, ...)
created by main.main
Am I leaking goroutines? I'm confused because I watched for a few hours and it never consumed more than around 160 MB of memory and was typically down around 147.
A note: the package "files", which should be the only one you don't recognize from the imports, is a helper package I use as I very frequently process files at work. The function used (files.ReadCSV) takes an io.Reader and a function, reads each line of CSV input from the Reader and passes it line by line to the function argument.
I hate sharing code that was hacked together quickly like this, so cut me some slack if it's horrendous, please :). This is not production stuff, just a hack for a friend.