I came back to the non-blocking hash set to optimize since there were some bits which I knew I could fix, here's 10 million lookups (9,999,996) in 140ms. the average time for lookup is now 66ns (down from 400ns!), throughput wise it's about 15million lookups/s and I still start the example with 32 entries to stress test the concurrent resizing. Single threaded use of the hashset gives me an average of 55ns and 18.5 million lookups/s with the total time being 640ms:
https://gist.github.com/RealNeGate/7dd84f7b6ef37affedcbacf27bc4e52f