Seems like I've addressed the bugs I had. Also found an updated version of the paper about sampling the microfacet normal.
Currently doing 8 samples per sub-pixel with 2x2 MSAA which effectively makes it 32 samples per pixel. The variance on rough materials are too high for my liking but seems like it could be improved with a more sensible lighting scheme.