Another program that hooks into timing functions is Hourglass, a program for tool-assisted speedrunning on Windows. https://github.com/TASVideos/hour...32/tree/master/src/wintasee/hooks
Saving the game state might be tricky. Depends on how the game is implemented. I'd try going for low hanging fruits first: Make a copy of Isaac's memory and restore it a short time later. This might let you rewind within a room or even within a floor or it might not work at all.
If the game is sufficiently deterministic, it might be possible to restore the game state by capturing and replaying the input. Obviously it takes a while to play the entire game all over again, but storing the input is way more convenient as a permanently running tool than periodic virtual machine snapshots. The replay could be sped up by temporarily disabling rendering.
Allowing AIsaac to recognize all items and enemies will require a ton of work. You'll need to extract and label hundreds of different sprites. I guess extracting the sprites could be automated, but teaching all the item effects and the enemies' movement and attack patterns seems like a tedious task. Initially it might be good enough to have AIsaac shoot at and avoid everything that isn't part of the background.
The sprites' black outline should make it much easier to distinguish entities from the background. Turning the screen capture into a binary image by simply comparing each pixel to black might be a nice way to reduce the amount of data to push through your CV pipeline without losing much information. I just looked at a few screenshot (I don't own the game myself) and unfortunately there are some visual effects, which influence the color of outlines. Otherwise you could just do an exact comparison and immediately have really high quality data.