What's the plan?

How much machine learning will there be? Is the AI supposed to learn the entire game by itself with minimal human intervention? Or shall the program act exactly as programmed without keeping any state between runs? Or maybe something in-between?
Doing pure machine learning would seem a bit overambitious, especially since The Binding of Isaac is a rather complex game. Therefore I assume the program will mostly follow handwritten routines.

In the beginning it might be a good idea to freeze or slow down the game to give AIsaac more time to think. Computer vision can be quite slow, when it isn't fully optimized.

I'm looking forward to seeing this project unfold. Watching an AI play at (or possibly above) the skill level of an experienced human player would be extraordinarily entertaining.
Yeah, initially at least I don't plan to take a machine learning approach (at least in the playing of the game). I think we'll definitely start with some handwritten routines and iterate towards more classical AI style reasoning. I wan't planning to do the "AI figures out how to play itself via deep learning or whatever" thing with this project though that could be fun in the future.

Right now i'm working on the low level screen capture and the computer vision pieces. This is sort of a big hurdle for me as I've never really done anything CV related before. Trying to make sense of all the different approaches and what might work best here. As you said it needs to be FAST.

I really love the idea of freezing or slowing down the game. One thing that I'd love to be able to do, at least during development is run the game at any speed I want and also be able to save the running game state, rewind to old states, even do looped live update programming of the bot handmade hero style. This is hard though because I think to do that I'd need to emulate the game in some sort of way that lets me snapshot and restart it and for an unknown program that does file accesses and has gpu state and everything it seems like I pretty much need a full virtual machine / os / game stack that gets snapshotted and restarted. I don't know of any way to do that in real time, virtualbox snapshots take forever. I'm really interested in any ideas people have on ways to do that.
Slowing down game is pretty nice idea. kkapture from Farbrausch does this for capturing demoscene video & sound. https://github.com/rygorous/kkapture
https://webcache.googleuserconten...:www.farb-rausch.de/~fg/kkapture/

This was done to capture video & sound from demoscene. Usually they are very CPU intensive, so capturing them with lower fixed rate is more reliable process. So you render everything slower and capture that. Quality will be exactly same if you'll properly slow down time.

It does this by hooking various Windows dll functions responsible for providing time to user application: https://github.com/rygorous/kkapt...master/kkapturedll/timing.cpp#L45
Another program that hooks into timing functions is Hourglass, a program for tool-assisted speedrunning on Windows. https://github.com/TASVideos/hour...32/tree/master/src/wintasee/hooks

Saving the game state might be tricky. Depends on how the game is implemented. I'd try going for low hanging fruits first: Make a copy of Isaac's memory and restore it a short time later. This might let you rewind within a room or even within a floor or it might not work at all.

If the game is sufficiently deterministic, it might be possible to restore the game state by capturing and replaying the input. Obviously it takes a while to play the entire game all over again, but storing the input is way more convenient as a permanently running tool than periodic virtual machine snapshots. The replay could be sped up by temporarily disabling rendering.

Allowing AIsaac to recognize all items and enemies will require a ton of work. You'll need to extract and label hundreds of different sprites. I guess extracting the sprites could be automated, but teaching all the item effects and the enemies' movement and attack patterns seems like a tedious task. Initially it might be good enough to have AIsaac shoot at and avoid everything that isn't part of the background.

The sprites' black outline should make it much easier to distinguish entities from the background. Turning the screen capture into a binary image by simply comparing each pixel to black might be a nice way to reduce the amount of data to push through your CV pipeline without losing much information. I just looked at a few screenshot (I don't own the game myself) and unfortunately there are some visual effects, which influence the color of outlines. Otherwise you could just do an exact comparison and immediately have really high quality data.
Wow, those time slowdown apis are cool! I had thought about slowing it down but speeding it up is a really cool idea too. Disable rendering, speed up time and replay all the same input events could be a good way to "reset" the game, at least to an early on room. (assuming we have a way to reset the game's state)

I think I'll probably focus on the image recognition stuff next, don't really need to reset or anything to test that.

I found a sprite unpacker that will give me a dump of all the sprites from the game files. These seem to be labeled pretty well but a lot of the enemies are created by combining and layering body part sprites and procedurally modifying the sprites color or size.

I like the idea of just starting with separating the background from the entities and just shooting at and avoiding everything that isn't background. Have it move on to the next room once things stop moving. I do want to get it playing asap and continually refine it rather than trying to build out a whole recognition layer first.