At first I managed to get really fast scrubbing, but then I realized that was because only key frames were being previewed by IMFSourceReader_SetCurrentPosition. Things slowed significantly once I started decoding all the frames around the cursor. It made me realize why scrubbing is generally slow in most video players, frames need to be decoded in real time when you seek, as all the frame information isn't actually present in the video file. Mainly key frames and diffs.
But that made me think of a slight optimization, outside of caching all frames, you could just display key frames while scrubbing quickly, then decode all frames if scrubbing really slow. Practically, human speed of mouse cursor movement should be enough information to do this well. If I dont manage to actually finish the trimming part (handling video using the MediaFoundation api is a bit trickier than I realized), I'll try implementing the seek optimization. &ZVideoTrimmer