I think a common practice is to break up the terrain triangle mesh into a grid of squares. This allows you to construct a spatial grid or quadtree of triangle meshes and you can use this to easily figure out which sections of the triangle mesh to actually perform collision detection on. Breaking it up into squares can also be good for streaming in and out different terrain chunks if the world is so big you don't want to keep it all in memory at the same time.
For something like a cave, you might need to have a special case that checks if you are in or near the cave first and then performs the collision detection if that special case is met.