That's not how clang works. It creates objects specialized for target architecture at beginning and there is no "unused code" to fetch or incorrectly predict (most of the time).
The slowness comes from way how compiler is structured / architected. Not because it is cross compiler.
It has very separate optimization vs code generation passes. Sometimes applies same optimization steps multiple times. Etc..
Even worse then