I've never done clipping manually, but in OpenGL (and probably DirectX and other apis) after the perspective projection you're in
clip space.
In clip space, points that are "valid" (are in the view frustum) have their coordinates between -w and w (the 4th component of the transformed point), not -1 to 1. So points with coordinates not in the -w to w range need to be clipped (and produce new triangle if needed...).
After you are done clipping, the perspective projection is applied to produce value in
normalized device coordinates (NDC) which means that x, y and z are divided by w to produce values in the -1 to 1 range.
I don't know if that answers you question.
Here is a another thread with
conversations about perspective projection.
ExTray2020
Also I don't understand why is pespective divide transforms from clip space into NDC space by only dividing each component by w (typically w = 1)
I may be mistaken, but I believe that w shouldn't be 1 most of the time, unless you're doing orthographic projection.