1. it's the w value in clipspace, which is cameraspace transformed with the projection matrix.
2. no the perspective matrix is not an affine matrix at all. The last row is not the 0,0,0,1 but instead 0,0,1,0. This is what allows z to become w and effectively divide by z.
The most basic perspective-like matrix looks something like:
| 1,0,0,0
0,1,0,0
0,0,0,1
0,0,1,0
|
however this only does a very specific projection so a few scale and translate operations are added before and after the perspective transform.