Here is a far-from-easy explanation: http://physics.stackexchange.com/questions/14056/how-does-gravitational-lensing-account-for-einsteins-cross
My best quick attempt at simplifying it (I'm not certain this is correct, though) is something like this:
The lensing galaxy creates an asymmetric warp in space around it. If you imagine warping a flat disc, the simplest (and therefore most likely) thing you can do creates two "high points" and two "low points" (if you try to do one, i think that's not a warp, that's just a rotation -- no curvature and therefore no lensing). And then I think we get images at the extrema of the warps, places where the warped space is locally "perpendicular" to us (second derivative zero) -- therefore four images.
(Actually, the link makes a big deal that there should be five images, but the fifth is directly behind the lensing object and is therefore obscured.)
This explanation still seems to me to have some holes, and one day I will try to get a real GR specialist to tell me if I'm close.