I expect there are two ways these are made: either by recording someone walking and cutting it into individual steps, or by fake-stepping or walking in place or by whacking things with shoes. Let's just worry about the first one, because it's better. It requires walking in such a way that the sounds don't overlap, there isn't intolerable noise (including clothing or equipment), and having the microphone(s) oriented such that the left and right steps aren't distinguishable by audio. And that's most of the work right there. Unless you're recording a practiced marcher practicing marching, the steps will naturally sound distinct, so all you need to do is split the recording into individual steps, maybe even with a tool that splits automatically around silence.
On top of this, very small pitch changes (±-1% or so) can help, if the engine in use supports on-the-fly pitch bending. Classical sample rate adjustment-style is fine at such small levels (You heard me, Soul Callibur 5).
So the real challenges are getting appropriate equipment, and finding a recording environment that is quiet enough with the desired terrain. Mid summer in Louisiana is a terrible setting for recording footsteps in grass or leaves. 4am in October in a cooler region might work better.
All of the above comes from extrapolating on experience. If pros do it a better way, especially if step 1 isn't "build a studio with perfect acoustics and somehow bring nature inside", I'd be glad to hear it.
Some of my games
Keep up to date by following @Jeqofire on twitter!Ear Ninja?