(*sigh*) If we try to save the full, unmodded values of the S(i) so that we could backtrack as you suggest (or, in another attempted speedup, just perform the mods with different modulus), we will find that after just a small number (less than 1000) of iterations, the S(i) values become so large that the entire known universe is not large enough to store them. (Exercise: Calculate the number of digits in S(1000) if no mod is performed at each step. Compare that to the number of particles in the known universe!)
But when we perform the modulos to keep the S(i) sizes manageable, we lose information necessary to correctly backtrack (or to reuse the S(i) with a different modulus).
A classic time vs. space tradeoff.
