The following is my interpretation of the original poster requirements.
Suppose that we want to run the algorithm in reverse, starting with S_{p2} = 0 (supposing that the number is prime) and we know how to perform the modular square root at the same speed as multiplication (of course, if you know such a method, please let me know).
So the original poster wants to start computing S_{0} = 4, S_{1}, S_{2}, etc. and at the same time S_{p2} = 0, S_{p3}, etc. up to S_{(p3)/2}. Then we check whether both numbers are equal and if they are, the Mersenne number in question is prime.
I think there is a big problem here. What of the two square root do we have to compute in the reverse computation? It appears that there are two possible values of S_{p3}, four values for S_{p4}, eight values for S_{p5}, etc. How do we know which is the correct one?
