Actually that algorithm is quite trivial and also quite slow.

With a Y of 32000, you'd need 32000 multiplications and modular reductions.

Using a simple binary exponentiation algorithm, you'd only need on the order of 15.
