I did try implementing the algorithm. There possibly is some small bug as it is not producing all the factors. I can't find it right now. I do not have a debugger for GCC to help me.
For base 2
I initially sieved range nmin=10,000 to nmax=100,000 to remove all small p (under 1M). There were ~ 3,600 candidates left.
I used this sieved file to test the following:
The original code gives a speed of 33Kp/sec for p range 1G to 2G.
The new code gives a speed of 2.5Kp/sec for p range 1G to 2G.
Then I tested the code nmin=1,000,000 to nmax=2,000,000 from 1G to 2G (without removing any small terms).
The original code gives a speed of 128 p/sec for p range 1G to 2G.
The new code gives a speed of 804p/sec for p range 1G to 2G.
So 8 times faster but as the original code removes terms from the sieve the code would get faster and faster whereas the new algorithm would stay at almost the same speed.
For the new algorithm to be significantly faster than the old algorithm we would need to test a range of at least 100400M.
Since no one that I know of is planning on sieving a range that large I would hold off on further development of this code.
