So I went ahead and read the algorithm and reasoning for why Atkin's works, and then I ignored all of that and just multiplied everything by 2 and sieved in respect to 120 instead of 60, taking out all numbers that weren't +/-1 mod 8.

The result, surprisingly, appears to not only work, but eliminates the need for crossing off multiples of the squares in the final step. In other words, it appears to be a completely parallelizable algorithm that can start from any point in the set of integers instead of just starting with 1. Perhaps more interesting, this algorithm appears to be able to take a pre-sieved list with no problem, not just a contiguous section.

Granted, I only ran the algorithm on 1 through 240 by hand, but since the cycle length is 120, you'd think that means the results are good throughout. I'll do some more extensive testing and see if I find discrepancies. It'd be really neat if this turned out to be a fast way to search for primes = +/- 1 mod 8. I think the need to solve for the solutions to the curves might impact larger numbers too much to be too feasible, though.

If anyone's interested, the revised algorithm:

1. List all integers k to be sieved, marked as composite (well, composite or prime and not = +/-1 mod 8)

2. For all k:

a. If k is congruent to {1, 17, 41, 49, 73, 89, 97, 113} mod 120, find the number of solutions d to 4x^2 + y^2 = k, where x,y > 0. If d is odd, mark k as prime = +/- 1 mod 8.

b. If k is congruent to {7, 31, 79, 103} mod 120, find the number of solutions d to 3x^2 + y^2 = k, where x,y > 0. If d is odd, mark k as prime = +/- 1 mod 8.

c. If k is congruent to {23 47 71 119} mod 120, find the number of solutions d to 3x^2 - y^2 = k, where x > y > 0. If d is odd, mark k as prime = +/- 1 mod 8.

d. If k is congruent to any other residue mod 120, leave k as marked as it is (in other words, composite or prime and not = +/- 1 mod 8)

3. You appear to be done! But the original algorithm says to square each element marked prime and then mark multiples of the square as composite. I didn't have to do this, but that doesn't mean it's not necessary for other test sets.

EDIT: Now that I think about it, this algorithm should certainly be extendable to cover 7, 11, 13, and so on, by multiplying by 120 and increasing the cycle range then sieving out the 'bad' elements in the congruency tests. It may not be necessary, but it might prevent needless curve solving, and it makes the problem set larger to be split up by cards with more cores.