Hmm, I use

Yamato's binary for ix CPUs
Even though I have Q6600, it is the fastest binary I encountered.

Code:

GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is 136706880825115259982539443842705091783468464291671430295675943386422659594169058181440482075770537702060920760622000611512112885689065092620523670866594066846607254062548684572378570133925988891095869638457519627884304857454648091906348689643793750450491860963055333872304254699094803530215188251039326790936647157153998296243151854164728032282229395280189647684579768987768465433627263279941944046472476777894013907675229629713104163671640972996979234803324182054701014485387427791401515071184209033940171710831051015257912132716600121119440672925246514490069720038091747256362721224766490216603 (597 digits)
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=2738948513
Step 1 took 14944ms
Step 2 took 5320ms
Run 2 out of 800:
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=980891661
Step 1 took 14961ms
Step 2 took 5288ms
Run 3 out of 800:
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=3061363078
Step 1 took 14914ms
Step 2 took 5304ms
Run 4 out of 800:
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=1692573487
Step 1 took 14898ms
Step 2 took 5319ms
Run 5 out of 800:
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=3209822111
Step 1 took 14930ms
Step 2 took 5335ms
Run 6 out of 800:
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=54315659
Step 1 took 14898ms
Step 2 took 5288ms
Run 7 out of 800:
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=2684259486
Step 1 took 14914ms
Step 2 took 5226ms
Run 8 out of 800:
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=374069699
Step 1 took 14852ms
Step 2 took 5272ms
Run 9 out of 800:
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=438075411
Step 1 took 14898ms
Step 2 took 5304ms
Run 10 out of 800:
Using B1=220000, B2=119750412, polynomial Dickson(3), sigma=1278366962