mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2014-07-22, 19:38   #1
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

352110 Posts
Default phi-ecm

Lately I've been playing around with a Xeon Phi. I've managed to dust off an old copy of YAFU's ECM and vectorize it for the Phi. On a C150 it runs 3840 curves in parallel with b1=1e7 in 8 minutes and 37 seconds, or 134 milliseconds per curve.

In comparison GMP-ECM runs one curve on this input (b1 only) in about 25 seconds, so phi-ecm is about 185 times faster than a single thread of a Xeon E5-4650 (Ivy Bridge EP). This single Phi card appears to be equivalent to almost 3 entire $3600 server chips.

I know there is a CUDA-ECM floating around out there but I haven't run it... anyone know how this compares to that?
bsquared is offline   Reply With Quote
Old 2014-07-22, 20:47   #2
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

110111101012 Posts
Default

What C150? I can try it out on the CUDA-ECM version I have. I suspect yours is a good bit faster though, especially if it's running both stages.
wombatman is offline   Reply With Quote
Old 2014-07-22, 20:50   #3
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

DC116 Posts
Default

The cofactor of: 9^229-7^229
Code:
377572270651617779506493206108874260118436109446199768359730032040155799061028584986245368745094941584241610763448528815511885067505602648309255344301
Oh, and no, it's just running stage 1.

Last fiddled with by bsquared on 2014-07-22 at 20:50
bsquared is offline   Reply With Quote
Old 2014-07-22, 21:25   #4
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

13×137 Posts
Default

Ah, ok. I just grabbed a 150-digit composite from factordb, and it took 34 minutes to complete 480 curves of Stage 1 at B1=1e7 (~4 secs per curve). So, yeah, yours is crazy fast.
wombatman is offline   Reply With Quote
Old 2014-07-23, 01:31   #5
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2×2,437 Posts
Default

Quote:
Originally Posted by wombatman View Post
Ah, ok. I just grabbed a 150-digit composite from factordb, and it took 34 minutes to complete 480 curves of Stage 1 at B1=1e7 (~4 secs per curve). So, yeah, yours is crazy fast.
Which video card?

You've been playing with altering GPU-ECM for various size numbers; which size did you test for this? A C150 could fit into a 512-bit version of GPU-ECM, which might double (?) speed.

A test of, say, a 200-digit number would produce a comparison more meaningful to me. Or, to show the best GPU-ECM has, a 300-digit candidate comparison.

Even if the C150 and C300 take the same time on GPU-ECM (as they would using the binary floating around) but take twice as long on a Phi, that's still a ten-fold increase in speed!

Last fiddled with by VBCurtis on 2014-07-23 at 01:31
VBCurtis is online now   Reply With Quote
Old 2014-07-23, 01:48   #6
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

13·137 Posts
Default

GTX 570. I haven't checked how the size of the GPU-ECM program affects the speed (that is, 512-bit vs 1024 bit vs 2048 vs 4096 and so on), but I don't know that it would. That said, there is some difference in speed as the size of the number increases. Again, though, I don't have any hard data for it. Maybe I could do some short test (with B1=1e6 or something) and see how it scales.


EDIT: Looks like I was completely wrong about the speed vs. composite size bit. I just ran a few numbers of increasing size on a 4096-bit and 8192-bit enabled CUDA-ECM. Using B1=1e6, they all take ~9-11 seconds of CPU time and 203-205 seconds of GPU time. For 4096-bit, I tested a C150 and C200. For 8192-bit, I tested C150, C200, C250, C400, C1234, and C2465. I don't have a 512-bit handy at the moment, but I'll build one and see if there's any difference.

Double-secret ninja edit: The 512-bit GPU-ECM performs exactly the same (10s CPU, 203-205s GPU), so there's no downside to using the highest bit version.

Last fiddled with by wombatman on 2014-07-23 at 02:47
wombatman is offline   Reply With Quote
Old 2014-07-23, 03:30   #7
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

7×503 Posts
Default

In my case the size of the input won't matter, as long as it fits within the max allowed bit size (a compile time option as of now). The above data was with 576 bits max. I re-ran the benchmark with a 1024 bit max and got 1506 seconds for 3840 curves at B1=1e7 or 395 milliseconds/curve. The time will be the same for any input up to 1024 bits.

10x faster than a GTX570 is better than I hoped!

On a CPU, GMP-ECM scales with increasing sized inputs. At 1000 bits one curve at 1e7 takes 78 seconds, so at that size the Phi is almost 200x faster than a cpu-thread!

I'll continue to tinker with it but initial evidence says I'm bandwidth limited, so I don't know if there are any further speed gains to be had.
bsquared is offline   Reply With Quote
Old 2014-07-23, 04:15   #8
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

13×137 Posts
Default

Very impressive work!
wombatman is offline   Reply With Quote
Old 2014-07-23, 08:17   #9
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

Impressive indeed
debrouxl is offline   Reply With Quote
Old 2014-07-23, 09:21   #10
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

1078610 Posts
Default

Data for a GTX 460
Code:
pcl@anubis ~ $ ecm -gpu -v 1000000 0
GMP-ECM 7.0-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Running on anubis
377572270651617779506493206108874260118436109446199768359730032040155799061028584986245368745094941584241610763448528815511885067505602648309255344301
Input number is 377572270651617779506493206108874260118436109446199768359730032040155799061028584986245368745094941584241610763448528815511885067505602648309255344301 (150 digits)
Using MODMULN [mulredc:0, sqrredc:2]
Computing batch product (of 1442099 bits) of primes below B1=1000000 took 20ms
GPU: compiled for a NVIDIA GPU with compute capability 2.0.
GPU: will use device 0: GeForce GTX 460, compute capability 2.1, 7 MPs.
GPU: Selection and initialization of the device took 10ms
Using B1=1000000, B2=0, sigma=3:3259007427-3:3259007650 (224 curves)
dF=0, k=0, d=140209989058952, d2=0, i0=0
Expected number of curves to find a factor of n digits:
35	40	45	50	55	60	65	70	75	80
17880	221980	3168483	5.1e+07	9.4e+08	6.6e+09	Inf	Inf	Inf	Inf
Computing 224 Step 1 took 7950ms of CPU time / 233467ms of GPU time
Throughput: 0.959 curves by second (on average 1042.26ms by Step 1)
Expected time to find a factor of n digits:
35	40	45	50	55	60	65	70	75	80
5.18h	2.68d	38.21d	1.70y	31.18y	218.13y	Inf	Inf	Inf	Inf
xilman is online now   Reply With Quote
Old 2014-07-23, 13:16   #11
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

7×503 Posts
Default

Thank you Paul.

So the next question is... does anyone have access to a Phi card besides me? I should note that the card is my employer's - cost likely prevents most individuals from owning one.
bsquared is offline   Reply With Quote
Reply



All times are UTC. The time now is 15:01.


Fri Aug 6 15:01:38 UTC 2021 up 14 days, 9:30, 1 user, load averages: 3.01, 2.86, 2.83

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.