mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

chalsall 2020-09-16 13:00

[QUOTE=kruoli;557116]That's weird. Why does it have an .exe file extension? That's usually not the case in Linux.[/QUOTE]

For as long as I've been using mfaktc, the Linux version has always been distributed thusly.

[QUOTE=kruoli;557116]And while you can omit the .exe extension in CMD, that's not valid for BASH etc. So if you have your mfaktc named with extension, you'll have to write [c]./mfaktc.exe[/c].[/QUOTE]

Yup. Exactly the way we like it! True Geeks don't like the command line guessing what it thinks we want to do... :smile:

mnd9 2020-09-19 19:52

[QUOTE=TheJudger;551813]Hi,

seems like mfaktc runs fine with CUDA 11 on Ampere (no specific changes for Ampere except Makefile). :smile:

[CODE]mfaktc v0.22-pre8 (64bit built)
[...]
CUDA version info
binary compiled for CUDA 11.0
CUDA runtime version 11.0
CUDA driver version 11.0

CUDA device info
name [COLOR="Red"][B]A100-SXM4-40GB[/B][/COLOR]
compute capability 8.0
max threads per block 1024
max shared memory per MP 167936 byte
number of multiprocessors 108
clock rate (CUDA cores) 1410MHz
memory clock rate: 1215MHz
memory bus width: 5120 bit
[...]
Starting trial factoring M66362159 from 2^74 to 2^75 (57.65 GHz-days)
k_min = 142321062303420
k_max = 284642124610180
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 19 21:19 | 0 0.1% | 0.829 13m15s | 6259.18 82485 n.a.%
Jul 19 21:19 | 4 0.2% | 0.779 12m26s | 6660.92 82485 n.a.%
Jul 19 21:19 | 9 0.3% | 0.780 12m26s | 6652.38 82485 n.a.%
[...]
Jul 19 21:31 | 4617 100.0% | 0.780 0m00s | 6652.38 82485 n.a.%
no factor for [COLOR="red"][B]M66362159 from 2^74 to 2^75[/B][/COLOR] [mfaktc 0.22-pre8 barrett76_mul32_gs CUDA 11.0 arch 8.0] 51D74917
tf(): total time spent: [COLOR="red"][B]12m 32.323s[/B][/COLOR]
[/CODE]

New absolute performance champion and I guess best performance per watt, too! :smile:


Older benchmark data for Turing (RTX 2080 Ti): [URL="https://mersenneforum.org/showpost.php?p=497430&postcount=2912"]https://mersenneforum.org/showpost.php?p=497430&postcount=2912[/URL]

Oliver[/QUOTE]

Sorry I looked all over but is 0.22 available anywhere to download? Or any prebuilt version compiled in Win64 with CUDA 11?

Thanks!

storm5510 2020-09-19 23:57

[QUOTE=TheJudger;551813]Hi,

seems like mfaktc runs fine with CUDA 11 on Ampere (no specific changes for Ampere except Makefile). :smile:

[CODE]mfaktc v0.22-pre8 (64bit built)
[...]
CUDA version info
binary compiled for CUDA 11.0
CUDA runtime version 11.0
CUDA driver version 11.0

CUDA device info
name [COLOR=Red][B]A100-SXM4-40GB[/B][/COLOR]
compute capability 8.0
max threads per block 1024
max shared memory per MP 167936 byte
number of multiprocessors 108
clock rate (CUDA cores) 1410MHz
memory clock rate: 1215MHz
memory bus width: 5120 bit
[...]
Starting trial factoring M66362159 from 2^74 to 2^75 (57.65 GHz-days)
k_min = 142321062303420
k_max = 284642124610180
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 19 21:19 | 0 0.1% | 0.829 13m15s | 6259.18 82485 n.a.%
Jul 19 21:19 | 4 0.2% | 0.779 12m26s | 6660.92 82485 n.a.%
Jul 19 21:19 | 9 0.3% | 0.780 12m26s | 6652.38 82485 n.a.%
[...]
Jul 19 21:31 | 4617 100.0% | 0.780 0m00s | 6652.38 82485 n.a.%
no factor for [COLOR=red][B]M66362159 from 2^74 to 2^75[/B][/COLOR] [mfaktc 0.22-pre8 barrett76_mul32_gs CUDA 11.0 arch 8.0] 51D74917
tf(): total time spent: [COLOR=red][B]12m 32.323s[/B][/COLOR]
[/CODE]New absolute performance champion and I guess best performance per watt, too! :smile:


Older benchmark data for Turing (RTX 2080 Ti): [URL]https://mersenneforum.org/showpost.php?p=497430&postcount=2912[/URL]

Oliver[/QUOTE]


With something like this, a person could do a lot of LL-DC work using [I]gpuOwl[/I]. There are many 10's of 1000's needing to be done. IMHO, using this for TF is a [U]waste[/U]. :ermm:

kracker 2020-09-20 00:12

[QUOTE=storm5510;557377]With something like this, a person could do a lot of LL-DC work using [I]gpuOwl[/I]. There are many 10's of 1000's needing to be done. IMHO, using this for TF is a [U]waste[/U]. :ermm:[/QUOTE]

Care to elaborate? I couldn't find any benchmarks from gpuowl for this card.

James Heinrich 2020-09-20 00:17

[QUOTE=kracker;557379]Care to elaborate? I couldn't find any benchmarks from gpuowl for this card.[/QUOTE]Neither can I. I expect we'll see some GTX 3080 data for gpuowl sometime in the somewhat-near future, but few people have access to an A100. I expect mfaktc would run fairly similar between the two, but gpuowl performance may differ significantly. If Oliver still has access to that A100 a quick benchmark of gpuowl (and possibly cudalucas) would be nice, as always.

But still, I don't think there's anything wrong with the developer of mfaktc spending 12 minutes testing that his program works on a new generation of hardware. :smile:

kracker 2020-09-20 01:00

[QUOTE=James Heinrich;557380]Neither can I. I expect we'll see some GTX 3080 data for gpuowl sometime in the somewhat-near future, but few people have access to an A100. I expect mfaktc would run fairly similar between the two, but gpuowl performance may differ significantly. If Oliver still has access to that A100 a quick benchmark of gpuowl (and possibly cudalucas) would be nice, as always.

But still, I don't think there's anything wrong with the developer of mfaktc spending 12 minutes testing that his program works on a new generation of hardware. :smile:[/QUOTE]

My guess is that (atleast for the consumer level) Ampere cards will perform quite poorly for LL/PRP and the like... according to techpowerup's GPU database, the [URL="https://www.techpowerup.com/gpu-specs/geforce-rtx-3080.c3621"]3080 [/URL]has a [I][U]1:64[/U][/I] for DP...(compare with 1;32 for RTX 2080 Ti)

Mark Rose 2020-09-20 15:42

[QUOTE=kracker;557384]My guess is that (atleast for the consumer level) Ampere cards will perform quite poorly for LL/PRP and the like... according to techpowerup's GPU database, the [URL="https://www.techpowerup.com/gpu-specs/geforce-rtx-3080.c3621"]3080 [/URL]has a [I][U]1:64[/U][/I] for DP...(compare with 1;32 for RTX 2080 Ti)[/QUOTE]

That's more a case of the formerly INT32-only cores also now supporting FP32. Both Ampere and Turing have 2 FP64 cores per Streaming Multiprocessor (SM) block. The RTX 2080 Ti has 68 SMs and the RTX 3080 also has 68 SMs, so clock speed being equal they should perform similarly.

storm5510 2020-09-20 22:31

[QUOTE=James Heinrich;557380]Neither can I. I expect we'll see some GTX 3080 data for gpuowl sometime in the somewhat-near future, but few people have access to an A100. I expect mfaktc would run fairly similar between the two, but gpuowl performance may differ significantly. If Oliver still has access to that A100 a quick benchmark of gpuowl (and possibly cudalucas) would be nice, as always.

But still, I don't think there's anything wrong with the developer of mfaktc spending 12 minutes testing that his program works on a new generation of hardware. :smile:[/QUOTE]

That was pure speculation, and I was referring to the A100 he used for his test. A web site I looked at says [URL="https://www.pcmag.com/news/nvidia-signals-rtx-3080-founders-edition-will-be-back-in-stock-next-week"]RTX 3080's[/URL] will be back in stock before the end of this week. Not coming from the horse's mouth, I don't know how reliable it is.

Until now, I never knew who the author of [I]mfaktc[/I] was. His TF GHz-d/day figure is 6x what I can do, for now. Something like this doesn't always translate into other work types. Even so, it still should be pretty good.

storm5510 2020-09-21 12:30

1 Attachment(s)
[B]A100[/B], photo attached. I've never seen anything like these before. Nvidia calls them "Data Center" GPU's. TDP, 400W on the left and 250W on the right. Most of the specs are the same for both.

tServo 2020-09-21 17:28

[QUOTE=storm5510;557465][B]A100[/B], photo attached. I've never seen anything like these before. Nvidia calls them "Data Center" GPU's. TDP, 400W on the left and 250W on the right. Most of the specs are the same for both.[/QUOTE]

They have been around since Pascal. They are SXM modules ( Pascal ) and SXM2 and SXM4 for Volta and Ampere respectively. For Data Center Machines, they are mounted on a carrier that can hold 4 or 8 of these and are connected via NVlink rather than PCIE. They will, of course, have a passive heat sink attached to their tops. The carrier boards are then attached to a "pizza box" server, usually just above it and holding 2 Xeon cpus. Then multiple serevrs are put in a rack etc. This is how supercomputers are made these days.
When Jensen Huang announced Ampere in May there was a ridiculous video Nvidia made of his pulling a populated carrier out of an ordinary oven exclaiming "Look what we have cooked up!"

The board on the right has one of these SXM4 modules within mounted on a board that has PCIE interface circuitry and the SXM4 will have a heatsink on its top that is different from the other ones I mentioned. These boards are also passively cooled and are for workstations. Since their cooling is less effective that the bare SXM4 modules, they are de-tuned to keep from overheating. Hence, they take 150 watts less that the datacenter SXM4 modules. They are usually referred to as Tesla boards.

Neutron3529 2020-09-26 16:04

[QUOTE=storm5510;557377]With something like this, a person could do a lot of LL-DC work using [I]gpuOwl[/I]. There are many 10's of 1000's needing to be done. IMHO, using this for TF is a [U]waste[/U]. :ermm:[/QUOTE]
using 3080 is enough for TF
I borrowed a 3080 card and test its performance..
it could brings 2/3 TF performance of A100 but using only 1/10 price.

[CODE]
Starting trial factoring M210230299 from 2^72 to 2^73 (4.55 GHz-days)
k_min = 11231412658620
k_max = 22462825317437
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Sep 26 23:57 | 0 0.1% | 0.088 n.a. | 4653.23 82485 n.a.%
Sep 26 23:57 | 5 0.2% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 9 0.3% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 12 0.4% | 0.090 n.a. | 4549.82 82485 n.a.%
Sep 26 23:57 | 17 0.5% | 0.088 n.a. | 4653.23 82485 n.a.%
Sep 26 23:57 | 20 0.6% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 21 0.7% | 0.090 n.a. | 4549.82 82485 n.a.%
Sep 26 23:57 | 24 0.8% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 29 0.9% | 0.091 n.a. | 4499.83 82485 n.a.%
Sep 26 23:57 | 36 1.0% | 0.092 n.a. | 4450.91 82485 n.a.%
Sep 26 23:57 | 41 1.1% | 0.092 n.a. | 4450.91 82485 n.a.%
Sep 26 23:57 | 44 1.2% | 0.091 n.a. | 4499.83 82485 n.a.%
Sep 26 23:57 | 45 1.4% | 0.092 n.a. | 4450.91 82485 n.a.%
Sep 26 23:57 | 56 1.5% | 0.092 n.a. | 4450.91 82485 n.a.%
Sep 26 23:57 | 57 1.6% | 0.092 n.a. | 4450.91 82485 n.a.%
Sep 26 23:57 | 65 1.7% | 0.092 n.a. | 4450.91 82485 n.a.%
Sep 26 23:57 | 69 1.8% | 0.092 n.a. | 4450.91 82485 n.a.%
Sep 26 23:57 | 72 1.9% | 0.091 n.a. | 4499.83 82485 n.a.%
Sep 26 23:57 | 77 2.0% | 0.092 n.a. | 4450.91 82485 n.a.%
Sep 26 23:57 | 80 2.1% | 0.092 n.a. | 4450.91 82485 n.a.%
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Sep 26 23:57 | 84 2.2% | 0.091 n.a. | 4499.83 82485 n.a.%
Sep 26 23:57 | 89 2.3% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 96 2.4% | 0.090 n.a. | 4549.82 82485 n.a.%
Sep 26 23:57 | 101 2.5% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 104 2.6% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 105 2.7% | 0.090 n.a. | 4549.82 82485 n.a.%
Sep 26 23:57 | 117 2.8% | 0.090 n.a. | 4549.82 82485 n.a.%
Sep 26 23:57 | 120 2.9% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 129 3.0% | 0.090 n.a. | 4549.82 82485 n.a.%
Sep 26 23:57 | 132 3.1% | 0.090 n.a. | 4549.82 82485 n.a.%
Sep 26 23:57 | 140 3.2% | 0.091 n.a. | 4499.83 82485 n.a.%
Sep 26 23:57 | 141 3.3% | 0.092 n.a. | 4450.91 82485 n.a.%
Sep 26 23:57 | 149 3.4% | 0.091 n.a. | 4499.83 82485 n.a.%
Sep 26 23:57 | 152 3.5% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 156 3.6% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 161 3.8% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 164 3.9% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 176 4.0% | 0.090 n.a. | 4549.82 82485 n.a.%
Sep 26 23:57 | 177 4.1% | 0.089 n.a. | 4600.94 82485 n.a.%
Sep 26 23:57 | 185 4.2% | 0.089 n.a. | 4600.94 82485 n.a.%[/CODE]


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.