mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2020-09-16, 13:00   #3312
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

222378 Posts
Default

Quote:
Originally Posted by kruoli View Post
That's weird. Why does it have an .exe file extension? That's usually not the case in Linux.
For as long as I've been using mfaktc, the Linux version has always been distributed thusly.

Quote:
Originally Posted by kruoli View Post
And while you can omit the .exe extension in CMD, that's not valid for BASH etc. So if you have your mfaktc named with extension, you'll have to write ./mfaktc.exe.
Yup. Exactly the way we like it! True Geeks don't like the command line guessing what it thinks we want to do...
chalsall is offline   Reply With Quote
Old 2020-09-19, 19:52   #3313
mnd9
 
Jun 2019
Boston, MA

3910 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi,

seems like mfaktc runs fine with CUDA 11 on Ampere (no specific changes for Ampere except Makefile).

Code:
mfaktc v0.22-pre8 (64bit built)
[...]
CUDA version info
  binary compiled for CUDA  11.0
  CUDA runtime version      11.0
  CUDA driver version       11.0

CUDA device info
  name                      A100-SXM4-40GB
  compute capability        8.0
  max threads per block     1024
  max shared memory per MP  167936 byte
  number of multiprocessors 108
  clock rate (CUDA cores)   1410MHz
  memory clock rate:        1215MHz
  memory bus width:         5120 bit
[...]
Starting trial factoring M66362159 from 2^74 to 2^75 (57.65 GHz-days)
 k_min =  142321062303420
 k_max =  284642124610180
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jul 19 21:19 |    0   0.1% |  0.829  13m15s |   6259.18    82485    n.a.%
Jul 19 21:19 |    4   0.2% |  0.779  12m26s |   6660.92    82485    n.a.%
Jul 19 21:19 |    9   0.3% |  0.780  12m26s |   6652.38    82485    n.a.%
[...]
Jul 19 21:31 | 4617 100.0% |  0.780   0m00s |   6652.38    82485    n.a.%
no factor for M66362159 from 2^74 to 2^75 [mfaktc 0.22-pre8 barrett76_mul32_gs CUDA 11.0 arch 8.0] 51D74917
tf(): total time spent: 12m 32.323s
New absolute performance champion and I guess best performance per watt, too!


Older benchmark data for Turing (RTX 2080 Ti): https://mersenneforum.org/showpost.p...postcount=2912

Oliver
Sorry I looked all over but is 0.22 available anywhere to download? Or any prebuilt version compiled in Win64 with CUDA 11?

Thanks!
mnd9 is offline   Reply With Quote
Old 2020-09-19, 23:57   #3314
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

110101000002 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi,

seems like mfaktc runs fine with CUDA 11 on Ampere (no specific changes for Ampere except Makefile).

Code:
mfaktc v0.22-pre8 (64bit built)
[...]
CUDA version info
  binary compiled for CUDA  11.0
  CUDA runtime version      11.0
  CUDA driver version       11.0

CUDA device info
  name                      A100-SXM4-40GB
  compute capability        8.0
  max threads per block     1024
  max shared memory per MP  167936 byte
  number of multiprocessors 108
  clock rate (CUDA cores)   1410MHz
  memory clock rate:        1215MHz
  memory bus width:         5120 bit
[...]
Starting trial factoring M66362159 from 2^74 to 2^75 (57.65 GHz-days)
 k_min =  142321062303420
 k_max =  284642124610180
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jul 19 21:19 |    0   0.1% |  0.829  13m15s |   6259.18    82485    n.a.%
Jul 19 21:19 |    4   0.2% |  0.779  12m26s |   6660.92    82485    n.a.%
Jul 19 21:19 |    9   0.3% |  0.780  12m26s |   6652.38    82485    n.a.%
[...]
Jul 19 21:31 | 4617 100.0% |  0.780   0m00s |   6652.38    82485    n.a.%
no factor for M66362159 from 2^74 to 2^75 [mfaktc 0.22-pre8 barrett76_mul32_gs CUDA 11.0 arch 8.0] 51D74917
tf(): total time spent: 12m 32.323s
New absolute performance champion and I guess best performance per watt, too!


Older benchmark data for Turing (RTX 2080 Ti): https://mersenneforum.org/showpost.p...postcount=2912

Oliver

With something like this, a person could do a lot of LL-DC work using gpuOwl. There are many 10's of 1000's needing to be done. IMHO, using this for TF is a waste.
storm5510 is offline   Reply With Quote
Old 2020-09-20, 00:12   #3315
kracker
ἀβουλία
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

32·241 Posts
Default

Quote:
Originally Posted by storm5510 View Post
With something like this, a person could do a lot of LL-DC work using gpuOwl. There are many 10's of 1000's needing to be done. IMHO, using this for TF is a waste.
Care to elaborate? I couldn't find any benchmarks from gpuowl for this card.
kracker is offline   Reply With Quote
Old 2020-09-20, 00:17   #3316
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

3·11·97 Posts
Default

Quote:
Originally Posted by kracker View Post
Care to elaborate? I couldn't find any benchmarks from gpuowl for this card.
Neither can I. I expect we'll see some GTX 3080 data for gpuowl sometime in the somewhat-near future, but few people have access to an A100. I expect mfaktc would run fairly similar between the two, but gpuowl performance may differ significantly. If Oliver still has access to that A100 a quick benchmark of gpuowl (and possibly cudalucas) would be nice, as always.

But still, I don't think there's anything wrong with the developer of mfaktc spending 12 minutes testing that his program works on a new generation of hardware.
James Heinrich is offline   Reply With Quote
Old 2020-09-20, 01:00   #3317
kracker
ἀβουλία
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

1000011110012 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Neither can I. I expect we'll see some GTX 3080 data for gpuowl sometime in the somewhat-near future, but few people have access to an A100. I expect mfaktc would run fairly similar between the two, but gpuowl performance may differ significantly. If Oliver still has access to that A100 a quick benchmark of gpuowl (and possibly cudalucas) would be nice, as always.

But still, I don't think there's anything wrong with the developer of mfaktc spending 12 minutes testing that his program works on a new generation of hardware.
My guess is that (atleast for the consumer level) Ampere cards will perform quite poorly for LL/PRP and the like... according to techpowerup's GPU database, the 3080 has a 1:64 for DP...(compare with 1;32 for RTX 2080 Ti)
kracker is offline   Reply With Quote
Old 2020-09-20, 15:42   #3318
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2,917 Posts
Default

Quote:
Originally Posted by kracker View Post
My guess is that (atleast for the consumer level) Ampere cards will perform quite poorly for LL/PRP and the like... according to techpowerup's GPU database, the 3080 has a 1:64 for DP...(compare with 1;32 for RTX 2080 Ti)
That's more a case of the formerly INT32-only cores also now supporting FP32. Both Ampere and Turing have 2 FP64 cores per Streaming Multiprocessor (SM) block. The RTX 2080 Ti has 68 SMs and the RTX 3080 also has 68 SMs, so clock speed being equal they should perform similarly.
Mark Rose is offline   Reply With Quote
Old 2020-09-20, 22:31   #3319
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

25·53 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Neither can I. I expect we'll see some GTX 3080 data for gpuowl sometime in the somewhat-near future, but few people have access to an A100. I expect mfaktc would run fairly similar between the two, but gpuowl performance may differ significantly. If Oliver still has access to that A100 a quick benchmark of gpuowl (and possibly cudalucas) would be nice, as always.

But still, I don't think there's anything wrong with the developer of mfaktc spending 12 minutes testing that his program works on a new generation of hardware.
That was pure speculation, and I was referring to the A100 he used for his test. A web site I looked at says RTX 3080's will be back in stock before the end of this week. Not coming from the horse's mouth, I don't know how reliable it is.

Until now, I never knew who the author of mfaktc was. His TF GHz-d/day figure is 6x what I can do, for now. Something like this doesn't always translate into other work types. Even so, it still should be pretty good.
storm5510 is offline   Reply With Quote
Old 2020-09-21, 12:30   #3320
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

25·53 Posts
Default

A100, photo attached. I've never seen anything like these before. Nvidia calls them "Data Center" GPU's. TDP, 400W on the left and 250W on the right. Most of the specs are the same for both.
Attached Thumbnails
Click image for larger version

Name:	A100-A.jpg
Views:	60
Size:	39.1 KB
ID:	23360  
storm5510 is offline   Reply With Quote
Old 2020-09-21, 17:28   #3321
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

10438 Posts
Default

Quote:
Originally Posted by storm5510 View Post
A100, photo attached. I've never seen anything like these before. Nvidia calls them "Data Center" GPU's. TDP, 400W on the left and 250W on the right. Most of the specs are the same for both.
They have been around since Pascal. They are SXM modules ( Pascal ) and SXM2 and SXM4 for Volta and Ampere respectively. For Data Center Machines, they are mounted on a carrier that can hold 4 or 8 of these and are connected via NVlink rather than PCIE. They will, of course, have a passive heat sink attached to their tops. The carrier boards are then attached to a "pizza box" server, usually just above it and holding 2 Xeon cpus. Then multiple serevrs are put in a rack etc. This is how supercomputers are made these days.
When Jensen Huang announced Ampere in May there was a ridiculous video Nvidia made of his pulling a populated carrier out of an ordinary oven exclaiming "Look what we have cooked up!"

The board on the right has one of these SXM4 modules within mounted on a board that has PCIE interface circuitry and the SXM4 will have a heatsink on its top that is different from the other ones I mentioned. These boards are also passively cooled and are for workstations. Since their cooling is less effective that the bare SXM4 modules, they are de-tuned to keep from overheating. Hence, they take 150 watts less that the datacenter SXM4 modules. They are usually referred to as Tesla boards.
tServo is offline   Reply With Quote
Old 2020-09-26, 16:04   #3322
Neutron3529
 
Neutron3529's Avatar
 
Dec 2018
China

23·5 Posts
Default

Quote:
Originally Posted by storm5510 View Post
With something like this, a person could do a lot of LL-DC work using gpuOwl. There are many 10's of 1000's needing to be done. IMHO, using this for TF is a waste.
using 3080 is enough for TF
I borrowed a 3080 card and test its performance..
it could brings 2/3 TF performance of A100 but using only 1/10 price.

Code:
Starting trial factoring M210230299 from 2^72 to 2^73 (4.55 GHz-days)
 k_min =  11231412658620
 k_max =  22462825317437
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Sep 26 23:57 |    0   0.1% |  0.088    n.a. |   4653.23    82485    n.a.%
Sep 26 23:57 |    5   0.2% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |    9   0.3% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |   12   0.4% |  0.090    n.a. |   4549.82    82485    n.a.%
Sep 26 23:57 |   17   0.5% |  0.088    n.a. |   4653.23    82485    n.a.%
Sep 26 23:57 |   20   0.6% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |   21   0.7% |  0.090    n.a. |   4549.82    82485    n.a.%
Sep 26 23:57 |   24   0.8% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |   29   0.9% |  0.091    n.a. |   4499.83    82485    n.a.%
Sep 26 23:57 |   36   1.0% |  0.092    n.a. |   4450.91    82485    n.a.%
Sep 26 23:57 |   41   1.1% |  0.092    n.a. |   4450.91    82485    n.a.%
Sep 26 23:57 |   44   1.2% |  0.091    n.a. |   4499.83    82485    n.a.%
Sep 26 23:57 |   45   1.4% |  0.092    n.a. |   4450.91    82485    n.a.%
Sep 26 23:57 |   56   1.5% |  0.092    n.a. |   4450.91    82485    n.a.%
Sep 26 23:57 |   57   1.6% |  0.092    n.a. |   4450.91    82485    n.a.%
Sep 26 23:57 |   65   1.7% |  0.092    n.a. |   4450.91    82485    n.a.%
Sep 26 23:57 |   69   1.8% |  0.092    n.a. |   4450.91    82485    n.a.%
Sep 26 23:57 |   72   1.9% |  0.091    n.a. |   4499.83    82485    n.a.%
Sep 26 23:57 |   77   2.0% |  0.092    n.a. |   4450.91    82485    n.a.%
Sep 26 23:57 |   80   2.1% |  0.092    n.a. |   4450.91    82485    n.a.%
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Sep 26 23:57 |   84   2.2% |  0.091    n.a. |   4499.83    82485    n.a.%
Sep 26 23:57 |   89   2.3% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |   96   2.4% |  0.090    n.a. |   4549.82    82485    n.a.%
Sep 26 23:57 |  101   2.5% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |  104   2.6% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |  105   2.7% |  0.090    n.a. |   4549.82    82485    n.a.%
Sep 26 23:57 |  117   2.8% |  0.090    n.a. |   4549.82    82485    n.a.%
Sep 26 23:57 |  120   2.9% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |  129   3.0% |  0.090    n.a. |   4549.82    82485    n.a.%
Sep 26 23:57 |  132   3.1% |  0.090    n.a. |   4549.82    82485    n.a.%
Sep 26 23:57 |  140   3.2% |  0.091    n.a. |   4499.83    82485    n.a.%
Sep 26 23:57 |  141   3.3% |  0.092    n.a. |   4450.91    82485    n.a.%
Sep 26 23:57 |  149   3.4% |  0.091    n.a. |   4499.83    82485    n.a.%
Sep 26 23:57 |  152   3.5% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |  156   3.6% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |  161   3.8% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |  164   3.9% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |  176   4.0% |  0.090    n.a. |   4549.82    82485    n.a.%
Sep 26 23:57 |  177   4.1% |  0.089    n.a. |   4600.94    82485    n.a.%
Sep 26 23:57 |  185   4.2% |  0.089    n.a. |   4600.94    82485    n.a.%
Neutron3529 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
The P-1 factoring CUDA program firejuggler GPU Computing 752 2020-09-08 16:15
"CUDA runtime version 0.0" when running mfaktc.exe froderik GPU Computing 4 2016-10-30 15:29
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51
World's dumbest CUDA program? xilman Programming 1 2009-11-16 10:26

All times are UTC. The time now is 10:53.

Fri Dec 4 10:53:27 UTC 2020 up 1 day, 7:04, 0 users, load averages: 1.15, 1.35, 1.49

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.