![]() |
|
|
#2652 |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
CUDA 9.0, CUDA driver 384.98, CUDALucas 2.05.1 (SVN rev. 99)
Benchmark FFT sizes './CUDALucas -cufftbench 2048 32768 20' Code:
Device Tesla V100-PCIE-16GB Compatibility 7.0 clockRate (MHz) 1380 memClockRate (MHz) 877 fft max exp ms/iter 2048 38492887 0.3978 2187 41047411 0.5123 2304 43194913 0.5183 2401 44973503 0.5293 2500 46787207 0.5429 2592 48471289 0.5460 2744 51250889 0.5997 3136 58404433 0.6361 3200 59570449 0.6514 3456 64229677 0.7015 4096 75846319 0.7591 4375 80897867 0.9595 4608 85111207 0.9649 5184 95507747 1.0124 5488 100984691 1.1235 6272 115080019 1.2037 6400 117377567 1.2445 6561 120266023 1.3328 6912 126558077 1.3391 8000 146019329 1.5105 8192 149447533 1.5230 8575 156280961 1.8316 10368 188188471 1.9362 10976 198980129 2.1451 11907 215480183 2.3303 12544 226753511 2.3331 12800 231280639 2.3830 13824 249369863 2.5663 16384 294471259 2.9531 16807 301908293 3.3334 16875 303103441 3.5138 18225 326810201 3.7274 20736 370806323 3.7880 21952 392070229 4.2109 25088 446794913 4.5286 27783 493705637 5.5610 32000 566915989 5.8087 32768 580225813 5.8343 And timing 100M exponent './CUDALucas 332192879' Code:
Starting M332192879 fft length = 20736K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Dec 06 21:52:36 | M332192879 10000 0xa19043095e213f4c | 20736K 0.01758 3.8055 38.05s | 14:15:09:00 0.00% | | Dec 06 21:53:14 | M332192879 20000 0xcb7bc66ac81b24be | 20736K 0.01709 3.8051 38.05s | 14:15:07:16 0.00% | | Dec 06 21:53:52 | M332192879 30000 0x38e4cc517de8fda3 | 20736K 0.01758 3.8051 38.05s | 14:15:06:19 0.00% | Oliver |
|
|
|
|
|
#2653 |
|
Oct 2014
Bari, Italy
2716 Posts |
So ~351*145Wh is the amount of energy consumed.
|
|
|
|
|
|
#2654 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10100111100112 Posts |
To follow up on http://www.mersenneforum.org/showpos...postcount=2649, testing several combinations of applications (among CUDALucas, CUDAPm1, Mfaktc) run on several model GPUs, I have preliminary results per GPU and apps combination ranging from a few percent throughput reduction to over thirteen percent throughput increase. Throughput is computed as the sum for each simultaneously running instance on an individual GPU, of the rate of progress divided by the rate that was benchmarked to occur when that application was the only one running on that GPU. (This approach treats all run types, LL, P-1, trial factoring, as equally valuable; what's valued is a GPU-day of that model.) Estimated standard deviations so far are of order 0.2% to 0.5% for those I've evaluated, so the observed 1-13% gains evaluated are statistically significant. A spot check of a benchmark was repeatable quickly to 0.2%. Memory requirement is typically a small fraction of total GPU ram.
Last fiddled with by kriesel on 2017-12-07 at 17:20 |
|
|
|
|
|
#2655 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
CUDA 9.1, CUDA driver 387.34, CUDALucas 2.05.1 (SVN rev. 99)
Updated P100-16GiB Benchmark (older CUDA 8 Benchmarks is here and here). Benchmark FFT sizes './CUDALucas -cufftbench 2048 32768 20' Code:
Device Tesla P100-PCIE-16GB Compatibility 6.0 clockRate (MHz) 1328 memClockRate (MHz) 715 fft max exp ms/iter 2048 38492887 0.5972 2187 41047411 0.7118 2304 43194913 0.7301 2401 44973503 0.7656 2592 48471289 0.7971 2744 51250889 0.8863 3136 58404433 0.9482 3200 59570449 0.9733 3456 64229677 1.0467 3584 66556463 1.1321 4096 75846319 1.1423 4608 85111207 1.4124 5184 95507747 1.4988 5488 100984691 1.6450 6272 115080019 1.8127 6400 117377567 1.8730 6561 120266023 1.9556 6912 126558077 2.0301 7776 142017539 2.2474 8192 149447533 2.2688 8575 156280961 2.6593 9261 168504209 2.8483 10368 188188471 2.9439 10976 198980129 3.1604 12544 226753511 3.5621 12800 231280639 3.6567 13824 249369863 3.9843 15552 279831199 4.4018 16384 294471259 4.5018 16807 301908293 5.1300 16875 303103441 5.5609 18225 326810201 5.7337 20736 370806323 5.8287 21952 392070229 6.2511 25088 446794913 7.0258 27783 493705637 8.2696 31104 551379091 8.7884 32000 566915989 9.0541 32768 580225813 9.0641 Code:
Starting M332192879 fft length = 20736K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Dec 23 20:37:12 | M332192879 10000 0xa19043095e213f4c | 20736K 0.01758 5.8218 58.21s | 22:09:12:09 0.00% | | Dec 23 20:38:10 | M332192879 20000 0xcb7bc66ac81b24be | 20736K 0.01709 5.8218 58.21s | 22:09:11:04 0.00% | | Dec 23 20:39:08 | M332192879 30000 0x38e4cc517de8fda3 | 20736K 0.01855 5.8249 58.24s | 22:09:15:49 0.00% | |
|
|
|
|
|
#2656 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
12CF16 Posts |
Quote:
|
|
|
|
|
|
|
#2658 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
31·173 Posts |
Quote:
|
|
|
|
|
|
|
#2659 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
31·173 Posts |
The attachment is based on actual timed exponents on a 701 Mhz clocked GTX480. Times for a GTX1070 scale by about 70%. That is, what takes the 480 ten days takes the 1070 a week.
|
|
|
|
|
|
#2660 |
|
Romulan Interpreter
Jun 2011
Thailand
226138 Posts |
I had a fast read, some I didn't understand (need more time for me to read them deeper, I am in hurry now), but point 10 seems that it is actually not true. What you see is an effect of the save file storing the time when the test was started. The program computes the time like "how many iterations you did" over "how long time you worked on it", multiply with "how many iterations you still have" and that is a date in the future. You will experience the same effect if you interrupt your work for a while (days) and resume in the same computer. I remember a discussion in the past where we argued if the interruption time should be considered or not (i.e. averaged into the calculus) and it seems to me that it is better to be included. No matter if you take one picosecond per iteration, but if you spent 1 week to do half of the test (for whatever reasons, including interruptions), it would look normal for me that you will spend another week for the other half. In this way, your new computer doesn't know that the time per iteration is faster, but the ETA will "catch up" soon, as the iterations progress to higher numbers.
The other way, to display ETA as the "number of remaining iterations" multiplied with "iteration time", will give you an immediate result when you move it to a faster toy, but it will be very-VERY jumpy ETA, due to the fact that iteration time varies a lot with how busy your computer is. Some of us use the computers for other activities too. So it is not "reliable". Some kind of "averaging" with the past values (either SMA, or EMA) need to be done, to avoid the jumpy ETA, and you will still see "no effect" when you move it, unless the MA (moving average) main period passes. Of course, it would be nice to have an option in the ini file, for example, where to chose an averaging period, something like 255 should be the actual method, (just an example), something like 0 should be "no averaging" (jumpy). But I feel we request too much already. |
|
|
|
|
|
#2661 | ||
|
"William Garnett III"
Oct 2002
Bensalem, PA
2×43 Posts |
CUDALucas2.05.1-CUDA8.0-Windows-x64.exe
GeForce 1050 CUDALucas benchmarks below followed by Intel i3-4150 Prime95 benchmarks for comparison Quote:
Quote:
Last fiddled with by wfgarnett3 on 2018-01-28 at 13:07 |
||
|
|
|
|
|
#2662 | |
|
Mar 2018
Shenzhen, China
2·32 Posts |
Hello!
I'm not sure it's correct place to ask this, but I'm bumping into a problem while trying to compile the latest CUDALucas under Linux. The problem is: Quote:
I have CUDA Toolkit 9.1 installed. Any suggestions, please? |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
| CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
| CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
| settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
| Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |