mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
Thread Tools
Old 2013-01-07, 02:08   #2025
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

15658 Posts
Default

Quote:
Originally Posted by kladner View Post
EDIT: [OT]PrimeNet still keeps sticking in 1 DC assignment (Worker #3) to 4 P-1s in P95. All the settings I can find, both online and in P95, are set to P-1. I have worked around this by moving all the assignments from the other 4 workers to Worker #3. This stops #3 from getting more assignments for now, and the other 4 then fill in with P-1s.[/OT]
That's funny, it did that to me too on a new machine I started up a couple of days ago. The P-1 worker on initial startup up was given 4 P-1 assignments, then after finishing one it was assigned a DC.
Chuck is offline   Reply With Quote
Old 2013-01-07, 02:12   #2026
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

Quote:
Originally Posted by Chuck View Post
That's funny, it did that to me too on a new machine I started up a couple of days ago. The P-1 worker on initial startup up was given 4 P-1 assignments, then after finishing one it was assigned a DC.
It's been an ongoing thing for me if I run more than two workers, though that's in another thread.

EDIT: Ongoing work on the GTX 460 with 64 and 32 bit exe's. First result on the 570 with 64 and 32 bit exe's:

Code:
64 bit exe

CUDA device info
  name                      GeForce GTX 460
  compute capability        2.1
  maximum threads per block 1024
  number of multiprocessors 7 (336 shader cores)
  clock rate                1660MHz

Automatic parameters
  threads per grid          917504

running a simple selftest...
Selftest statistics
  number of tests           92
  successfull tests         92

selftest PASSED!

got assignment: exp=64801397 bit_min=69 bit_max=73 (27.68 GHz-days)
Starting trial factoring M64801397 from 2^69 to 2^73 (27.68 GHz-days)
 k_min = 4554653426400
 k_max = 72874454895928
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 06 17:26 |    0   0.1% | 12.093   3h13m |    205.98    82485    n.a.%
Jan 06 17:26 |    3   0.2% | 12.130   3h13m |    205.35    82485    n.a.%
Jan 06 17:26 |    4   0.3% | 12.137   3h13m |    205.23    82485    n.a.%
--------------------------------
Jan 06 18:21 | 1312  28.5% | 12.095   2h18m |    205.94    82485    n.a.%
Jan 06 18:21 | 1315  28.6% | 12.129   2h18m |    205.36    82485    n.a.%
Jan 06 18:21 | 1320  28.8% | 12.140   2h18m |    205.18    82485    n.a.%
Jan 06 18:21 | 1323  28.9% | 12.127   2h18m |    205.40    82485    n.a.%
Jan 06 18:22 | 1327  29.0% | 12.140   2h17m |    205.18    82485    n.a.%
Jan 06 18:22 | 1332  29.1% | 12.128   2h17m |    205.38    82485    n.a.%
received signal "SIGINT"

Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 06 18:51 | 1339  29.3% | 12.079   2h16m |    206.21    82485    n.a.%
Jan 06 18:52 | 1344  29.4% | 12.128   2h17m |    205.38    82485    n.a.%
Jan 06 18:52 | 1348  29.5% | 12.121   2h16m |    205.50    82485    n.a.%
Jan 06 18:52 | 1360  29.6% | 12.135   2h16m |    205.26    82485    n.a.%
Jan 06 18:52 | 1363  29.7% | 12.149   2h16m |    205.03    82485    n.a.%
Jan 06 18:52 | 1368  29.8% | 12.142   2h16m |    205.14    82485    n.a.%
Jan 06 18:53 | 1372  29.9% | 12.145   2h16m |    205.09    82485    n.a.%
Jan 06 18:53 | 1375  30.0% | 12.146   2h16m |    205.08    82485    n.a.%
received signal "SIGINT"

32 bit exe
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 06 18:55 | 1384  30.2% | 12.122   2h15m |    205.48    82485    n.a.%
Jan 06 18:55 | 1392  30.3% | 12.123   2h15m |    205.47    82485    n.a.%
Jan 06 18:55 | 1395  30.4% | 12.123   2h14m |    205.47    82485    n.a.%
Jan 06 18:55 | 1399  30.5% | 12.119   2h14m |    205.53    82485    n.a.%
Jan 06 18:56 | 1404  30.6% | 12.123   2h14m |    205.47    82485    n.a.%
Jan 06 18:56 | 1407  30.7% | 12.112   2h14m |    205.65    82485    n.a.%
Jan 06 18:56 | 1419  30.8% | 12.122   2h14m |    205.48    82485    n.a.%
Jan 06 18:56 | 1420  30.9% | 12.122   2h13m |    205.48    82485    n.a.%
Jan 06 18:56 | 1423  31.0% | 12.121   2h13m |    205.50    82485    n.a.%
Jan 06 18:57 | 1428  31.1% | 12.121   2h13m |    205.50    82485    n.a.%
Jan 06 18:57 | 1432  31.3% | 12.119   2h13m |    205.53    82485    n.a.%


GTX 570
64 bit exe

Jan 06 18:21 | 3760  81.6% |  5.954  17m34s |    418.34    82485    n.a.%
Jan 06 18:21 | 3772  81.7% |  5.940  17m25s |    419.33    82485    n.a.%
Jan 06 18:21 | 3777  81.8% |  5.983  17m27s |    416.31    82485    n.a.%
Jan 06 18:21 | 3780  81.9% |  5.987  17m22s |    416.04    82485    n.a.%
Jan 06 18:21 | 3781  82.0% |  6.002  17m18s |    415.00    82485    n.a.%
Jan 06 18:21 | 3784  82.1% |  6.003  17m13s |    414.93    82485    n.a.%
Jan 06 18:21 | 3789  82.2% |  6.005  17m07s |    414.79    82485    n.a.%
Jan 06 18:21 | 3792  82.3% |  6.003  17m01s |    414.93    82485    n.a.%
Jan 06 18:21 | 3796  82.4% |  5.982  16m51s |    416.38    82485    n.a.%
Jan 06 18:22 | 3801  82.5% |  5.997  16m47s |    415.34    82485    n.a.%
Jan 06 18:22 | 3805  82.6% |  6.000  16m42s |    415.13    82485    n.a.%
Jan 06 18:22 | 3816  82.7% |  5.999  16m36s |    415.20    82485    n.a.%
Jan 06 18:22 | 3817  82.8% |  5.973  16m26s |    417.01    82485    n.a.%
Jan 06 18:22 | 3829  82.9% |  5.936  16m14s |    419.61    82485    n.a.%
received signal "SIGINT"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 06 18:51 | 3892  84.5% |  6.003  14m54s |    414.93    82485    n.a.%
Jan 06 18:51 | 3900  84.6% |  5.998  14m48s |    415.27    82485    n.a.%
Jan 06 18:51 | 3901  84.7% |  5.903  14m28s |    421.96    82485    n.a.%
Jan 06 18:51 | 3904  84.8% |  5.914  14m23s |    421.17    82485    n.a.%
Jan 06 18:51 | 3912  84.9% |  6.000  14m30s |    415.13    82485    n.a.%
Jan 06 18:51 | 3921  85.0% |  5.980  14m21s |    416.52    82485    n.a.%
Jan 06 18:52 | 3924  85.1% |  5.985  14m16s |    416.17    82485    n.a.%
Jan 06 18:52 | 3925  85.2% |  5.987  14m10s |    416.04    82485    n.a.%
Jan 06 18:52 | 3936  85.3% |  5.984  14m04s |    416.24    82485    n.a.%


32 bit exe
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 06 18:54 | 3945  85.5% |  5.968  13m50s |    417.36    82485    n.a.%
Jan 06 18:54 | 3949  85.6% |  5.980  13m45s |    416.52    82485    n.a.%
Jan 06 18:54 | 3957  85.7% |  5.920  13m31s |    420.74    82485    n.a.%
Jan 06 18:55 | 3960  85.8% |  5.910  13m24s |    421.46    82485    n.a.%
Jan 06 18:55 | 3961  85.9% |  5.973  13m26s |    417.01    82485    n.a.%
Jan 06 18:55 | 3964  86.0% |  5.972  13m20s |    417.08    82485    n.a.%
Jan 06 18:55 | 3969  86.1% |  5.972  13m14s |    417.08    82485    n.a.%
Jan 06 18:55 | 3976  86.3% |  5.972  13m08s |    417.08    82485    n.a.%
Jan 06 18:55 | 3981  86.4% |  5.971  13m02s |    417.15    82485    n.a.%
Jan 06 18:55 | 3984  86.5% |  5.973  12m56s |    417.01    82485    n.a.%
Jan 06 18:55 | 3997  86.6% |  5.973  12m51s |    417.01    82485    n.a.%
Jan 06 18:55 | 4005  86.7% |  5.966  12m44s |    417.50    82485    n.a.%
Jan 06 18:55 | 4009  86.8% |  5.972  12m38s |    417.08    82485    n.a.%
------------------
Jan 06 19:08 | 4596  99.6% |  5.968   0m24s |    417.36    82485    n.a.%
Jan 06 19:08 | 4597  99.7% |  5.959   0m18s |    417.99    82485    n.a.%
Jan 06 19:08 | 4600  99.8% |  5.953   0m12s |    418.41    82485    n.a.%
Jan 06 19:08 | 4605  99.9% |  5.973   0m06s |    417.01    82485    n.a.%
Jan 06 19:08 | 4617 100.0% |  5.972   0m00s |    417.08    82485    n.a.%
no factor for M64802879 from 2^69 to 2^73 [mfaktc 0.20 barrett76_mul32_gs]
tf(): time spent since restart:    0h 13m 59.077s
      estimated total time spent:  1h 35m 53.670s
EDIT2: Second 570 result. Damn! This is fast!
Code:
Jan 06 20:43 | 4569  99.1% |  5.912   0m53s |    421.51    82485    n.a.%
Jan 06 20:43 | 4577  99.2% |  5.914   0m47s |    421.36    82485    n.a.%
Jan 06 20:43 | 4580  99.3% |  5.913   0m41s |    421.43    82485    n.a.%
Jan 06 20:43 | 4584  99.4% |  5.915   0m35s |    421.29    82485    n.a.%
Jan 06 20:43 | 4589  99.5% |  5.914   0m30s |    421.36    82485    n.a.%
Jan 06 20:43 | 4592  99.6% |  5.914   0m24s |    421.36    82485    n.a.%
Jan 06 20:43 | 4604  99.7% |  5.914   0m18s |    421.36    82485    n.a.%
Jan 06 20:43 | 4605  99.8% |  5.915   0m12s |    421.29    82485    n.a.%
Jan 06 20:43 | 4613  99.9% |  5.914   0m06s |    421.36    82485    n.a.%
Jan 06 20:44 | 4617 100.0% |  5.914   0m00s |    421.36    82485    n.a.%
no factor for M64773187 from 2^69 to 2^73 [mfaktc 0.20 barrett76_mul32_gs]
tf(): total time spent:  1h 35m 24.141s

Last fiddled with by kladner on 2013-01-07 at 02:47
kladner is offline   Reply With Quote
Old 2013-01-07, 02:49   #2027
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

13B16 Posts
Default

I've always avoided tf in the past because of the high cpu load. Now it looks like I'll be sharing gpu time between tf and dc. Thank you to everyone involved in making this new release. The code look pretty too. Maybe I can learn something from it.
owftheevil is offline   Reply With Quote
Old 2013-01-07, 04:12   #2028
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

26×151 Posts
Default

@Oliver: Sir, you made my day! Outstanding. For you, for George, and all the people involved in developing and testing, here you go!
LaurV is offline   Reply With Quote
Old 2013-01-07, 06:07   #2029
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

74 Posts
Default

I've had the chance to complete some assignments using the version 0.20, and I can say it's about three times as fast as 0.19 on the GTX 555. Granted, I didn't let mfaktc reach its full potential because I run Prime95 on all of my cores, but it's nonetheless a remarkable improvement.

The new version does cause my computer to lag a little, but that's something I can stand.

Last fiddled with by ixfd64 on 2013-01-07 at 06:08
ixfd64 is online now   Reply With Quote
Old 2013-01-07, 14:29   #2030
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

23×149 Posts
Default

Quote:
Originally Posted by kladner View Post
Sieve is running 82,485 on both GPUs, with SievePrimesAdjust=1
ini settings "SievePrimes" and "SievePrimesAdjust" only apply to CPU sieving. The GPU sieving settings are controlled by "SieveOnGPU", "GPUSievePrimes", "GPUSieveSize", "GPUSieveProcessSize". There is no mechanism for auto-adjusting GPUSievePrimes -- for the balance between CPU sieving and GPU crunching you can check whether GPU is waiting for CPU or vice-versa, but with GPU sieving the GPU is waiting for the GPU, so there's never any idle time, it's just a balance of how much effort is spent in the sieving portion. In my brief tests, (at least small changes to) the value of SievePrimes doesn't make much difference in overall throughput, so I'm content to leave it at the default 82485.
James Heinrich is offline   Reply With Quote
Old 2013-01-07, 15:01   #2031
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

1101011000112 Posts
Default

Now that everyone has access to v0.20, I'd like to ask for a new round of benchmarks from everyone so I can update my GPU-TF benchmark page.

Please submit the results using the form on the benchmark page:
http://www.mersenne.ca/mfaktc.php#benchmark

Please keep these requests in mind:
  • mfaktc v0.20
  • 32-bit mfaktc is preferred, please mention 32/64 when submitting
  • GPU sieving enabled, GPUSievePrimes=82485 (default)
  • assignment something around 60-70M, to 273 (whatever you're working on currently is probably fine, as long as it takes at least 30 minutes per assignment, preferably an hour or longer).

Last fiddled with by James Heinrich on 2013-01-07 at 15:02
James Heinrich is offline   Reply With Quote
Old 2013-01-07, 15:54   #2032
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

100010101112 Posts
Default

Hi James,

why not using a fixed exponent for the benchmark. The GHzd rating (at least the formulas I've seen so far) do not care about the exponent much. Those formulas take care about the number of FCs for the exponent but there are other (minor) effects, too.
Take a look here: http://mersenne.org/various/math.php
  • if the exponent has many 1 (in binary representation) than there are alot of additional "multiply by 2". OK, they are relative cheap but it is measureable.
  • bigger exponents need more iterations than smaller exponents. Again for current exponents the effect is not that big... but it is there.

Oliver

P.S. My personal benchmark exponent is 66362159 ;)
TheJudger is offline   Reply With Quote
Old 2013-01-07, 15:58   #2033
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
ini settings "SievePrimes" and "SievePrimesAdjust" only apply to CPU sieving. The GPU sieving settings are controlled by "SieveOnGPU", "GPUSievePrimes", "GPUSieveSize", "GPUSieveProcessSize". There is no mechanism for auto-adjusting GPUSievePrimes -- for the balance between CPU sieving and GPU crunching you can check whether GPU is waiting for CPU or vice-versa, but with GPU sieving the GPU is waiting for the GPU, so there's never any idle time, it's just a balance of how much effort is spent in the sieving portion. In my brief tests, (at least small changes to) the value of SievePrimes doesn't make much difference in overall throughput, so I'm content to leave it at the default 82485.
Oh. Of course. I read through mfaktc.ini, but the new settings obviously did not stick with me. Thanks.
kladner is offline   Reply With Quote
Old 2013-01-07, 16:33   #2034
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

26×151 Posts
Default

Time to make Uncwilly happy...
LaurV is offline   Reply With Quote
Old 2013-01-07, 16:36   #2035
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

1000011110002 Posts
Default

Quote:
Originally Posted by LaurV View Post
Time to make Uncwilly happy...
kracker is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 23:29.


Fri Aug 6 23:29:56 UTC 2021 up 14 days, 17:58, 1 user, load averages: 3.53, 3.78, 3.92

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.