mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-11-29, 03:56   #529
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

22×3×163 Posts
Default

Quote:
Originally Posted by kriesel View Post
...The GTX480 says 1332MB for the same exponents....
Are you overclocking your GTX 480?
storm5510 is offline   Reply With Quote
Old 2017-11-29, 20:42   #530
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124538 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Are you overclocking your GTX 480?
No. I have two, on the same machine. They came with different default clocks, 701 and 725. The 725 I downclock to 702. The 701 has been reliable; the 725/702 has repeatable memory errors in the middle of the address range, that at one time were reduced by downclocking but no longer are. So I use it only for trial factoring, which occupies memory not affected by the errors. I've become an advocate of testing as much gpu memory as possible, from what I've learned on that second GTX480.
kriesel is offline   Reply With Quote
Old 2017-11-30, 00:06   #531
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

22×3×163 Posts
Default

Quote:
Originally Posted by kriesel View Post
No. I have two, on the same machine. They came with different default clocks, 701 and 725. The 725 I downclock to 702. The 701 has been reliable; the 725/702 has repeatable memory errors in the middle of the address range, that at one time were reduced by downclocking but no longer are. So I use it only for trial factoring, which occupies memory not affected by the errors. I've become an advocate of testing as much gpu memory as possible, from what I've learned on that second GTX480.
Interesting! I tried it on mine once. The gain was insignificant. The one I have defaults to 700. If I run a GPU process that causes it to reset itself, then that number drops to 450. It takes a cold-boot to get back to 700.
storm5510 is offline   Reply With Quote
Old 2017-11-30, 05:40   #532
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Interesting! I tried it on mine once. The gain was insignificant. The one I have defaults to 700. If I run a GPU process that causes it to reset itself, then that number drops to 450. It takes a cold-boot to get back to 700.
One of the two goes AWOL at varying intervals. I found that to get reliable p-1 or LL tests, it required making the 702/memory error one device zero. If it was device one, and p-1 or LL were set to run on device zero, when the one goes AWOL, the bad-memory one drops to device zero and causes problems with a p-1 or LL run. When I say AWOL, it's physically there, but GPU-Z only finds the one device, and a running GPU-Z already set to track device one ceases displaying its sensor readings, Windows event log shows a driver restart, and restarted cudapm1 and cudalucas don't find a device one. Clearing that up requires a shutdown/restart, in command line, shutdown -r.
kriesel is offline   Reply With Quote
Old 2017-12-01, 18:21   #533
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124538 Posts
Default multiple instances or dissimilar instances per gpu

Hi,

Has anyone experimented with running more than one instance of CUDAPm1 on a single GPU?

Reason I ask is I'm used to seeing 100% GPU load in GPU-Z, with a single instance of CUDALucas or CUDAPm1 per GPU, but on a GTX1070 it varies 99-100%. Also I have found gains in running multiple Mfaktc instances, raising the GPU load from 98 to 100%, on a GTX480.

In sharing a single GTX480 GPU between simultaneous single instances of CUDALucas and CUDAPm1, in a quick test, I'm calculating more combined throughput than either running alone, by several percent. Since I'm running numerous GPUs, if that holds up, it's the equivalent of adding another GPU.

Any light you can shed on effects of multiple instances, such as confirming results, or negative results, on various GPU models, would be appreciated.
kriesel is offline   Reply With Quote
Old 2017-12-04, 17:27   #534
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Interesting! I tried it on mine once. The gain was insignificant. The one I have defaults to 700. If I run a GPU process that causes it to reset itself, then that number drops to 450. It takes a cold-boot to get back to 700.
I have never seen a 450 clock rate on either of my GTX480's, or a lower clock after driver restart or program reset. I have seen them drop from 70x to 405, and then some seconds later down to 50.6, when there's little or no GPU processing load, and go back up with load.
kriesel is offline   Reply With Quote
Old 2017-12-04, 18:10   #535
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by kriesel View Post
Has anyone else seen something similar? A GTX480 had no equivalent problem on the same 84M exponents. This is CUDAPm1 v0.20 on Windows 64-bit Vista.

After a few successful stage 1 and stage 2 p-1 runs of ~83.5M, each following exponent >84M runs through stage 1, but not through stage 1 gcd or stage 2, crashing the program instead.
Behavior is reproducible for exponents 84M+, including after program restarts, logouts, system restarts.

M83496143 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=85D38BAC023FCFF8022AABA05F602C4C CUDAPm1 v0.20)
reported 11/1/17
M83496227 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=A1656CF4111B3B15C4A71186811384FF CUDAPm1 v0.20)
reported 11/2/17
M83496247 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=5F246BFB077E96AA450384EFEC8EC599 CUDAPm1 v0.20)
reported 11/3/17
M83496293 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=725F9720C9179022C18CEA98F646F72E CUDAPm1 v0.20)
reported 11/4/17
M50001781 has a factor: 4392938042637898431087689 (P-1, B1=430000, B2=5000000, e=2, n=2688K CUDAPm1 v0.20)

All 5 exponents attempted above 84M failed:
PFactor=A3B66EB4FAAE78E8F283D5C96AD37A__,1,2,84228073,-1,76,2
PFactor=DC8BDAFB8D89D04B3B35742B11D9CE__,1,2,84228097,-1,76,2
PFactor=C996CF4EA78E42F9610D9789BE1666__,1,2,84228103,-1,76,2
and two more

A typical event log entry follows. From entry to entry, process id and application start time changes but other event data values do not.

Log Name: Application
Source: Application Error
Date: 11/4/2017 7:23:36 PM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: eagle
Description:
Faulting application CUDAPm1_win64_20131118_CUDA_50.exe, version 0.0.0.0, time stamp 0x5285815f, faulting module CUDAPm1_win64_20131118_CUDA_50.exe, version 0.0.0.0, time stamp 0x5285815f, exception code 0xc0000005, fault offset 0x000000000000dd20, process id 0xd78, application start time 0x01d355cc5142bacb.
Event Xml:
<Event xmlns=&quot;http://schemas.microsoft.com/win/2004/08/events/event&quot;>
<System>
<Provider Name=&quot;Application Error&quot; />
<EventID Qualifiers=&quot;0&quot;>1000</EventID>
<Level>2</Level>
<Task>100</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime=&quot;2017-11-05T00:23:36.000Z&quot; />
<EventRecordID>256</EventRecordID>
<Channel>Application</Channel>
<Computer>eagle</Computer>
<Security />
</System>
<EventData>
<Data>CUDAPm1_win64_20131118_CUDA_50.exe</Data>
<Data>0.0.0.0</Data>
<Data>5285815f</Data>
<Data>CUDAPm1_win64_20131118_CUDA_50.exe</Data>
<Data>0.0.0.0</Data>
<Data>5285815f</Data>
<Data>c0000005</Data>
<Data>000000000000dd20</Data>
<Data>d78</Data>
<Data>01d355cc5142bacb</Data>
</EventData>
</Event>

Normal progression, 83M:
(end of stage 1)
Iteration 987000 M83496293, 0xf2fb4b229c8521b0, n = 4608K, CUDAPm1 v0.20 err = 0.16919 (0:37 real, 36.8380 ms/iter, ETA 0:39)
Iteration 988000 M83496293, 0x9ad528e521e85730, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:37 real, 36.8401 ms/iter, ETA 0:03)
M83496293, 0x232eab21eaf81e92, n = 4608K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:10:44
Starting stage 1 gcd.
M83496293 Stage 1 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 685000, b2 = 12843750, d = 2310, e = 2, nrp = 13
Zeros: 573917, Ones: 658723, Pairs: 125889
Processing 1 - 13 of 480 relative primes.
Inititalizing pass... done. transforms: 270, err = 0.16406, (5.09 real, 18.8644 ms/tran, ETA NA)
Transforms: 2106 M83496293, 0x52b341a257507f69, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:41 real, 19.4671 ms/tran, ETA 9:14:05)
Transforms: 2010 M83496293, 0x905f255bd35e844b, n = 4608K, CUDAPm1 v0.20 err = 0.17188 (0:39 real, 19.5838 ms/tran, ETA 9:15:02)
Transforms: 2014 M83496293, 0x673b942ac1fc4ae2, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:40 real, 19.5771 ms/tran, ETA 9:14:52)
...

Processing 469 - 480 of 480 relative primes.
Inititalizing pass... done. transforms: 357, err = 0.17090, (6.88 real, 19.2605 ms/tran, ETA 14:07)
Transforms: 2090 M83496293, 0x284e7914442300ef, n = 4608K, CUDAPm1 v0.20 err = 0.17090 (0:41 real, 19.4700 ms/tran, ETA 13:26)
Transforms: 2058 M83496293, 0xb1c240cc360984b8, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:40 real, 19.5747 ms/tran, ETA 12:46)
Transforms: 2012 M83496293, 0xfa21edbaa82e8d9d, n = 4608K, CUDAPm1 v0.20 err = 0.16992 (0:40 real, 19.5721 ms/tran, ETA 12:07)
Transforms: 1958 M83496293, 0xfdc0e766f0aa5f44, n = 4608K, CUDAPm1 v0.20 err = 0.16992 (0:38 real, 19.5923 ms/tran, ETA 11:28)
Transforms: 1980 M83496293, 0xf808c66bf88da80d, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:39 real, 19.5757 ms/tran, ETA 10:50)
Transforms: 1998 M83496293, 0xed71c1b76d6c0757, n = 4608K, CUDAPm1 v0.20 err = 0.16602 (0:39 real, 19.5754 ms/tran, ETA 10:10)
Transforms: 1910 M83496293, 0x9587bca9e6a92d95, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.5884 ms/tran, ETA 9:33)
Transforms: 1902 M83496293, 0xdd50dacef6b94028, n = 4608K, CUDAPm1 v0.20 err = 0.17383 (0:38 real, 19.5907 ms/tran, ETA 8:56)
Transforms: 1930 M83496293, 0x5c01c876ba23af0e, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:38 real, 19.6468 ms/tran, ETA 8:18)
Transforms: 1924 M83496293, 0x4967e5714a906dd8, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.6022 ms/tran, ETA 7:40)
Transforms: 1914 M83496293, 0xb5338d4f9734dcbf, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:38 real, 19.5649 ms/tran, ETA 7:03)
Transforms: 1882 M83496293, 0xb3364da78f68767c, n = 4608K, CUDAPm1 v0.20 err = 0.17969 (0:37 real, 19.5884 ms/tran, ETA 6:26)
Transforms: 1916 M83496293, 0x63c6b998ac49a7a0, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.5861 ms/tran, ETA 5:49)
Transforms: 1844 M83496293, 0x9b385d7b61a51d47, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:36 real, 19.5965 ms/tran, ETA 5:13)
Transforms: 1882 M83496293, 0xe0d8af2fcfffed20, n = 4608K, CUDAPm1 v0.20 err = 0.17188 (0:37 real, 19.5938 ms/tran, ETA 4:36)
Transforms: 1896 M83496293, 0x85a24d9c67bd9496, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.5903 ms/tran, ETA 3:59)
Transforms: 1986 M83496293, 0x71a887caf40e5bb7, n = 4608K, CUDAPm1 v0.20 err = 0.17627 (0:39 real, 19.5874 ms/tran, ETA 3:20)
Transforms: 1978 M83496293, 0x65c7d9d6c70197bf, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:39 real, 19.5815 ms/tran, ETA 2:41)
Transforms: 1986 M83496293, 0x8f7ecc43a94105ef, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:39 real, 19.5769 ms/tran, ETA 2:02)
Transforms: 1950 M83496293, 0xaac5ccee0aafbde0, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:38 real, 19.5877 ms/tran, ETA 1:24)
Transforms: 2036 M83496293, 0x34e6f17ecab893b1, n = 4608K, CUDAPm1 v0.20 err = 0.17188 (0:40 real, 19.5862 ms/tran, ETA 0:44)
Transforms: 2024 M83496293, 0x4b29a8a5677c72db, n = 4608K, CUDAPm1 v0.20 err = 0.17578 (0:40 real, 19.5816 ms/tran, ETA 0:04)

Stage 2 complete, 1710522 transforms, estimated total time = 9:18:00
Starting stage 2 gcd.
M83496293 Stage 2 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K CUDAPm1 v0.20)

(results.txt entry made, worktodo modified, next exponent started)



Abnormal 84M exponent:
(end of stage 1 crashes before gcd, program restarted attempts to begin at stage 2 fail, stage 1 gcd message missing)
Iteration 994000 M84228073, 0xf6fe7d71235ae765, n = 4608K, CUDAPm1 v0.20 err = 0.21875 (0:37 real, 36.8486 ms/iter, ETA 0:55)
Iteration 995000 M84228073, 0xed35e0151d83c908, n = 4608K, CUDAPm1 v0.20 err = 0.22656 (0:36 real, 36.8537 ms/iter, ETA 0:19)
M84228073, 0xc840c55fb78fc6a2, n = 4608K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:15:26batch wrapper reports cudapm1 exited at Sat 11/04/2017 12:12:38.23
batch wrapper reports CUDAPm1 (re)launch at Sat 11/04/2017 12:12:39.17

(from here repeats except batch wrapper date/time stamps change, until worktodo file is manually modified to remove the stuck exponent)
CUDAPm1 v0.20
Warning: Couldn't parse ini file option UnusedMem; using default.
------- DEVICE 0 -------
name Quadro 2000
Compatibility 2.1
clockRate (MHz) 1251
memClockRate (MHz) 1304
totalGlobalMem 1073741824
totalConstMem 65536
l2CacheSize 262144
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1536
multiProcessorCount 4
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
textureAlignment 512
deviceOverlap 1

No Quadro 2000 fft.txt file found. Using default fft lengths.
For optimal fft selection, please run
./CUDAPm1 -cufftbench 1 8192 r
for some small r, 0 < r < 6 e.g.
CUDA reports 952M of 1024M GPU memory free.
No Quadro 2000 threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDAPm1 -cufftbench 4608 4608 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 512, mult 128, norm2 128.
No stage 2 checkpoint.
Using up to 828M GPU memory.
Selected B1=690000, B2=12937500, 3.07% chance of finding a factor
Using B1 = 690000 from savefile.
Continuing stage 2 from a partial result of M84228073 fft length = 4608K
batch wrapper reports cudapm1 exited at Sat 11/04/2017 12:13:34.24
batch wrapper reports CUDAPm1 (re)launch at Sat 11/04/2017 12:13:36.14
The plot thickens. I've successfully run higher exponents (~84.9m) on another Quadro 2000, with the same CUDAPm1 executable image, CUDA5.5 64-bit 20130923 V0.20 executable. BIOS versions on the GPUs differ in the right 6 characters; the problem occurred on the gpu with the lower BIOS version number 70 06 0F 00 0A, and not with 70 06 31 02 01. It was run with no fft file or threads file initially, 512, 128, 128 threads 4608k fft length, then retried to complete with fft and threads files and 256, 256, 32 threads, 4608k fft length and program still failed. The other GPU had fft and threads files created before beginning to run any P-1 attempts, which succeeded. I'm now attempting a new exponent ~84.9m on the unit that had trouble with 84.2m. If that fails I may run a thorough memory test on it. Other possibilities are card swap and retest, and BIOS update. Other ideas?
kriesel is offline   Reply With Quote
Old 2017-12-05, 01:20   #536
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

111101001002 Posts
Default

Code:
Transforms:  2024 M83496293, 0x4b29a8a5677c72db, n = 4608K, CUDAPm1 v0.20 err = 0.17578 (0:40 real, 19.5816 ms/tran, ETA 0:04)
I wish the part in red could be removed. It makes PowerShell, or Command Prompt, almost too wide to fit my screen.

Last fiddled with by storm5510 on 2017-12-05 at 01:20
storm5510 is offline   Reply With Quote
Old 2017-12-05, 02:25   #537
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

1011011100102 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Code:
Transforms:  2024 M83496293, 0x4b29a8a5677c72db, n = 4608K, CUDAPm1 v0.20 err = 0.17578 (0:40 real, 19.5816 ms/tran, ETA 0:04)
I wish the part in red could be removed. It makes PowerShell, or Command Prompt, almost too wide to fit my screen.
Use smaller fonts?
Mark Rose is offline   Reply With Quote
Old 2017-12-05, 08:19   #538
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

7A416 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Use smaller fonts?
That's an option. I generally run this in PowerShell.
storm5510 is offline   Reply With Quote
Old 2017-12-05, 21:24   #539
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by kriesel View Post
The plot thickens. I've successfully run higher exponents (~84.9m) on another Quadro 2000, with the same CUDAPm1 executable image, CUDA5.5 64-bit 20130923 V0.20 executable. BIOS versions on the GPUs differ in the right 6 characters; the problem occurred on the gpu with the lower BIOS version number 70 06 0F 00 0A, and not with 70 06 31 02 01. It was run with no fft file or threads file initially, 512, 128, 128 threads 4608k fft length, then retried to complete with fft and threads files and 256, 256, 32 threads, 4608k fft length and program still failed. The other GPU had fft and threads files created before beginning to run any P-1 attempts, which succeeded. I'm now attempting a new exponent ~84.9m on the unit that had trouble with 84.2m. If that fails I may run a thorough memory test on it. Other possibilities are card swap and retest, and BIOS update. Other ideas?
Ok. Same GPU and system that reliably choked on exponents 84228073, 84228097, 84228103, 84228119, 84228229, just successfully ran to completion, M84861479, with same fft length etc. I'd expect the higher exponent to present more of a challenge, not less.

CUDA reports 830M of 1024M GPU memory free.
Index 64
Using threads: norm1 256, mult 256, norm2 32.
Using up to 720M GPU memory.
Selected B1=690000, B2=12420000, 3.05% chance of finding a factor
Starting stage 1 P-1, M84861479, B1 = 690000, B2 = 12420000, fft length = 4608K
Doing 995519 iterations
Iteration 5000 M84861479, 0x85dcbca418bb3656, n = 4608K, CUDAPm1 v0.20 err = 0.27344 (3:03 real, 36.6115 ms/iter, ETA 10:04:24)
...
Iteration 995000 M84861479, 0xb98ed42b48260d4a, n = 4608K, CUDAPm1 v0.20 err = 0.25000 (3:02 real, 36.5191 ms/iter, ETA 0:18)
M84861479, 0x4a2093b79c7bf108, n = 4608K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:06:31
Starting stage 1 gcd.
M84861479 Stage 1 found no factor (P-1, B1=690000, B2=12420000, e=0, n=4608K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 690000, b2 = 12420000, d = 2310, e = 2, nrp = 10
Zeros: 554802, Ones: 637038, Pairs: 121194
Processing 1 - 10 of 480 relative primes.
...

Stage 2 complete, 1766191 transforms, estimated total time = 9:31:47
Starting stage 2 gcd.
M84861479 Stage 2 found no factor (P-1, B1=690000, B2=12420000, e=2, n=4608K CUDAPm1 v0.20)

Weird, but I'll take it. A couple other things I had thought of to try were matching OS and system ram on another box & GPU and retrying there. System that ran the problem 84.2m exponents to completion had a newer Windows OS and twice the system ram.
...
kriesel is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51
World's dumbest CUDA program? xilman Programming 1 2009-11-16 10:26
Factoring program need help Citrix Lone Mersenne Hunters 8 2005-09-16 02:31
Factoring program ET_ Programming 3 2003-11-25 02:57

All times are UTC. The time now is 06:57.


Mon Aug 2 06:57:10 UTC 2021 up 10 days, 1:26, 0 users, load averages: 1.41, 1.20, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.