mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Operazione Doppi Mersennes (https://www.mersenneforum.org/forumdisplay.php?f=99)
-   -   Trial division with CUDA (mmff) -- used, but runs like new! (https://www.mersenneforum.org/showthread.php?t=17162)

ATH 2012-09-22 15:02

I tried to run the known fermat factors again, but I got this error on 2 of them:

[CODE]got assignment: k*2^45+1, k range 111310000000000 to 111320000000000 (92-bit fac
tors)
Starting trial factoring of k*2^45+1 in k range: 111310G to 111320G (92-bit fact
ors)
k_min = 111309999999660
k_max = 111320000000000
Using GPU kernel "mfaktc_barrett96_F32_63gs"
ERROR: GPU sieve problems. Factor divisible by 29


got assignment: k*2^45+1, k range 111310000000000 to 111320000000000 (92-bit fac
tors)
Starting trial factoring of k*2^45+1 in k range: 111310G to 111320G (92-bit fact
ors)
k_min = 111309999999660
k_max = 111320000000000
Using GPU kernel "mfaktc_barrett96_F32_63gs"
ERROR: GPU sieve problems. Factor divisible by 41


got assignment: k*2^54+1, k range 81900000000000 to 81911000000000 (101-bit fact
ors)
Starting trial factoring of k*2^54+1 in k range: 81900G to 81911G (101-bit facto
rs)
k_min = 81899999998740
k_max = 81911000000000
Using GPU kernel "mfaktc_barrett108_F32_63gs"
ERROR: GPU sieve problems. Factor divisible by 17


got assignment: k*2^54+1, k range 81900000000000 to 81911000000000 (101-bit fact
ors)
Starting trial factoring of k*2^54+1 in k range: 81900G to 81911G (101-bit facto
rs)
k_min = 81899999998740
k_max = 81911000000000
Using GPU kernel "mfaktc_barrett108_F32_63gs"
ERROR: GPU sieve problems. Factor divisible by 13
[/CODE]

I wonder if it's my card since it's not the same prime in the error every time?

Prime95 2012-09-22 16:39

[QUOTE=ATH;312430]I wonder if it's my card since it's not the same prime in the error every time?[/QUOTE]

I wouldn't conclude that. Is this the Windows or Linux build? I reran finding the known Fermat factors last night before uploading the source.

Did you set GPUSievePrimes in mmff.ini or try the new auto-select feature?

ATH 2012-09-22 19:26

It's a windows 64bit build.

I think I found the problem, it happens when there is 2 or more assignment in worktodo.txt on the same n but using different GPU kernels. For example
FermatFactor=63,88,89
FermatFactor=63,89,90
uses first "mfaktc_barrett89_F32_63gs" then "mfaktc_barrett96_F32_63gs" for the 2nd line.

It's not all kernel transitions but most of them. Here is a list I started with the transitions and whether or not the problem occurs and then an example of the 2 lines in worktodo.txt.

There are 62 more transitions to test which I can do if its needed.

EDIT: This seems to be an issue with auto-selecting GPUSievePrimes as it disappears when I set it.

[CODE]mfaktc_barrett89_F0_31gs to mfaktc_barrett96_F0_31gs ERROR: GPU sieve problems
FermatFactor=31,200000000e9,200000001e9
FermatFactor=31,300000000e9,300000001e9

mfaktc_barrett96_F0_31gs to mfaktc_barrett89_F0_31gs ERROR: GPU sieve problems
FermatFactor=31,300000000e9,300000001e9
FermatFactor=31,200000000e9,200000001e9



mfaktc_barrett89_F32_63gs to mfaktc_barrett96_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,88,89
FermatFactor=63,89,90

mfaktc_barrett89_F32_63gs to mfaktc_barrett108_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,88,89
FermatFactor=63,96,97

mfaktc_barrett89_F32_63gs to mfaktc_barrett120_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,88,89
FermatFactor=63,40000e9,40001e9

mfaktc_barrett89_F32_63gs to mfaktc_barrett128_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,88,89
FermatFactor=63,200000000e9,200000001e9



mfaktc_barrett96_F32_63gs to mfaktc_barrett89_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,89,90
FermatFactor=63,88,89

mfaktc_barrett96_F32_63gs to mfaktc_barrett108_F32_63gs no error
FermatFactor=63,89,90
FermatFactor=63,96,97

mfaktc_barrett96_F32_63gs to mfaktc_barrett120_F32_63gs no error
FermatFactor=63,89,90
FermatFactor=63,40000e9,40001e9

mfaktc_barrett96_F32_63gs to mfaktc_barrett128_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,89,90
FermatFactor=63,200000000e9,200000001e9



mfaktc_barrett108_F32_63gs to mfaktc_barrett89_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,96,97
FermatFactor=63,88,89

mfaktc_barrett108_F32_63gs to mfaktc_barrett96_F32_63gs no error
FermatFactor=63,96,97
FermatFactor=63,95,96

mfaktc_barrett108_F32_63gs to mfaktc_barrett120_F32_63gs no error
FermatFactor=63,96,97
FermatFactor=63,40000e9,40001e9

mfaktc_barrett108_F32_63gs to mfaktc_barrett128_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,96,97
FermatFactor=63,200000000e9,200000001e9



mfaktc_barrett120_F32_63gs to mfaktc_barrett89_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,40000e9,40001e9
FermatFactor=63,88,89

mfaktc_barrett120_F32_63gs to mfaktc_barrett96_F32_63gs no error
FermatFactor=63,40000e9,40001e9
FermatFactor=63,89,90

mfaktc_barrett120_F32_63gs to mfaktc_barrett108_F32_63gs no error
FermatFactor=63,40000e9,40001e9
FermatFactor=63,96,97

mfaktc_barrett120_F32_63gs to mfaktc_barrett128_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,40000e9,40001e9
FermatFactor=63,200000000e9,200000001e9



mfaktc_barrett128_F32_63gs to mfaktc_barrett89_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,200000000e9,200000001e9
FermatFactor=63,88,89

mfaktc_barrett128_F32_63gs to mfaktc_barrett96_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,200000000e9,200000001e9
FermatFactor=63,89,90

mfaktc_barrett128_F32_63gs to mfaktc_barrett108_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,200000000e9,200000001e9
FermatFactor=63,96,97

mfaktc_barrett128_F32_63gs to mfaktc_barrett120_F32_63gs ERROR: GPU sieve problems
FermatFactor=63,200000000e9,200000001e9
FermatFactor=63,40000e9,40001e9[/CODE]

ATH 2012-09-22 19:38

While testing I have also run in to the error:
ERROR: Exponentiation falure
(yes there is a typo in 'failure')

I get it with GPUSievePrimes off (auto-selecting) and this worktodo.txt:
FermatFactor=96,128,129
FermatFactor=97,128,129
FermatFactor=98,128,129
FermatFactor=99,128,129
FermatFactor=100,128,129

But I know I got this error before 0.24 and auto-select feature but it seems very elusive and hard to track down and reproduce.

Prime95 2012-09-22 20:03

[QUOTE=ATH;312430]I wonder if it's my card since it's not the same prime in the error every time?[/QUOTE]

Clarification. The GPU sieve does not give reproducible results. For performance reasons, bits are cleared from the sieve without using atomic operations. Thus, there are race conditions where two threads try to clear different bits in the same byte.

This causes us to test a few more trial factors then necessary, but is more than offset by the savings from not using atomic operations.

Prime95 2012-09-22 21:54

[QUOTE=ATH;312451]
I think I found the problem, it happens when there is 2 or more assignment in worktodo.txt on the same n but using different GPU kernels. [/QUOTE]

I have a fix for this. If you can reproduce the exponentiation failure with the -v 3 command line argument that might be helpful. I could not reproduce the trouble.

If I get timely feedback on the %g problem, I'd like to get that fixed in 0.25 too.

rcv 2012-09-22 22:05

[QUOTE=ATH;312454]While testing I have also run in to the error:
ERROR: Exponentiation falure
(yes there is a typo in 'failure')

I get it with GPUSievePrimes off (auto-selecting) and this worktodo.txt:
FermatFactor=96,128,129
FermatFactor=97,128,129
FermatFactor=98,128,129
FermatFactor=99,128,129
FermatFactor=100,128,129

But I know I got this error before 0.24 and auto-select feature but it seems very elusive and hard to track down and reproduce.[/QUOTE]
I reported a similar problem to George a week or two ago. There wasn't enough information to determine whether it was flaky hardware, the mmff software, or an NVIDIA runtime bug. I can reproduce the above error, which seems to rule out flaky hardware.

I ran each of the above five assignments 11 times, and I got 3 failures of the "FermatFactor=96,128,129" assignment. [I Immediately restarted mmff after each failure.]
[CODE]got assignment: k*2^96+1 bit_min=128 bit_max=129
Starting trial factoring k*2^96+1 from 2^128 to 2^129
k_min = 4294964520
k_max = 8589934592
Using GPU kernel "mfaktc_barrett140_F96_127gs"
class | candidates | time | ETA | raw rate | SievePrimes | CPU wait
...
<failure location not recorded>
ERROR: Exponentiation falure
...
1575/4620 | 0.93M | 0.009s | n.a. | 103.77M/s | 349749 | n.a.%
ERROR: Exponentiation falure
...
1575/4620 | 0.93M | 0.010s | n.a. | 93.39M/s | 349749 | n.a.%
ERROR: Exponentiation falure
...[/CODE]Next, I put the failing assignment ("FermatFactor=96,128,129") in my worktodo.txt file 25 times.
[CODE]...
1575/4620 | 0.93M | 0.009s | n.a. | 103.77M/s | 349749 | n.a.%
ERROR: Exponentiation falure
...
1575/4620 | 0.93M | 0.008s | n.a. | 116.74M/s | 349749 | n.a.%
ERROR: Exponentiation falure[/CODE]After two failures, I changed the command line to specify a different GPU.
[CODE] ...
1575/4620 | 0.93M | 0.024s | n.a. | 38.91M/s | 349749 | n.a.%
ERROR: Exponentiation falure
...
1575/4620 | 0.93M | 0.013s | n.a. | 71.84M/s | 349749 | n.a.%
ERROR: Exponentiation falure
...
1575/4620 | 0.93M | 0.018s | n.a. | 51.88M/s | 349749 | n.a.%
ERROR: Exponentiation falure
...
1575/4620 | 0.93M | 0.013s | n.a. | 71.84M/s | 349749 | n.a.%
ERROR: Exponentiation falure
...
1575/4620 | 0.93M | 0.016s | n.a. | 58.37M/s | 349749 | n.a.%
ERROR: Exponentiation falure[/CODE]10 failures out of 46 runs on two different GPUs. Perhaps this is sufficiently reproducible to find the problem.

One more failure with -v3:
[CODE]1573/4620 | 0.93M | 0.009s | n.a. | 103.77M/s | 349749 | n.a.%
Verifying (2^(2^96)) % 340581321636451875144725492967785103361 = 202753569648208169353731391108513369608
1575/4620 | 0.93M | 0.009s | n.a. | 103.77M/s | 349749 | n.a.%
Verifying (2^(2^96)) % 340282272560196908974548533520923361281 = 213505026821406843026269288964103298839037
ERROR: Exponentiation falure[/CODE]
Note that the expected result is about 3 digits longer than the modulus.

Prime95 2012-09-22 22:56

[QUOTE=rcv;312470]Note that the expected result is about 3 digits longer than the modulus.[/QUOTE]

and the factor is less than 2^128...

ATH 2012-09-23 01:14

[QUOTE=Prime95;312468]If you can reproduce the exponentiation failure with the -v 3 command line argument that might be helpful. I could not reproduce the trouble.[/QUOTE]

First one with GPUSievePrimes off (auto-select): [URL="http://www.hoegge.dk/mersenne/falure1.txt"]falure1.txt[/URL]
FermatFactor=96,128,129
FermatFactor=97,128,129
FermatFactor=98,128,129
FermatFactor=99,128,129
FermatFactor=100,128,129

Second one with GPUSievePrimes=650000 (optimal), same worktodo.txt: [URL="http://www.hoegge.dk/mersenne/falure2.txt"]falure2.txt[/URL]

Third one with GPUSievePrimes=100000 (too low, optimal ~ 950k): [URL="http://www.hoegge.dk/mersenne/falure3.txt"]falure3.txt[/URL]
FermatFactor=140,171,172
FermatFactor=151,182,183
FermatFactor=153,184,185
FermatFactor=156,187,188

ET_ 2012-09-23 10:08

[QUOTE=ATH;312430]I tried to run the known fermat factors again, but I got this error on 2 of them:

[CODE]got assignment: k*2^45+1, k range 111310000000000 to 111320000000000 (92-bit fac
tors)
Starting trial factoring of k*2^45+1 in k range: 111310G to 111320G (92-bit fact
ors)
k_min = 111309999999660
k_max = 111320000000000
Using GPU kernel "mfaktc_barrett96_F32_63gs"
ERROR: GPU sieve problems. Factor divisible by 29


got assignment: k*2^45+1, k range 111310000000000 to 111320000000000 (92-bit fac
tors)
Starting trial factoring of k*2^45+1 in k range: 111310G to 111320G (92-bit fact
ors)
k_min = 111309999999660
k_max = 111320000000000
Using GPU kernel "mfaktc_barrett96_F32_63gs"
ERROR: GPU sieve problems. Factor divisible by 41


got assignment: k*2^54+1, k range 81900000000000 to 81911000000000 (101-bit fact
ors)
Starting trial factoring of k*2^54+1 in k range: 81900G to 81911G (101-bit facto
rs)
k_min = 81899999998740
k_max = 81911000000000
Using GPU kernel "mfaktc_barrett108_F32_63gs"
ERROR: GPU sieve problems. Factor divisible by 17


got assignment: k*2^54+1, k range 81900000000000 to 81911000000000 (101-bit fact
ors)
Starting trial factoring of k*2^54+1 in k range: 81900G to 81911G (101-bit facto
rs)
k_min = 81899999998740
k_max = 81911000000000
Using GPU kernel "mfaktc_barrett108_F32_63gs"
ERROR: GPU sieve problems. Factor divisible by 13
[/CODE]

I wonder if it's my card since it's not the same prime in the error every time?[/QUOTE]

I don't know if this is related... I had the same error trying to run mmff (v2.0) on a cc1.3 card. Did you modify your CUDA drivers/settings?

Luigi

bcp19 2012-09-23 12:52

This may have nothing to do with it, but I noticed new nVidia drivers were available recently (I have not upgraded mine yet). Could they be part of the cause?


All times are UTC. The time now is 04:47.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.