mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   genefer/CUDA (https://www.mersenneforum.org/showthread.php?t=14297)

msft 2012-01-04 01:10

[QUOTE=AG5BPilot;284535]I'm not sure how long this has been in the code, but the source of the occasional checkpoint read failures is due to the code within the checkpoint read function using !strcmp when it should be using strcmp AND also not adding a terminating null to the end of one of the strings it's comparing.
[/QUOTE]
You are right.
[code]
$ od -cx genefer.ckpt |head
0000000 335 \0 \0 \0 004 \0 \0 \0 C U D A 232 372 036 \0
00dd 0000 0004 0000 5543 4144 fa9a 001e

#define SaveFileVersion 221

oldSaveVer="00dd 0000"
byte="0004 0000"
build="5543 4144"
(x86 is little-endian format.)

http://research.microsoft.com/en-us/um/redmond/projects/invisible/src/crt/strcmp.c.htm
[/code]

msft 2012-01-04 04:05

1 Attachment(s)
Ver 1.049 with linux64 exec file.
Fixed #99 issue.

msft 2012-01-04 12:33

1 Attachment(s)
Ver 1.05 with linux64 exec file.
Fixed "Corruption of error message" issue.

rroonnaalldd 2012-01-04 15:15

1 Attachment(s)
Minor cosmetics (i*2) to ((i)*2) in v1.051.

Do you have an idea, why your source produces a 100% load on one cpu-core?
Is this too by design that stopping the bench results in a hanging app after writing the message "writing checkpoint" to the screen?
Do you know what causes the underlined differences in err-rates between the 32bit- and 64bit-app?


Here some timings for our apps and in comparison the timings for genefer, genefer80 and geneferX64:
[QUOTE]boinc@vmware2k-3: ./[B]GeneferCUDA.cuda4.0.Linux32[/B] -b

2009574^8192+1 Time: 663 us/mul. Err: 3.82e-01 51636 digits
1632282^16384+1 Time: 680 us/mul. Err: 2.53e-01 101791 digits
1325824^32768+1 Time: 746 us/mul. Err: 2.03e-01 200622 digits
1076904^65536+1 Time: 980 us/mul. Err: 1.88e-01 395325 digits
874718^131072+1 Time: 1.34 ms/mul. Err: 3.47e-01 778813 digits
710492^262144+1 Time: 2.07 ms/mul. Err: 4.21e-01 1533952 digits
577098^524288+1 Time: 4.06 ms/mul. Err: 2.01e-01 3020555 digits
468750^1048576+1 Time: 8.21 ms/mul. [U]Err: 1.64e-01[/U] 5946413 digits
380742^2097152+1 Time: 16.6 ms/mul. Err: 3.63e-01 11703432 digits
309258^4194304+1 Time: 35.8 ms/mul. [U]Err: 4.07e-01[/U] 23028076 digits
251196^8388608+1 Time: 73.2 ms/mul. [U]Err: 4.33e-01[/U] 45298590 digits
[/QUOTE]
[QUOTE]boinc@vmware2k-3: ./[B]GeneferCUDA.cuda4.0.Linux64[/B] -b

2009574^8192+1 Time: 696 us/mul. Err: 3.82e-01 51636 digits
1632282^16384+1 Time: 713 us/mul. Err: 2.53e-01 101791 digits
1325824^32768+1 Time: 779 us/mul. Err: 2.03e-01 200622 digits
1076904^65536+1 Time: 1.01 ms/mul. Err: 1.88e-01 395325 digits
874718^131072+1 Time: 1.37 ms/mul. Err: 3.47e-01 778813 digits
710492^262144+1 Time: 2.11 ms/mul. Err: 4.21e-01 1533952 digits
577098^524288+1 Time: 4.09 ms/mul. Err: 2.01e-01 3020555 digits
468750^1048576+1 Time: 8.21 ms/mul. [U]Err: 1.72e-01[/U] 5946413 digits
380742^2097152+1 Time: 16.7 ms/mul. Err: 3.63e-01 11703432 digits
309258^4194304+1 Time: 36.9 ms/mul. [U]Err: 1.56e-01[/U] 23028076 digits
251196^8388608+1 Time: 74.9 ms/mul. [U]Err: 1.56e-01[/U] 45298590 digits
[/QUOTE]

CPU:
[QUOTE]boinc@vmware2k-3: ./[B]genefer[/B] -b

6631258^256+1 Time: 6.1 us/mul. Err: 0.4991 1747 digits
5386256^512+1 Time: 13.4 us/mul. Err: 0.4361 3447 digits
4375000^1024+1 Time: 28.1 us/mul. Err: 0.4200 6801 digits
3553604^2048+1 Time: 59.8 us/mul. Err: 0.4686 13416 digits
2886422^4096+1 Time: 129 us/mul. Err: 0.4362 26462 digits
2344504^8192+1 Time: 327 us/mul. Err: 0.4133 52184 digits
1904328^16384+1 Time: 693 us/mul. Err: 0.4014 102888 digits
1546796^32768+1 Time: 1.46 ms/mul. Err: 0.4412 202816 digits
1256388^65536+1 Time: 3.05 ms/mul. Err: 0.4634 399713 digits
1020504^131072+1 Time: 6.33 ms/mul. Err: 0.3661 787588 digits
828906^262144+1 Time: 13.8 ms/mul. Err: 0.4133 1551501 digits
673282^524288+1 Time: 29.4 ms/mul. Err: 0.4228 3055654 digits
546874^1048576+1 Time: 71.2 ms/mul. Err: 0.3784 6016611 digits
444200^2097152+1 Time: 155 ms/mul. Err: 0.3295 11843831 digits
[/QUOTE]
[QUOTE]boinc@vmware2k-3: ./[B]genefer80[/B] -b

5683936^256+1 Time: 9.77 us/mul. Err: 0.0002 1730 digits
4616790^512+1 Time: 20.8 us/mul. Err: 0.0002 3413 digits
3750000^1024+1 Time: 47.6 us/mul. Err: 0.0002 6732 digits
3045946^2048+1 Time: 101 us/mul. Err: 0.0002 13279 digits
2474076^4096+1 Time: 227 us/mul. Err: 0.0002 26188 digits
2009574^8192+1 Time: 522 us/mul. Err: 0.0002 51636 digits
1632282^16384+1 Time: 1.08 ms/mul. Err: 0.0002 101791 digits
1325824^32768+1 Time: 2.25 ms/mul. Err: 0.0002 200622 digits
1076904^65536+1 Time: 5 ms/mul. Err: 0.0002 395325 digits
874718^131072+1 Time: 11.2 ms/mul. Err: 0.0002 778813 digits
710492^262144+1 Time: 22.7 ms/mul. Err: 0.0002 1533952 digits
577098^524288+1 Time: 55.3 ms/mul. Err: 0.0002 3020555 digits
468750^1048576+1 Time: 128 ms/mul. Err: 0.0002 5946413 digits
380742^2097152+1 Time: 270 ms/mul. Err: 0.0002 11703432 digits
[/QUOTE]
[QUOTE]boinc@vmware2k-3: ./[B]geneferX64[/B] -b

5683936^256+1 Time: 3.81 us/mul. Err: 0.2500 1730 digits
4616790^512+1 Time: 8.24 us/mul. Err: 0.2500 3413 digits
3750000^1024+1 Time: 17.7 us/mul. Err: 0.2500 6732 digits
3045946^2048+1 Time: 37.8 us/mul. Err: 0.2500 13279 digits
2474076^4096+1 Time: 80.6 us/mul. Err: 0.2500 26188 digits
2009574^8192+1 Time: 200 us/mul. Err: 0.2500 51636 digits
1632282^16384+1 Time: 420 us/mul. Err: 0.2500 101791 digits
1325824^32768+1 Time: 879 us/mul. Err: 0.2188 200622 digits
1076904^65536+1 Time: 1.88 ms/mul. Err: 0.2031 395325 digits
874718^131072+1 Time: 3.83 ms/mul. Err: 0.2188 778813 digits
710492^262144+1 Time: 7.97 ms/mul. Err: 0.1875 1533952 digits
577098^524288+1 Time: 17.5 ms/mul. Err: 0.1719 3020555 digits
468750^1048576+1 Time: 44.4 ms/mul. Err: 0.1875 5946413 digits
380742^2097152+1 Time: 98.8 ms/mul. Err: 0.1875 11703432 digits
[/QUOTE]

AG5BPilot 2012-01-04 16:26

[QUOTE=rroonnaalldd;284769]
Do you have an idea, why your source produces a 100% load on one cpu-core?
Is this too by design that stopping the bench results in a hanging app after writing the message "writing checkpoint" to the screen?
Do you know what causes the underlined differences in err-rates between the 32bit- and 64bit-app?

CPU:[/QUOTE]

Ronald,

I can probably answer some of your questions as my boinc version is, essentially, a superset of Shoichiro's software.

The problem with not being able to terminate the benchmarks has to do with the way the CTRL-C trapping works. The handler for the signal capture is set up at the beginning of the program, BEFORE it's determined that you're going to run a benchmark. When you hit CTRL-C, the handler captures it and sets the quitting flag, which tells "check" to stop running, do a checkpoint, and exit.

Unfortunately, "check" isn't running. So the quitting flag is set, but nothing ever reads that.

I fixed this in my boinc version of the code by only setting the CTRL-C handler if we're going to do a real PRP test. Otherwise, I let the default run-time handler catch the signal and terminate the program normally.

As for the CPU core loading, mine doesn't do that, and the math code in my version is identical to Shoichiro's. But my build is 32 bit windows.

By the way, when I did build a 64 bit version, it ran very slightly slower than the 32 bit build.

AG5BPilot 2012-01-04 16:52

[QUOTE=rroonnaalldd;284769]Minor cosmetics (i*2) to ((i)*2) in v1.051.[/QUOTE]

I must be missing something. These are the only changes I can find between 1.05 and 1.051:

[code]Compare: (<)C:\GeneferCUDA test\genefercuda.1.051\GeneferCUDA.cu (37817 bytes)
with: (>)C:\GeneferCUDA test\genefercuda.1.05\GeneferCUDA.cu (37811 bytes)

137d137
<
476,477c475
< }
< while (f != 0);
---
> } while (f != 0);
898,899c896,897
< }
< while (maxError > ErrThreshold);
---
>
> } while (maxError > ErrThreshold);
1121c1119,1120
< while (fgets(str, 132, fp) != NULL)
---
>
> while (fgets(str, 132, fp) != NULL)
1267d1266
<
1271,1275d1269
<
<
<
<
< [/code]

I don't see anything about (i*2). Also, the only place I can find (i*2), i is a variable, not a macro parameter, so I don't understand the benefit of changing it to ((i)*2).

AG5BPilot 2012-01-04 19:37

Shoichiro,

I just completed testing 4000^2097152+1 with older code (0.99) that uses SHIFT=8. Good news; the residual is the same as the residual that came from the 1.04 version with the new SHIFT logic.

Mike

AG5BPilot 2012-01-04 23:48

I'm hoping there's a simple answer to this question.

How do I get convert an array of integers (which represent a very very large integer) to and from the complex arrays that are the input and output from FFTsquareFFT?

Here's the background:

Once upon a time (say, when I was in high school some 35 five years ago), I understood and really, really enjoyed things like Fourier transforms. Having not ever, once, had to actually use a Fourier transform since then, it's not difficult to imagine that I've forgotten nearly everything about them!

Until, lo and behold, I find myself needing to do multiplications on astoundingly big numbers.

So, here's the problem. Genefer, at the beginning of its test, starts by computing the value of b^N by repeatedly taking the square of b. The code in Genefer worked fine, for years, until some fool (that fool would be me, of course) had the audacity to go and discover that 25898^524288+1 was a prime number. So, of course, that sent us all thinking about finding even bigger primes with Genefer.

So far, so good. Except that I'm a programmer by training, so I started tinkering with Genefer. First thing I did was convert GeneferCUDA to run under BOINC, so we can gets lots and lots of people running it (more people are willing to run Boinc than PRPNet. That's a shame; I like PRPNet better. Mark rocks.)

Along the way, I found that at N=4194304 Genefer takes a very long time to start up. Looking at the code, I discovered that the really simple code at the beginning that computes the actual value of b^N, by repeatedly squaring b, takes a very long time when N=44194304. About 2 hours on my Core2Quad.

Now, I could live with it taking 2 hours. After all, the actual primality test takes about 200 hours on a GTX 460. But, I know it can be done faster. It would take me, for example, a few minutes to write the code to use the gwnum library to do it, and that could do the calculations very quickly.

Of course, it's silly to link the gwnum library into a program that's already doing math like that on the GPU.

The programmer in me won't let me accept that 2 hours of processing when I know it can be eliminated.

Which brings me to the part about my having forgotten what I learned about Fourier transforms 35 years ago. GeneferCUDA already has the code in it to do all the real work to do the square. All I have to do is call FFTsquareFFT. It does a forward FFT, calls a CUDA kernel to do the complex multiply on the FFT result, and does an inverse FFT. The result is the square of the input.

The problem is, I'm not sure how to convert the input number (an array of integers) into the complex array that's the input to the FFT, nor how to get the resulting number back out. I might figure it out eventually, but I thought I'd ask because there's probably an incredibly simple answer.

Thanks,
Mike

msft 2012-01-05 00:18

Hi ,rroonnaalldd
[QUOTE=rroonnaalldd;284769]Do you have an idea, why your source produces a 100% load on one cpu-core?
[/QUOTE]
Please see.
[url]http://mersenneforum.org/showpost.php?p=284251&postcount=86[/url]
[url]http://mersenneforum.org/showpost.php?p=284284&postcount=87[/url]
[url]http://mersenneforum.org/showpost.php?p=284292&postcount=88[/url]
[url]http://mersenneforum.org/showpost.php?p=284300&postcount=89[/url]

msft 2012-01-05 01:10

Hi ,
[QUOTE=AG5BPilot;284813]How do I get convert an array of integers (which represent a very very large integer) to and from the complex arrays that are the input and output from FFTsquareFFT?
[/QUOTE]
I think right way was Implement library.
[url]http://mersenneforum.org/showpost.php?p=248716&postcount=63[/url]
Or something like this(GMP ?).
llrCUDA have same problem.

rroonnaalldd 2012-01-05 02:21

[QUOTE]Please see.[/QUOTE] Thanks for the links.


All times are UTC. The time now is 20:52.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.