![]() |
Shoichiro,
While looking for the benchmark problem, I think I found a small error in the code. Unless I don't understand what you're trying to do, the IDX macro has a mistake: [quote]#define IDX(i) ((((i)>>(SHIFT*2+2))<<(SHIFT*2+2))+(((i)&((2<<SHIFT)*(2<<SHIFT)-1))>>(SHIFT+1))+((i&((2<<SHIFT)-1))<<(SHIFT+1)))[/quote] I think this should be: [quote]#define IDX(i) ((((i)>>(SHIFT*2+2))<<(SHIFT*2+2))+(((i)&((2<<SHIFT)*(2<<SHIFT)-1))>>(SHIFT+1))+(([color=red][b]([/b][/color]i[color=red][b])[/b][/color]&((2<<SHIFT)-1))<<(SHIFT+1)))[/quote] There's a few places where you use an expression as an argument to the IDX macro, such as these: [quote] zx = g_z[IDX(i*2)]; zy = g_z[IDX(i*2+1)]; ex = g_e1[IDX(i*2)]; ey = g_e1[IDX(i*2+1)];[/quote] The code will still work because of the low precedence for the "&" operator, but the macro would be safer with the extra ( ) in there. |
Hi ,AG5BPilot
Thank you for your review. I'll fix it in next version. |
1 Attachment(s)
Okay, i fixed this but i find no solution for the following: [QUOTE]GeneferCUDA.cu: In function âvoid check(Int32, UInt32, char*)â:
GeneferCUDA.cu:651: warning: format â%dâ expects type âintâ, but argument 5 has type âlong intâ GeneferCUDA.cu:651: warning: format â%02dâ expects type âintâ, but argument 6 has type âlong intâ GeneferCUDA.cu:651: warning: format â%02dâ expects type âintâ, but argument 7 has type âlong intâ [/QUOTE] Code::Blocks points me to the line 493: [QUOTE]static void check(const Int32 b, const UInt32 m, char *expectedResidue)[/QUOTE] |
Change
[CODE] sprintf(str2, " (%d digits) (err = %.4f) (time = (long)%d:(long)%02d:(long)%02d) %.8s\n", dgts, maxErr, hours, minutes, seconds, asctime(today)+11); [/CODE] to [CODE] sprintf(str2, " (%d digits) (err = %.4f) (time = %ld:%02ld:%02ld) %.8s\n", dgts, maxErr, hours, minutes, seconds, asctime(today)+11); [/CODE] |
[QUOTE=msft;283766]
Is GeneferCUDA Ver 1.03 correct ?[/QUOTE] Shoichiro, I just realized I didn't understand your question correctly. (That was my fault. Your English was fine!) I went back and tested 1.03. It has the same problem. Here are the results. [quote]C:\GeneferCUDA test\genefercuda.1.03>GeneferCUDA-X32.exe -b GeneferCUDA 2.2.1 (CUDA) based on Genefer v2.2.1 Copyright (C) 2001-2003, Yves Gallot (v1.3) Copyright (C) 2009, 2011 Mark Rodenkirch, David Underbakke (v2.2.1) Copyright (C) 2010, 2011, Shoichiro Yamada (CUDA) A program for finding large probable generalized Fermat primes. Generalized Fermat Number Bench 2009574^8192+1 Time: 398 us/mul. Err: 3.82e-001 51636 digits 1632282^16384+1 Time: 420 us/mul. Err: 2.53e-001 101791 digits 1325824^32768+1 Time: 451 us/mul. Err: 1.88e-001 200622 digits 1076904^65536+1 Time: 590 us/mul. Err: 1.72e-001 395325 digits 874718^131072+1 Time: 716 us/mul. Err: 3.47e-001 778813 digits 710492^262144+1 Time: 944 us/mul. Err: 4.21e-001 1533952 digits 577098^524288+1 Time: 1.51 ms/mul. Err: 2.01e-001 3020555 digits 468750^1048576+1 Time: 2.31 ms/mul. Err: 1.56e-001 5946413 digits 380742^2097152+1 Time: 230 us/mul. Err: 3.63e-001 11703432 digits 309258^4194304+1 Time: 324 us/mul. Err: 1.48e-001 23028076 digits 251196^8388608+1 Time: 273 us/mul. Err: 1.41e-001 45298590 digits[/quote] Same problem, which isn't surprising. Forcing SHIFT to stay at 8 fixes the problem. I'm currently running 4000^2097152+1 through GeneferCUDA 1.04 to see if the residual matches GeneferCUDA 0.99. This will take a few days. I should know the answer in 2012. In other news, I've got the Windows BOINC development environment running, including the sample BOINC-CUDA application. I'm going to try turning GeneferCUDA into a native BOINC application. This may be an adventure. I'll let you know how this goes. ;-) |
1 Attachment(s)
Hi ,AG5BPilot
Changed around "SHIFT". |
1.042:
[quote]Generalized Fermat Number Bench 2009574^8192+1 Time: 396 us/mul. Err: 3.82e-001 51636 digits 1632282^16384+1 Time: 419 us/mul. Err: 2.53e-001 101791 digits 1325824^32768+1 Time: 451 us/mul. Err: 1.88e-001 200622 digits 1076904^65536+1 Time: 589 us/mul. Err: 1.72e-001 395325 digits 874718^131072+1 Time: 715 us/mul. Err: 3.47e-001 778813 digits 710492^262144+1 Time: 943 us/mul. Err: 4.21e-001 1533952 digits 577098^524288+1 Time: 1.5 ms/mul. Err: 2.01e-001 3020555 digits 468750^1048576+1 Time: 2.31 ms/mul. Err: 1.56e-001 5946413 digits [color=red]380742^2097152+1 Time: 232 us/mul. Err: 3.63e-001 11703432 digits 309258^4194304+1 Time: 320 us/mul. Err: 1.48e-001 23028076 digits 251196^8388608+1 Time: 281 us/mul. Err: 1.41e-001 45298590 digits[/color][/quote] Doesn't fix the problem in the benchmark. I'm still running a test to see if the calculations are ok, but that's going to take a few more days. |
Shoichiro,
Both 1.04 and 1.042 do the same thing, but the code in 1.04 is cleaner. If I can't figure out why it behaves differently, I'll make the Windows build force SHIFT to be 8. That's easier to do in 1.04. I like 1.04 better. Mike |
1 Attachment(s)
Hi ,AG5BPilot
I guess Fixed issue. Problem was use cpu time clock. This version change to wall clock. And reduce CPU Time (100% to 20%). Happy New Year! |
[QUOTE=msft;284251]Hi ,AG5BPilot
I guess Fixed issue. Problem was use cpu time clock. This version change to wall clock. And reduce CPU Time (100% to 20%). Happy New Year![/QUOTE] Happy New Year Shoichiro! Unfortunately, I just tried 1.045 and something is very wrong with it. The GPU is running at less than 60% utilization at N=8192, and of course it's taking a lot longer to run the benchmarks. The utilization does go up as N increases. The PRP test on 2030234^8192+1 takes about 1:10 on previous versions of GeneferCUDA. It took 2:01 with v1.045. Mike |
[QUOTE=msft;284251]Hi ,AG5BPilot
I guess Fixed issue. Problem was use cpu time clock. This version change to wall clock. And reduce CPU Time (100% to 20%). Happy New Year![/QUOTE] Shoichiro, I just did a bunch of research. This is somewhat convoluted. You were using clock(), which is *supposed* to return the CPU time. That made perfect sense in the CPU versions of Genefer, but not in the CUDA version. Therefore, switching to time() is the correct choice. BUT.... The Microsoft implementation of clock() actually returns the WALL TIME. Their documentation is contradictory, on one page of the same document it says "CPU TIME" and on another page it says "WALL TIME". I tested it right now, and it does, indeed report WALL TIME. (This is with the Visual Studio 2005 compiler, although I've seen identical benchmark timings from versions of GeneferCUDA compiled with lots of compilers, so I suspect they all are reporting WALL TIME.) So, switching to time() is the correct thing to do, but in practice it doesn't change anything on Windows because wall clock time was being used anyway. By the way, even though the changes in 1.045 greatly slowed down GeneferCUDA, the benchmarks are now working: [quote]Generalized Fermat Number Bench 2009574^8192+1 Time: 702 us/mul. Err: 3.82e-001 51636 digits 1632282^16384+1 Time: 717 us/mul. Err: 2.53e-001 101791 digits 1325824^32768+1 Time: 763 us/mul. Err: 1.88e-001 200622 digits 1076904^65536+1 Time: 977 us/mul. Err: 1.72e-001 395325 digits 874718^131072+1 Time: 1.1 ms/mul. Err: 3.47e-001 778813 digits 710492^262144+1 Time: 1.46 ms/mul. Err: 4.21e-001 1533952 digits 577098^524288+1 Time: 2.44 ms/mul. Err: 2.01e-001 3020555 digits 468750^1048576+1 Time: 3.91 ms/mul. Err: 1.56e-001 5946413 digits 380742^2097152+1 Time: 7.81 ms/mul. Err: 3.63e-001 11703432 digits 309258^4194304+1 Time: 19.5 ms/mul. Err: 1.48e-001 23028076 digits 251196^8388608+1 Time: 39.1 ms/mul. Err: 1.41e-001 45298590 digits[/quote] |
| All times are UTC. The time now is 05:55. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.