![]() |
I get infinite loop in err=2 with exponent of 110503 where it claims to increase n from 16384 but never actually change n. I use GTX280.
|
Does anyone feel like converting this to llr? I am sure the writer of LLR wouldn't mind if you copied some of his code with permission.
|
Hi, wavelet3000
Need more information, OS and CUDA version. And please test Version "K". |
is there a windows build available somewhere?
|
1 Attachment(s)
Hi,
Version "O", fix infinite loop. |
On version K, I get "too small exponent" for 110503
Both version N and O still give infinite loop as follows: [code] Iteration 10000 M( 110503 )C, 0xffff0003ffffc07e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 20000 M( 110503 )C, 0x0000fe03f8003f7d, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 30000 M( 110503 )C, 0x1fc0fe0007e0007e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 40000 M( 110503 )C, 0xe03ffe03ffffc07e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 50000 M( 110503 )C, 0x1fc0fffc00003f7e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 60000 M( 110503 )C, 0x1fc0000007dfbf7e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 70000 M( 110503 )C, 0x0000fe0007dfbf7d, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 80000 M( 110503 )C, 0x1fc001fc07e0007e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 90000 M( 110503 )C, 0x1fff01fc0000007e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 100000 M( 110503 )C, 0x1fff0003ffe03f7d, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 110000 M( 110503 )C, 0xffff01fff800007e, n = 16384, MacLucasFFTW v8.1 Ballester err = 2, increasing n from 16384 Iteration 10000 M( 110503 )C, 0xffff0003ffffc07e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 20000 M( 110503 )C, 0x0000fe03f8003f7d, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 30000 M( 110503 )C, 0x1fc0fe0007e0007e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 40000 M( 110503 )C, 0xe03ffe03ffffc07e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 50000 M( 110503 )C, 0x1fc0fffc00003f7e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 60000 M( 110503 )C, 0x1fc0000007dfbf7e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 70000 M( 110503 )C, 0x0000fe0007dfbf7d, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 80000 M( 110503 )C, 0x1fc001fc07e0007e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 90000 M( 110503 )C, 0x1fff01fc0000007e, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 100000 M( 110503 )C, 0x1fff0003ffe03f7d, n = 16384, MacLucasFFTW v8.1 Ballester Iteration 110000 M( 110503 )C, 0xffff01fff800007e, n = 16384, MacLucasFFTW v8.1 Ballester err = 2, increasing n from 16384 [/code]OS is linux64 bit, CUDA 2.3 64bit. Could this be due to 64bit/32 bit issues? I also get warnings during compilation: [code]setup.h(191): warning: omission of exception specification is incompatible with previous function "rename" /usr/include/stdio.h(159): here setup.h(200): warning: omission of exception specification is incompatible with previous function "sscanf" /usr/include/stdio.h(413): here setup.h(275): warning: omission of exception specification is incompatible with previous function "setvbuf" /usr/include/stdio.h(313): here setup.cu(1): warning: variable "RCSsetup_c" was declared but never referenced setup.cu(5): warning: variable "RCSsetup_h" was declared but never referenced setup.h(191): warning: omission of exception specification is incompatible with previous function "rename" /usr/include/stdio.h(159): here setup.h(200): warning: omission of exception specification is incompatible with previous function "sscanf" /usr/include/stdio.h(413): here setup.h(275): warning: omission of exception specification is incompatible with previous function "setvbuf" /usr/include/stdio.h(313): here setup.cu(1): warning: variable "RCSsetup_c" was declared but never referenced setup.cu(5): warning: variable "RCSsetup_h" was declared but never referenced [/code]Thank you! |
Okay, definitely 32bit vs 64 bit issue. When I installed 32-bit CUDA toolkit and recompiled against it, the problem disappeared. As a matter of fact, I noticed now that 64-bit version, when it didn't fall into infinite loop trap, was still incorrect as it declared, for example, M44497 a composite.
|
Hello all, I getting closer to assembling my new hardware to test this. It will take about 2 more weeks but there are some things I need cleared up first.
I intend to run this on Ubuntu Server Edition 9.10. Is there a benefit to using the 64bit edition over the 32bit? Would some of the software instructions used execute faster under 64bit? I'm only going to run LL tests. Also do I need to install both the proprietary Nvidia drivers and the CUDA kit or would the CUDA kit be enough. Also if anyone is familiar with Ubuntu what packages must I install to compile and run this software? Thanks. |
[quote=wavelet3000;214124] for example, M44497 a composite.[/quote]
your system may have problem.:sad: |
[quote=odin;214128]Also if anyone is familiar with Ubuntu what packages must I install to compile and run this software? [/quote]
[URL]http://developer.nvidia.com/object/cuda_3_0_downloads.html[/URL] need Developer Drivers for Linux,CUDA Toolkit for Ubuntu Linux 9.04,GPU Computing SDK code samples and more |
1 Attachment(s)
The GTX480 arrived and is installed. However, there is a bug with the 64-bit CUDA 3.0 Linux driver and 32-bit binaries that prevented me from running 32-bit CUDA runtime binaries, and of course the code is still not 64-bit safe. So I converted verO from the runtime API to the driver API now that you can use runtime style kernel calls with the driver API. The converted MacLucasFFTW.cu file is attached. It will only work with CUDA 3.0, and not with 2.3. Note that I had to remove a bunch of safe call commands and didn't replace them with proper error checking, but I just wanted it to work! :smile: It worked correctly with a number of small known primes.
Anyway, bottom line. The 2048K FFT runs at 5.47 ms/iteration, and the 4096K FFT runs at 10.4 ms/iteration on a GTX 480. As expected, they have disabled DP units in the consumer card, so DP still runs at 1/8 SP. The speed increase is due to the increase in the number of compute cores. The Tesla version, when it is released, should run at about 4x this speed. As a check of both the code and the hardware, I'm going to run the test of 42643801 to completion. That should take a bit over 5 days. |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.