![]() |
|
|
#122 |
|
Banned
"Luigi"
Aug 2002
Team Italia
5×7×139 Posts |
Here it is
![]() http://www.moregimps.it/billion/expo_f.php Let me know if you need it in some particular format. BTW, 5 minutes from 2^50 to 2^71 is amazing! Luigi |
|
|
|
|
|
#123 |
|
"Oliver"
Mar 2005
Germany
5×223 Posts |
Hi Luigi,
that's exactly what I was looking for. :) I had this one in my bookmarks: http://home.earthlink.net/~elevensmooth/Billion.html Oliver Last fiddled with by TheJudger on 2010-02-05 at 13:33 |
|
|
|
|
|
#124 |
|
Sep 2008
Kansas
75618 Posts |
TheJudger,
Does this require a 64-bit CUDA library? I'm on a 64-bit OS (Mac - Snow Leopard) but Nvidia does not have the Mac libraries at 64-bit (still at 32-bit). I noticed some old post of yours in the Math forum asking questions about 32-bit integers. RichD. |
|
|
|
|
|
#125 |
|
"Oliver"
Mar 2005
Germany
5×223 Posts |
Hi RichD,
I'm running in 64bit, but I think that msft did some tests on a 32bit Ubuntu. The size of the adress room doesn't affect the size of the datatypes (usually ;)). E.g. on a 32bit OS there are 64bit ints aswell. And on x86 you have 80 bit floats, too. It _SHOULD_ run on 32bit OS but I haven't checked. If I remember correctly the siever runs a bit slower in 32bit mode. :/ Oliver |
|
|
|
|
|
#126 | |
|
Jan 2008
France
3·199 Posts |
Quote:
OTOH, I can't say if your program uses long :) |
|
|
|
|
|
|
#127 |
|
"Oliver"
Mar 2005
Germany
21338 Posts |
Hi!
ldesnogu: right, I've forgotten "long". :/ ----- The (not yet released) 0.05 is faster again. Raw speed on my GTX 275: ~73M candidates per second for M66362159 above 64 bits. Single Process THREADS_PER_GRID: 30 * 2^15 # this is specific for my GTX 275 since it has 30 multiprocessors THREADS_PER_BLOCK: 256 SIEVE_PRIMES: 22500 # siever becomes limiting on my system... again M66362159 from 2^ 1 to 2^64: 98386msec M66362159 from 2^64 to 2^65: 90711msec M66362159 from 2^65 to 2^66: 177915msec M66362159 from 2^66 to 2^67: 353126msec ----- No more ptx hacking needed! Code:
__device__ unsigned int __umul24hi(unsigned int a, unsigned int b)
{
unsigned int r;
asm("mul24.hi.u32 %0, %1, %2;" : "=r" (r) : "r" (a) , "r" (b));
/* _SLOW_ workaround if inline assembly above doesn't work (e.g. device emulation)*/
// r = (__umul24(a,b) >> 16) + (__umulhi(a&0xFFFFFF,b&0xFFFFFF)<<16);
return r;
}
AFAIK inline assembly is a unsupported feature of nvcc but I think it is better this way. Oliver |
|
|
|
|
|
#128 |
|
Banned
"Luigi"
Aug 2002
Team Italia
486510 Posts |
I hope you will include a short HOWTO_compile.txt to your next release...
![]() Luigi |
|
|
|
|
|
#129 |
|
"Oliver"
Mar 2005
Germany
100010110112 Posts |
Hi Luigi,
for the upcomming 0.05 on Linux with a proper installed CUDA Toolkit: Code:
./compile.sh Oliver |
|
|
|
|
|
#130 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×23×89 Posts |
|
|
|
|
|
|
#131 |
|
"Oliver"
Mar 2005
Germany
111510 Posts |
Hi,
good news: Yesterday I've added more than 200 known factors to the selftest. Every single factor was verified using my code. :) In some cases it misses factors when there are mutliple factors in one class close together but this is not critical. The is a known problem since the first version... This has nothing to do with the calculations itself, it is just how the results are returned from the GPU to the CPU. ----- Raw speed on my GTX 275 for M66362159 above 64 bits: ~74M candidates per second. Siever received a nice performace improvement for free by adding "-funroll-all-loops" to the gcc options. :) (only useful for CPU-limited scenarios) Oliver Last fiddled with by TheJudger on 2010-02-11 at 08:54 |
|
|
|
|
|
#132 |
|
Banned
"Luigi"
Aug 2002
Team Italia
5×7×139 Posts |
![]() Luigi |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1724 | 2023-06-04 23:31 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 42 | 2022-12-18 05:59 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |