mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-02-05, 10:13   #122
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

5×7×139 Posts
Default

Here it is

http://www.moregimps.it/billion/expo_f.php

Let me know if you need it in some particular format.

BTW, 5 minutes from 2^50 to 2^71 is amazing!

Luigi
ET_ is offline   Reply With Quote
Old 2010-02-05, 13:32   #123
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

5×223 Posts
Default

Hi Luigi,

that's exactly what I was looking for. :)

I had this one in my bookmarks: http://home.earthlink.net/~elevensmooth/Billion.html

Oliver

Last fiddled with by TheJudger on 2010-02-05 at 13:33
TheJudger is offline   Reply With Quote
Old 2010-02-06, 01:03   #124
RichD
 
RichD's Avatar
 
Sep 2008
Kansas

75618 Posts
Default

TheJudger,

Does this require a 64-bit CUDA library? I'm on a 64-bit OS (Mac - Snow Leopard) but Nvidia does not have the Mac libraries at 64-bit (still at 32-bit). I noticed some old post of yours in the Math forum asking questions about 32-bit integers.

RichD.
RichD is offline   Reply With Quote
Old 2010-02-06, 16:24   #125
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

5×223 Posts
Default

Hi RichD,

I'm running in 64bit, but I think that msft did some tests on a 32bit Ubuntu.
The size of the adress room doesn't affect the size of the datatypes (usually ;)).
E.g. on a 32bit OS there are 64bit ints aswell. And on x86 you have 80 bit floats, too.

It _SHOULD_ run on 32bit OS but I haven't checked. If I remember correctly the siever runs a bit slower in 32bit mode. :/

Oliver
TheJudger is offline   Reply With Quote
Old 2010-02-07, 12:06   #126
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3·199 Posts
Default

Quote:
Originally Posted by TheJudger View Post
The size of the adress room doesn't affect the size of the datatypes (usually ;)).
long is considered as 32-bit on both Windows 32- and 64-bit (LLP64) while it is considered as 32-bit on Linux 32-bit and 64-bit on Linux 64-bit (LP64).
OTOH, I can't say if your program uses long :)
ldesnogu is offline   Reply With Quote
Old 2010-02-09, 08:51   #127
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

21338 Posts
Default

Hi!

ldesnogu: right, I've forgotten "long". :/
-----
The (not yet released) 0.05 is faster again. Raw speed on my GTX 275: ~73M candidates per second for M66362159 above 64 bits.

Single Process
THREADS_PER_GRID: 30 * 2^15 # this is specific for my GTX 275 since it has 30 multiprocessors
THREADS_PER_BLOCK: 256
SIEVE_PRIMES: 22500 # siever becomes limiting on my system... again

M66362159 from 2^ 1 to 2^64: 98386msec
M66362159 from 2^64 to 2^65: 90711msec
M66362159 from 2^65 to 2^66: 177915msec
M66362159 from 2^66 to 2^67: 353126msec
-----
No more ptx hacking needed!
Code:
__device__ unsigned int __umul24hi(unsigned int a, unsigned int b)
{
  unsigned int r;
  asm("mul24.hi.u32 %0, %1, %2;" : "=r" (r) : "r" (a) , "r" (b));
/* _SLOW_ workaround if inline assembly above doesn't work (e.g. device emulation)*/
  //  r = (__umul24(a,b) >> 16) + (__umulhi(a&0xFFFFFF,b&0xFFFFFF)<<16);
  return r;
}
This works fine on Linux - CUDA toolkit 2.3. Hopefully this is possible on Windows, too.
AFAIK inline assembly is a unsupported feature of nvcc but I think it is better this way.

Oliver
TheJudger is offline   Reply With Quote
Old 2010-02-09, 11:39   #128
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

486510 Posts
Default

I hope you will include a short HOWTO_compile.txt to your next release...

Luigi
ET_ is offline   Reply With Quote
Old 2010-02-09, 12:23   #129
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

100010110112 Posts
Default

Hi Luigi,

for the upcomming 0.05 on Linux with a proper installed CUDA Toolkit:
Code:
./compile.sh
;)

Oliver
TheJudger is offline   Reply With Quote
Old 2010-02-09, 17:46   #130
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3×23×89 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi Luigi,

for the upcomming 0.05 on Linux with a proper installed CUDA Toolkit:
Code:
./compile.sh
;)

Oliver
have you changed the path to the more standard destination then?
henryzz is offline   Reply With Quote
Old 2010-02-11, 08:53   #131
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

111510 Posts
Default

Hi,

good news:
Yesterday I've added more than 200 known factors to the selftest.
Every single factor was verified using my code. :)
In some cases it misses factors when there are mutliple factors in one class close together but this is not critical. The is a known problem since the first version... This has nothing to do with the calculations itself, it is just how the results are returned from the GPU to the CPU.
-----
Raw speed on my GTX 275 for M66362159 above 64 bits: ~74M candidates per second.

Siever received a nice performace improvement for free by adding "-funroll-all-loops" to the gcc options. :) (only useful for CPU-limited scenarios)

Oliver

Last fiddled with by TheJudger on 2010-02-11 at 08:54
TheJudger is offline   Reply With Quote
Old 2010-02-11, 21:16   #132
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

5×7×139 Posts
Default



Luigi
ET_ is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 42 2022-12-18 05:59
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 14:21.


Fri Jul 7 14:21:21 UTC 2023 up 323 days, 11:49, 0 users, load averages: 0.89, 1.13, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔