mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-02-12, 06:40   #133
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

2×977 Posts
Default

Quote:
Originally Posted by TheJudger View Post
In some cases it misses factors when there are mutiple factors in one class close together but this is not critical. That is a known problem since the first version... This has nothing to do with the calculations itself, it is just how the results are returned from the GPU to the CPU.
But it does mean that, even if those cases are extremely (?) rare, the program can not yet be used to do trial factoring for GIMPS.

Jacob
S485122 is offline   Reply With Quote
Old 2010-02-12, 07:06   #134
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

1101011111002 Posts
Default

If if only happens with multiple factors and at least one of the factors is always found it doesn't matter. We only need one factor to skip doing LL for that exponent.
ATH is offline   Reply With Quote
Old 2010-02-12, 13:14   #135
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

111510 Posts
Default

Hi,

it can happen if there is more than one factor found at the same time. On my GTX 275 there are 30720 (30 "multiprocessors" * 256 threads per block * 4 blocks per multiprocessor) threads "in flight" at the same time. The problem is that I haven't written code to handle access to the result array. I need some kind of locking and/or atomic read/writes.

S485122: did you know that prime95 missed some factors asweel?

Oliver

Last fiddled with by TheJudger on 2010-02-12 at 13:15
TheJudger is offline   Reply With Quote
Old 2010-02-12, 13:15   #136
TimSorbet
Account Deleted
 
TimSorbet's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

11×389 Posts
Default

IIRC Prime95 always looks for the smallest factor in the bit range. If George considers that important to be a GIMPS-contributing TF algorithm/program, and this program isn't guaranteed to find the smallest of the factors, then that might be a roadblock.
TimSorbet is offline   Reply With Quote
Old 2010-02-12, 14:34   #137
kjaget
 
kjaget's Avatar
 
Jun 2005

100000012 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Siever received a nice performace improvement for free by adding "-funroll-all-loops" to the gcc options. :) (only useful for CPU-limited scenarios)

Oliver
If you're looking for a set of useful optimizations, here's what I came up with for gcc 4.4.0 :

g++ -O1 -fno-dce -fno-ipa-reference -fno-split-wide-types -fno-tree-dominator-opts -fno-tree-copyrename -fno-tree-loop-optimize -momit-leaf-frame-pointer -fno-tree-sink -fno-inline-functions-called-once -frerun-cse-after-loop -ftree-pre -fipa-cp -freorder-blocks -fipa-cp-clone -fsee -mtune=nocona -msse4.1

This gained me ~20% over default -O2 or -O3 settings.

I'm testing on my Core2Duo laptop, so obviously results will be different for different CPUs and even different versions of the compilers. Still, give it a shot and see if it helps.
kjaget is offline   Reply With Quote
Old 2010-02-12, 15:56   #138
davieddy
 
davieddy's Avatar
 
"Lucan"
Dec 2006
England

2×3×13×83 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
IIRC Prime95 always looks for the smallest factor in the bit range. If George considers that important to be a GIMPS-contributing TF algorithm/program, and this program isn't guaranteed to find the smallest of the factors, then that might be a roadblock.
The principal reason for trying potential factors in ascending order is
that the probability of it being a factor decreases with the size of the
trial factor. That this procedure finds the smallest factor is merely an
added bonus.
Besides, does P-1 necessarily find the lowest factor?

David

PS The probability of 2kp + 1 being a factor goes as 1/klogk ?

Last fiddled with by davieddy on 2010-02-12 at 16:16
davieddy is offline   Reply With Quote
Old 2010-02-12, 16:15   #139
TimSorbet
Account Deleted
 
TimSorbet's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

11×389 Posts
Default

Quote:
Originally Posted by davieddy View Post
The principal reason for trying potential factors in ascending order is
that the probability of it being a factor decreases with the size of the
trial factor. That this procedure finds the smallest factor is merely an
added bonus.
IIRC Prime95 doesn't just search it going from lowest to highest, but something based on the potential factor's value mod...something (120?). Something about being able to do it faster that way, I suppose. But I think then if it finds a factor it keeps looking for a smaller one.
Quote:
Originally Posted by davieddy View Post
Besides, does P-1 necessarily find the lowest factor?
No. Certainly not. But that is a quite different scenario, and doesn't necessarily mean that they don't care about the discovery order for factors found by TF.

Last fiddled with by TimSorbet on 2010-02-12 at 16:15
TimSorbet is offline   Reply With Quote
Old 2010-02-12, 16:30   #140
axn
 
axn's Avatar
 
Jun 2003

23×683 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
But I think then if it finds a factor it keeps looking for a smaller one
It used to ... way back when. Not anymore.
axn is offline   Reply With Quote
Old 2010-02-12, 16:39   #141
davieddy
 
davieddy's Avatar
 
"Lucan"
Dec 2006
England

194A16 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
something based on the potential factor's value mod...something (120?). Something about being able to do it faster that way, I suppose.
I implemented a TF program about five years ago, and I recall
that 120 business. I think it comes from neatly storing possible primes
mod 30 as eight bits, implicitly excluding multiples of 2,3 and 5.

Any factor found saves the LL tests and is stronger information than
"is composite".

David

Last fiddled with by davieddy on 2010-02-12 at 16:39
davieddy is offline   Reply With Quote
Old 2010-02-12, 16:46   #142
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

22×863 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
IIRC Prime95 doesn't just search it going from lowest to highest, but something based on the potential factor's value mod...something (120?). Something about being able to do it faster that way, I suppose. But I think then if it finds a factor it keeps looking for a smaller one.
It only checks factors with +-1, +-7, +-17, +-23, +-31, +-41, +-47, +-49 (mod 120), which is only 16 of the 30 classes that corresponds to +/- 1 (mod 8), so it doing it mod 120 "saves" checking factors from 14 of 30 residue classes.
ATH is offline   Reply With Quote
Old 2010-02-12, 17:28   #143
TimSorbet
Account Deleted
 
TimSorbet's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

11·389 Posts
Default

Quote:
Originally Posted by ATH View Post
It only checks factors with +-1, +-7, +-17, +-23, +-31, +-41, +-47, +-49 (mod 120), which is only 16 of the 30 classes that corresponds to +/- 1 (mod 8), so it doing it mod 120 "saves" checking factors from 14 of 30 residue classes.
Does it still cover all factors? It seems to me that if you only cover 16 of the 30 potential values, you'd be missing about half the factors.
TimSorbet is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 42 2022-12-18 05:59
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 14:21.


Fri Jul 7 14:21:27 UTC 2023 up 323 days, 11:50, 0 users, load averages: 0.82, 1.12, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔