mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-09-14, 01:40   #122
KingKurly
 
KingKurly's Avatar
 
Sep 2010
Annapolis, MD, USA

BD16 Posts
Default

I was able to test the Linux build as a success:

Code:
Selftest statistics
  number of tests           3332
  successfull tests         3332

selftest PASSED!


real    43m31.235s
user    9m22.360s
sys    31m10.151s
I did find it a bit strange that test cases 1552 through 1557 had no output to the terminal, but I will transition to using 0.08 now and I will begin submitting results to PrimeNet as they become available.

Do we need to redo "no factor" work that was done under 0.07 or can those be submitted? I have not downloaded the source to check a diff to see if it is reasonable thing to do.
KingKurly is offline   Reply With Quote
Old 2011-09-14, 08:53   #123
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Quote:
Originally Posted by KingKurly View Post
I was able to test the Linux build as a success:
Code:
selftest PASSED!
Great to see!
Quote:
Originally Posted by KingKurly View Post
I did find it a bit strange that test cases 1552 through 1557 had no output to the terminal, but I will transition to using 0.08 now and I will begin submitting results to PrimeNet as they become available.
The reason is, that test cases 1552 to 1557 are for factors of more than 91 bits. I had removed the 95-bit kernel because it was so terribly slow that you would not want to use it anyway. Therefore, mfakto currently only supports TF up to 91 bits.

Quote:
Originally Posted by KingKurly View Post
Do we need to redo "no factor" work that was done under 0.07 or can those be submitted? I have not downloaded the source to check a diff to see if it is reasonable thing to do.
In version 0.07, the single-vectored MUL24 kernel did not work with Catalyst 11.8. In your self-compiled version you removed that kernel from the selftest, but not from the program. If you never changed the mfakto.ini-Parameter VectorSize (i.e. if you left it at 4), then that faulty kernel has not been used and you can submit the previous results without re-running them.
Bdot is offline   Reply With Quote
Old 2011-09-14, 14:50   #124
KingKurly
 
KingKurly's Avatar
 
Sep 2010
Annapolis, MD, USA

33×7 Posts
Default

Quote:
Originally Posted by Bdot View Post
In version 0.07, the single-vectored MUL24 kernel did not work with Catalyst 11.8. In your self-compiled version you removed that kernel from the selftest, but not from the program. If you never changed the mfakto.ini-Parameter VectorSize (i.e. if you left it at 4), then that faulty kernel has not been used and you can submit the previous results without re-running them.
I have confirmed that the VectorSize never changed from 4. I am submitting the results, "closing the book" on 0.07, and moving to 0.08. Thank you so much!
KingKurly is offline   Reply With Quote
Old 2011-09-14, 16:03   #125
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by KingKurly View Post
moving to 0.08. Thank you so much!
Did you already check if mfakto_cl_barrett79 or mfakto_cl_71 are faster on your GPU? I'm really interested to see the two compared on different GPUs. On my HD5770, barrett is about 10% faster ...
Bdot is offline   Reply With Quote
Old 2011-09-14, 16:28   #126
Razor_FX_II
 
Razor_FX_II's Avatar
 
Jan 2009

43 Posts
Default

Quote:
Originally Posted by Bdot View Post
Did you already check if mfakto_cl_barrett79 or mfakto_cl_71 are faster on your GPU? I'm really interested to see the two compared on different GPUs. On my HD5770, barrett is about 10% faster ...
Using mfakto-0.08 on my HD4870's and HD4890's mfakto_cl_barrett79 is about 10% faster.
mfakto_cl_barrett79 avg rate: 55M/s
mfakto_cl_71 avg rate: 50M/s
Razor_FX_II is offline   Reply With Quote
Old 2011-09-14, 17:53   #127
KingKurly
 
KingKurly's Avatar
 
Sep 2010
Annapolis, MD, USA

BD16 Posts
Default

Quote:
Originally Posted by Bdot View Post
Did you already check if mfakto_cl_barrett79 or mfakto_cl_71 are faster on your GPU? I'm really interested to see the two compared on different GPUs. On my HD5770, barrett is about 10% faster ...
I am finding similar results. My HD5450 seems to do about 8.6M/s on the mfakto_cl_71 and about 9.1M/s on the mfakto_cl_barrett79, doing TF on M41774351 from 68 to 69.
KingKurly is offline   Reply With Quote
Old 2011-09-14, 23:13   #128
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

32×7×59 Posts
Default

Since it appeared to be missing, I've created a stub article on MersenneWiki for mfakto:
http://www.mersennewiki.org/index.php/Mfakto

But since I don't actually use mfakto, perhaps someone else could fill in and fix all the details in the article.
James Heinrich is offline   Reply With Quote
Old 2011-09-20, 15:33   #129
DigiK-oz
 
Jul 2008

308 Posts
Default

In the GPUGRID forum :

there's a bug in the latest sdk that makes a full use of a cpu-core whenever an opencl app is running.
They promised a fix, but still not here in 11.8
maybe in 11.9??

Maybe mfakto suffers from this as well? One of the threads using 100% of one cpu happens to be in the ATI libs....
DigiK-oz is offline   Reply With Quote
Old 2011-09-20, 18:23   #130
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by DigiK-oz View Post
In the GPUGRID forum :

there's a bug in the latest sdk that makes a full use of a cpu-core whenever an opencl app is running.
They promised a fix, but still not here in 11.8
maybe in 11.9??

Maybe mfakto suffers from this as well? One of the threads using 100% of one cpu happens to be in the ATI libs....
They seem to have implemented some kind of busy-wait (futex-based) whenever something needs to be synchronized with the GPU. As this is usually the CPU just waiting for the GPU to complete something, that is a total waste of CPU resources.

However, mfakto is not hit that badly as mfakto passes the prepared factor candidates to the GPU but does not wait for the results immediately. Instead, the next block of factor candidates is prepared on the CPU. Only when the CPU is faster preparing the stuff than the GPU can process it, then mfakto will synchronize with the GPU. And of course at the end of a class.

So yes, mfakto will also consume a full CPU core, but it will do something useful most of that time.
Bdot is offline   Reply With Quote
Old 2011-09-20, 18:33   #131
Samoflan
 
Jan 2010

510 Posts
Default

Quote:
Originally Posted by Razor_FX_II View Post
Using mfakto-0.08 on my HD4870's and HD4890's mfakto_cl_barrett79 is about 10% faster.
mfakto_cl_barrett79 avg rate: 55M/s
mfakto_cl_71 avg rate: 50M/s
I get similar results on my HD4890

mfakto_cl_barrett79 avg rate: 51.9M/s
mfakto_cl_71 avg rate: 48.7M/s

GPU load is 91-95%
CPU load will almost cap out 2 cores on my Phenom II x4 955
Samoflan is offline   Reply With Quote
Old 2011-09-26, 19:14   #132
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default Bug warning

I´m sorry to report: yesterday I found a bug, mfakto up to 0.08 does not find the factor for k=3 for M6599953.

The reason is an invalid "optimization" that I made over the mfaktc-code. Mfaktc does not have this problem. I have fixed the bug and added a test case for it to the selftests.

The mfakto kernel "mfakto_cl_71" (all vector sizes) sometimes calculated a bad modulus when the factor candidate was <248. Smaller FCs (~224) had a higher chance for the error to occur, FCs >248 were always calculated correctly. The problem does not depend on the exponent size.

I´m sorry for possibly having wasted effort and resources, but I hope it´s not too many tests that need to be repeated as it´s only about small FCs. I will provide a fixed version within the next few days.
Bdot is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2760 2022-05-15 00:00
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3541 2022-04-21 22:37
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 18:37.


Wed May 25 18:37:26 UTC 2022 up 41 days, 16:38, 0 users, load averages: 0.99, 1.19, 1.28

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔