mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > And now for something completely different

Reply
 
Thread Tools
Old 2015-01-10, 08:28   #34
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36×13 Posts
Thumbs up

I had to recompile it (for some reason this binary doesn't produce factors, even though it is doing something on CPU - but not on GPU).
After recompilation, looks very nice; I am running the 0-1P range again and will compare to the earlier results.
Batalov is offline   Reply With Quote
Old 2015-01-11, 15:53   #35
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default

So, a first (re)attempt at getting linux to work. Removed all asm code so that a single code base will work under both Win and Linux.

Haven't compiled/tested under linux, so if someone can try it out.... ?

There will be performance regression, as CPU code will be much slowed, but that doesn't matter. If it works under Linux, I'll gradually add back the assembly routines.

EDIT:- Per usual, a Win32 compile under CUDA 3.2 is included.
Attached Files
File Type: zip CycloNoAsm.zip (211.2 KB, 203 views)

Last fiddled with by axn on 2015-01-11 at 15:54
axn is online now   Reply With Quote
Old 2015-01-11, 19:07   #36
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

250516 Posts
Default

Ok, will test.
Batalov is offline   Reply With Quote
Old 2015-01-15, 01:31   #37
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36×13 Posts
Default

Sorry that it took me so long to get to it.
It works fine under linux. The factors are all valid and none are missed (compared to the win version).

Could you please merge plain source and asm, and put multiple "#if 0 {... block of asm code} #else {block of c code} #endif" and I will gladly flip 0 to 1s for as many combinations as necessary to get to the bug.
Batalov is offline   Reply With Quote
Old 2015-01-15, 03:34   #38
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default

Quote:
Originally Posted by Batalov View Post
It works fine under linux. The factors are all valid and none are missed (compared to the win version).
Phew! That's a relief.

Quote:
Originally Posted by Batalov View Post
Could you please merge plain source and asm, and put multiple "#if 0 {... block of asm code} #else {block of c code} #endif" and I will gladly flip 0 to 1s for as many combinations as necessary to get to the bug.
I will. But probably will tweak the plain code first before reintroducing the asm. Most of the asm were not in performance critical path; they were there to make implementing multi-precision arithmetic easier! (who'd have thought that assembly is _easier_ than plain C).

Meanwhile, could you compare the performance of the C version vs ASM version under Windows? I'd like to have hard numbers on how much performance is lost (or not) on mid/high end GPUs. If it is < 1-2% (especially when running multiple instances per GPU), then I'd rather not introduce assembly at all.

[Note to self - try out MPIR/GMP someday]
axn is online now   Reply With Quote
Old 2015-01-15, 06:24   #39
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36·13 Posts
Thumbs up

I don't have any dual boot machines. (And I have very different cards on one side and on the other.)
But the speed on linux "looks" fine. Your idea to leave it in C sounds fine to me.

I only tried the problematic 0-1P region once again, and then 1P-3P (and they ran in parallel with some other programs, too) - they finished fast; "organoleptically" similar to what I'd expect (from having run sieving on Windows up to 350P; I know how frequently I had to restart binaries). For the accuracy point, I've compared the outputs of 0-1P that I already had. After sorting (because the factors are dumped in differently sized chunks and are not ordered by design), the files match precisely, byte to byte. For validity, all checked with reformatter_in_perl piped to gp (I prepare x=Mod(b,f), then check that x^2n - x^n +1 = 0, obviously).
Batalov is offline   Reply With Quote
Old 2015-01-15, 07:10   #40
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default

Quote:
Originally Posted by Batalov View Post
I don't have any dual boot machines.
Actually, I was looking for asm (0.2)/windows vs plain C (0.3)/windows. Maybe one pair of results with n=18, and another pair with n=22, at say, 1000-1001 range.

EDIT:- Looking for speed as reported by the program. Also, 1 vs 2 instances running.

Linux should be approximately equal to Windows, but even if it isn't, there isn't much that I can do :-(

Last fiddled with by axn on 2015-01-15 at 07:12
axn is online now   Reply With Quote
Old 2015-01-15, 07:42   #41
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36·13 Posts
Default

Ah! Right, I forgot that there was a Windows binary included.
This test makes sense.

The asm code does have influence. Let's have a look case-by-case: CPU is 1055T and GPU is a 570 OC.
n=18 1001 B12: with 0.3) 45.4P/day single; 2*37.7P/day dual; CPU is busy, GPU is cold (53C)
n=18 1001 B12: with 0.2) 59.4P/day single; 2*46P/day dual; CPU is busy, GPU is a bit warmer (55C)

n=22 1001 B12: with 0.3) 137.1P/day single; 2*83.8P/day dual; CPU load is much less, GPU is warm (63C)
n=22 1001 B12: with 0.2) 171P/day single; 2*94.8P/day dual; CPU load is lesser still, GPU is warmer (65C)

For full load, I use n=18, 350P(going up) + BOINC genefer, CPU load is 100% and T = 66-67C
(I've recently replaced fans; T used to be 72C+ and started climbing to 78 with an occasional butterfly rattle)

Last fiddled with by Batalov on 2015-01-15 at 08:55
Batalov is offline   Reply With Quote
Old 2015-02-01, 17:20   #42
axn
 
axn's Avatar
 
Jun 2003

116738 Posts
Default

Version 0.4

Highlights include:
1) Bmax = 1e8 (for PG). Had to increase the buffer size another 10x for 0-1p range, and yet, that will overflow as well if Block parameter above 8 is used :-(
2) Performance improvements compared to NoAsm. 16-18 should be as fast or faster than the last asm version. Still not as fast as the asm version at higher n's, but the gap should have been significantly narrowed.

Needed:
Performance figures under windows.
Build & regression testing under linux.

Per usual, source and windows build attached.

If everything checks out, this will be the "production" version. I'll work on more performance improvements for higher n's, but hopefully, those won't be needed for a while.
Attached Files
File Type: zip cyclosv0.4.zip (80.4 KB, 192 views)
axn is online now   Reply With Quote
Old 2015-02-01, 18:03   #43
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36·13 Posts
Thumbs up works fine!

Tested "18 1001 1002 B12" and "18 1002 1003 B12"on Windows and linux.

The speed is volatile on both platforms but generally similar (and similar to earlier runs: 58P/day single-load; 2*48P/day dual-load).
The output is identical (after removing Windows-style line breaks, and sorting each file [because factor dumps are asynchronous]).
Good stuff! Kudos!

Last fiddled with by Batalov on 2015-02-02 at 03:17
Batalov is offline   Reply With Quote
Old 2015-02-02, 13:51   #44
axn
 
axn's Avatar
 
Jun 2003

116738 Posts
Default

Many thanks! I'll post in the PG thread and let them know that the sieve is ready for use.
axn is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime 95 and internet connection issue Jwb52z Software 10 2013-01-30 01:09
Twin prime search? MooooMoo Twin Prime Search 115 2010-08-29 17:38
Prime Search at School Unregistered Information & Answers 5 2009-10-15 22:44
Prime Search on PS-3? Kosmaj Riesel Prime Search 6 2006-11-21 15:19
Running prime on PC without internet-connection Ferdy Software 3 2006-04-25 08:53

All times are UTC. The time now is 17:21.


Fri Jul 16 17:21:05 UTC 2021 up 49 days, 15:08, 1 user, load averages: 1.39, 1.66, 1.65

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.