mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-12-09, 22:56   #100
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

88510 Posts
Default

Quote:
Originally Posted by Prime95 View Post
This is a link to the Windows executable I built for Chuck:

https://www.dropbox.com/s/sc4ib5v4f4...ime95.zip?dl=0

It contains some of the new swizzle optimizations discussed in another thread that seemed to work OK.

I'll be available for email and forum conversations, but unable to upload or download big files.
Good results continue with this version. I successfully finished a PRP test and started another with no error messages.
Chuck is offline   Reply With Quote
Old 2018-12-16, 18:06   #101
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×1,579 Posts
Default

In build 5 I had a PRP test at 88.22M where it immediately chose 4704K FFT. It had SoftCrossoverAdjust=-0.004 in prime.txt

Then another instance had a PRP test at 88.26M where it started to test 4608K FFT and when that had the average roundoff error 0.30835 it chose 4800K FFT.

Granted that instance had SoftCrossover=1.0 and SoftCrossoverAdjust=-0.008 in prime.txt, but why did it not test 4704K FFT?


I now removed all SoftCrossover and SoftCrossoverAdjust from all instances, it was a leftover from when I added it to build 3 when I got so many roundoff > 0.4 errors.
Attached Files
File Type: txt log.txt (2.1 KB, 62 views)

Last fiddled with by ATH on 2018-12-16 at 18:09
ATH is offline   Reply With Quote
Old 2018-12-16, 22:47   #102
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×1,579 Posts
Default

There might be a problem with choosing the best FFT parameters after restarting mprime, see here:
https://www.mersenneforum.org/showpo...&postcount=133
ATH is offline   Reply With Quote
Old 2019-01-03, 21:37   #103
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·3,767 Posts
Default

29.5 build 6.

1) Fixed testing tiny numbers with AVX-512 FFTs, numbers less than about 5120 bits are forced to use FMA FFTs.
2) Torture test dialog box allows picking AVX-512, AVX2, AVX, SSE2 FFTs.
3) Changed FFT crossovers. ATH, let me know if this is better.
4) Eliminated soft crossovers. Instead, I hope to get the crossovers right in the gwnum code. Soft crossovers can be re-enabled (see undoc.txt).
5) Eliminated a memory leak when a benchmark is interrupted.
6) In linux a pid file is created (mprime.pid).
7) New swizzle code is used. FFTs should be about 0% faster. Well, maybe small FFTs that run in the L2 cache will see a tiny speed bump.

Not fixed:
1) The hangs reported in benchmarking. Please try again with this release. I cannot get it to happen in Linux. Is this a Windows-only issue? Does this only happen benchmarking multi-threaded FFTs? Is CPU usage 0% at the time of the hang? Hangs are usually due to a deadlock. Benchmarks use locks to sync the start of all the worker threads. Multi-threaded FFTs also use locks to coordinate threads.

Linux 64-bit: ftp://mersenne.org/gimps/p95v295b6.linux64.tar.gz
Windows 64-bit: ftp://mersenne.org/gimps/p95v295b6.win64.zip
Prime95 is online now   Reply With Quote
Old 2019-01-04, 01:27   #104
GP2
 
GP2's Avatar
 
Sep 2003

258510 Posts
Default

Quote:
Originally Posted by Prime95 View Post
29.5 build 6.
Wait, does this build include the Chuck fix? You didn't mention anything about that (saving registers).


Quote:
Originally Posted by Prime95 View Post
I have an idea. Some routines were not saving and restoring xmm8 - xmm15 as per the Windows 64 ABI. If I can build an executable for you before I go on a cruise, you can test out that theory.
Quote:
Originally Posted by Prime95 View Post
I was wrong, xmm8-xmm15 arrived with SSE2 in 2004. Saving the registers fixed Chuck's problem.
Quote:
Originally Posted by Prime95 View Post
This is a link to the Windows executable I built for Chuck:
GP2 is offline   Reply With Quote
Old 2019-01-04, 01:47   #105
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

753410 Posts
Default

Quote:
Originally Posted by GP2 View Post
Wait, does this build include the Chuck fix? You didn't mention anything about that (saving registers).
Yes, that is included.
Prime95 is online now   Reply With Quote
Old 2019-01-04, 02:24   #106
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72×197 Posts
Default

LaurV is offline   Reply With Quote
Old 2019-01-04, 03:24   #107
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

55628 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Not fixed:
1) The hangs reported in benchmarking. Please try again with this release. I cannot get it to happen in Linux. Is this a Windows-only issue? Does this only happen benchmarking multi-threaded FFTs? Is CPU usage 0% at the time of the hang? Hangs are usually due to a deadlock. Benchmarks use locks to sync the start of all the worker threads. Multi-threaded FFTs also use locks to coordinate threads.
I did have an mprime hang with build 5, during a 8000K benchmark. It was on a 48 core dual socket Epyc system. I was testing combinations of workers and threads. I seem to remember it happening during a several workers, several threads test. It wasn't during the single worker test.
Mark Rose is offline   Reply With Quote
Old 2019-01-04, 10:00   #108
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·32·7·43 Posts
Default

Quote:
Originally Posted by Prime95 View Post
29.5 build 6.

1) Fixed testing tiny numbers with AVX-512 FFTs, numbers less than about 5120 bits are forced to use FMA FFTs.
2) Torture test dialog box allows picking AVX-512, AVX2, AVX, SSE2 FFTs.
3) Changed FFT crossovers. ATH, let me know if this is better.
4) Eliminated soft crossovers. Instead, I hope to get the crossovers right in the gwnum code. Soft crossovers can be re-enabled (see undoc.txt).
5) Eliminated a memory leak when a benchmark is interrupted.
6) In linux a pid file is created (mprime.pid).
7) New swizzle code is used. FFTs should be about 0% faster. Well, maybe small FFTs that run in the L2 cache will see a tiny speed bump.

Not fixed:
1) The hangs reported in benchmarking. Please try again with this release. I cannot get it to happen in Linux. Is this a Windows-only issue? Does this only happen benchmarking multi-threaded FFTs? Is CPU usage 0% at the time of the hang? Hangs are usually due to a deadlock. Benchmarks use locks to sync the start of all the worker threads. Multi-threaded FFTs also use locks to coordinate threads.

Linux 64-bit: ftp://mersenne.org/gimps/p95v295b6.linux64.tar.gz
Windows 64-bit: ftp://mersenne.org/gimps/p95v295b6.win64.zip
Benchmark tries on v29.5b6 on Windows X on i7-8750H;
set allowed memory to 8192M.
Set throughput benchmark to do 2048 - 32768K, 1-3,6 workers, hyperthreaded too.
First try, stalls early in 2048K. At this point there was an mfakto instance on the UHD630 IGP and mfaktc instance on the GTX1050Ti gpu. Halt mfakto on UHD630 for second try.
Another benchmark stall early in 2048k. Halt mfaktc and retry. Application crash. Retry; benchmark stall again. Try #4: nearly finished 2560K before stalling, 8 minutes is longest of 4 tries before stall or crash. Try 5: no hyperthreading, made it to 23040K before stalling. Try 6 starting from 23040K reached 32768K.

Last fiddled with by kriesel on 2019-01-04 at 10:12
kriesel is online now   Reply With Quote
Old 2019-01-04, 11:38   #109
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Not fixed:
1) The hangs reported in benchmarking. Please try again with this release. I cannot get it to happen in Linux. Is this a Windows-only issue? Does this only happen benchmarking multi-threaded FFTs? Is CPU usage 0% at the time of the hang? Hangs are usually due to a deadlock. Benchmarks use locks to sync the start of all the worker threads. Multi-threaded FFTs also use locks to coordinate threads.
I tried one additional time with 295b5. It hung fairly quickly. As before, it was on a 2560K FFT.
i7-6700K, 4300 MHz, 16 GiB RAM, 3200 MHz, Win 7 Pro

I will try the latest version when I am able to check on it sooner than after work (~10 hours.)
Attached Files
File Type: zip 2560K-FFT.zip (391 Bytes, 54 views)

Last fiddled with by kladner on 2019-01-04 at 11:42
kladner is offline   Reply With Quote
Old 2019-01-04, 16:06   #110
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011011102 Posts
Default

Aaargh! Why can't I reproduce the hang! Makes debugging much harder.

For those that can reproduce it quickly, does a normal torture test or daily work ever hang? If you add "TortureTestThreads=<# of logical cores>" to prime.txt and then run a torture test with just one worker, does that hang?

In the meantime, I'll re-examine the bench code.
Prime95 is online now   Reply With Quote
Reply



All times are UTC. The time now is 18:17.


Sun Aug 1 18:17:46 UTC 2021 up 9 days, 12:46, 0 users, load averages: 2.53, 2.84, 2.67

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.