![]() |
[QUOTE=Madpoo;418308]Another (possibly?) good idea... use mlucas to replicate what's going on with Prime95/mprime ?
Ernst would have to chime in since I'm totally unfamiliar with mlucas options and whether it can be forced to use AVX (not FMA) and essentially set it up to do the same thing that Prime95 is doing when it fails. At least then with a separate code branch (but same underlying technique) it might be useful in some way. Possibly eliminate code issues of mlucas also throws rounding errors.[/QUOTE] Anyone with access to a Skylake system of the problematic kind running Linux is welcome to try it out. The auto-build setup included with the latest Mlucas release (the one which recently entered the Debian 'unstable' branch for testing) will invoke all distinct build modes (scalar-double, sse2, avx, avx2+fma) supported by the target hardware, and create a binary for each. You want the avx2+fma binary. No idea if Mlucas will hit the same issue, as its self-test setup is different and it is still somewhat less efficient than Prime95 (i.e. may not push the hardware quite as severely, if that is the cause of the issue in question). But worth a shot - here is my testing suggestion for would-be Skylake builders: Assuming you get a working avx2+fma binary, run the standard small/medium/large self-tests like so (this assumes the avx2+fma binary is called Mlucas_avx2) Mlucas_avx2 -s s -iters 1000 Mlucas_avx2 -s m -iters 1000 Mlucas_avx2 -s l -iters 1000 Once we see what happens with those, we can take it from there - closest thing to George's torture test is running an actual LL-test at the desired FFT length. George, please confirm or deny: The skylake 768K torture-test failures are using single-threaded mode? (And if so, running on just 1 core or 1 job per physical core?) |
[QUOTE=ewmayer;418313]
George, please confirm or deny: The skylake 768K torture-test failures are using single-threaded mode? (And if so, running on just 1 core or 1 job per physical core?)[/QUOTE] As far as I know, they are all single threaded, one thread per virtual core (8 threads for 4 physical cores), but fail only with hyper threading turned on. |
As I already told you in post #[URL="http://www.mersenneforum.org/showpost.php?p=418274&postcount=77"][B]77[/B][/URL] the linux test was different (4-6 tests compared to 21 tests). Nevertheless it might me a good hint because it didn't fail for ~3 hours!
settings: [url]http://www.bilder-hochladen.net/files/big/hb0a-9t-70f2.png[/url] the test looks like this: [url]http://www.bilder-hochladen.net/files/big/hb0a-9u-dbf5.png[/url] I'll try LaurV worktodo.txt next. |
[QUOTE=Aurum;418324]As I already told you in post #[URL="http://www.mersenneforum.org/showpost.php?p=418274&postcount=77"][B]77[/B][/URL] the linux test was different (4-6 tests compared to 21 tests). Nevertheless it might me a good hint because it didn't fail for ~3 hours![/QUOTE]
This triggered a recollection and I did some looking at the source code. There are three differences between version 27.9 and version 28.7: 1) There were several minor changes to the assembly macros used to build the FFTs. Thus a 27.9 AVX FFT is not identical to a 28.7 AVX FFT. 2) Due to the minor changes above, AVX FFTs were rebenchmarked. For the 768K AVX FFT, a different implementation was found to be faster. In version 27.9, prime95 breaks up 768K into 512 in pass 1 and 1536 in pass 2. In version 28.7, prime95 breaks up 768K into 768 in pass 1 and 1024 in pass 2. What this means is that the two versions are stress testing using a completely different code path. And it has been reported that both fail. 3) From whatsnew.txt on version 28: All new test torture test data for AVX CPUs. The new data runs more iterations, thus more time is spent torturing the CPU rather than initializing the FFT routines. Also the default time to run each FFT length was reduced from 15 minutes to 3 minutes. |
[QUOTE=Prime95;418358]This triggered a recollection and I did some looking at the source code.
There are three differences between version 27.9 and version 28.7: 1) There were several minor changes to the assembly macros used to build the FFTs. Thus a 27.9 AVX FFT is not identical to a 28.7 AVX FFT. 2) Due to the minor changes above, AVX FFTs were rebenchmarked. For the 768K AVX FFT, a different implementation was found to be faster. In version 27.9, prime95 breaks up 768K into 512 in pass 1 and 1536 in pass 2. In version 28.7, prime95 breaks up 768K into 768 in pass 1 and 1024 in pass 2. What this means is that the two versions are stress testing using a completely different code path. And it has been reported that both fail. 3) From whatsnew.txt on version 28: All new test torture test data for AVX CPUs. The new data runs more iterations, thus more time is spent torturing the CPU rather than initializing the FFT routines. Also the default time to run each FFT length was reduced from 15 minutes to 3 minutes.[/QUOTE] That definitely seems to point to a hardware failure then, since both are failing. But it's still incredibly strange that only 768K fails. Were there any other FFT lengths whose code path changed between the two versions? |
[QUOTE=Aurum;418324]As I already told you in post #[URL="http://www.mersenneforum.org/showpost.php?p=418274&postcount=77"][B]77[/B][/URL] the linux test was different (4-6 tests compared to 21 tests). Nevertheless it might me a good hint because it didn't fail for ~3 hours!.[/QUOTE]
You should try version 27.9 of mprime. That version seems to fail very reliably for you in Windows. Download it from [url]ftp://mersenne.org/gimps[/url] |
[QUOTE=Prime95;418364]You should try version 27.9 of mprime. That version seems to fail very reliably for you in Windows.[/QUOTE]
[QUOTE=Xyzzy;418236][CODE]wget http://www.mersenneforum.org/gimps/p95v287.linux64.tar.gz[/code][/QUOTE] :redface: |
[QUOTE=Prime95;418364]You should try version 27.9 of mprime. That version seems to fail very reliably for you in Windows. Download it from [URL]ftp://mersenne.org/gimps[/URL][/QUOTE]
A worker stopped after 11 minutes: [url]http://www.bilder-hochladen.net/files/big/hb0a-9v-a59d.png[/url] |
[QUOTE=Aurum;418377]A worker stopped after 11 minutes: [url]http://www.bilder-hochladen.net/files/big/hb0a-9v-a59d.png[/url][/QUOTE]
This is excellent. We've potentially eliminated one variable (OS). Is there any chance your friends on your forum could run the same test to expand the sample space with their different hardware configurations? Convergence.... |
[QUOTE=chalsall;418384]Convergence....[/QUOTE]
I agree. We started out strongly suspecting it was the CPU. Since then we've ruled out the RAM, RAM manufacturer, OS, and even a good chunk of prime95 code. Best now is to either (or both) rattle someone's cage at Intel or (and) get the ASRock engineer to reproduce it and rattle their Intel contact's cage. Alas, neither is likely to happen until after the weekend. On another note, have you wondered how Intel would go about finding the cause? What a daunting task that must be. |
[QUOTE=Prime95;418389]On another note, have you wondered how Intel would go about finding the cause? What a daunting task that must be.[/QUOTE]
If I may share... I once spent a week at Intel. I made the mistake of eating a burrito just before I made my presentation in my cubicle. Everyone was very polite. But even I found it very smelly. I was invited to other's cubicals afterwards. Therein I saw experimental equipment which costs tens if not hundreds of thousands of dollars. This is a true story. :smile: |
| All times are UTC. The time now is 23:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.