![]() |
[QUOTE=kriesel;507560]Updated TIghtVNC to latest available v2.8.11, on i7-8750H laptop (VNC server) and on VNC client box. This is after a Win 10 update and restart, and a BIOS update and other laptop manufacturer updates to current, and a system restart. After that, again, prime95 v29.5b9 had a stall early in a user-initiated throughput benchmark, within 10 minutes of start.[/QUOTE]
The screenshot won't be much help in solving the problem. Why not try Mysticial's idea? You wrote that the dump file is too big for email, but you can easily upload it for free to Dropbox or some similar cloud storage where George can access it. [QUOTE=Mysticial;505246]Here's an idea: If anybody manages to make it hang in Windows again. Open up Task Manager, right-click on the process and "Create dump file". Then send it to George. If George has the debug symbols for the binary, he should be able to load it up in Visual Studio and probe the stacks for every single thread that's alive to see what's waiting on what. That might be enough to figure out what the deadlock/hang is.[/QUOTE] [QUOTE=kriesel;505274]Thanks, and managing to make it hang is no issue, very reliable on my i7-8750H, making the dump file seems easy enough, but 155MB raw, and even 55MB compressed, it's too big for email etc.[/QUOTE] As an alternative, on Linux you can get a core dump of a running program and then stack trace all the threads. See for instance [URL="https://mersenneforum.org/showthread.php?p=506780&postcount=179"]this post[/URL]. You could try activating Windows Subsystem for Linux (WSL) on your Windows 10 box, and download a Linux distro for free from the Windows store, then try running the Linux64 version of mprime and see if you can reproduce the same problem. Note WSL is not a virtual machine, it's more like a container, so the Linux64 version of mprime should run at full speed. However, Mysticial's idea is much more straightforward, and the large size of the dump file shouldn't be an issue. |
[QUOTE=kriesel;507499]
[url]https://www.mersenneforum.org/showpost.php?p=496446&postcount=694[/url] says PRP-1 can only produce type 4 residues. What I observed with a single run was V4.3 produced a type 4 residue even when P-1 is not involved. See also [url]https://www.mersenneforum.org/showpost.php?p=468378&postcount=209[/url] I don't know if V6 returns to the preferred standard type 1 residue.[/QUOTE] Would there be any advantage to switch from gpuowl v5.0 to v3.8 ( other than a tiny increase in speed that v3,8 offers ) so my results produce type=1 residues ? i.e. are type=1 residues more desirable than type=4? |
[QUOTE=tServo;507574]Would there be any advantage to switch from gpuowl v5.0 to v3.8 ( other than a tiny increase in speed that v3,8 offers ) so my results produce type=1 residues ? i.e. are type=1 residues more desirable than type=4?[/QUOTE]
I'd find either of those reasons persuasive. Although, speedup with V3.8 is not guaranteed, and depends on your gpu model and perhaps what fft lengths you are running at and OS, driver, etc. I have my AMD gpu system set up so I have multiple versions ready to run. Try it. I think prime95 does type one PRP3, with pseudorandom shift. As we saw recently, puowl does type 1 or 4 PRP3 depending on version, with zero shift (leaving out the complication of PRP/P-1 in some versions of gpuowl using a different base than 3). For double check residues to be comparable, they need to be same PRP type. And also same base? And it is preferable the shifts be different, so that some potential errors don't affect the two separate runs the same way. Therefore, it's not ideal to run gpuowl type4 PRP to double check gpuowl type4 PRP; they'd have the same zero shift. (I'm not clear on what else is available to DC a type 4 PRP. Maybe prime95 can do it if the user specifies.) |
[QUOTE=R. Gerbicz;507449]ps. Ofcourse at least at the first exponent the third run, that used type=4 was useless, that can't verify a type=1 result. Not to mention that you can't use type=2,4 for a fast PRPC test.[/QUOTE]
It would be easier to calculate also in all 5 types the a^(mp-1) mod mp residue (res64 or better res2048), that means 0-1-very few more iterations. As I can remember the gpuowl is calculating also this residue with a=3. [QUOTE=GP2;507466]As of today the exponent [M]79075979[/M] is confirmed bad, so we have a type-1 PRP test with a bad residue![/QUOTE] The other five has already a non-matching (masked) residue, so even without a triple check we know that those tests failed (most likely "only" Simon's tests). |
[QUOTE=kriesel;505274]Therefore, it's not ideal to run gpuowl type4 PRP to double check gpuowl type4 PRP; they'd have the same zero shift. (I'm not clear on what else is available to DC a type 4 PRP. Maybe prime95 can do it if the user specifies.)[/QUOTE]
Yes Prime95/mprime can do type 4. If it's an automatic assignment of a PRP double-check ([c]WorkPreference=151[/c] in prime.txt), it will set the residue type to be the same as that of the first-time PRP test. |
[QUOTE=GP2;507566]The screenshot won't be much help in solving the problem.[/QUOTE]
I don't think this will be hard to track down. |
[QUOTE=GP2;507566]The screenshot won't be much help in solving the problem.
Why not try Mysticial's idea? You wrote that the dump file is too big for email, but you can easily upload it for free to Dropbox or some similar cloud storage where George can access it.[/QUOTE]May return to that. I offered George the choice of naming an existing ftp or dropbox location I could give it to him at, and never heard back. I don't have one of either set up currently, and easily find other ways to be busy. Probably the same is true for George. [QUOTE]As an alternative, on Linux you can get a core dump of a running program and then stack trace all the threads. See for instance [URL="https://mersenneforum.org/showthread.php?p=506780&postcount=179"]this post[/URL]. You could try activating Windows Subsystem for Linux (WSL) on your Windows 10 box, and download a Linux distro for free from the Windows store, then try running the Linux64 version of mprime and see if you can reproduce the same problem. Note WSL is not a virtual machine, it's more like a container, so the Linux64 version of mprime should run at full speed. However, Mysticial's idea is much more straightforward, and the large size of the dump file shouldn't be an issue.[/QUOTE]I don't know if the same system has the issue in linux as in Windows. Either way, figuring out what is going on with the Windows instance is the priority. Recently I've been focussing on ruling out unapplied updates as possible causes there, partly because of repeated questions to me about that. I think use of WSL might create as many questions as it answers. If there's still an issue, is that because of the underlying Windows, linux fans may ask or presume. Running mprime on WSL on Windows 10 is not the case I'm trying to help solve. Running prime95 on bare Win 10 is. I'm hesitant to complicate the environment or proliferate cases. I have a bootable USB stick for Ubuntu 18.04. I built it and tried it after Chalsall's suggestion to use that on the problem laptop. Well, that was a big waste of time. The trial mode recommended treated both the hard drives in the laptop, and the USB stick also, as readonly. Even linux mprime images deposited on the empty hard drive and double checked unrestricted before and after, were readonly during the linux boot (just r, not execute, modify or delete) and could not be changed permissions or run. Mprime runs could not be run, through the method recommended for attempting it. Either getting through linux, a very foreign environment, or getting a dump file uploaded, is problematic with my crappy slow ISP. Individual web pages time out more often than load. Downloads and uploads frequently fail. DVD size downloads are impossible. Even tiny downloads fail. Maybe I'll need to haul it into town to the library. It's taken hours just to look into a couple of online storage choices. Problem solving or relearning of linux involves a lot of note taking because the internet connection is so slow I often forget what I was looking up or why by the time a search returns results. Everything in linux seems to me to be named wrong or awkward to use. That and unfamiliarity makes it orders of magnitude slower to use. Maybe I spent too many decades in VMS, RSX, DOS from 1.1, and Windows from 3.0 up. It may not sound like it, but I appreciate you taking the time to write about alternatives. It would be good to have it resolved. |
This has happened a few times in 29.5b9, but only to my 2 PRP CF DC instances:
[CODE][Comm thread Feb 5 09:54:40] PRP test successfully completes double-check of M6946969 -- [Comm thread Feb 5 09:54:40] CPU credit is 1.4084 GHz-days. [Comm thread Feb 5 09:54:40] Getting assignment from server [Comm thread Feb 5 09:54:40] PrimeNet error 9: Access denied [Comm thread Feb 5 09:54:40] Invalid security signature [Comm thread Feb 5 09:54:40] Visit http://mersenneforum.org for help. [Comm thread Feb 5 09:54:40] Will try contacting server again in 70 minutes. [Work thread Feb 5 10:54:39] Resuming. [Work thread Feb 5 10:54:39] No work to do at the present time. Waiting. [/CODE] If I leave it, it just keeps failing every 70 minutes, I waited max 3 times I think. If I shut down mprime and start it again, it works right away and receives a new exponent. |
[QUOTE=GP2;507590]Yes Prime95/mprime can do type 4. If it's an automatic assignment of a PRP double-check ([c]WorkPreference=151[/c] in prime.txt), it will set the residue type to be the same as that of the first-time PRP test.[/QUOTE]
So, if prime95 via primenet connection is the second check, gpuowl first test, all's well. Odds of prime95 rather than a gpu app being the second check are high. If gpuowl is second check, manually assigned, not so good. Odds of prime95 type 1 first test are high. Then gpuowl does type 4 PRP if it is v5, v6, and there's a type mismatch. If gpuowl did first test, with offset zero, using gpuowl for double check would match type (good), but would also match offset zero (not good). Gpuowl versions that do type 4 probably should not be used for double check for now. |
[QUOTE=Prime95;507597]I don't think this will be hard to track down.[/QUOTE]
George, when you have a few minutes, check your forum private message inbox re 3 dump files for prime95 v29.5b9 benchmark stalls on the "reliable" i7-8750H, Windows 10. |
Update with my situation: I installed an 850W Seagate PSU and instead of trying 29.5b9 I ran AIDA64 stress test with FPU option checked to see if it was stable.
The system is not stable in AIDA64, even with the 850W PSU. It frequently crashes AIDA64's stress test, citing "Error: hardware error". So it appears the problem is on my end and is not the result of Prime95's AVX512 optimizations. I do find it interesting that all 4 of my Skylake-X 9800X CPUs are stable in 29.4 or in stress tests without AVX512 optimizations, but the moment I include AVX512 in AIDA the system breaks down. I will have to do some digging about this situation. |
| All times are UTC. The time now is 22:08. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.