mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2019-02-03, 17:31   #243
GP2
 
GP2's Avatar
 
Sep 2003

32×7×41 Posts
Default

Quote:
Originally Posted by kriesel View Post
Updated TIghtVNC to latest available v2.8.11, on i7-8750H laptop (VNC server) and on VNC client box. This is after a Win 10 update and restart, and a BIOS update and other laptop manufacturer updates to current, and a system restart. After that, again, prime95 v29.5b9 had a stall early in a user-initiated throughput benchmark, within 10 minutes of start.
The screenshot won't be much help in solving the problem.

Why not try Mysticial's idea? You wrote that the dump file is too big for email, but you can easily upload it for free to Dropbox or some similar cloud storage where George can access it.

Quote:
Originally Posted by Mysticial View Post
Here's an idea:

If anybody manages to make it hang in Windows again. Open up Task Manager, right-click on the process and "Create dump file". Then send it to George.

If George has the debug symbols for the binary, he should be able to load it up in Visual Studio and probe the stacks for every single thread that's alive to see what's waiting on what. That might be enough to figure out what the deadlock/hang is.
Quote:
Originally Posted by kriesel View Post
Thanks, and managing to make it hang is no issue, very reliable on my i7-8750H, making the dump file seems easy enough, but 155MB raw, and even 55MB compressed, it's too big for email etc.

As an alternative, on Linux you can get a core dump of a running program and then stack trace all the threads. See for instance this post.

You could try activating Windows Subsystem for Linux (WSL) on your Windows 10 box, and download a Linux distro for free from the Windows store, then try running the Linux64 version of mprime and see if you can reproduce the same problem.

Note WSL is not a virtual machine, it's more like a container, so the Linux64 version of mprime should run at full speed.


However, Mysticial's idea is much more straightforward, and the large size of the dump file shouldn't be an issue.

Last fiddled with by GP2 on 2019-02-03 at 17:41
GP2 is offline   Reply With Quote
Old 2019-02-03, 18:26   #244
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

22×32×17 Posts
Default

Quote:
Originally Posted by kriesel View Post


https://www.mersenneforum.org/showpo...&postcount=694 says PRP-1 can only produce type 4 residues.
What I observed with a single run was V4.3 produced a type 4 residue even when P-1 is not involved.

See also https://www.mersenneforum.org/showpo...&postcount=209
I don't know if V6 returns to the preferred standard type 1 residue.
Would there be any advantage to switch from gpuowl v5.0 to v3.8 ( other than a tiny increase in speed that v3,8 offers ) so my results produce type=1 residues ? i.e. are type=1 residues more desirable than type=4?
tServo is offline   Reply With Quote
Old 2019-02-03, 21:11   #245
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7×19×37 Posts
Default

Quote:
Originally Posted by tServo View Post
Would there be any advantage to switch from gpuowl v5.0 to v3.8 ( other than a tiny increase in speed that v3,8 offers ) so my results produce type=1 residues ? i.e. are type=1 residues more desirable than type=4?
I'd find either of those reasons persuasive. Although, speedup with V3.8 is not guaranteed, and depends on your gpu model and perhaps what fft lengths you are running at and OS, driver, etc. I have my AMD gpu system set up so I have multiple versions ready to run. Try it.
I think prime95 does type one PRP3, with pseudorandom shift. As we saw recently, puowl does type 1 or 4 PRP3 depending on version, with zero shift (leaving out the complication of PRP/P-1 in some versions of gpuowl using a different base than 3). For double check residues to be comparable, they need to be same PRP type. And also same base?

And it is preferable the shifts be different, so that some potential errors don't affect the two separate runs the same way. Therefore, it's not ideal to run gpuowl type4 PRP to double check gpuowl type4 PRP; they'd have the same zero shift. (I'm not clear on what else is available to DC a type 4 PRP. Maybe prime95 can do it if the user specifies.)

Last fiddled with by kriesel on 2019-02-03 at 21:18
kriesel is offline   Reply With Quote
Old 2019-02-03, 22:52   #246
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

5·172 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
ps. Ofcourse at least at the first exponent the third run, that used type=4 was useless, that can't verify a type=1 result. Not to mention that you can't use type=2,4 for a fast PRPC test.
It would be easier to calculate also in all 5 types the a^(mp-1) mod mp residue (res64 or better res2048), that means 0-1-very few more iterations. As I can remember the gpuowl is calculating also this residue with a=3.

Quote:
Originally Posted by GP2 View Post
As of today the exponent 79075979 is confirmed bad, so we have a type-1 PRP test with a bad residue!
The other five has already a non-matching (masked) residue, so even without a triple check we know that those tests failed (most likely "only" Simon's tests).
R. Gerbicz is offline   Reply With Quote
Old 2019-02-03, 23:11   #247
GP2
 
GP2's Avatar
 
Sep 2003

50278 Posts
Default

Quote:
Originally Posted by kriesel View Post
Therefore, it's not ideal to run gpuowl type4 PRP to double check gpuowl type4 PRP; they'd have the same zero shift. (I'm not clear on what else is available to DC a type 4 PRP. Maybe prime95 can do it if the user specifies.)
Yes Prime95/mprime can do type 4. If it's an automatic assignment of a PRP double-check (WorkPreference=151 in prime.txt), it will set the residue type to be the same as that of the first-time PRP test.
GP2 is offline   Reply With Quote
Old 2019-02-04, 01:24   #248
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1CBD16 Posts
Default

Quote:
Originally Posted by GP2 View Post
The screenshot won't be much help in solving the problem.
I don't think this will be hard to track down.
Prime95 is offline   Reply With Quote
Old 2019-02-04, 03:28   #249
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7·19·37 Posts
Default

Quote:
Originally Posted by GP2 View Post
The screenshot won't be much help in solving the problem.

Why not try Mysticial's idea? You wrote that the dump file is too big for email, but you can easily upload it for free to Dropbox or some similar cloud storage where George can access it.
May return to that. I offered George the choice of naming an existing ftp or dropbox location I could give it to him at, and never heard back. I don't have one of either set up currently, and easily find other ways to be busy. Probably the same is true for George.
Quote:
As an alternative, on Linux you can get a core dump of a running program and then stack trace all the threads. See for instance this post.

You could try activating Windows Subsystem for Linux (WSL) on your Windows 10 box, and download a Linux distro for free from the Windows store, then try running the Linux64 version of mprime and see if you can reproduce the same problem.

Note WSL is not a virtual machine, it's more like a container, so the Linux64 version of mprime should run at full speed.

However, Mysticial's idea is much more straightforward, and the large size of the dump file shouldn't be an issue.
I don't know if the same system has the issue in linux as in Windows. Either way, figuring out what is going on with the Windows instance is the priority. Recently I've been focussing on ruling out unapplied updates as possible causes there, partly because of repeated questions to me about that.

I think use of WSL might create as many questions as it answers. If there's still an issue, is that because of the underlying Windows, linux fans may ask or presume. Running mprime on WSL on Windows 10 is not the case I'm trying to help solve. Running prime95 on bare Win 10 is. I'm hesitant to complicate the environment or proliferate cases.

I have a bootable USB stick for Ubuntu 18.04. I built it and tried it after Chalsall's suggestion to use that on the problem laptop. Well, that was a big waste of time. The trial mode recommended treated both the hard drives in the laptop, and the USB stick also, as readonly. Even linux mprime images deposited on the empty hard drive and double checked unrestricted before and after, were readonly during the linux boot (just r, not execute, modify or delete) and could not be changed permissions or run. Mprime runs could not be run, through the method recommended for attempting it.
Either getting through linux, a very foreign environment, or getting a dump file uploaded, is problematic with my crappy slow ISP. Individual web pages time out more often than load. Downloads and uploads frequently fail. DVD size downloads are impossible. Even tiny downloads fail. Maybe I'll need to haul it into town to the library. It's taken hours just to look into a couple of online storage choices.

Problem solving or relearning of linux involves a lot of note taking because the internet connection is so slow I often forget what I was looking up or why by the time a search returns results. Everything in linux seems to me to be named wrong or awkward to use. That and unfamiliarity makes it orders of magnitude slower to use. Maybe I spent too many decades in VMS, RSX, DOS from 1.1, and Windows from 3.0 up.

It may not sound like it, but I appreciate you taking the time to write about alternatives. It would be good to have it resolved.

Last fiddled with by kriesel on 2019-02-04 at 03:56
kriesel is offline   Reply With Quote
Old 2019-02-05, 11:03   #250
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

57358 Posts
Default

This has happened a few times in 29.5b9, but only to my 2 PRP CF DC instances:

Code:
[Comm thread Feb 5 09:54:40] PRP test successfully completes double-check of M6946969 -- 
[Comm thread Feb 5 09:54:40] CPU credit is 1.4084 GHz-days.
[Comm thread Feb 5 09:54:40] Getting assignment from server
[Comm thread Feb 5 09:54:40] PrimeNet error 9: Access denied
[Comm thread Feb 5 09:54:40] Invalid security signature
[Comm thread Feb 5 09:54:40] Visit http://mersenneforum.org for help.
[Comm thread Feb 5 09:54:40] Will try contacting server again in 70 minutes.
[Work thread Feb 5 10:54:39] Resuming.
[Work thread Feb 5 10:54:39] No work to do at the present time.  Waiting.
If I leave it, it just keeps failing every 70 minutes, I waited max 3 times I think. If I shut down mprime and start it again, it works right away and receives a new exponent.
ATH is offline   Reply With Quote
Old 2019-02-05, 15:44   #251
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

114718 Posts
Default

Quote:
Originally Posted by GP2 View Post
Yes Prime95/mprime can do type 4. If it's an automatic assignment of a PRP double-check (WorkPreference=151 in prime.txt), it will set the residue type to be the same as that of the first-time PRP test.
So, if prime95 via primenet connection is the second check, gpuowl first test, all's well. Odds of prime95 rather than a gpu app being the second check are high.

If gpuowl is second check, manually assigned, not so good. Odds of prime95 type 1 first test are high. Then gpuowl does type 4 PRP if it is v5, v6, and there's a type mismatch.
If gpuowl did first test, with offset zero, using gpuowl for double check would match type (good), but would also match offset zero (not good).

Gpuowl versions that do type 4 probably should not be used for double check for now.
kriesel is offline   Reply With Quote
Old 2019-02-05, 15:48   #252
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7·19·37 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I don't think this will be hard to track down.
George, when you have a few minutes, check your forum private message inbox re 3 dump files for prime95 v29.5b9 benchmark stalls on the "reliable" i7-8750H, Windows 10.

Last fiddled with by kriesel on 2019-02-05 at 16:12
kriesel is offline   Reply With Quote
Old 2019-02-05, 19:22   #253
simon389
 
Aug 2013

3·29 Posts
Default

Update with my situation: I installed an 850W Seagate PSU and instead of trying 29.5b9 I ran AIDA64 stress test with FPU option checked to see if it was stable.

The system is not stable in AIDA64, even with the 850W PSU. It frequently crashes AIDA64's stress test, citing "Error: hardware error". So it appears the problem is on my end and is not the result of Prime95's AVX512 optimizations.

I do find it interesting that all 4 of my Skylake-X 9800X CPUs are stable in 29.4 or in stress tests without AVX512 optimizations, but the moment I include AVX512 in AIDA the system breaks down. I will have to do some digging about this situation.
simon389 is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 02:03.

Fri Mar 5 02:03:53 UTC 2021 up 91 days, 22:15, 0 users, load averages: 1.41, 1.62, 1.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.