mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   P95 PrimeNet causes BSOD; small FFTs, large FFTs, and blend test don't (https://www.mersenneforum.org/showthread.php?t=18754)

KarateF22 2013-10-25 02:53

P95 PrimeNet causes BSOD; small FFTs, large FFTs, and blend test don't
 
Hi.

So I have an interesting problem... I can get my overclocked computer 8 hour+ stable in Small FFTs, Large FFTs, and the Blend torture tests. Oddly enough, however, if I try to use PrimeNet... I will get a near instantaneous BSOD citing "Secondary Processor Interrupt". I can only get this to go away by dropping from 4.8 to 4.5 GHz, which is massive. Likewise I have never had a processor crash in any other program, including the Intel Burn Test. Does anyone know why it would be so unstable for this one specific program when in all other operating circumstances it is rock solid?

Lastly, sorry if this isn't exactly the place for this kind of question, I will admit that I found this forum by basically googling "Prime95 forums".

Prime95 2013-10-25 03:27

Haswell?

KarateF22 2013-10-25 04:25

Sandy Bridge, 3930K

Prime95 2013-10-25 05:00

[QUOTE=KarateF22;357351]Oddly enough, however, if I try to use PrimeNet... I will get a near instantaneous BSOD citing "Secondary Processor Interrupt". I can only get this to go away by dropping from 4.8 to 4.5 GHz, which is massive.[/QUOTE]

I'm sorry but I've not heard from anyone else having a similar problem. I can't even come up with a plausible explanation why network activity would trigger a hardware failure.

KarateF22 2013-10-25 05:25

[QUOTE=Prime95;357362]I'm sorry but I've not heard from anyone else having a similar problem. I can't even come up with a plausible explanation why network activity would trigger a hardware failure.[/QUOTE]

Which torture test does PrimeNet most resemble? Or does it cycle through each of them over its execution? Are there any subroutines that are only utilized in PrimeNet?

Prime95 2013-10-25 13:17

[QUOTE=KarateF22;357363]Which torture test does PrimeNet most resemble? [/QUOTE]

In-place large FFT torture test with an FFT size somewhere between 3072K and 4096K.

TheMawn 2013-10-25 20:51

See if you can set up a custom torture test with those parameters. When you say near instant, are you talking minutes, seconds, or less?

KarateF22 2013-10-26 03:55

[QUOTE=TheMawn;357444]See if you can set up a custom torture test with those parameters. When you say near instant, are you talking minutes, seconds, or less?[/QUOTE]

0-2 seconds after it finishes downloading the data on any core. As for testing the custom settings, I will be able to Sunday... am currently out of town.

retina 2013-10-26 06:10

I understand the desire to get as much out of the system as one can. You are pushing your system to the limits so naturally bad things can (and do) happen. As for finding the reason I think we already know what it is: overclocking and generally stressing the system. And you also already know the solution: reduce the clock.

But considering the utility function of gain vs loss, I think you have already lost much more than you can ever gain from now by overclocking. BSODs and general errored results cause a lot of work to be wasted and requires a significant amount of overclocking to be able to recover such losses, let alone actually getting ahead on the deal.

My suggestion is just to set it to 4.3GHz (or whatever you found stable less some comfort margin) and start getting good results immediately. Waiting another few days for more tests to complete, and other various bits of farting around, will take a lot of catching up (if you can ever catch up).

Aramis Wyler 2013-10-26 16:29

I would normally agree with what retina says here, but in this case if it can run the prime95 torture test without error for 8 hours reliably, and the same program is dumping a bsod in just a couple of seconds talking to primenet, something else is wrong.

You may discover in the end, that the cpu IS overclocked a little to high too give consitsently accurate results. Still, something else is wrong here.

Prime95 2013-10-27 02:50

Posting the prime.log file might be useful.

cheesehead 2013-10-27 12:04

[QUOTE=KarateF22;357351]Lastly, sorry if this isn't exactly the place for this kind of question,[/QUOTE]It's the right place.

[quote]I will admit that I found this forum by basically googling "Prime95 forums".[/quote]Did you try Googling "BSOD Secondary Processor Interrupt" or "Sandy Bridge Secondary Processor Interrupt"?

When I try those and some variations, almost every find is talking about a _clock_ interrupt, as in "A clock interrupt was not received on a secondary processor". Does your message mention "clock"?

[quote]I can get my overclocked[/quote]Hmm... , "clock interrupt", "overclocked" Coincidence?

As for it happening only if you use PrimeNet ... I don't know.

KarateF22 2013-10-27 21:50

Well, this just gets even more curious. Just deleted my current version of p95 and installed a new (presumably identical) one. Now PrimeNet is fine, though it will only utilize 6 of 12 available workers (though they each have their affinity set to use 2 threads). That said, it doesn't seem to be fully stressing the CPUs anymore using PrimeNet, while previously it had 12 workers running on full and fully stressed CPUs. I... don't even really have anything I could possibly say to justify why this would suddenly be the case.

*EDIT* Nevermind on the number of workers, found the setting to increase that number higher. Now fully stresses CPU. No longer crashes CPU either... which confuses me as presumably its an identical program.


tl;dr Deleting then reinstalling Prime95 magically fixed everything.

chalsall 2013-10-27 22:53

[QUOTE=KarateF22;357646]Deleting then reinstalling Prime95 magically fixed everything.[/QUOTE]

That doesn't make sense (if you are SURE the versions of Prime95 were identical).

Have you checked your HD's SMART data? You might have a failing hard drive.

If not, the temporal correlation with network activity might be worth thinking about. Could you have a bad NIC and/or bus?

KarateF22 2013-10-27 23:27

[QUOTE=chalsall;357653]That doesn't make sense (if you are SURE the versions of Prime95 were identical).

Have you checked your HD's SMART data? You might have a failing hard drive.

If not, the temporal correlation with network activity might be worth thinking about. Could you have a bad NIC and/or bus?[/QUOTE]

I won't argue that it doesn't make sense, unfortunately I wasn't smart enough to keep the old version... though to be fair it was because I did not think it would work.

My HD appears to be fine, just did a check and there were no errors.

TheMawn 2013-10-28 00:20

Hey yeah! Where the hell is our damn NIC stress tester?

chalsall 2013-10-28 00:34

[QUOTE=TheMawn;357665]Hey yeah! Where the hell is our damn NIC stress tester?[/QUOTE]

MTR (My Trace Route) is mine.


All times are UTC. The time now is 08:22.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.