mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2012-03-03, 02:56   #122
rjs5
 
Feb 2012

1 Posts
Default

You can find Intel CPU information on the site below.
http://ark.intel.com/

A comment about performance. It appears that the source contains software PREFETCH instructions in a loop to gang prefetch the data before it is actually used. You might check the source code build and analyze the prefetch pattern.

Each CPU has a limited number of memory Read/write operations that can be active at any one time. After that maximum number is reached, the earlier CPU (P4 I think) would cancel a SW prefetch and the cache line would not be read.

Today, a SW prefetch will be completed. If the maximum number of Read/Writes has been reached, the prefetch instruction will stall until resources are available. An NTA prefetch should not evict a modified line but it could possibly evict earlier prefetched data. The earlier prefetch would be waisted.

If you are running two different workloads on one core and its Hyperthread, the prefetch from loop of one might similarly evict valid, useful data from the cache. It is more likely to happen since a core and its hyperthread share the cache.

The number of memory buffers per core is likely to be something in the 8 to 16 PER CORE range. Prefetching more will just cause a "read" stall.

Making sure the prefetching does not get too far ahead and stall execution or evict data needed, you can probably break up the prefetching so there is less pressure on the bus and hyperthread cache.
rjs5 is offline   Reply With Quote
Old 2012-03-03, 10:09   #123
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

3·2,141 Posts
Default

It is an unstable box; the whole point of mprime is that it's been tested on enough machines that error messages of this kind mean your machine is broken.
fivemack is offline   Reply With Quote
Old 2012-03-03, 14:22   #124
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

65358 Posts
Default

Immediate reproducable crash on very-small FFT:
Code:
Pminus1=N/A,1,2,10061,-1,1000000000,100000000000,60

P-1 on M10061 with B1=1000000000, B2=100000000000
Chance of finding a factor is an estimated 24.5%
Using AVX FFT length 512
SUMOUT error occurred.
Waiting five minutes before restarting.
James Heinrich is offline   Reply With Quote
Old 2012-03-03, 16:20   #125
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3,767 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Immediate reproducable crash on very-small FFT
Thanks. For now, turn on round-off checking and the error should go away.
Prime95 is online now   Reply With Quote
Old 2012-03-03, 19:59   #126
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts
Default

Quote:
Originally Posted by fivemack View Post
It is an unstable box; the whole point of mprime is that it's been tested on enough machines that error messages of this kind mean your machine is broken.
His point was that he has two identical machines, each at the same hardware settings. One (appears to be) is stable, the other isn't. However, I do agree with you that it is more likely to be bad hardware than an MPrime bug. MPrime runs on other OCd SB-E machines fine (not sure about the RAM OC), so try and figure out what would cause that particular error. Have you tried MPrime's torture test or memtest?

Edit: Going back and rereading the posts, the "Illegal instruction" is not an error typically associated with hardware problems. Reading the post after his, talking about prefetching and stuff, might be of relevance (but why one box and not the other?). I guess more memtest :P

Last fiddled with by Dubslow on 2012-03-03 at 20:02
Dubslow is offline   Reply With Quote
Old 2012-03-03, 22:00   #127
erg
 
Feb 2012

416 Posts
Default

Both machines have the RAM set to 2133. I'm inclined to believe one of them is unstable, but this mprime is a beta version or something, so I thought I'd get a second opinion.

I will run more tests on the machine and perhaps put the RAM back to the default clock.
erg is offline   Reply With Quote
Old 2012-03-03, 22:06   #128
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

The RAM speed might be messing with MPrime, seeing as I don't think anybody else has used RAM that high with AVX, but then again, your other machine appears to be working fine, so I think it's unlikely to be MPrime with the bug.
Dubslow is offline   Reply With Quote
Old 2012-03-03, 22:14   #129
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by erg View Post
I will run more tests on the machine and perhaps put the RAM back to the default clock.
That would be a really good idea.

Based on heuristics, before you think that Prime95/mprime has a bug, assume your hardware is the problem.

Particularily if you are trying to push your hardware past its manufacturing specs....
chalsall is offline   Reply With Quote
Old 2012-03-03, 22:17   #130
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by erg View Post
Both machines have the RAM set to 2133. I'm inclined to believe one of them is unstable, but this mprime is a beta version or something, so I thought I'd get a second opinion.

I will run more tests on the machine and perhaps put the RAM back to the default clock.
Anytime you get errors with MPrime/Prime95, it just about 100% likely a hardware problem (Maybe 101%)


Overclocking is also a good way to get errors, but MPrime/Prime95 are used to burn-in testing all the time. People often sell their systems as Prime95 stable, and for good reason. If you can get 72+ hours on a full Prime95 burn-in test with no errors, you almost certainly have perfect hardware.


With that said, you have two options:


1. Remove the overclock and retest. This option will tell you about the overclock and also test the rest of the hardware again.

2. Start swapping components between the computers, one at a time so you know what causes the problem. I'm inclined to think the memory and then the motherboard, but you never know.

Just keep in mind that no hardware is ever exactly the same. Though it would be nice, you're not likely to get a 100% match for any two builds. It's quite possible you'll get one system 100% stable on overclock and another that won't get 100% stable unless you're at stock. Good luck.

Last fiddled with by flashjh on 2012-03-03 at 22:18
flashjh is offline   Reply With Quote
Old 2012-03-03, 22:20   #131
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

He said the RAM was rated for 2133, so technically only the mobo is overclocking with the ram interface, but the RAM itself should be fine.
Dubslow is offline   Reply With Quote
Old 2012-03-03, 22:25   #132
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11·311 Posts
Default

Quote:
Originally Posted by Dubslow View Post
He said the RAM was rated for 2133, so technically only the mobo is overclocking with the ram interface, but the RAM itself should be fine.
Not necessarily. My RAM is rated for 1600, but I can only get the system 98% stable at "stock" RAM speed no matter what I do. Dropping the RAM clock solved all my stability issues. RAM vendors are unfortunately aggressive with their marketing of high-speed RAM (there's not usually a lot of safe headroom, sometimes (as in my case) even negative headroom).
James Heinrich is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 29.4 Prime95 Software 441 2020-02-16 15:18
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 17:50.


Sun Aug 1 17:50:36 UTC 2021 up 9 days, 12:19, 0 users, load averages: 3.02, 2.43, 1.98

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.