![]() |
|
|
#166 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Congrats for building :)
I think one way is to use MSYS2 http://www.msys2.org/ which may be a bit friendlier. I do think there's a Ctrl-C problem on windows/msys or something. I don't know exactly why or what's the fix for that -- I'd like to find out more. It seems Ctrl-C works.. after a while or from time to time, or something. |
|
|
|
|
|
#167 |
|
Romulan Interpreter
Jun 2011
Thailand
32·29·37 Posts |
Thanks. Now (after your new commit) it also makes sense why the lines were skipped... (expo limits change, we figured out as much meantime, from the source code in worktodo.h).
![]() Working right now. Testing 76000117 (DC-ing). We'll keep in touch.
Last fiddled with by LaurV on 2017-05-28 at 14:04 |
|
|
|
|
|
#168 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Quote:
![]() Windows binary from the latest commit as of now: |
|
|
|
|
|
|
#169 |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
In v0.4 I did a trade-off between memory-bandwidth and compute; in the transposition kernels, the trigonometric table is now much smaller at the cost of 2 additional complex multiplications. This improves performance in situations where memory bandwidth was the bottleneck. A side effect is slightly reduced accuracy (thus, increased round-off error).
This improves performance on "slow-memory" GPUs, in particular Polaris cards (rx580, rx480) should see ~8% perf increase. On the "fast-memory" GPUs (e.g. Fury series) the perf increase is much lower at ~2%. (but it did move my FuryX to under 2ms/it :) |
|
|
|
|
|
#170 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Not sure exactly what's changed, but here's the latest (v0.5) binaries.
Also, I've noticed something interesting(and i'm not sure why i didn't notice this a long time ago ), when an assignment from worktodo.txt finishes, it doesn't remove that line from the file... also to note i've never put multiple assignments there just for the record.
Last fiddled with by kracker on 2017-06-11 at 19:12 |
|
|
|
|
|
#171 |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
I tried to fix the workdoto.txt delete (though I couldn't repro, thus I didn't verify the fix).
In v0.5 I try a new amalgamation kernel; which reduces memory roundtrips by 2 (3 kernels merged into 1), but because of poor opencl compiler the VGPR usage becomes > 128, thus occupancy is lowered. Anyway, if I get reports that 0.5 is slower (compared to 0.3/0.4) I'll add a new "legacy" option to allow use of previous kernels. |
|
|
|
|
|
#172 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
|
|
|
|
|
|
|
#173 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×13×139 Posts |
Quote:
Lessons I took from that are: Test fully; retest fully. Track reliability of individual hardware versus time. Built-in self-test features of number theory software are very useful. Last fiddled with by kriesel on 2017-07-12 at 16:50 |
|
|
|
|
|
|
#174 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×13×139 Posts |
Quote:
Thanks for the GpuOwL links, I've added them to the available software table. Can you think of any reason GpuOwL would not work on Intel integrated graphics processors with OpenCL support? |
|
|
|
|
|
|
#175 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·13·139 Posts |
Quote:
Is a least-significant-64-bit residue matching that of the previous iteration ever valid? I think it would be useful to detect and warn on any successive iteration match. As long as the maximum 64-bit residue value is large compared to the exponent, the chance of a repeat or a premature zero is low. M11 needs at least 5 bits. i residue binary 0 4 0100 1 14 1110 2 194 1100 0010 3 788 11 0001 0100 4 701 10 1011 1101 5 119 111 0111 6 1877 111 0101 0101 7 240 1111 0000 8 282 1 0001 1010 9 1736 110 1100 1000 not prime Last fiddled with by kriesel on 2017-07-13 at 03:36 |
|
|
|
|
|
|
#176 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×13×139 Posts |
GpuOwL (4M flavor) trial on an I7-7500U system
An i7-7500U is two CPU cores with hyperthreading, plus one HD620 iGP. http://ark.intel.com/products/95451/...p-to-3_50-GHz- GpuOwL took approx 19 seconds to get started and launch the first selftest exponent. It identified and listed 4 devices. Zero and two are identified as 24x1050Mhz Intel(R) HD Graphics 620; OpenCL 2.1 One and three are identified as 4x2700Mhz Intel(R) Core*TM) i7-7500U CPU @ 2.70Ghz; OpenCL 2.1 (build 2) It raises the HD620 clock rate from 300Mhz idle value to 950-1000Mhz. Selftest cases run at iteration times of 140-145 msec/iteration. Prime95 throughput drops by about half during this use of the iGP by GpuOwL. (prime95 43M LL test goes from 16 msec/iteration, to 30-33msec/iteration; other prime95 worker goes from ECM on 16.9M 96 seconds per screen output to 152 seconds per output) The 15W TDP total was not changed; but 8.4 watts of it is absorbed by the iGP, dropping cpu frequency to 1.6Ghz and lowering CPU wattage. (CPUID HWMonitor and TechPowerUp GPU-Z) Other elements are Uncore 1.5 W, DRAM 1.6W, IA Cores 5.2W Task Manager shows prime95 dropping from ~72% CPU load before launch of GpuOwl, to about 40% during GpuOwL operation. This is not due to cpu load by GpuOwL. The GpuOwL process shows no cpu usage in Task Manager (occasionally a fraction of a percent). The drop in prime95 cpu usage is assumed to be due to some combination of TDP management and contention for memory bandwidth since the iGP uses shared system RAM. Launching a second instance of GpuOwL in a separate directory near the end of completion of the selftest (two exponents left) did not noticeably change prime95 cpu. There was no discernible difference in HD620 clock rate or power consumption or Prime95 throughput between one and two instances. I found later that the one started earlier had been accidentally put into Select status. Once that was cleared the two instances ran simultaneously at half speed. Run times were 140-145msec/iteration with a single gpuowl running, and approx 299msec/iteration with two. So in regard to prime95, still no discernible difference between two and one instance of gpuowl after clearing the select status. Pilot error? May try it on a different older system too. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |