![]() |
|
|
#353 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
5×11×137 Posts |
Quote:
All Ryzens (to my knowledge) are faster using the 256-bit implementation even if internally it is done 128-bits at a time. |
|
|
|
|
|
|
#354 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
5·11·137 Posts |
|
|
|
|
|
|
#355 | |
|
Jun 2019
2110 Posts |
Quote:
|
|
|
|
|
|
|
#356 |
|
Bemusing Prompter
"Danny"
Dec 2002
California
5·479 Posts |
I added NumCPUs=2 to my local.txt file as an experiment, and it didn't make a difference. I'm at a loss as to why Prime95 won't let me set less than two cores per worker on a dual-core machine when using two worker windows. Any other Mac users seen this issue?
Last fiddled with by ixfd64 on 2019-08-21 at 06:01 |
|
|
|
|
|
#357 |
|
Feb 2016
UK
3·5·29 Posts |
They do, just not in the same way Intel does it. I assume we're familiar with Intel's AVX offset. If there is AVX code running, the clock may be reduced by some fixed amount. It is crude but does the job.
Zen 2 doesn't have an AVX offset concept, but when running Prime95 like code with FMA, it still generates a lot of heat. Based on observation of actual behaviour, running stock, you will hit PPT limit and current limit is also close, so it does still clock down compared to running lesser loads. From memory, on my 3600 it runs all core around 3900 MHz with 128k FFT per core, and a lower stress like Cinebench R15 it is well above 4 GHz. AMD took a more GPU-like approach on Zen 2, it will adjust its clock based on power, current, temperature... so they're not detecting AVX and dropping, but it still uses more power and hits other limits earlier than otherwise so it still drops. |
|
|
|
|
|
#358 | |
|
Jul 2019
the Netherlands
2×11 Posts |
Quote:
So I have been running AVX-256 bit FFTs for years on my Ryzen 1700. You learn something new everyday. What's your secret? Good instruction scheduling? My own FFT implementation runs slightly faster on the 1700 when I use AVX-128 bit instead of AVX-256 bit. |
|
|
|
|
|
|
#359 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
753510 Posts |
I've written SSE2 (128-bit) and AVX (256-bit) versions of FFTs. I've never tried writing an AVX version that only uses half of the register width. I don't see why that would be beneficial.
|
|
|
|
|
|
#360 | |
|
Jul 2019
the Netherlands
2·11 Posts |
Quote:
2. implied SSE4_2 support 3. possible FMA support 4. faster on some processors (not the mainstream ones, however) 5. explicit zeroing of upper half of register (only important when switching between 128-bit and 256-bit) Expect about a 5% speed increase compared to SSE2. Last fiddled with by Evil Genius on 2019-08-21 at 18:23 |
|
|
|
|
|
|
#361 |
|
Jul 2019
the Netherlands
2×11 Posts |
I forgot an important one:
1.5 better scheduling on processors that chop 256-bit operations in half To summarize: * three-register non-destructive mode gives AVX-128 about a 5% speed advantage over SSE2 * better instruction scheduling gives AVX-128 about a 5% speed advantage over AVX-256 when the processor doesn't natively support 256 bit YMMV Last fiddled with by Evil Genius on 2019-08-21 at 20:01 |
|
|
|
|
|
#362 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
753510 Posts |
Quote:
No arguments about the improvements over SSE2. I contend that using the full 256 register should be significantly faster unless the chip developer completely screwed up. a) Twice as much data in registers -- the fastest possible place to store data. b) Half as many instructions to be read and decoded. c) Guaranteed no data dependencies executing on data in the upper vs. lower 128 bits (makes it easier to schedule 128-bit uops). IMO, when AMD screwed up their implementation of splitting 256-bit instructions into two 128-bit uops it was not my job to fix it. I do admit that when I worked on Bulldozer several years ago I did not think of timing AVX on 128-bit operands. |
|
|
|
|
|
|
#363 | |
|
Jul 2019
the Netherlands
268 Posts |
Quote:
They were not the only ones. They deemed compatibility more important than throughput at the time. The reason their cores are more efficient power-wise. But if you have time to spare give it a try on a Zen 1. The results may surprise you. |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Prime95 version 29.2 | Prime95 | Software | 71 | 2017-09-16 16:55 |
| Prime95 version 29.1 | Prime95 | Software | 95 | 2017-08-22 22:46 |
| Prime95 version 26.5 | Prime95 | Software | 175 | 2011-04-04 22:35 |
| Prime95 version 25.9 | Prime95 | Software | 143 | 2010-01-05 22:53 |
| Prime95 version 25.8 | Prime95 | Software | 159 | 2009-09-21 16:30 |