mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   unknown Intel (https://www.mersenneforum.org/showthread.php?t=24014)

cbug 2019-01-19 19:02

unknown Intel
 
Hi,


I am using prime95 29.4b8 on an Intel Pentium N4200 Mediacenter.
Since prime95 reports unknown Intel with 1MB L2 cache size instead of 2MB, I thought I should report details for this CPU using [URL="https://docs.microsoft.com/en-us/sysinternals/downloads/coreinfo"]CoreInfo[/URL].



[code]
Intel(R) Pentium(R) CPU N4200 @ 1.10GHz
Intel64 Family 6 Model 92 Stepping 9, GenuineIntel
Microcode signature: 0000001C
HTT * Hyperthreading enabled
HYPERVISOR - Hypervisor is present
VMX * Supports Intel hardware-assisted virtualization
SVM - Supports AMD hardware-assisted virtualization
X64 * Supports 64-bit mode

SMX - Supports Intel trusted execution
SKINIT - Supports AMD SKINIT

NX * Supports no-execute page protection
SMEP * Supports Supervisor Mode Execution Prevention
SMAP * Supports Supervisor Mode Access Prevention
PAGE1GB * Supports 1 GB large pages
PAE * Supports > 32-bit physical addresses
PAT * Supports Page Attribute Table
PSE * Supports 4 MB pages
PSE36 * Supports > 32-bit address 4 MB pages
PGE * Supports global bit in page tables
SS * Supports bus snooping for cache operations
VME * Supports Virtual-8086 mode
RDWRFSGSBASE * Supports direct GS/FS base access

FPU * Implements i387 floating point instructions
MMX * Supports MMX instruction set
MMXEXT - Implements AMD MMX extensions
3DNOW - Supports 3DNow! instructions
3DNOWEXT - Supports 3DNow! extension instructions
SSE * Supports Streaming SIMD Extensions
SSE2 * Supports Streaming SIMD Extensions 2
SSE3 * Supports Streaming SIMD Extensions 3
SSSE3 * Supports Supplemental SIMD Extensions 3
SSE4a - Supports Streaming SIMDR Extensions 4a
SSE4.1 * Supports Streaming SIMD Extensions 4.1
SSE4.2 * Supports Streaming SIMD Extensions 4.2

AES * Supports AES extensions
AVX - Supports AVX intruction extensions
FMA - Supports FMA extensions using YMM state
MSR * Implements RDMSR/WRMSR instructions
MTRR * Supports Memory Type Range Registers
XSAVE * Supports XSAVE/XRSTOR instructions
OSXSAVE * Supports XSETBV/XGETBV instructions
RDRAND * Supports RDRAND instruction
RDSEED * Supports RDSEED instruction

CMOV * Supports CMOVcc instruction
CLFSH * Supports CLFLUSH instruction
CX8 * Supports compare and exchange 8-byte instructions
CX16 * Supports CMPXCHG16B instruction
BMI1 - Supports bit manipulation extensions 1
BMI2 - Supports bit manipulation extensions 2
ADX - Supports ADCX/ADOX instructions
DCA - Supports prefetch from memory-mapped device
F16C - Supports half-precision instruction
FXSR * Supports FXSAVE/FXSTOR instructions
FFXSR - Supports optimized FXSAVE/FSRSTOR instruction
MONITOR * Supports MONITOR and MWAIT instructions
MOVBE * Supports MOVBE instruction
ERMSB * Supports Enhanced REP MOVSB/STOSB
PCLMULDQ * Supports PCLMULDQ instruction
POPCNT * Supports POPCNT instruction
LZCNT - Supports LZCNT instruction
SEP * Supports fast system call instructions
LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode
HLE - Supports Hardware Lock Elision instructions
RTM - Supports Restricted Transactional Memory instructions

DE * Supports I/O breakpoints including CR4.DE
DTES64 * Can write history of 64-bit branch addresses
DS * Implements memory-resident debug buffer
DS-CPL * Supports Debug Store feature with CPL
PCID - Supports PCIDs and settable CR4.PCIDE
INVPCID - Supports INVPCID instruction
PDCM * Supports Performance Capabilities MSR
RDTSCP * Supports RDTSCP instruction
TSC * Supports RDTSC instruction
TSC-DEADLINE * Local APIC supports one-shot deadline timer
TSC-INVARIANT * TSC runs at constant rate
xTPR * Supports disabling task priority messages

EIST * Supports Enhanced Intel Speedstep
ACPI * Implements MSR for power management
TM * Implements thermal monitor circuitry
TM2 * Implements Thermal Monitor 2 control
APIC * Implements software-accessible local APIC
x2APIC * Supports x2APIC

CNXT-ID - L1 data cache mode adaptive or BIOS

MCE * Supports Machine Check, INT18 and CR4.MCE
MCA * Implements Machine Check Architecture
PBE * Supports use of FERR#/PBE# pin

PSN - Implements 96-bit processor serial number

PREFETCHW * Supports PREFETCHW instruction

Maximum implemented CPUID leaves: 00000015 (Basic), 80000008 (Extended).

Logical to Physical Processor Map:
*--- Physical Processor 0
-*-- Physical Processor 1
--*- Physical Processor 2
---* Physical Processor 3

Logical Processor to Socket Map:
**** Socket 0

Logical Processor to NUMA Node Map:
**** NUMA Node 0

No NUMA nodes.

Logical Processor to Cache Map:
*--- Data Cache 0, Level 1, 24 KB, Assoc 6, LineSize 64
*--- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
**-- Unified Cache 0, Level 2, 1 MB, Assoc 16, LineSize 64
-*-- Data Cache 1, Level 1, 24 KB, Assoc 6, LineSize 64
-*-- Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
--*- Data Cache 2, Level 1, 24 KB, Assoc 6, LineSize 64
--*- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
--** Unified Cache 1, Level 2, 1 MB, Assoc 16, LineSize 64
---* Data Cache 3, Level 1, 24 KB, Assoc 6, LineSize 64
---* Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64

Logical Processor to Group Map:
**** Group 0
[/code]


Cheers,
cbug

Prime95 2019-01-19 20:25

Thanks. I've updated the detection code. Note that the problem has little effect on prime95's behavior.

Your processor will be be treated like an Atom processor in the next release.


The L2 cache reporting is not fixed, not that prime95 uses that information anyway. I believe your CPU has two 1MB L2 caches for a total of 2MB. The L1 cache size reported has a similar issue, your CPU has four 24KB L1 data caches.

ET_ 2019-01-20 10:41

[QUOTE=Prime95;506435]Thanks. I've updated the detection code. Note that the problem has little effect on prime95's behavior.

Your processor will be be treated like an Atom processor in the next release.


The L2 cache reporting is not fixed, not that prime95 uses that information anyway. I believe your CPU has two 1MB L2 caches for a total of 2MB. The L1 cache size reported has a similar issue, your CPU has four 24KB L1 data caches.[/QUOTE]

I didn't knw that Atom processors actually had AVX and FMA support :redface::cry:

cbug 2019-01-20 11:46

Based on CoreInfo, Pentium N4200 has no AVX/FMA support.


The main difference to the Atoms is the AES-NI and the Virtualizing support I believe.




But I got another CPU which I do not know if its detected correctly.


Got an AMD A8-7600, which is detected as:
[code][Main thread Jan 20 12:37] Optimizing for CPU architecture: AMD Bulldozer, L2 cache size: 2 MB[/code]
AMD Kaveri got AVX and FMA4, but while my Xeon E5-2450L CPUs shows "using AVX FFT" and my Xeon D-1541 shows "using FMA3 FFT" while calculating, the AMD CPU shows "using FFT" without any hints to AVX or FMA4.



[code]
~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 21
Model: 48
Model name: AMD A8-7600 Radeon R7, 10 Compute Cores 4C+6G
Stepping: 1
CPU MHz: 3397.693
CPU max MHz: 3100.0000
CPU min MHz: 1400.0000
BogoMIPS: 6188.70
Virtualization: AMD-V
L1d cache: 16K
L1i cache: 96K
L2 cache: 2048K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
[/code]

ET_ 2019-01-20 12:12

[QUOTE=cbug;506469]Based on CoreInfo, Pentium N4200 has no AVX/FMA support.


The main difference to the Atoms is the AES-NI and the Virtualizing support I believe.




But I got another CPU which I do not know if its detected correctly.


Got an AMD A8-7600, which is detected as:
[code][Main thread Jan 20 12:37] Optimizing for CPU architecture: AMD Bulldozer, L2 cache size: 2 MB[/code]
AMD Kaveri got AVX and FMA4, but while my Xeon E5-2450L CPUs shows "using AVX FFT" and my Xeon D-1541 shows "using FMA3 FFT" while calculating, the AMD CPU shows "using FFT" without any hints to AVX or FMA4.



[code]
~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 21
Model: 48
Model name: AMD A8-7600 Radeon R7, 10 Compute Cores 4C+6G
Stepping: 1
CPU MHz: 3397.693
CPU max MHz: 3100.0000
CPU min MHz: 1400.0000
BogoMIPS: 6188.70
Virtualization: AMD-V
L1d cache: 16K
L1i cache: 96K
L2 cache: 2048K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
[/code][/QUOTE]


As always, I didn't pay attention to the symbols "*" and "-", sorry.

Mark Rose 2019-01-20 16:52

[QUOTE=cbug;506469]Based on CoreInfo, Pentium N4200 has no AVX/FMA support.


The main difference to the Atoms is the AES-NI and the Virtualizing support I believe.
[/Quote]

Depending on which Atom. The modern ones all support both AES-NI and VT-x. It's the lack of AVX that differentiates them.

Prime95 2019-01-20 17:40

[QUOTE=cbug;506469]
Got an AMD A8-7600, which is detected as:
[code][Main thread Jan 20 12:37] Optimizing for CPU architecture: AMD Bulldozer, L2 cache size: 2 MB[/code]
AMD Kaveri got AVX and FMA4, but while my Xeon E5-2450L CPUs shows "using AVX FFT" and my Xeon D-1541 shows "using FMA3 FFT" while calculating, the AMD CPU shows "using FFT" without any hints to AVX or FMA4.[/QUOTE]

Bulldozer's implementation of AVX was so bad that SSE2 FFTs were faster than AVX FFTs.

You can test this on your CPU by adding "CPUArchitecture=5" to local.txt

cbug 2019-01-21 01:01

Oh thanks for your answer.


I just tested it. Running 1 worker, 1 thread.


[code][Main thread Jan 21 01:08] Mersenne number primality test program version 29.4
[Main thread Jan 21 01:08] Optimizing for CPU architecture: AMD Bulldozer, L2 cache size: 2 MB
[Work thread Jan 21 01:08] Setting affinity to run worker on CPU core #2
[Work thread Jan 21 01:08] Running Jacobi error check. Passed. Time: 27.771 sec.
[Work thread Jan 21 01:08] Resuming primality test of M51631579 using FFT length 2688K, Pass1=896, Pass2=3K, clm=1
[Work thread Jan 21 01:08] Iteration: 29518675 / 51631579 [57.17%].
[Work thread Jan 21 01:09] Iteration: 29520000 / 51631579 [57.17%], ms/iter: 36.019, ETA: 9d 05:13
[Work thread Jan 21 01:15] Iteration: 29530000 / 51631579 [57.19%], ms/iter: 36.063, ETA: 9d 05:24

[/code]


[code]
[Main thread Jan 21 01:16] Mersenne number primality test program version 29.4
[Main thread Jan 21 01:16] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 2 MB
[Work thread Jan 21 01:16] Worker starting
[Work thread Jan 21 01:16] Setting affinity to run worker on CPU core #2
[Work thread Jan 21 01:16] Running Jacobi error check. Passed. Time: 27.966 sec.
[Work thread Jan 21 01:17] Resuming primality test of M51631579 using AVX FFT length 2688K, Pass1=896, Pass2=3K, clm=1
[Work thread Jan 21 01:17] Iteration: 29531022 / 51631579 [57.19%].
[Work thread Jan 21 01:22] Iteration: 29540000 / 51631579 [57.21%], ms/iter: 32.380, ETA: 8d 06:42
[Work thread Jan 21 01:28] Iteration: 29550000 / 51631579 [57.23%], ms/iter: 32.314, ETA: 8d 06:12
[/code]



After those 2 runs I tried using FMA4 using undoc.txt parameter CpuSupportsFMA4=1.

But that does not seem to work. It stills says AVX FFT.

nomead 2019-01-21 02:11

Even though the Bulldozer family was bad, they did improve over time, just not enough to catch the competition. But of course the main feature affecting LL use (one FPU per "cluster" of two integer cores) didn't change along the way.

Piledriver (2nd gen) had, among other things, improvements to FPU/integer scheduling. Also at this point, the cores got FMA3 support in addition to FMA4.

Steamroller (3rd gen, what your A8-7600 actually is) got further integer IPC gains, but really no FP changes.

Excavator (4th gen) got AVX2 instruction support, but the underlying FP performance likely didn't change much.

Prime95 2019-01-21 03:01

[QUOTE=cbug;506525]
After those 2 runs I tried using FMA4 using undoc.txt parameter CpuSupportsFMA4=1.

But that does not seem to work. It stills says AVX FFT.[/QUOTE]

Only CpuSupportsFMA3 will affect FFT selection.

cbug 2019-01-21 10:29

Haha, I believe the documentation of AMD CPU is really bad.
[URL="https://forums.anandtech.com/threads/when-is-fma3-better-than-fma4.2282248/"]Here[/URL] someone says, that AMD got FMA3 in its SSE5 instructionset. Kaveri does not have SSE5, nor FMA3 in the CPU flags.



I set CpuSupportsFMA3=1 anyways and it seems to be working. Maybe it is not "the great catch" it seems to be faster than before.


1 worker, 1 thread
[code]
[Work thread Jan 21 11:18] Worker starting
[Work thread Jan 21 11:18] Setting affinity to run worker on CPU core #2
[Work thread Jan 21 11:18] Running Jacobi error check. Passed. Time: 28.326 sec.
[Work thread Jan 21 11:18] Resuming primality test of M51631579 using FMA3 FFT length 2688K, Pass1=896, Pass2=3K, clm=1
[Work thread Jan 21 11:18] Iteration: 31642369 / 51631579 [61.28%].
[Work thread Jan 21 11:22] Iteration: 31650000 / 51631579 [61.29%], ms/iter: 28.933, ETA: 6d 16:35
[Work thread Jan 21 11:27] Iteration: 31660000 / 51631579 [61.31%], ms/iter: 28.899, ETA: 6d 16:19
[/code]


All times are UTC. The time now is 07:17.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.