![]() |
unknown Intel
Hi,
I am using prime95 29.4b8 on an Intel Pentium N4200 Mediacenter. Since prime95 reports unknown Intel with 1MB L2 cache size instead of 2MB, I thought I should report details for this CPU using [URL="https://docs.microsoft.com/en-us/sysinternals/downloads/coreinfo"]CoreInfo[/URL]. [code] Intel(R) Pentium(R) CPU N4200 @ 1.10GHz Intel64 Family 6 Model 92 Stepping 9, GenuineIntel Microcode signature: 0000001C HTT * Hyperthreading enabled HYPERVISOR - Hypervisor is present VMX * Supports Intel hardware-assisted virtualization SVM - Supports AMD hardware-assisted virtualization X64 * Supports 64-bit mode SMX - Supports Intel trusted execution SKINIT - Supports AMD SKINIT NX * Supports no-execute page protection SMEP * Supports Supervisor Mode Execution Prevention SMAP * Supports Supervisor Mode Access Prevention PAGE1GB * Supports 1 GB large pages PAE * Supports > 32-bit physical addresses PAT * Supports Page Attribute Table PSE * Supports 4 MB pages PSE36 * Supports > 32-bit address 4 MB pages PGE * Supports global bit in page tables SS * Supports bus snooping for cache operations VME * Supports Virtual-8086 mode RDWRFSGSBASE * Supports direct GS/FS base access FPU * Implements i387 floating point instructions MMX * Supports MMX instruction set MMXEXT - Implements AMD MMX extensions 3DNOW - Supports 3DNow! instructions 3DNOWEXT - Supports 3DNow! extension instructions SSE * Supports Streaming SIMD Extensions SSE2 * Supports Streaming SIMD Extensions 2 SSE3 * Supports Streaming SIMD Extensions 3 SSSE3 * Supports Supplemental SIMD Extensions 3 SSE4a - Supports Streaming SIMDR Extensions 4a SSE4.1 * Supports Streaming SIMD Extensions 4.1 SSE4.2 * Supports Streaming SIMD Extensions 4.2 AES * Supports AES extensions AVX - Supports AVX intruction extensions FMA - Supports FMA extensions using YMM state MSR * Implements RDMSR/WRMSR instructions MTRR * Supports Memory Type Range Registers XSAVE * Supports XSAVE/XRSTOR instructions OSXSAVE * Supports XSETBV/XGETBV instructions RDRAND * Supports RDRAND instruction RDSEED * Supports RDSEED instruction CMOV * Supports CMOVcc instruction CLFSH * Supports CLFLUSH instruction CX8 * Supports compare and exchange 8-byte instructions CX16 * Supports CMPXCHG16B instruction BMI1 - Supports bit manipulation extensions 1 BMI2 - Supports bit manipulation extensions 2 ADX - Supports ADCX/ADOX instructions DCA - Supports prefetch from memory-mapped device F16C - Supports half-precision instruction FXSR * Supports FXSAVE/FXSTOR instructions FFXSR - Supports optimized FXSAVE/FSRSTOR instruction MONITOR * Supports MONITOR and MWAIT instructions MOVBE * Supports MOVBE instruction ERMSB * Supports Enhanced REP MOVSB/STOSB PCLMULDQ * Supports PCLMULDQ instruction POPCNT * Supports POPCNT instruction LZCNT - Supports LZCNT instruction SEP * Supports fast system call instructions LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode HLE - Supports Hardware Lock Elision instructions RTM - Supports Restricted Transactional Memory instructions DE * Supports I/O breakpoints including CR4.DE DTES64 * Can write history of 64-bit branch addresses DS * Implements memory-resident debug buffer DS-CPL * Supports Debug Store feature with CPL PCID - Supports PCIDs and settable CR4.PCIDE INVPCID - Supports INVPCID instruction PDCM * Supports Performance Capabilities MSR RDTSCP * Supports RDTSCP instruction TSC * Supports RDTSC instruction TSC-DEADLINE * Local APIC supports one-shot deadline timer TSC-INVARIANT * TSC runs at constant rate xTPR * Supports disabling task priority messages EIST * Supports Enhanced Intel Speedstep ACPI * Implements MSR for power management TM * Implements thermal monitor circuitry TM2 * Implements Thermal Monitor 2 control APIC * Implements software-accessible local APIC x2APIC * Supports x2APIC CNXT-ID - L1 data cache mode adaptive or BIOS MCE * Supports Machine Check, INT18 and CR4.MCE MCA * Implements Machine Check Architecture PBE * Supports use of FERR#/PBE# pin PSN - Implements 96-bit processor serial number PREFETCHW * Supports PREFETCHW instruction Maximum implemented CPUID leaves: 00000015 (Basic), 80000008 (Extended). Logical to Physical Processor Map: *--- Physical Processor 0 -*-- Physical Processor 1 --*- Physical Processor 2 ---* Physical Processor 3 Logical Processor to Socket Map: **** Socket 0 Logical Processor to NUMA Node Map: **** NUMA Node 0 No NUMA nodes. Logical Processor to Cache Map: *--- Data Cache 0, Level 1, 24 KB, Assoc 6, LineSize 64 *--- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 **-- Unified Cache 0, Level 2, 1 MB, Assoc 16, LineSize 64 -*-- Data Cache 1, Level 1, 24 KB, Assoc 6, LineSize 64 -*-- Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 --*- Data Cache 2, Level 1, 24 KB, Assoc 6, LineSize 64 --*- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 --** Unified Cache 1, Level 2, 1 MB, Assoc 16, LineSize 64 ---* Data Cache 3, Level 1, 24 KB, Assoc 6, LineSize 64 ---* Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 Logical Processor to Group Map: **** Group 0 [/code] Cheers, cbug |
Thanks. I've updated the detection code. Note that the problem has little effect on prime95's behavior.
Your processor will be be treated like an Atom processor in the next release. The L2 cache reporting is not fixed, not that prime95 uses that information anyway. I believe your CPU has two 1MB L2 caches for a total of 2MB. The L1 cache size reported has a similar issue, your CPU has four 24KB L1 data caches. |
[QUOTE=Prime95;506435]Thanks. I've updated the detection code. Note that the problem has little effect on prime95's behavior.
Your processor will be be treated like an Atom processor in the next release. The L2 cache reporting is not fixed, not that prime95 uses that information anyway. I believe your CPU has two 1MB L2 caches for a total of 2MB. The L1 cache size reported has a similar issue, your CPU has four 24KB L1 data caches.[/QUOTE] I didn't knw that Atom processors actually had AVX and FMA support :redface::cry: |
Based on CoreInfo, Pentium N4200 has no AVX/FMA support.
The main difference to the Atoms is the AES-NI and the Virtualizing support I believe. But I got another CPU which I do not know if its detected correctly. Got an AMD A8-7600, which is detected as: [code][Main thread Jan 20 12:37] Optimizing for CPU architecture: AMD Bulldozer, L2 cache size: 2 MB[/code] AMD Kaveri got AVX and FMA4, but while my Xeon E5-2450L CPUs shows "using AVX FFT" and my Xeon D-1541 shows "using FMA3 FFT" while calculating, the AMD CPU shows "using FFT" without any hints to AVX or FMA4. [code] ~# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 21 Model: 48 Model name: AMD A8-7600 Radeon R7, 10 Compute Cores 4C+6G Stepping: 1 CPU MHz: 3397.693 CPU max MHz: 3100.0000 CPU min MHz: 1400.0000 BogoMIPS: 6188.70 Virtualization: AMD-V L1d cache: 16K L1i cache: 96K L2 cache: 2048K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov [/code] |
[QUOTE=cbug;506469]Based on CoreInfo, Pentium N4200 has no AVX/FMA support.
The main difference to the Atoms is the AES-NI and the Virtualizing support I believe. But I got another CPU which I do not know if its detected correctly. Got an AMD A8-7600, which is detected as: [code][Main thread Jan 20 12:37] Optimizing for CPU architecture: AMD Bulldozer, L2 cache size: 2 MB[/code] AMD Kaveri got AVX and FMA4, but while my Xeon E5-2450L CPUs shows "using AVX FFT" and my Xeon D-1541 shows "using FMA3 FFT" while calculating, the AMD CPU shows "using FFT" without any hints to AVX or FMA4. [code] ~# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 21 Model: 48 Model name: AMD A8-7600 Radeon R7, 10 Compute Cores 4C+6G Stepping: 1 CPU MHz: 3397.693 CPU max MHz: 3100.0000 CPU min MHz: 1400.0000 BogoMIPS: 6188.70 Virtualization: AMD-V L1d cache: 16K L1i cache: 96K L2 cache: 2048K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov [/code][/QUOTE] As always, I didn't pay attention to the symbols "*" and "-", sorry. |
[QUOTE=cbug;506469]Based on CoreInfo, Pentium N4200 has no AVX/FMA support.
The main difference to the Atoms is the AES-NI and the Virtualizing support I believe. [/Quote] Depending on which Atom. The modern ones all support both AES-NI and VT-x. It's the lack of AVX that differentiates them. |
[QUOTE=cbug;506469]
Got an AMD A8-7600, which is detected as: [code][Main thread Jan 20 12:37] Optimizing for CPU architecture: AMD Bulldozer, L2 cache size: 2 MB[/code] AMD Kaveri got AVX and FMA4, but while my Xeon E5-2450L CPUs shows "using AVX FFT" and my Xeon D-1541 shows "using FMA3 FFT" while calculating, the AMD CPU shows "using FFT" without any hints to AVX or FMA4.[/QUOTE] Bulldozer's implementation of AVX was so bad that SSE2 FFTs were faster than AVX FFTs. You can test this on your CPU by adding "CPUArchitecture=5" to local.txt |
Oh thanks for your answer.
I just tested it. Running 1 worker, 1 thread. [code][Main thread Jan 21 01:08] Mersenne number primality test program version 29.4 [Main thread Jan 21 01:08] Optimizing for CPU architecture: AMD Bulldozer, L2 cache size: 2 MB [Work thread Jan 21 01:08] Setting affinity to run worker on CPU core #2 [Work thread Jan 21 01:08] Running Jacobi error check. Passed. Time: 27.771 sec. [Work thread Jan 21 01:08] Resuming primality test of M51631579 using FFT length 2688K, Pass1=896, Pass2=3K, clm=1 [Work thread Jan 21 01:08] Iteration: 29518675 / 51631579 [57.17%]. [Work thread Jan 21 01:09] Iteration: 29520000 / 51631579 [57.17%], ms/iter: 36.019, ETA: 9d 05:13 [Work thread Jan 21 01:15] Iteration: 29530000 / 51631579 [57.19%], ms/iter: 36.063, ETA: 9d 05:24 [/code] [code] [Main thread Jan 21 01:16] Mersenne number primality test program version 29.4 [Main thread Jan 21 01:16] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 2 MB [Work thread Jan 21 01:16] Worker starting [Work thread Jan 21 01:16] Setting affinity to run worker on CPU core #2 [Work thread Jan 21 01:16] Running Jacobi error check. Passed. Time: 27.966 sec. [Work thread Jan 21 01:17] Resuming primality test of M51631579 using AVX FFT length 2688K, Pass1=896, Pass2=3K, clm=1 [Work thread Jan 21 01:17] Iteration: 29531022 / 51631579 [57.19%]. [Work thread Jan 21 01:22] Iteration: 29540000 / 51631579 [57.21%], ms/iter: 32.380, ETA: 8d 06:42 [Work thread Jan 21 01:28] Iteration: 29550000 / 51631579 [57.23%], ms/iter: 32.314, ETA: 8d 06:12 [/code] After those 2 runs I tried using FMA4 using undoc.txt parameter CpuSupportsFMA4=1. But that does not seem to work. It stills says AVX FFT. |
Even though the Bulldozer family was bad, they did improve over time, just not enough to catch the competition. But of course the main feature affecting LL use (one FPU per "cluster" of two integer cores) didn't change along the way.
Piledriver (2nd gen) had, among other things, improvements to FPU/integer scheduling. Also at this point, the cores got FMA3 support in addition to FMA4. Steamroller (3rd gen, what your A8-7600 actually is) got further integer IPC gains, but really no FP changes. Excavator (4th gen) got AVX2 instruction support, but the underlying FP performance likely didn't change much. |
[QUOTE=cbug;506525]
After those 2 runs I tried using FMA4 using undoc.txt parameter CpuSupportsFMA4=1. But that does not seem to work. It stills says AVX FFT.[/QUOTE] Only CpuSupportsFMA3 will affect FFT selection. |
Haha, I believe the documentation of AMD CPU is really bad.
[URL="https://forums.anandtech.com/threads/when-is-fma3-better-than-fma4.2282248/"]Here[/URL] someone says, that AMD got FMA3 in its SSE5 instructionset. Kaveri does not have SSE5, nor FMA3 in the CPU flags. I set CpuSupportsFMA3=1 anyways and it seems to be working. Maybe it is not "the great catch" it seems to be faster than before. 1 worker, 1 thread [code] [Work thread Jan 21 11:18] Worker starting [Work thread Jan 21 11:18] Setting affinity to run worker on CPU core #2 [Work thread Jan 21 11:18] Running Jacobi error check. Passed. Time: 28.326 sec. [Work thread Jan 21 11:18] Resuming primality test of M51631579 using FMA3 FFT length 2688K, Pass1=896, Pass2=3K, clm=1 [Work thread Jan 21 11:18] Iteration: 31642369 / 51631579 [61.28%]. [Work thread Jan 21 11:22] Iteration: 31650000 / 51631579 [61.29%], ms/iter: 28.933, ETA: 6d 16:35 [Work thread Jan 21 11:27] Iteration: 31660000 / 51631579 [61.31%], ms/iter: 28.899, ETA: 6d 16:19 [/code] |
| All times are UTC. The time now is 07:17. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.