![]() |
|
|
#1 |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
11×347 Posts |
Recently some members helped me solve my compilation trouble with the ggnfs package. Thanks you to all that helped.
But, now I have some new questions. Empirical data from my machines appears to point to lower L1_BITS values as running faster than higher values, although I haven't done really extensive research. I'm hoping familiar users already know the answers so I don't have to research heavily. To the point, in more than one machine setting the L1_BITS value to 14 vs. 15 vs. 16 lead to the increase of time per relation as much of a difference as 0.04107 secs/rel (14 bits) vs. 0.05295 secs/rel (16 bits). I have not studied whether the number of relations is affected in either direction. Is it possible that I just happen to be working with a range of composites size that is better handled by smaller L1_BITS value or is this something that is due to my ancient hardware, or, is it that I am looking at something wrong? Thanks... |
|
|
|
|
|
#2 |
|
"Curtis"
Feb 2005
Riverside, CA
4,861 Posts |
My very elementary understanding is that setting refers to the core's L1 cache size. Intel chips of all but the oldest vintage work best at 15 bits (that's 32k, right?), while some AMD chips work best at 14 though I don't recall which specific generation.
I assume this refers to data cache size, but I'm no programmer so that's just a guess. I have, at times, wondered if some machine workloads involving heavy thread-loads might benefit from 14 for this setting, even if the architecture is compatible with 15 in theory. I'm pretty sure the software should find no different relations from the different settings, but I'd like confirmation of such. |
|
|
|
|
|
#3 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
36×13 Posts |
L1_bits should be set such that the L1 cache size (of the CPU where you will run) = 2^L1_bits. That seems to be the paradigm of the code.
8Kb -> L1_bits=13 16Kb -> L1_bits=14 (old AMD chips and old Pentiums) 32Kb -> L1_bits=15 (most Intel chips) 64Kb -> L1_bits=16 (Phenoms, for example) L1_bits=16 really makes for slightly faster sievers for the Phenoms, but the same binary runs slower (than that with L1_bits=15) on Xeons. There were some exceptions when I tried various binaries (quite a few years ago). P.S. While I was typing, this already became a cross-post, but I'll just leave it here. |
|
|
|
|
|
#4 |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
73518 Posts |
Thanks! I'll study this a bit more, after I find out why my factmsieve.py mysteriously quit working this morning. I can't even run it manually. But, that is for a different thread, even though the future use of the L1_BITS value depends on it working.
|
|
|
|
|
|
#5 |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
11×347 Posts |
The machine that I have currently running with L1_BITS=14 lists the following via lswh:
Code:
configuration: cores=2 enabledcores=2 threads=2
*-cache:0
description: L1 cache
physical id: 700
size: 32KiB
capacity: 32KiB
capabilities: internal write-back data
|
|
|
|
|
|
#6 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
16F816 Posts |
Per core. What is the cpu? I assume modern intel?
It does raise a good point though. Hyperthreading gives a large speed improvement in sieving. We would possibly get an even better speed with L1_bits one lower with hyperthreading as the L1 cache would be shared between the threads. It would be nice to find why hyperthreading helps and fix the slowdown at somepoint. Last fiddled with by henryzz on 2016-12-05 at 22:17 |
|
|
|
|
|
#7 | |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
381710 Posts |
Quote:
Code:
*-cpu
description: CPU
product: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz
vendor: Intel Corp.
physical id: 400
bus info: cpu@0
slot: Microprocessor
size: 1866MHz
width: 64 bits
clock: 1066MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow
configuration: cores=2 enabledcores=2 threads=2
*-cache:0
description: L1 cache
physical id: 700
size: 32KiB
capacity: 32KiB
capabilities: internal write-back data
*-cache:1
description: L2 cache
physical id: 701
size: 2MiB
capacity: 2MiB
capabilities: internal varies unified
Code:
*-cpu
description: CPU
product: Intel(R) Core(TM)2 Duo CPU U7600 @ 1.20GHz
vendor: Intel Corp.
physical id: 4
bus info: cpu@0
version: Intel(R) Core(TM)2 Duo CPU U7600 @ 1.20GHz
slot: U10
size: 1200MHz
capacity: 1200MHz
width: 64 bits
clock: 133MHz
capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx x86-64 constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow vnmi flexpriority cpufreq
*-cache:0
description: L1 cache
physical id: 5
slot: Internal L1 Cache
size: 64KiB
capacity: 64KiB
capabilities: burst internal write-back unified
*-cache:1
description: L2 cache
physical id: 6
slot: Internal L2 Cache
size: 2MiB
capacity: 2MiB
capabilities: burst external write-back unified
|
|
|
|
|
|
|
#8 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
Check the processor specifications online. Sometimes the L1 cache might really be two separate stores, one for data and one for instructions. Only the former is usable for user program data (as the name suggests).
|
|
|
|
|
|
#9 | |||
|
"Ed Hall"
Dec 2009
Adirondack Mtns
EE916 Posts |
Quote:
Quote:
![]() Just to note, the first cpu says: Quote:
Thanks much! That sheds some more light. |
|||
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| GMP under win64 compilation | paulunderwood | Programming | 1 | 2015-10-30 02:18 |
| CUDA 64-bit compilation | wombatman | Msieve | 1 | 2014-02-08 18:40 |
| MSieve 1.51 (GPU) compilation | wombatman | Msieve | 28 | 2013-05-16 16:29 |
| LLR compilation error | nuggetprime | Software | 1 | 2008-08-29 15:17 |
| Request for compilation | fivemack | Factoring | 12 | 2008-06-13 06:07 |