mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   NFS@Home (https://www.mersenneforum.org/forumdisplay.php?f=98)
-   -   BOINC NFS sieving - NFS@Home (https://www.mersenneforum.org/showthread.php?t=12388)

VictordeHolland 2014-09-15 14:32

GC_2_795
 
1 Attachment(s)
[B]GC_2_795[/B]
[code]
prp88 factor: 1229181211256783532456105424311951957003782336757922948778898041952727344588487561131449
prp122 factor: 29437568931055869616663350030336306362482066093333700749117926399399022787021904381727616483972300349043385650188711287347[/code]95.0 hours with a "target_density=120" 10.6M matrix on a i7 3770k -t 4

swellman 2014-09-15 20:47

GC_12_222 Factored
 
[code]
prp84 factor: 148902775642108616331795830640649037143571534869122755521883005939595673614910241763
prp113 factor: 12971143073736097580217855171017913040659435287994287740254244481861979067885275717801474448610862951431512611037
[/code]

VictordeHolland 2014-09-16 09:10

L1282 LA started
 
LA started on L1282, ETA October 1st.

Couldn't build a matrix at target density 120 (not enough relations), so settled for a 20.3M matrix with target density 100.

Edit: I just noticed NFS@home increased the Q sieving limit of L1282, so I'll wait for the extra relations and build a new matrix.

wombatman 2014-09-16 13:06

GC_5_353 factors as: [CODE]Tue Sep 16 00:48:21 2014 prp57 factor: 485969537734693364126687271492949711480027245728307842357
Tue Sep 16 00:48:21 2014 prp190 factor: 2236662401843371642445552346818808997187163195483097065186825555807216271969665340930331577121304317764383946464160253639913952177421808512066356909004894530314082698485713276741002939780381[/CODE]

Took 145 hours on i7-2630QM with 7 threads.

pinhodecarlos 2014-09-16 13:25

[QUOTE=wombatman;383161]GC_5_353 factors as: [CODE]Tue Sep 16 00:48:21 2014 prp57 factor: 485969537734693364126687271492949711480027245728307842357
Tue Sep 16 00:48:21 2014 prp190 factor: 2236662401843371642445552346818808997187163195483097065186825555807216271969665340930331577121304317764383946464160253639913952177421808512066356909004894530314082698485713276741002939780381[/CODE]Took 145 hours on i7-2630QM with 7 threads.[/QUOTE]

On your laptop it is quicker to run on only 3 or 4 threads. Turn off HT. I think you just wasted 10-15 % of your total time.

wombatman 2014-09-16 14:26

[QUOTE=pinhodecarlos;383163]On your laptop it is quicker to run on only 3 or 4 threads. Turn off HT. I think you just wasted 10-15 % of your total time.[/QUOTE]

I've seen this mentioned before. Why does using fewer threads without hyperthreading help?

xilman 2014-09-16 15:14

[QUOTE=wombatman;383164]I've seen this mentioned before. Why does using fewer threads without hyperthreading help?[/QUOTE]Think: is your process compute bound or memory bound? How well do the memory accesses fit in the caches?

pinhodecarlos 2014-09-16 15:35

I think it is a problem on how LA phase is coded not supporting HT. Also with HT off the CPU will be cooler.

wombatman 2014-09-16 15:57

[QUOTE=xilman;383166]Think: is your process compute bound or memory bound? How well do the memory accesses fit in the caches?[/QUOTE]

Truthfully, I'm not sure. The 2630QM has 6 MB L3 cache, which from reading around, only has a strong effect if the data is being read sequentially. My assumption (based on my poor understanding of the black box workings) is that the LA step goes through some kind of sequential order. So would using 7 threads essentially saturate the L3 cache, making it the limiting step?

Am I in the ballpark?

debrouxl 2014-09-16 18:27

Let's consider two other workloads for which I got benchmark data, before considering msieve:
* application A uses OpenMP for near-linear speedup on a loop with a huge number of iterations, each core chews through several dozen thousand iterations per second. All of the dataset fits in the cache, zero external memory accesses are performed after the initial load, and no floating-point operations are used.
The fastest computer (among the platforms I have access to) for that compute-bound workload uses a FX-8150 @ 3.6 GHz, which is a real 8-core system without HT, with a small L1 cache and only 4 FPUs, so it sucks at Prime95 LL testing, for instance. A couple Core-i7 HT Xeons @ 3.2 and 3.3 GHz are nearly as fast. A quad-core Cortex-A9 @ 1.7 GHz is less than 5 times slower than the FX-8150, so it's in the same ballpark.

* application B is single-threaded. Computer 2 has a CPU with a clock frequency more than twice higher than that of computer 1, definitely more cache and possibly a slightly newer micro-architecture (Xeon E5-1xxx vs. first-generation mobile Core i7), and uses DDR3-1600 while computer 1 uses DDR3-1333.
The workload runs less than 50% faster on computer 2 than on computer 1, so the workload is rather memory-bound. Using DDR3-2400, or the upcoming DDR4, would presumably yield near-linear speedup, but using a latest-generation Xeon at ~4 GHz clock speed wouldn't help that much.


Experience running msieve on an otherwise idle computer shows that:
* using 1 < N <= [real core count] threads decreases run time almost linearly;
* using [real core count] < N <= [hyperthread count] threads does at best slightly decrease the total run time, at worse increases it slightly, so the efficiency (runtime / number of threads) decreases near-linearly.
So msieve is a memory-bound workload indeed, and you should be using only 3 or 4 threads on your i7, as hinted by Carlos :smile:

wombatman 2014-09-16 18:49

Thanks Lionel for the more detailed explanation. I guess this means I can cut back on the number of threads I use on my desktop as well, since I only have DDR3-2133 RAM there.


All times are UTC. The time now is 22:40.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.