![]() |
MPI-and-threads post-SVN900
This sounds as if it's time to redo my MPI+threads runs.
What I got with SVN900 was the rather uninspiring table below: [code] 1x8 grid, -t3 50757s 2x4 grid, -t2 49289s 2x4 grid, -t3 crash 4x2 grid, -t2 crash 4x2 grid, -t3 crash 8x1 grid, -t3 crash 3x8 grid, no threading 57624s (haswell -t4, svn 886 90103s) (haswell -t4, svn 900 ~50000s) [/code] It will take about a week to run 4 grids x 2 thread-counts x 50 kiloseconds ; so I should probably get started ASAP. |
Jason fixed the crash yesterday. When doing benchmarks, I don't let the run finish. About 10 minutes of running gives a stable ETA.
BTW, what's the size of this matrix? |
Tom with his SI units for time reminds me of this (patently silly) anecdote:
[QUOTE]My daughter has been studying for the theory part of her driving test and tells me she is confident she will pass it. "Are you sure? It is not that easy," I told her. "Ask me any question Dad!" she replied, "OK then, what's the most speed you can do on a British road?" "Oh, that's a tricky one, but I think it's about two grams."[/QUOTE] |
BTW, I was just trying out the Stampede cluster yesterday on the 42.1M matrix from 3,745+. Each node of this cluster has two 8-core Sandy Bridge CPU's. Using 1024 cores in an 8x16 grid with 8 threads, which puts one MPI process on each CPU and uses threads to distribute the calculation on the CPU, msieve SVN 923 can complete the linear algebra in 22.5 hours.
A 42.1M matrix in under a day! :smile: |
Awesome!
gnfs-216-220 is in order then? 3,776+? 11,649L ? |
[QUOTE=Batalov;346684]Awesome!
gnfs-216-220 is in order then? 3,776+? 11,649L ?[/QUOTE] 11,649L is borderline SNFS? Here's the table from the XSEDE proposal I just submitted. [TEX] \begin{tabular}{ccclcc} ~ & Decimal & & & Matrix Size& SU's\\ Type & Digits & Bits & Target & $N & Required \\ \hline GNFS & 197 & 652 & $7^{394}+1 & 34M & 40,000 \\ GNFS & 216 & 716 & $3^{766}+1 & 92M & 500,000 \\ GNFS & 221 & 733 & $11^{323}+1 & 119M & 960,000 \\ \hline \end{tabular} [/TEX] |
The Lomonosov cluster needed 63 hours to solve a 40M matrix with 900 MPI processes. Two years later the solve time is 3x better!
|
[QUOTE=frmky;346687]11,649L is borderline SNFS? Here's the table from the XSEDE proposal I just submitted.
[/QUOTE] It is a borderline SNFS (but it is a quintic, which needs to be tested). I suspect that 11,323+ is a borderline SNFS, too; let's put some polynomials on the table to see... 3,766+ is a solid rung on the ladder... 3,718+ c222 seems to be a more convincing GNFS than 11,323+. |
The matrix I'm testing on is from 5+4.353, just because that's what I'd most recently sieved at the time the Haswell turned up; it's
Thu Jul 4 08:11:15 2013 matrix is 5382199 x 5382378 (1644.8 MB) with weight 515425593 (95.76/col) |
Haswell numbers on a small matrix
This is 'how far did it get in twenty minutes', on a 1197160x1197390 matrix for a C127; each iteration is on average 63.22 dimensions
1/2/3/4 threads svn 886: 2584/3401/4720/5954 svn 923: 4700/8433/11535/14138 scaling to 1 thread svn 886: 1.00 / 1.32 / 1.83 / 2.30 svn 923: 1.00 / 1.79 / 2.45 / 3.01 scaling 923:886 at each thread count: 1.82 / 2.48 / 2.44 / 2.37 I have arranged an account on a friend's i7/3930 and will see what the times are like on that machine's faster memory subsystem tonight |
Opteron numbers on a medium matrix
This is the 5382199 x 5382378 matrix mentioned earlier; figures are number of dimensions (not iterations) done in 3000 seconds, then expected kiloseconds for whole job. Machine is otherwise idle.
So at least 24 cores of Opteron manage to beat 4 cores of Haswell ... 1x8 -t2 299206 51.1 1x8 -t3 368336 41.6 2x4 -t2 349544 44.6 2x4 -t3 429151 36.0 4x2 -t2 344009 44.3 4x2 -t3 436101 35.8 8x1 -t2 292563 52.0 8x1 -t3 363180 42.9 Trying -t[456] tonight. |
| All times are UTC. The time now is 04:52. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.