![]() |
[QUOTE=Dubslow;307663]
Is that a root norm or size norm? (I.e., can it be ignored for -ns?) [/QUOTE] It's a size norm; the root score contributes to it, but at heart it's a root-mean-square measure of the polynomial size over the projected sieving region. [QUOTE] ...unless of course each coefficient would take a lifetime to search :smile: I take it that without the time limit, for a problem this large (or even much smaller problems), Msieve could go on forever with one coefficient? And that the time limit per coeff is simply the only reasonable/practical way to do it?[/QUOTE] Pretty much correct; the CPU stage 1 is hashtable-driven and you have to parametrize things in terms of a special-Q, much like sieving. To keep the size of the hashtable small the special-Q have to be very large, leading to a complete search space with effectively millions of spcial-Q. Move to another leading coefficient and you have millions more. |
Suggestion: Instead of using a time limit per coefficient, why not a spq limit? Here's some output of a very slightly modified binary on debrouxl's C150 above:
[code]taskset 2 ./msieve -np1 -i num deadline: 400 CPU-seconds per coefficient coeff 12 specialq 1 - 218082 other 109442 - 262660 aprogs: 16794 entries, 23722 roots hashtable: 32768 entries, 0.50 MB, [U]38815 total spq, 183s elapsed[/U] coeff 12 specialq 1 - 37861 other 262660 - 630384 aprogs: 34609 entries, 45593 roots hashtable: 32768 entries, 0.50 MB, 5590 total spq, 217s elapsed coeff 24 specialq 1 - 301373 other 99780 - 239472 aprogs: 15294 entries, 20710 roots hashtable: 16384 entries, 0.25 MB, 58207 total spq, 253s elapsed coeff 24 specialq 1 - 52321 other 239472 - 574732 aprogs: 32137 entries, 42561 roots hashtable: 32768 entries, 0.50 MB, 2710 total spq, 147s elapsed coeff 36 specialq 1 - 364151 other 94529 - 226869 aprogs: 15170 entries, 23590 roots hashtable: 32768 entries, 0.50 MB, 66036 total spq, 310s elapsed coeff 36 specialq 1 - 63220 other 226869 - 544485 aprogs: 31433 entries, 45693 roots hashtable: 32768 entries, 0.50 MB, 2040 total spq, 90s elapsed coeff 48 specialq 1 - 416471 other 90972 - 218332 aprogs: 13874 entries, 17774 roots hashtable: 16384 entries, 0.25 MB, 71209 total spq, 203s elapsed coeff 48 specialq 1 - 72303 other 218332 - 523996 aprogs: 29535 entries, 38255 roots hashtable: 32768 entries, 0.50 MB, 7050 total spq, 197s elapsed coeff 60 specialq 1 - 462177 other 88305 - 211932 aprogs: 13514 entries, 17350 roots hashtable: 16384 entries, 0.25 MB, 79106 total spq, 220s elapsed coeff 60 specialq 1 - 80239 other 211932 - 508636 aprogs: 28925 entries, 37797 roots hashtable: 32768 entries, 0.50 MB, 6580 total spq, 180s elapsed coeff 72 specialq 1 - 503228 other 86184 - 206841 aprogs: 13251 entries, 17271 roots hashtable: 16384 entries, 0.25 MB, 86399 total spq, 240s elapsed coeff 72 specialq 1 - 87365 other 206841 - 496418 aprogs: 28463 entries, 37719 roots hashtable: 32768 entries, 0.50 MB, 5940 total spq, 160s elapsed coeff 84 specialq 1 - 540758 other 84431 - 202634 aprogs: 12946 entries, 16694 roots hashtable: 16384 entries, 0.25 MB, 78854 total spq, 211s elapsed coeff 84 specialq 1 - 93881 other 202634 - 486321 aprogs: 27825 entries, 36225 roots hashtable: 32768 entries, 0.50 MB, 7160 total spq, 189s elapsed coeff 108 specialq 1 - 608044 other 81649 - 195957 aprogs: 13059 entries, 19383 roots hashtable: 16384 entries, 0.25 MB, 91450 total spq, 400s elapsed coeff 120 specialq 1 - 638684 other 80510 - 193224 aprogs: 12338 entries, 15614 roots hashtable: 16384 entries, 0.25 MB, 108723 total spq, 235s elapsed coeff 120 specialq 1 - 110882 other 193224 - 463737 aprogs: 26702 entries, 34766 roots hashtable: 32768 entries, 0.50 MB, 9910 total spq, 165s elapsed coeff 132 specialq 1 - 667740 other 79493 - 190783 aprogs: 12410 entries, 16818 roots hashtable: 16384 entries, 0.25 MB, 115189 total spq, 284s elapsed coeff 132 specialq 1 - 115927 other 190783 - 457879 aprogs: 26936 entries, 37284 roots hashtable: 32768 entries, 0.50 MB, 5970 total spq, 116s elapsed coeff 144 specialq 1 - 695414 other 78576 - 188582 aprogs: 12204 entries, 16308 roots hashtable: 16384 entries, 0.25 MB, 119597 total spq, 281s elapsed coeff 144 specialq 1 - 120731 other 188582 - 452596 aprogs: 26429 entries, 35405 roots hashtable: 32768 entries, 0.50 MB, 6500 total spq, 119s elapsed coeff 156 specialq 1 - 721882 other 77742 - 186580 aprogs: 12045 entries, 15877 roots hashtable: 16384 entries, 0.25 MB, 125052 total spq, 353s elapsed coeff 156 specialq 1 - 125326 other 186580 - 447792 aprogs: 26107 entries, 34663 roots hashtable: 32768 entries, 0.50 MB, 1680 total spq, 47s elapsed coeff 168 specialq 1 - 747272 other 76978 - 184747 aprogs: 12276 entries, 17708 roots hashtable: 16384 entries, 0.25 MB, 116185 total spq, 338s elapsed coeff 168 specialq 1 - 129734 other 184747 - 443392 aprogs: 26171 entries, 36091 roots hashtable: 32768 entries, 0.50 MB, 2310 total spq, 62s elapsed coeff 180 specialq 1 - 771726 other 76273 - 183055 aprogs: 11904 entries, 15924 roots hashtable: 16384 entries, 0.25 MB, 131862 total spq, 298s elapsed coeff 180 specialq 1 - 133980 other 183055 - 439332 aprogs: 25921 entries, 35509 roots hashtable: 32768 entries, 0.50 MB, 5880 total spq, 102s elapsed coeff 204 specialq 1 - 818158 other 75010 - 180024 aprogs: 11619 entries, 15227 roots hashtable: 16384 entries, 0.25 MB, 134647 total spq, 317s elapsed coeff 204 specialq 1 - 142041 other 180024 - 432057 aprogs: 25375 entries, 33775 roots hashtable: 32768 entries, 0.50 MB, 3830 total spq, 83s elapsed coeff 216 specialq 1 - 840268 other 74441 - 178658 aprogs: 11492 entries, 14896 roots hashtable: 16384 entries, 0.25 MB, 141761 total spq, 299s elapsed coeff 216 specialq 1 - 145879 other 178658 - 428779 aprogs: 25175 entries, 33387 roots hashtable: 32768 entries, 0.50 MB, 6590 total spq, 101s elapsed coeff 228 specialq 1 - 861745 other 73906 - 177374 aprogs: 11749 entries, 16845 roots hashtable: 16384 entries, 0.25 MB, 140809 total spq, 342s elapsed coeff 228 specialq 1 - 149608 other 177374 - 425697 aprogs: 25523 entries, 36203 roots hashtable: 32768 entries, 0.50 MB, 3040 total spq, 58s elapsed coeff 240 specialq 1 - 882602 other 73403 - 176167 aprogs: 11469 entries, 15613 roots hashtable: 16384 entries, 0.25 MB, 137970 total spq, 400s elapsed coeff 252 specialq 1 - 902935 other 72927 - 175024 aprogs: 11317 entries, 14897 roots hashtable: 16384 entries, 0.25 MB, 132314 total spq, 282s elapsed coeff 252 specialq 1 - 156759 other 175024 - 420057 aprogs: 24983 entries, 34167 roots hashtable: 32768 entries, 0.50 MB, 7180 total spq, 118s elapsed[/code] Potential reasons: 1) Time-based seems rather arbitrary. Data-based is more aesthetically pleasing :razz: 2) If a user ^Z's a poly select prog, when it's resumed, chances are the deadline will be long past even though no work is done. A special-q count can't be faked out like that. (I've done this once with an aliquot.) To extend this, there's other ways that CPU time can be stolen/lost, and especially in a BOINC project (kibibit *cough*), awarding credit based on a timer is a rather bad idea. Counting special-q is directly correlated with actual CPU time spent. (P.S. If it's not clear, this is a suggestion for general Msieve use, though it was obviously motivated by Op. Kb. P.P.S. The "second pass" really is quite a bit slower than the first one, you might consider tossing it for degree==5 as well.) |
You joined the forum after the long discussion we had about how Msieve should behave in terms of limiting the effort spent for poly selection. Counting wall clock time means putting the machine on standby overnight will mess up the timing. Counting CPU time means you will never finish if the binary is at idle priority (like BOINC wants) and has to fight with e.g. a screensaver, or you have a laptop that throttles due to high temperature. Counting the special-Q will do too much work on a 3 year old computer, and won't do enough work on next year's computer.
Bottom line is that cobbling together a scheme that works for everyone is too tricky. It's going to get even worse if the GPU overhaul I'm working is going to accelerate stage 1 as much as I think it will. |
Pretty tricky, indeed...
On the one hand, I tend to like logging output as well, for its (potential) informational and educational purposes. That's why I added a verbose mode to remdups4, without interfering with the normal mode experts are used to. For instance, the above output tells that in this range, some special-Qs took much more time than others, so they may be more (or less ?) interesting than the average. On the other hand, I appreciate that it might cause people to wonder why the number of special-q and the elapsed time are, at best, weakly correlated, and send you e-mails or post on MF about it. So [i]I[/i]'d be interested in seeing Dubslow's modifications integrated, but it's up to you :smile: Dubslow, maybe you could post the patch ? :smile: |
Okay, since there's at least a teensy bit of interest, I'll reply. To be explicitly clear, I'm only talking about the per_coeff deadline, not the overall deadline. For the overall deadline, I'm not even sure a spq approach would work, and for that at least, the status quo is better.
[QUOTE=jasonp;308507]You joined the forum after the long discussion we had about how Msieve should behave in terms of limiting the effort spent for poly selection.[/quote] Would you be willing to point me to that conversation? Even something like "in the feedback threads for 1.15-1.20 or thereabouts" would be very helpful. [QUOTE=jasonp;308507] Counting wall clock time means putting the machine on standby overnight will mess up the timing. Counting CPU time means you will never finish if the binary is at idle priority (like BOINC wants) and has to fight with e.g. a screensaver, or you have a laptop that throttles due to high temperature.[/quote] That's how GIMPS and NFS@Home work. They assign credit based on computations completed, not based on the time the program started to the time it finished. If credit was awarded based on time, then it would be super easy to spoof the system by starting the workunit and then pausing it until the deadline was up without actually using any CPU time. As for the idle priority/screensaver, those arguments should apply equally well to GIMPS and NFS@Home, but that's how they work. They run at idle and still get the job done. [QUOTE=jasonp;308507] Counting the special-Q will do too much work on a 3 year old computer, and won't do enough work on next year's computer.[/quote] I would argue that this is what happens with wall-clock time, and not with counting spq. For a C120 I'm tackling, YAFU chose ~3 hours of poly select time. I spent about 10-11 hours sieving. Since poly select is supposed to be no more than 5% of the sieving time, it seems that the 3 hours number was chosen on a slower CPU (which would make sense, this is a i7-2600K). For faster and faster CPUs, the 3 hour number would get progressively worse and worse. With some sort of spq count correlated to work done, the poly select time will go down just as fast as the sieve time goes down, so that they stay in roughly the right ratio for a variety of CPUs. (I know that this argument isn't for the per_coeff deadline, however I believe the principle applies in the per_coeff case as well.) [QUOTE=jasonp;308507] Bottom line is that cobbling together a scheme that works for everyone is too tricky. It's going to get even worse if the GPU overhaul I'm working is going to accelerate stage 1 as much as I think it will.[/QUOTE] As Xyzzy said elsewhere, the impossible is only impossible until it's done. I think (at least in the per_coeff deadline) that a spq scheme would be more applicable to future processors than the wallclock timing method. [QUOTE=debrouxl;308618]Pretty tricky, indeed... On the one hand, I tend to like logging output as well, for its (potential) informational and educational purposes. That's why I added a verbose mode to remdups4, without interfering with the normal mode experts are used to. For instance, the above output tells that in this range, some special-Qs took much more time than others, so they may be more (or less ?) interesting than the average. On the other hand, I appreciate that it might cause people to wonder why the number of special-q and the elapsed time are, at best, weakly correlated, and send you e-mails or post on MF about it. So [i]I[/i]'d be interested in seeing Dubslow's modifications integrated, but it's up to you :smile: Dubslow, maybe you could post the patch ? :smile:[/QUOTE] Well, all I did was add one variable and the extra information to the print statement. I haven't actually put together a patch that uses my proposed spq deadline, though I would certainly be willing to if that's what you're asking for. [code]Index: gnfs/poly/stage1/stage1_sieve_cpu.c =================================================================== --- gnfs/poly/stage1/stage1_sieve_cpu.c (revision 743) +++ gnfs/poly/stage1/stage1_sieve_cpu.c (working copy) @@ -443,6 +443,7 @@ uint32 i, j; uint32 quit = 0; + uint32 tot_q = 0; p_packed_t *tmp; p_packed_var_t specialq_array; p_packed_var_t hash_array; @@ -533,8 +534,9 @@ if (num_q == 0) break; #if 0 - printf("special q: %u entries, %u roots\n", num_q, - specialq_array.num_roots); + printf("special q: %u entries, %u roots, %u total\n", num_q, + specialq_array.num_roots, tot_q); + #endif /* multiply the special-q together; this, and the @@ -600,21 +602,25 @@ qptr = p_packed_next(qptr); invtmp += num_p; } + + tot_q += num_q; // once in a while()... } finished: + *elapsed = get_cpu_time() - cpu_start_time; #if 1 - printf("hashtable: %u entries, %5.2lf MB\n", + printf("hashtable: %u entries, %5.2lf MB, %u total spq, %.0lfs elapsed\n", (uint32)1 << hashtable_size_log2, (double)sizeof(hash_entry_t) * - ((uint32)1 << hashtable_size_log2) / 1048576); + ((uint32)1 << hashtable_size_log2) / 1048576, + tot_q, *elapsed); #endif free(invtable); free(hashtable); p_packed_free(&specialq_array); p_packed_free(&hash_array); mpz_clear(qprod); - *elapsed = get_cpu_time() - cpu_start_time; + return quit; } Index: Makefile =================================================================== --- Makefile (revision 743) +++ Makefile (working copy) @@ -16,7 +16,7 @@ # get overridden by architecture-specific builds) CC = gcc -D_FILE_OFFSET_BITS=64 WARN_FLAGS = -Wall -W -OPT_FLAGS = -O3 -fomit-frame-pointer -march=k8 -DNDEBUG -D_LARGEFILE64_SOURCE +OPT_FLAGS = -O3 -fomit-frame-pointer -march=native -DNDEBUG -D_LARGEFILE64_SOURCE # use := instead of = so we only run the following once SVN_VERSION := $(shell svnversion .)[/code] There were also a couple of other print statements I commented out. |
I was interested only in the extended printf, thanks :smile:
|
What numbers can be used to rank the -np1 and -nps output? Obviously the -npr output is the full polys, ranked by increasing Murphy e combined_score :smile:
|
-np1 doesn't currently output any scores, hits either come in under the stage 1 bound or not. The latest SVN prints the size norm when a polynomial is output from -nps.
|
[QUOTE=jasonp;309320]-np1 doesn't currently output any scores, hits either come in under the stage 1 bound or not. The latest SVN prints the size norm when a polynomial is output from -nps.[/QUOTE]
I'm asking about the code, rather than the output. Do S1 hits not get any sort of score at all? As for -nps, in order to rank the hits, should they be sorted by projective_alpha or sizeopt_norm? (In either case, is a higher or lower score "better"?) |
Stage 1 doesn't score the polynomial at all, because the size optimization is what builds the complete polynomial for the first time, and the 'stage 1 score' is the size of the third-to-highest algebraic polynomial coefficient. Even then, the skew for the resulting polynomial also figures into the score, so getting an accurate measure of the polynomial size may as well be an optimization problem of its own, so it's folded into the rest of the size optimization. This is in gnfs/poly/stage2/stage2.c, mostly in function pol_expand and its callers.
The size optimization score is an integral of the square of the algebraic polynomial over the sieving region, and smaller is better. The projective alpha doesn't really matter, it just influences the size score and its influence is already folded in when the size score is reported. |
[QUOTE=jasonp;309327]The projective alpha doesn't really matter, it just influences the size score and its influence is already folded in when the size score is reported.[/QUOTE]
Why print exp(alpha)*size_norm, rather than just plain size_norm, if the alpha is already folded into size_norm? Edit: config->heap[i] is just the i'th thread's best poly (where i is always 0 :razz:)? |
| All times are UTC. The time now is 04:52. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.