20190224, 03:47  #34  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7·491 Posts 
Quote:
If it depends on the factoring program mfaktc or mfakto using only candidate prime factors in the algorithm, or a predictable number of candidate factors, of predictable values, there may be a problem. In gpu sieving there are 3 parameters which we users can adjust to tune the application for maximum throughput. Some have only a few values allowable, others have a wide integer range allowable. One of these is typically of order 10^{5}, and is how far to sieve candidate factors. (Another can range up to 2047 in modified code.) Some composite candidate factors get let through and used in the algorithm, because it's faster than perfectly sieving candidate factors for primality. If not using gpu sieving for candidate factors, there's another set of tune parameters to do approximately the same. See https://www.mersenneforum.org/showpo...23&postcount=6 mainly items 8, 9 and 26. The crux of it I think is the application does not know how many of the candidate factors employed were prime, in usual operation, or which to count as prime and which not to. Let's see how much I got wrong. Here's how I interpreted it. P is the exponent we seek a factor for and want to validate the TF attempt for, over the TF interval 2^{b} to 2^{(b+1)} cnt(*) is count of (whatever); number of primes (q's?) in TF interval 2^{b} to 2^{(b+1)}; (all primes in the range, or only primes of form 1 or 7 mod 8 and 2kp+1?) q is a candidate factor that may or may not itself be prime, having been generated from an efficient generator and sieved imperfectly in such applications as mfaktc, mfakto, or prime95/mprime t is chosen so that the expected number of H1's is between 1000 and 2000. If for example p=~90M and b=76, cnt =~.05 * 2^{76} ~ 2^{72} (and most of those will be sieved out or not even generated) N =~2^{72} /2 /90M =~ 2623536935. aim for about the middle, ev=1536=1.5*2^{10} =N*r; ev/N=r=~5.85E7. t=log2(1/r) =~21. round t to integer, t=21. check; ev=N r = N / 2^{t} = 2623536935 / 2^{21} ~ 1251. sigma = sqrt(2623536935 * 2^{21} * (1 2^{21})) =~ sqrt(1251) = 35.37. ev5 sigma =~ 1251 5 * 35.37 = 1251 176.85=~ 1074. while computing the TF, after each (2^{p} mod q) trial, compute H= residue mod 2^{t}; if H=1, add q to a list. When the TF is complete, if no factor is found, the report gets the list of H1 q values appended as a form of validation. That's of order 1250 entries, of roughly 23digit numbers plus a separator character each, an additional 30 KBytes for every NF report. Some size (17%?) could be saved by using hex representation rather than decimal. The computation of the H values itself seems quite quick compared to the base TF trial. Although it is still the evaluation of a multiword residue since in the example q is 77 bits, in 32bit or shorter expression per word, since the modulo is 2^{t}, for t not greater than usable word length, the high words can be ignored. Some of the top TF producers send in millions of attempt results annually each. 30KB each x 7M+ for one prolific contributor is another >210GB of traffic to and storage on the server annually; probably of order triple that for the entire project. Approx 160Kbps average additional sustained rate to the primenet server. And it's probably quite bursty. I think in lists of such size, there should be a CRC or maybe ECC added. So, what is the trivially quick server side verification? Pick a q at random from the list included and verify H=1? It would need to be able to reproduce the t value for the exponent and bit level, or receive it in the message. Last fiddled with by kriesel on 20190224 at 03:59 

20190224, 05:06  #35 
Serpentine Vermin Jar
Jul 2014
3×5×7×31 Posts 
Our mystery spammer was back under a different random girls name. Same or similar porn site link for the web url...
And that web link may lead to the actual motive of this person: forum spamming. If they're reading this thread... don't bother. You may notice that any weblinks to 3rd party sites that appear on mersenne.org have rel="nofollow" so it's not doing you a bit of good to try and billboard the recent results or anything. Take a look, it's true. And if you think somehow a search engine is going to see those and follow them anyway, you're wrong, but keep this up and we'll just remove those links from those pages entirely. By the way, the VPN you're using has a terrible reputation on all of their endpoints. But that's probably why they allow scum like you (or because they allow scum like you). No reputation to protect. At my last job, we ran into web spammers like this on a somewhat regular basis and I know the game. 
20190224, 05:28  #36 
Serpentine Vermin Jar
Jul 2014
3×5×7×31 Posts 
I realized I could just limit those links so they only show up on accounts over a certain age. Keep the bozos from submitting bogus results just so their useless porn links show up. Or not bother showing them at all for "no factor" results. That way we're not throwing the baby out with the bath water.

20190224, 06:35  #37 
Serpentine Vermin Jar
Jul 2014
6267_{8} Posts 
Idea #whatever is to use something like this, either on the account creation page (which I just updated, FYI) or maybe even on each hit to the manual result page:
https://www.stopforumspam.com/ Basically the server would call the API and get a reliability score for the IP address. I noticed that in each of these recent cases, the offending IP was listed as a TOR exit node with terrible reputation scores on that particular service. Like, 99.9% + confidence that anything coming from them is bad. I can do a little "offline" run of that and see how many, if any, of the manual results we got in the past couple days would have been affected. With any service like that, the danger is false positives. 
20190224, 07:00  #38 
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2^{3}·3·13·17 Posts 
If you decide to try and automate detection of bad actors then you will always catch innocent people with it. It just goes with the territory.
Is it really such a problem that catching people manually is not an option? Humans are always better at determining good from bad. Can't we keep the humans in the process? So what if we occasionally get extreme results in the DB? We can go back and erase them, no big deal. 
20190224, 07:31  #39  
Serpentine Vermin Jar
Jul 2014
110010110111_{2} Posts 
Quote:
I just ran a list of IP addresses that have submitted manual results lately through that "forum spam" API and only got one hit. And that one hit turned out to be this doofus back for more. No false positives, although it was a limited sample size, and the one it found was correctly flagged. I also just implemented a change to the "recent results" page so any account under 30 days old won't have their website appear as a link on their user name. The hope being when this person (and I use the term lightly) realized their porn site isn't showing up anywhere, they'll get bored and move on. 

20190224, 11:44  #40  
"Robert Gerbicz"
Oct 2005
Hungary
2457_{8} Posts 
Quote:
Quote:
Testing 12 thousand q primes per p is not an issue. If the value of t is deterministic then you don't need to send it, but maybe sending one byte per p,b is still not a big effort. Last fiddled with by R. Gerbicz on 20190224 at 11:46 

20190224, 14:41  #41  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7×491 Posts 
Quote:
Sending k rather than q knocks off about 27 bits per q or 8 digits per q or a third for my example. Checking all the listed q's would constitute a very considerable increase in the server effort of verification of TF. Instead of per report, running only the occasional one factor found again, at cost ~log2(p) and frequency 1/b, we would also be running on the server, (b1)/b frequency * number ~1500 q's at cost ~log2(p). (1/b + (b1)/b 1500 )/ (1/b) = 112501 ~10^{5}fold ratio. Sampling a dozen or more pseudorandomly selected q's might be a good compromise between increased server loading and detections of discrepancies. I'm unpersuaded of the marginal utility of checking the last thousand or more if all the first dozen to hundred per report check out ok. (How many sigma do we really need? Five sigma, better than 1ppm is impressive. I'd be satisfied with around half that, on a log scale; 0.1% as a great improvement over the status quo. But it's probably George's call.) If just one or two of 1500 fail to check out, what then? Does the server record it as a verified NF report with n verification errors of 1500 values or of however many actually checked? I like the idea of sending also t. The impact on message size is small, and it is one more piece for the rare few people faking results to get wrong. I'd probably choose to express it as " t dd" or some such, around 5 characters, as easily human readable. I'll leave it to folks like Madpoo, Prime95, Scott Kurowski, and James Heinrich to decide what's no problem and what's maybe too much to add to their servers' loadings. Thanks once again for continuing to look for ways to refine the methods of the GIMPS project. This seems an exciting development. Last fiddled with by kriesel on 20190224 at 14:46 

20190224, 15:17  #42  
If I May
"Chris Halsall"
Sep 2002
Barbados
227D_{16} Posts 
Quote:
It is true that /most/ people who do TF'ing don't really care all that much about the credits. But, *some* do. Further, the "No Factor between 2^X and 2^Y" information is almost as valuable as a found factor. Not only to keep track of where the next bit (no joke intended) of TF'ing work should be done, but also P1'ing takes this information into consideration. Lastly, since there would be no "reward" to the participant to submit the "No factor found" information, some _might_ simply not submit such results, thus making others redo the work with the resultant lower "success rate". My 4 cents (BDS).... 

20190224, 15:42  #43  
2×3×199 Posts 
Quote:
+1 Also there would be no reward for the energy spent to find "No Factor between 2^X and 2^Y". 

20190224, 16:17  #44  
"Robert Gerbicz"
Oct 2005
Hungary
2457_{8} Posts 
Quote:
Make the decision per p,b and not per input file. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Results  ET_  Operazione Doppi Mersennes  604  20200326 15:17 
Where have all the TF results gone?...  lycorn  PrimeNet  22  20171002 02:40 
PGS Results  danaj  Prime Gap Searches  0  20170814 18:35 
CPU Results last 24 hrs  Unregistered  Information & Answers  3  20100726 00:49 
0x results...  Mike  PrimeNet  11  20040523 12:55 