mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Twin Prime Search

Reply
 
Thread Tools
Old 2009-08-18, 22:54   #265
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default

I have updated the remove-tpfactors awk script, it can now accept multiple sieve files as input, and so can be used to merge single-n NewPGen files into one multi-n file for tpsieve:
Code:
remove-tpfactors [FACTORFILE ...] INFILE [INFILE] > OUTFILE
where FACTORFILE is zero or more tpsieve factors files, and INFILE is one or more sieve files. (But not ABCD format sieve files). I haven't tested this script on real data, but memory use should be proportional to the number of factors removed, and independent of the size of the input and output sieve files.


Also, tpsieve doesn't double-check factors as they are found, but you can use the check-factors program to do that.
geoff is offline   Reply With Quote
Old 2009-08-18, 23:42   #266
gribozavr
 
gribozavr's Avatar
 
Mar 2005
Internet; Ukraine, Kiev

11·37 Posts
Default

I use some shell, perl and pari magic:

sort_factors.sh
Code:
#!/bin/bash

set -e

sort --ignore-leading-blanks --buffer-size=5% -n -t ' ' -k 1,1 "$@" | uniq
Usage:
./sort_factors.sh tpfactors1.txt tpfactors2.txt ... > factors.txt

This script expects tpsieve-format factor files as input. It sorts all factor files by n and removes duplicate factors.
sort_numbers.sh
Code:
#!/bin/bash

set -e

sort --ignore-leading-blanks --buffer-size=5% -n -t ' ' -k 2,2 -k 1,1 "$@"
Usage:
./sort_numbers.sh num1.txt num2.txt > num.txt

This script expects newpgen-format number files as input. It sorts all files by n, then by k. It leaves multiple header lines if they are different in source files. If that is the case, just use tail -n +N to fix that.

get_p_range.pl
Code:
#!/usr/bin/perl

use warnings;
use strict;

our $max_p_delta = 15000000;
our $prev_p = 0;
our $in_gap = 1;

while(<>)
{
  $_ =~ tr/\r//d;
  chomp($_);
  if(m/^(\d+) /)
  {
    if($prev_p)
    {
      if($in_gap)
      {
        printf("range start: \%17s\n", $prev_p);
        $in_gap = 0;
      }
      if(($1 - $prev_p) >= $max_p_delta)
      {
        my $delta = $1 - $prev_p;
        printf("range end:   \%17s\n", $prev_p);
        printf("delta:       \%17s\n\n", $delta);
        $in_gap = 1;
      }
    }
    $prev_p = $1;
  }
  else
  {
    print STDERR "What's \"$_\"";
  }
}

printf("range end:   \%17s\n", $prev_p);
Usage:
./get_p_range.pl tpfactors.txt

This script expects sorted tpsieve-format factor files as input. It prints ranges of p found in the file. If the gap between consequtive p's is greater then $max_p_delta, script treats it like a gap between ranges. So you should tweak that variable (because as p gets larger, factors are found less frequently). This script is useful to make sure we haven't missed anything.

check_divisors.pl
Code:
#!/usr/bin/perl

use warnings;
use strict;

while(<>)
{
  $_ =~ tr/\r//d;
  chomp($_);
  if(m/^(\d+) \| (\d+)\*2\^(\d+)(-1|\+1)$/)
  {
    print("if((lift($2*Mod(2,$1)^$3$4)!=0),print(\"$_ bad\"))\n");
  }
  else
  {
    print STDERR "What's \"$_\"";
  }
}
Usage:
./get_p_range.pl tpfactors.txt | gp

This script expects tpsieve-format factor files as input. It outputs a pari script to check factors.

gen_delete_file.sh
Code:
#!/bin/bash

set -e

cut -d '|' -f 2 "$@" | tr -d '\t\r ' | tr '*^+-' '    ' | cut -d ' ' -f 1,3 | sort --ignore-leading-blanks --buffer-size=5% -n -t ' ' -k 2,2 -k 1,1 | uniq
Usage:
./gen_delete_file.sh tpfactors.txt > delete.txt

This script expects tpsieve-format factor files as input. It outputs k,n pairs of factored numbers. Pairs are sorted by n, then by k (like number files).

remove_factors.pl
Code:
#!/usr/bin/perl

use warnings;
use strict;

die("usage: $0 numbers.txt factors.txt") if $#ARGV != 1;

open(NUMBERS, '<', $ARGV[0]) or die('can\'t open ' . $ARGV[0] . ': ' . $!);
open(FACTORS, '<', $ARGV[1]) or die('can\'t open ' . $ARGV[1] . ': ' . $!);

# leave header intact
print scalar <NUMBERS>;

my $factor_line = <FACTORS>;

my $number_line;
while($number_line = <NUMBERS>)
{
  # if there are no more factors, just print the numbers
  if(!defined($factor_line))
  {
    print $number_line;
    print <NUMBERS>;
    last;
  }

  my ($number_k, $number_n) = split(' ', $number_line);
  my ($factor_k, $factor_n) = split(' ', $factor_line);
  if(($number_n < $factor_n) ||
     (($number_n == $factor_n) && ($number_k < $factor_k)))
  {
    print $number_line;
  }
  elsif(($number_n == $factor_n) && ($number_k == $factor_k))
  {
    $factor_line = <FACTORS>;
  }
  elsif(($number_n > $factor_n) && ($number_k > $factor_k))
  {
    die("factor \"$factor_line\" not found in numbers file")
  }
}

close(NUMBERS);
close(FACTORS);
Usage:
./remove_factors.pl numbers.txt delete.txt > new-numbers.txt

This script expects as input a sorted newpgen-format number file and a delete.txt (from gen_delete_file.sh). It outputs a new number file with factored numbers removed. Works in one pass, uses little constant amount of memory so it works rather fast even with huge files.

In all shell scripts above you can remove "--buffer-size=5%" argument to sort if you have much free RAM (you can get some speedup in that case).

Last fiddled with by gribozavr on 2009-08-18 at 23:47 Reason: last-minute fix
gribozavr is offline   Reply With Quote
Old 2009-08-19, 02:20   #267
cipher
 
cipher's Avatar
 
Feb 2007

110100112 Posts
Default

Quote:
Originally Posted by geoff View Post
I have updated the remove-tpfactors awk script, it can now accept multiple sieve files as input, and so can be used to merge single-n NewPGen files into one multi-n file for tpsieve:
Code:
remove-tpfactors [FACTORFILE ...] INFILE [INFILE] > OUTFILE
where FACTORFILE is zero or more tpsieve factors files, and INFILE is one or more sieve files. (But not ABCD format sieve files). I haven't tested this script on real data, but memory use should be proportional to the number of factors removed, and independent of the size of the input and output sieve files.


Also, tpsieve doesn't double-check factors as they are found, but you can use the check-factors program to do that.
Is there any way we can have tpsieve using less RAM(for the new twin project), i have few machines which have 512MB RAM even if it has to compromise 30% of perfomance it would be a good trade off.
cipher is offline   Reply With Quote
Old 2009-08-19, 03:08   #268
axn
 
axn's Avatar
 
Jun 2003

23·683 Posts
Default

Quote:
Originally Posted by cipher View Post
Is there any way we can have tpsieve using less RAM(for the new twin project), i have few machines which have 512MB RAM even if it has to compromise 30% of perfomance it would be a good trade off.
Actually, you can sieve it with practically zero loss* of efficiency -- you just have to sieve in pieces. Split the file into n pieces (I guess, in your cases you can do 3 to 5 pieces). Sieve each of them separately. Voila. In fact, using the -n & -N option, you don't even have to split the file!** Just sieve -n 480000 -N 480999, then -n 481000 -N 481999, and so on for the same p-range.

* There will be some loss of efficiency.

** It is still advisable to split the file -- faster load up times.

Last fiddled with by axn on 2009-08-19 at 03:11
axn is offline   Reply With Quote
Old 2009-08-19, 13:32   #269
cipher
 
cipher's Avatar
 
Feb 2007

211 Posts
Default

Quote:
Originally Posted by axn View Post
Actually, you can sieve it with practically zero loss* of efficiency -- you just have to sieve in pieces. Split the file into n pieces (I guess, in your cases you can do 3 to 5 pieces). Sieve each of them separately. Voila. In fact, using the -n & -N option, you don't even have to split the file!** Just sieve -n 480000 -N 480999, then -n 481000 -N 481999, and so on for the same p-range.

* There will be some loss of efficiency.

** It is still advisable to split the file -- faster load up times.
Thanks axn It is approx 6% less efficient but it works. Great.
cipher is offline   Reply With Quote
Old 2009-08-19, 18:03   #270
Flatlander
I quite division it
 
Flatlander's Avatar
 
"Chris"
Feb 2005
England

31×67 Posts
Default tpsieve

Is it expected behaviour for the tpfactors.txt file to contain more than one entry for some k/n pairs?
e.g.

10000554877 | 998451*2^504623-1
23822842763 | 998451*2^504623+1
92136264541 | 998451*2^504623+1

(Windows, TPS 0.2.4 110809)

edit:
Maybe my fault. See below.

Last fiddled with by Flatlander on 2009-08-19 at 18:34
Flatlander is offline   Reply With Quote
Old 2009-08-19, 18:14   #271
axn
 
axn's Avatar
 
Jun 2003

23·683 Posts
Default

Quote:
Originally Posted by Flatlander View Post
Is it expected behaviour for the tpfactors.txt file to contain more than one entry for some k/n pairs?
e.g.

10000554877 | 998451*2^504623-1
23822842763 | 998451*2^504623+1
92136264541 | 998451*2^504623+1

(Windows, TPS 0.2.4 110809)
Were these from a single sieve session or from multiple sieve sessions? We know that tpsieve doesn't modify the sieve file for found factors. I _believe_ it does remove it from the in-memory bitmap, in which case, for the same sieve session, this shouldn't happen. Geoff/Ken_g6?

Last fiddled with by axn on 2009-08-19 at 18:15
axn is offline   Reply With Quote
Old 2009-08-19, 18:23   #272
Flatlander
I quite division it
 
Flatlander's Avatar
 
"Chris"
Feb 2005
England

31·67 Posts
Default

A single sieve: "tpsieve -i allc.txt -p 10e9 -P 100e9"

edit:
There are other double entries, not sure about more triple entries.

edit:
Yes, there are other triples.

edit:
Maybe I messed up with the sieving? I'd better run the range again. Sorry.

edit:
I don't see how I messed up. I stitched the files together from NewPGen, removed all the headers except the first, then took out the blank lines. That's how I ended up with allc.txt.

Last fiddled with by Flatlander on 2009-08-19 at 18:40
Flatlander is offline   Reply With Quote
Old 2009-08-19, 19:13   #273
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5·79 Posts
Default

Quote:
Originally Posted by axn View Post
Were these from a single sieve session or from multiple sieve sessions? We know that tpsieve doesn't modify the sieve file for found factors. I _believe_ it does remove it from the in-memory bitmap, in which case, for the same sieve session, this shouldn't happen. Geoff/Ken_g6?
Nope, it doesn't. I never thought it would be a problem. I can fix it if you want.

Edit: Though I should mention, this could lead to different sieve file results depending on whether you interrupt the sieve, or how you break it up. Maybe that's why Geoff didn't do it before?

Last fiddled with by Ken_g6 on 2009-08-19 at 19:29
Ken_g6 is offline   Reply With Quote
Old 2009-08-19, 19:22   #274
axn
 
axn's Avatar
 
Jun 2003

23·683 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
Nope, it doesn't. I never thought it would be a problem. I can fix it if you want.
It really isn't a problem, per se. More like an (extremely small) annoyance. You would want every factor you submit to count.

BTW, if the speed loss is less than 0.1% @ p = 10G, then I think it is worth it. Otherwise, not.

Last fiddled with by axn on 2009-08-19 at 19:23
axn is offline   Reply With Quote
Old 2009-08-20, 00:24   #275
gribozavr
 
gribozavr's Avatar
 
Mar 2005
Internet; Ukraine, Kiev

6278 Posts
Default

(Just for statistics) We even have a number with 7 known factors:

23884746253 | 6257667*2^483102+1
41690700607 | 6257667*2^483102+1
109867225619 | 6257667*2^483102+1
292122145397 | 6257667*2^483102+1
381821327087 | 6257667*2^483102+1
540756549527 | 6257667*2^483102+1
1505506019099 | 6257667*2^483102+1

Last fiddled with by gribozavr on 2009-08-20 at 00:25
gribozavr is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
S9 and general sieving discussion Lennart Conjectures 'R Us 31 2014-09-14 15:14
Sieving discussion thread philmoore Five or Bust - The Dual Sierpinski Problem 66 2010-02-10 14:34
Combined sieving discussion ltd Prime Sierpinski Project 76 2008-07-25 11:44
Sieving Discussion ltd Prime Sierpinski Project 26 2005-11-01 07:45
Sieving Discussion R.D. Silverman Factoring 7 2005-09-30 12:57

All times are UTC. The time now is 13:33.


Fri Jul 7 13:33:04 UTC 2023 up 323 days, 11:01, 0 users, load averages: 1.19, 1.22, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔