mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2004-11-29, 22:10   #78
Mystwalker
 
Mystwalker's Avatar
 
Jul 2004
Potsdam, Germany

3·277 Posts
Default

Quote:
Originally Posted by error404
Any ideas on how to remove the text from the ?.dat files?
Do the relations have a specific characteristic (like "relation" at the beginning)?
If yes, you can use grep to filter out this pattern.
Mystwalker is offline   Reply With Quote
Old 2004-11-29, 22:55   #79
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101110101012 Posts
Default

Quote:
Originally Posted by Mystwalker
Do the relations have a specific characteristic (like "relation" at the beginning)?
If yes, you can use grep to filter out this pattern.
A savefile has four kinds of information in it. The first line
contains the number being factored, in hex. You can get it
with

head -1 msieve.dat

All other lines start with either 'A' (polynomial A value),
'B' (polynomial B value) or 'R' (relation). So

grep "^[ABR]" msieve.dat

will pull out all the actual factoring-relevant stuff, which
can be redirected to another file.

The larger question, of course, is why in heaven's name there's
extraneous information in the savefile at all. Note that the
code should be ignoring all lines that are not the first and
do not start with 'A', 'B' or 'R'

jasonp
jasonp is offline   Reply With Quote
Old 2004-11-29, 22:58   #80
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
Originally Posted by error404
Jason,

Using Paul's suggestion and filtering the 5 data files creates a
248 MB file as opposed to concatenating them and creating a 593 MB
file.
Tom,

msieve.dat contains both polynomials and relations; each relation
comes after the polynomial that it requires, so sorting the file
would corrupt it.

jasonp
jasonp is offline   Reply With Quote
Old 2004-11-29, 23:48   #81
error404
 
error404's Avatar
 
Sep 2003

1010112 Posts
Default # grep "^[ABR]" msieve.dat

Jason,

I re-read Paul's letter and it clearly said to use "UNIX". Changing from
Slackware to FreeBSD, and
# sort whatever.dat | uniq > whereever.dat
does not have the advertisement at the tail.

OTOH,
# sort whatever.dat | uniq
under FreeBSD has a 250 MB file size limit which Slackware does not have.

After laboriously filtering and concatenating the files individually, I logged
on and read your most recent post about
# grep "^[ABR]" msieve.dat > whoever.dat
and the above became irrelevant.


Using
# grep "^[ABR]" msieve5.dat > msieve.dat
cleans out the lines of C code but apparently this corrupts the file
since qs085 rejects it and starts from scratch.

This being on my fastest system and the clip below shows the status
which I still hope will finish tomorrow.

I am going to leave it alone for now.

Tom

******************************************************************************/

Script started on Mon Nov 29 17:25:40 2004
gigabyte#
gigabyte# time ./qs085 < rsa110.txt

Msieve v. 0.85
random seeds: 0000125d 41abb016
input to factor: factoring 35794234179725868774991807832568455403003778024228226193532908190484670252364677411513516111204504060317568667
Mon Nov 29 17:26:14 2004
using multiplier of 3
Mon Nov 29 17:26:16 2004
using sieve block of 65536
using a sieve bound of 2016551 (74992 primes)
using large prime bound of 604965300
using double large prime bound of 6413791293514800

sieving in progress (press Ctrl-C to pause)
found 68127 relations (2343 full + 65784 partial), need 75120

******************************************************************************/
error404 is offline   Reply With Quote
Old 2004-11-30, 00:16   #82
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
Originally Posted by jasonp
msieve.dat contains both polynomials and relations; each relation
comes after the polynomial that it requires, so sorting the file
would corrupt it.
Just so everyone is clear, the intended way to use msieve
in distributed mode is as follows:

1. Start up separate msieve processes on separate machines
(or at least in separate directories on the same machine)

2. After a while, hit Ctrl-C for each running copy of msieve

3. Concatenate all the msieve.dat files into a single file; rename
it to msieve.dat (make sure it doesn't overwrite one of the other
savefiles)

4. Attempt to restart one msieve process using this concatenated
file. If it finds enough relations, then the factorization should finish.
If it doesn't find enough relations, it will start sieving by itself.
Hit Ctrl-C to stop it.

5. If there are not enough relations, or there were corrupted relations
that caused the process in step 4 to exit with an error, delete the
savefile from step 4 and then restart each sieving machine from its own
(unmodified) savefile. Go to step 2; repeat the cycle until the factorization
succeeds.

The overriding rule to remember is this: when a relation is found,
it goes into one savefile. When creating the combined savefile in
step 3, that relation *must only appear once*. If it appears more than
once, the run in step 4 will remove any duplicates.

Tom, is this what you did for your RSA110 run? I regret not making
clear that the process will not go any faster if you attempt to feed
relations back to sieving machines

There's still the problem of failing to read relations that Tom and SMH
both have run into; I can't reproduce the behavior here, and it doesn't
seem to happen all the time either.

jasonp
jasonp is offline   Reply With Quote
Old 2004-11-30, 00:22   #83
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101110101012 Posts
Default

Quote:
Originally Posted by error404
Using
# grep "^[ABR]" msieve5.dat > msieve.dat
cleans out the lines of C code but apparently this corrupts the file
since qs085 rejects it and starts from scratch.
There's actually one more step; the first line in the file must
be the number to be factored. The code uses this fact to detect
whether a new factorization is starting or an old one is restarting.
So the complete sequence is

head -1 msieve5.dat > msieve.dat
grep "^[ABR]" msieve5.dat >> msieve.dat

jasonp
jasonp is offline   Reply With Quote
Old 2004-11-30, 01:54   #84
error404
 
error404's Avatar
 
Sep 2003

43 Posts
Default Problems Solved ...

Jason,

Problem 1:
using
# head -1 msieve5.dat > msieve.dat
# grep "^[ABR]" msieve5.dat >> msieve.dat
worked correctly.

Problem 2:
"failed to read relation 264463"
is easily reproduced by exceeding 200+ MB for msieve.dat.
*** WAG ***
I will "GUESS" that the maximum number of polynomials and/or relations
has been exceeded???

Tom

/******************************************************************************/
p4-2682#
p4-2682# ls -l
|
|
-rwxr-xr-x 1 root wheel 143498132 Nov 29 19:32 msieve3.dat
-rwxr-xr-x 1 root wheel 131637295 Nov 29 19:31 msieve4.dat
|
|
p4-2682#
p4-2682# cat msieve3.dat msieve4.dat >> msieve.dat
p4-2682#
p4-2682# ls -l
|
|
-rwxr-xr-x 1 root wheel 275135427 Nov 29 19:36 msieve.dat
-rwxr-xr-x 1 root wheel 143498132 Nov 29 19:32 msieve3.dat
-rwxr-xr-x 1 root wheel 131637295 Nov 29 19:31 msieve4.dat
|
|
p4-2682#
p4-2682#
p4-2682#
p4-2682#
p4-2682# time ./qs085 < rsa110.txt

Msieve v. 0.85
random seeds: 0000119f 41abcf60
input to factor: factoring 35794234179725868774991807832568455403003778024228226193532908190484670252364677411513516111204504060317568667
Mon Nov 29 19:39:44 2004
using multiplier of 3
Mon Nov 29 19:39:47 2004
using sieve block of 65536
using a sieve bound of 2016551 (74992 primes)
using large prime bound of 604965300
using double large prime bound of 6413791293514800
restarting with 4408 full and 651412 partial relations

found 278171 relations (4408 full + 273763 partial), need 75120
begin with 651412 relations
reduce to 438521 relations in 5 passes
attempting to read 4408 full and 438521 partial relations
failed to read relation 264463
failed to read relation 264486
|
|
failed to read relation 437711
failed to read relation 437760
recovered 4394 full and 438458 partial relations
recovered 416247 polynomials
freed 1839 duplicate relations
freed 273209 duplicate relations
rebuilding graph...
attempting to build 621 cycles
found 621 cycles in 3 passes
distribution of cycle lengths:
length 2 : 457
length 3 : 126
length 4 : 29
length 5 : 7
length 6 : 1
length 7 : 1
largest cycle: 7 relations
Mon Nov 29 19:40:50 2004
error: not enough relations to find linear dependencies
54.867u 2.127s 1:06.01 86.3% 87+155610k 3412+0io 2pf+0w
p4-2682#

/******************************************************************************/
error404 is offline   Reply With Quote
Old 2004-11-30, 04:56   #85
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Version 0.86 has been released. Highlights include:

- several bugfixes with performance implications. Sieving for
factorizations up to at least 90 digits, and possibly more, is
about 10% faster.

- way more paranoia with reading from the savefile; I've also
added an up-front verification pass on all relations, so that
if a savefile is corrupted somehow the odds are higher that
sieving will happen anyway

- I'm blindly attempting to fix the problems multiple people have
reported with failure to read relations, usually after sieving for
a big factorization has finished. I'm guessing it has something to
do with buffered formatted output being written too fast to the
savefile, and so all savefile writes now use manual buffering with
an explicit flush. I have no idea if this will resolve the problems
folks have been seeing; the only way to find out is to perform more
big factorizations.

Good luck,
jasonp
jasonp is offline   Reply With Quote
Old 2004-11-30, 13:12   #86
error404
 
error404's Avatar
 
Sep 2003

43 Posts
Default failed to read relation 679827

Jason,

>- I'm blindly attempting to fix the problems multiple people have
>reported with failure to read relations, usually after sieving for
>a big factorization has finished. I'm guessing it has something to
>do with buffered formatted output being written too fast to the
>savefile, and so all savefile writes now use manual buffering with
>an explicit flush. I have no idea if this will resolve the problems
>folks have been seeing; the only way to find out is to perform more
>big factorizations.

Same problem. Combining the msieve.dat files from both of my AMD64
systems and "failed to read relation 679827".

Tom

/******************************************************************************/

root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08# cat msieve5.dat >> msieve.dat
root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08# ls -l
|
|
-rw-r--r-- 1 root root 334460589 2004-11-30 06:41 msieve.dat
-rwxr--r-- 1 root root 160132381 2004-11-30 06:38 msieve5.dat
|
|
root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08# gcc34 -O3 -march=i686 *.c -o qs086 -lm
root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08# time ./qs086 < rsa110.txt

Msieve v. 0.86
random seeds: 000013f2 41ac6adb
input to factor: factoring 35794234179725868774991807832568455403003778024228226193532908190484670252364677411513516111204504060317568667
Tue Nov 30 06:43:07 2004
using multiplier of 3
Tue Nov 30 06:43:10 2004
using sieve block of 65536
using a sieve bound of 2016551 (74992 primes)
using large prime bound of 604965300
using double large prime bound of 6413791293514800
skipping 2314 corrupt relations
restarting with 5267 full and 787766 partial relations

found 381771 relations (5267 full + 376504 partial), need 75120
begin with 790061 relations
reduce to 629170 relations in 5 passes
attempting to read 5283 full and 629170 partial relations
failed to read relation 264463
failed to read relation 264486
|
|
failed to read relation 679827
failed to read relation 679833
recovered 5267 full and 629054 partial relations
recovered 596125 polynomials
freed 2492 duplicate relations
freed 375355 duplicate relations
attempting to build 752 cycles
found 752 cycles in 2 passes
distribution of cycle lengths:
length 2 : 530
length 3 : 150
length 4 : 48
length 5 : 19
length 6 : 1
length 7 : 3
length 8 : 1
largest cycle: 8 relations
Tue Nov 30 06:44:30 2004
error: not enough relations to find linear dependencies

real 1m22.599s
user 1m21.190s
sys 0m1.230s
root@k8-2200:/home/k5gj/QS08#
root@k8-2200:/home/k5gj/QS08#

/******************************************************************************/
error404 is offline   Reply With Quote
Old 2004-11-30, 13:58   #87
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
Originally Posted by error404
Same problem. Combining the msieve.dat files from both of my AMD64
systems and "failed to read relation 679827".
I don't think this version will fix savefiles that are already corrupted;
the most I'm hoping for is that it will not *produce* corrupted savefiles.

It's interesting that there are only 2300 corrupt relations going into
the postprocessing but that the filtering phase (which uses the same
code to read the savefile) fails on many more of them.

I'm running RSA110 on 3 CPUs here; maybe it will turn up something
(over the course of the week!)

jasonp
jasonp is offline   Reply With Quote
Old 2004-11-30, 23:22   #88
dave_dm
 
May 2004

1208 Posts
Default

Jason,

Looks good so far, I've tried a few numbers in the 80 digit range and it works fine.

The only problem I had was compiling it - gcc couldn't find aligned_malloc or aligned_free and they don't seem to be in any include file on my system. (FYI I run gcc 3.4.3, glibc 2.3.4-20041102). I changed these to normal malloc/frees and all was ok (apart from the nonalignment!) Have any other Linux users had this problem or had I just forgotten to link something in? If not, it's an easy task to write homemade versions of these functions.

I join with others in suggesting that you write a Makefile to go with the next release - else I'll write one myself

Dave
dave_dm is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Utility of integer factorization. jwaltos Other Mathematical Topics 8 2015-05-22 12:20
File Splitting Utility Antonio Software 5 2013-04-18 14:22
Low-powered motherboard of adequate capability sought fivemack Hardware 1 2011-12-21 19:26
Implementing MPQS: SOS! smoking81 Factoring 10 2007-10-02 12:30
Prime Shuffle Utility HiddenWarrior Programming 6 2004-11-04 05:21

All times are UTC. The time now is 01:30.


Sat Jul 17 01:30:26 UTC 2021 up 49 days, 23:17, 1 user, load averages: 2.07, 1.34, 1.25

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.