mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2016-06-10, 05:38   #89
cgy606
 
Feb 2012

32×7 Posts
Default

Quote:
Originally Posted by WraithX View Post
Recently I noticed that the ecm.py script would crash if it was given factorial or primorial input strings. This is because the python "eval" function can't handle these characters. So, instead of writing my own equation parser to figure out how many digits are in these input strings, I've just grabbed the output from ecm.exe to see how many digits it reports are in the input number. So,

Announcing ecm.py v0.35:
Code:
Fixed:
  - ecm.py no longer calculates number of digits on its own, it reads this information from the ecm executable.  This fixed a problem where the python "eval" function would crash when it encountered factorial or primorial characters.
    ie, you can now do: echo "140!+1" | python ecm.py 1e6 and it will work correctly, without crashing.
  - also fixed the output when the ecm binary is not found.  It will no longer print out the misleading "ECM_BIN_PATH", it will print out "ECM_PATH" to match the variable name in the python code.
Great, I have been factoring factorial +/- numbers.
cgy606 is offline   Reply With Quote
Old 2016-06-10, 06:06   #90
cgy606
 
Feb 2012

778 Posts
Default

Quote:
Originally Posted by WraithX View Post
I believe I can add this functionality. However, there are some questions about how to proceed. There are several cases that need to be considered:
1) If the factor is prime and the cofactor is prime, I think it's obvious we can stop.
2) If the factor is prime and the cofactor is composite
3) If the factor is composite and the cofactor is prime
4) If the factor is composite and the cofactor is composite

Should we start running the original number of curves on the new composite? Or only run the remaining number of curves on the composite? My guess from what you are asking is to "run the remaining number of curves" on the new composite. And then also, if two (or more) composites are found, should we run the remaining number of curves on each new composite, or run the original number of curves on each composite?

The second part of your question (keep running until the number is completely factored) I think is outside the scope of ecm.py. Much like it is outside the scope of ecm.exe. There is no logic (in ecm.py or ecm.exe) that can keep running curves, and keep choosing bounds to find potentially larger and larger factors that can adequately take into account: the amount of ram a system has available (or will have available in the future), and when the best time to switch over to different factoring methods, like gnfs/snfs, would be. That is more in the realm of yafu.

I can implement the first part of your question, but won't be implementing the second part. Also, since I'm now using the gpu capabilities of ecm, I need a way to start multiple ecm's to resume those saved stage1 residues. So, I'll be adding in the ability for ecm.py to recognize and work with the "-resume" option.

Also, one thing you should know about how paths work. In a previous post you wrote that you set:
ECM_PATH = './usr/local/bin'
That would not point to the /usr/local/bin directory, because you have a "." in front of it. The "." means start from the current working directory and look into these sub folders for what you want. So, if you ran the ecm.py script from the (made up) directory
/Users/cgy606/Documents/ecmpy/ and had set ECM_PATH = './usr/local/bin', it would look for the ecm executable in the directory:
/Users/cgy606/Documents/ecmpy/usr/local/bin/
Which proabably doesn't exist, and so will definitely fail. You could have set:
ECM_PATH = '/usr/local/bin'
Without the leading "." and that would have worked because it is an absolute path, and not a relative path. Since you've got it working now, you don't need to change it, but I wanted to let you know about the difference between absolute paths and relative paths.
So some food for thought on how to proceed. Clearly we need not be concerned about case '1' as in principal that is already implemented... we check the factor found and the cofactor for primality, if they both pass, kill the script and drink a beer.

Case '2' is the most fruitful of our efforts. The 'best' way to proceed IMHO is to kill all the threads running (assuming that the script starts and stops at roughly the same time at the start of each curve, the way that yafu works), test the primality of each factor. If the larger one is composite (WLOG let us assume that the smaller factor is the one found), then the script determines how many curves have been completed (at the current B1/B2 values), calculates how many curves remain in order to complete the original input, and then reschedules the remain number of curves given the number of threads being used. To illustrate:

Factoring C170 B1 = 3e6 B2 =### threads = 4 total curves remaining = 2352
factor found prp37 (curve 221 thread = 1 sigma = ###)
composite cofactor C134

Factoring C134 B1 = 3e6 B2 =### threads = 4 total curves = 1468

I think you get the idea...

Case '3' is nothing more than a transpose of case '2'. We found a "small" composite factor and a large probable prime factor. In principal we could reschedule the curves like we did in case '2' but their is probably a better way to factor this number, which I will explain in case '4'

For case '4' we find 2 composite cofactors. Let's assume for the sake of argument that one is larger (i.e. more decimal digits than the other). We could continue factoring that one in the same fashion as we did in case '2', but let's turn our attention to the smaller one. What does it mean when ecm finds two composite cofactors? Usually it means that smaller factors were not eliminated from the beginning (i.e. with some other factoring method like trial division or rho P+1/P-1) and thus B1 was selected so high that in a single curve, it effectively found 2 factors and not one (we would like to claim this was intentional but no one would believe this statement)! Anyways, if it finds a factor of N digits, then the smallest cofactor of this composite number can have at most ~N/2 digits. But how large is this composite cofactor of the original number we are factoring expected to be? Well, the current ecm record is 82 digits (I think). For the sake of argument, let's be a little conservative and assume that somebody out their runs a curve at B1 = 25e9 (or something crazy like that) on a C300 number and finds a C90 and a C210 (lucky!!!). Clearly, one of the C90 factors should have been found be about 5k curves at B1 = 11e6 (on average of course). In principal we could run ecm on the C90 until B1 = 11e6 bounds or we could let SIQS or some other factoring algorithm hack at it (perhaps even trial division given that maybe even smaller factors were not eliminated from the C300 to begin with). Anyways, the story I am trying to paint here is that if two composites are found, it basically means that a very large B1 bound was selected while at the same time small factors were not eliminated. Given that this cofactor is not large (less than 90 digits or so), we should focus on the larger (and more important cofactor to continuing factoring on) and reschedule the remaining curves for that guy analogous to case '2'.

I hope this makes sense...
cgy606 is offline   Reply With Quote
Old 2016-06-18, 20:25   #91
WraithX
 
WraithX's Avatar
 
Mar 2006

1110110002 Posts
Default Announcing ecm.py v0.36...

Announcing ecm.py v0.36:
Code:
New Feature:
 - Added the ability for ecm.py to perform the remaining number of requested curves on any composite factors found.
  (You can activate this by setting "find_one_factor_and_stop = 0", it is 1 by default)
I've added the ability for ecm.py to continue working on any composite factors found, it will perform the remaining number of requested curves. I've run quite a few tests locally and it seems to work well. However, if you do run into any problems, please let me know.
Attached Files
File Type: zip ecm-py_v0.36.zip (16.2 KB, 67 views)
WraithX is offline   Reply With Quote
Old 2016-07-10, 04:24   #92
WraithX
 
WraithX's Avatar
 
Mar 2006

23·59 Posts
Default Announcing ecm.py v0.38...

Announcing ecm.py v0.38:
Code:
New feature:
 - Added the ability for ecm.py to resume a GMP-ECM (compatible) save file, and
   it will evenly distribute the resume lines across several instances of GMP-ECM
Calling this can be as simple as:
ecm.py -resume resume.txt

Or you can use additional options, like:
Code:
ecm.py -threads 3 -out output.txt -maxmem 300 -pollfiles 60 -resume resume.txt
------------------------------------------------------------------------------
Which would spread the resume lines from resume.txt across 3 instances of gmp-ecm,
and give each one the command line option "-maxmem 100"  ( = 300/3)
and poll the output files every 60 seconds to look for factors, or see if a gmp-ecm instance has finished
and save all gmp-ecm output to the file output.txt
 * Like always, you can specify the "threads" and "pollfiles" options inside the script
Here is a description of this new feature, which can also be found in the script:
Code:
# If we are using the "-resume" feature of gmp-ecm, we will make some assumptions about the job...
# 1) This is designed to be a _simple_ way to speed up resuming ecm by running several resume jobs in parallel.
#      ie, we will not try to replicate all resume capabilities of gmp-ecm
# 2) If we find identical lines in our resume file, we will only resume one of them and skip the others
#      - If this happens, we will print out a notice to the user (if VERBOSE >= v_normal) so they know what is going on
# 3) We will use the B1 value in the resume file, and not resume with higher values of B1
# 4) We will let gmp-ecm determine which B2 value to use, which can be affected by "-maxmem" and "-k"
# 5) We will try to split up the resume work evenly between the threads.
#     - We will put total/num_threads resume lines into each file, and total%num_threads files will each get one extra line.
#      At the end of a job or when restarting a job, we will write any completed resume lines out to a "finished file"
#      This "finished file" will be used to help us keep track of work done, in case we are interrupted and need to (re)resume later
#      We will query the output files once every poll_file_delay seconds.
#    resume_job_<filename>_inp_t00.txt # input resume file for use by gmp-ecm in thread 0
#    resume_job_<filename>_inp_t01.txt # input resume file for use by gmp-ecm in thread 1
#    ...etc...
#    resume_job_<filename>_out_t00.txt # output file for resume job of gmp-ecm in thread 0
#    resume_job_<filename>_out_t01.txt # output file for resume job of gmp-ecm in thread 1
#    ...etc...
#    resume_job_<filename>_finished.txt # file where we write out each resume line that we have finished with gmp-ecm
#    where <filename> is based on the resume file name, but with any "." characters replaced by a dash.
I know this skips over v0.37. I have created a version 0.37 with similar functionality, but it put each resume line into its own file (one at a time, not all at once) and would give that input file to gmp-ecm to resume, and save the output to another file. Once that resume line was finished processing, it would delete both the input and output files, and then move on to the next resume line. So, if a resume file had 1000 lines to resume, then the script would created/delete 1000 input files and 1000 output files. I didn't want to tax any filesystems by creating/deleting so many files, so I rewrote it as detailed above.
Attached Files
File Type: zip ecm-py_v0.38.zip (22.7 KB, 70 views)
WraithX is offline   Reply With Quote
Old 2016-07-10, 05:29   #93
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

32×193 Posts
Default

This is AWESOME. Thanks for making this update.
wombatman is offline   Reply With Quote
Old 2016-07-14, 12:45   #94
swellman
 
swellman's Avatar
 
Jun 2012

1011001001002 Posts
Default

+1

Fantastic functionality. Love it!
swellman is online now   Reply With Quote
Old 2016-08-03, 06:14   #95
UBR47K
 
UBR47K's Avatar
 
Aug 2015

1111012 Posts
Default

Is there anyway to specify B1 value when using the "-resume" switch?
I'd like to use GMP-ECM for stage 2 with Prime95 stage 1 results.txt
UBR47K is online now   Reply With Quote
Old 2016-08-05, 23:46   #96
cgy606
 
Feb 2012

32·7 Posts
Default

I tried running the script on a resume file produced from a gpu stage 1 run. I am getting the following error:


python ecm.py -threads 8 -resume gpu.save
-> ___________________________________________________________________
-> | Running ecm.py, a Python driver for distributing GMP-ECM work |
-> | on a single machine. It is copyright, 2011-2016, David Cleaver |
-> | and is a conversion of factmsieve.py that is Copyright, 2010, |
-> | Brian Gladman. Version 0.38 (Python 2.6 or later) 7th Jul 2016 |
-> |_________________________________________________________________|

-> Resuming work from resume file: gpu.save
-> Spreading the work across 8 thread(s)
->=============================================================================
-> Working on the number(s) in the resume file: gpu.save
-> Using up to 8 instances of GMP-ECM...
-> Found 1024 unique resume lines to work on.
-> Will start working on the 1024 resume lines.
Traceback (most recent call last):
File "ecm.py", line 2393, in <module>
parse_ecm_options(sys.argv, set_args = True, first = True)
File "ecm.py", line 2235, in parse_ecm_options
run_ecm_resume_job()
File "ecm.py", line 1850, in run_ecm_resume_job
threadList = [[i, '', 0, '', '', [], False] for i in xrange(intNumThreads)]
NameError: name 'xrange' is not defined

Any ideas about what is going wrong?
cgy606 is offline   Reply With Quote
Old 2016-08-06, 00:19   #97
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

10D616 Posts
Default

Looks like you didn't give B1 or B2 parameters to ecm.py.

When I do stage 2 from a GPU'ed stage 1, I put on the command line the same B1 value I ran Stage 1 on (note you can put a higher one here, and it'll use the CPU to extend B1 before starting stage 2).
VBCurtis is offline   Reply With Quote
Old 2016-08-06, 00:28   #98
cgy606
 
Feb 2012

32·7 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Looks like you didn't give B1 or B2 parameters to ecm.py.

When I do stage 2 from a GPU'ed stage 1, I put on the command line the same B1 value I ran Stage 1 on (note you can put a higher one here, and it'll use the CPU to extend B1 before starting stage 2).
The command line input that the ecm.py creator posted didn't indicate a B1 or B2 value. I tried it byadding the B1 and B2 values at the end of the command line, no effect:


python ecm.py -threads 8 -resume gpu.save 11e6 35133391030
-> ___________________________________________________________________
-> | Running ecm.py, a Python driver for distributing GMP-ECM work |
-> | on a single machine. It is copyright, 2011-2016, David Cleaver |
-> | and is a conversion of factmsieve.py that is Copyright, 2010, |
-> | Brian Gladman. Version 0.38 (Python 2.6 or later) 7th Jul 2016 |
-> |_________________________________________________________________|

-> Resuming work from resume file: gpu.save
-> Spreading the work across 8 thread(s)
->=============================================================================
-> Working on the number(s) in the resume file: gpu.save
-> Using up to 8 instances of GMP-ECM...
-> Found 1024 unique resume lines to work on.
-> Will start working on the 1024 resume lines.
Traceback (most recent call last):
File "ecm.py", line 2393, in <module>
parse_ecm_options(sys.argv, set_args = True, first = True)
File "ecm.py", line 2235, in parse_ecm_options
run_ecm_resume_job()
File "ecm.py", line 1850, in run_ecm_resume_job
threadList = [[i, '', 0, '', '', [], False] for i in xrange(intNumThreads)]
NameError: name 'xrange' is not defined

Last fiddled with by cgy606 on 2016-08-06 at 00:29
cgy606 is offline   Reply With Quote
Old 2016-08-06, 01:18   #99
UBR47K
 
UBR47K's Avatar
 
Aug 2015

3D16 Posts
Default

Try running with python2. That error happens when you try to run the script with python 3
UBR47K is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Python Coding Help? kelzo Programming 3 2016-11-27 05:16
PHP vs. Python vs. C (all with GMP) daxmick Programming 2 2014-02-10 01:45
Python... Xyzzy Programming 20 2009-09-08 15:51
using libecm from python yqiang GMP-ECM 2 2007-04-22 00:14
Help w/ python. a216vcti Programming 7 2005-10-30 00:37

All times are UTC. The time now is 18:32.

Sun Sep 20 18:32:38 UTC 2020 up 10 days, 15:43, 0 users, load averages: 1.66, 1.45, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.