![]() |
Good to know, thanks. That invalid character message looks harmless (the .ini output appeared to be correct), but it's on my list to try and fix.
Also cool to see a crossover of 95+ digits on a win32 system. In comparison, Win64 systems are typically 100+ digits while linux 64bit OS's are more like 90-92 digits. |
After 1-2 hours of a intense use yafu 1.25 (started bchaffin's script for mass factoring on 8 threads) it has started to fail with errors.
Example: [CODE]# echo 'factor(422296712117246876775829906558964408942454424061098352293)'|../yafu/yafu-32k-linux64 factoring 422296712117246876775829906558964408942454424061098352293 using pretesting plan: normal div: primes less than 10000 fmt: 1000000 iterations rho: x^2 + 1, starting 1000 iterations on C57 rho: x^2 + 3, starting 1000 iterations on C57 rho: x^2 + 2, starting 1000 iterations on C57 pp1: starting B1 = 20K, B2 = gmp-ecm default on C57 pp1: starting B1 = 20K, B2 = gmp-ecm default on C57 pp1: starting B1 = 20K, B2 = gmp-ecm default on C57 pm1: starting B1 = 100K, B2 = gmp-ecm default on C57 Couldn't allocated shared memory segment in ECM[/CODE]strace output: [CODE]<...> open("factor.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=8197, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0edb5d7000 fstat(4, {st_mode=S_IFREG|0644, st_size=8197, ...}) = 0 lseek(4, 8197, SEEK_SET) = 8197 close(4) = 0 munmap(0x2b0edb5d7000, 4096) = 0 rt_sigaction(SIGINT, {0x445d00, [INT], SA_RESTORER|SA_RESTART, 0x33ede302d0}, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x33ede302d0}, 8) = 0 shmget(IPC_PRIVATE, 4, 0600) = -1 ENOSPC (No space left on device) write(1, "Couldn't allocated shared memory"..., 48Couldn't allocated shared memory segment in ECM ) = 48[/CODE]There are enough free memory to run. [CODE]# free total used free shared buffers cached Mem: 2054196 700512 1353684 0 319760 126104 -/+ buffers/cache: 254648 1799548 Swap: 2048276 0 2048276[/CODE]What's wrong with it? |
It seems that yafu doesn't release the shared memory after use. According to /proc/sysvipc/shm, memory blocks are reserved by yafu processes that already correctly finished the work. I've released shared memory using ipcrm tool and it works again.
|
[QUOTE=unconnected;261814]It seems that yafu doesn't release the shared memory after use. According to /proc/sysvipc/shm, memory blocks are reserved by yafu processes that already correctly finished the work. I've released shared memory using ipcrm tool and it works again.[/QUOTE]
Thanks for the report -- what OS are you running? I haven't had any problems on SUSE 11 or Ubuntu, but maybe the release of shared memory segments is not consistent across all Linux flavors. I'll see if I can figure out how to free them cleanly. Just to confirm -- you're not factoring lots of numbers within a single yafu process, right? If you're using my scripts then each factorization should be done with its own process, so the problem must be that the shared memory segments persist even after the process terminates. |
[QUOTE=bchaffin;261848]Thanks for the report -- what OS are you running? I haven't had any problems on SUSE 11 or Ubuntu, but maybe the release of shared memory segments is not consistent across all Linux flavors. I'll see if I can figure out how to free them cleanly.[/quote]
It is CentOS 5.5 x86_64 with 2.6.18 kernel. [quote] Just to confirm -- you're not factoring lots of numbers within a single yafu process, right? If you're using my scripts then each factorization should be done with its own process, so the problem must be that the shared memory segments persist even after the process terminates.[/QUOTE] Yes, I run 8 instances of yafu, each uses one cpu core. |
[QUOTE=bchaffin;261848]Thanks for the report -- what OS are you running? I haven't had any problems on SUSE 11 or Ubuntu, but maybe the release of shared memory segments is not consistent across all Linux flavors. I'll see if I can figure out how to free them cleanly.
[/QUOTE] I don't know much about it, but just googling around I found this site: [url]http://www-personal.washtenaw.cc.mi.us/~chasselb/linux275/ClassNotes/ipc/shared_mem.htm[/url] with these instructions: [QUOTE]First you should detach the memory from your address space (inverse of shmat) with a command like: shmdt(shm_addr); To get rid of the ipcs shared memory resource, issue the command: shmctl(shmid, IPC_RMID, 0);[/QUOTE] Within ecm_loop() I see the shmdt command issued on the shared memory blocks, but not shmctl... |
[QUOTE=bsquared;261963]
Within ecm_loop() I see the shmdt command issued on the shared memory blocks, but not shmctl...[/QUOTE] Thanks Ben. I didn't see that mentioned in any of the shm docs I read before... I managed to accumulate a couple hundred shm segments on my system, and was wondering how I could possibly not have hit this before, and then: poof, they were all gone. So there must be some periodic cleanup process taking care of them for me. Unconnected, try [URL="http://www.sendspace.com/file/aje3h6"]this[/URL] version and see if it fixes your problem. If it does, I'll commit the changes to sourceforge. |
[QUOTE=bchaffin;262032]
Unconnected, try [URL="http://www.sendspace.com/file/aje3h6"]this[/URL] version and see if it fixes your problem. If it does, I'll commit the changes to sourceforge.[/QUOTE] It's working for me - the /proc/sysvipc/shm table is unchanged after doing several factorizations with ECM. Thanks Ben, go ahead and check in the changes when you get time. - ben. |
I've just about finished with a fairly major overhaul of NFS within yafu. One of the changes is parallelizing the poly searching (using msieve still, of course). A question I have is, is it better to divide the time deadline by N, or multiply the space searched by N? In other words, would you rather get done with the poly search N times faster over approximately the same coefficient range, or search N times the coefficient range in the same amount of time as a serial task would take?
I imagine there might even be a crossover of sorts between the two options... the point at which the savings from a faster poly search is smaller than the potential reduction in sieving time from being able to perhaps find a better poly. Any thoughts? p.s. It's unexpectedly entertaining to watch the sheer enormity of messages emitted by 16 parallel poly search threads madly scroll by on the screen. Fivemack's 48 core machine might cause the console window to melt :smile: |
My opinion - true parallelization is doing the same task(s), only in parallel.
Ideally, this is N times faster than single-threaded variant. When this cant be done(or hard to do, or inefficient), an alternative way is to launch N threads, each doing its own job on its own range. |
Smaller problems don't have N times the range to be divided up, especially for e.g. degree 5 jobs near 110 digits, or degree 4 jobs below ~95 digits. Degree 4 jobs around 100-110 digits will definitely benefit from splitting the A4 range, and possibly from splitting a single A4. The other thing to note is that is that you have to get over 140 digits before a single A5 has enough blocks of work to do that you can assign them randomly to threads and expect not to duplicate any work. In a perfect world you could assign both the special-q range and the A4/5/6 range to the library.
|
| All times are UTC. The time now is 23:06. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.