mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cunningham Tables (https://www.mersenneforum.org/forumdisplay.php?f=51)
-   -   Raman's plans and questions (https://www.mersenneforum.org/showthread.php?t=12338)

retina 2010-01-13 09:05

[QUOTE=Raman;201597]Pay? I hate that word. My family is facing serious financial problems, and is it feasible for buying some costly products from abroad? Products from USA are much more expensive than from the standards at India. Factoring numbers has been a good entertainment activity for me for the past two years. Instead of paying for it, and owning the product, I contribute some numbers for some project. Isn't that compensated? Who will get motivated to join up any distributed computing project, if everything is made for money? I will have more resources till June 2011, so I can expect more numbers to be done soon. Just tackling the crash in the square root phase and whether the processing of files is altered between two environments should be known.[/QUOTE]One would imagine that if you are having "serious financial problems" in your life that you would not have the luxury of [strike]wasting[/strike] spending time factoring numbers. Instead one would imagine that you would be searching for way to improve your financial status before doing other less fruitful things. Unless, that is, you think that factorising numbers is the path to infinite riches :unsure:

Raman 2010-01-13 11:41

Job submission script that is to be used up for executing tasks within compute cluster.
Unfortunately, we does not know at all what is happening inside until the tasks gets completed up fully.

[code]#! /bin/bash
#PBS -o logfile.log
#PBS -e errorfile.err
#PBS -l walltime=200:00:00
#PBS -l nodes=1:ppn=8
tpdir=`echo $PBS_JOBID | cut -f 1 -d .`
tempdir=/scratch/$PBS_O_LOGNAME/job$tpdir
mkdir -p $tempdir
cd $tempdir
cp -R $PBS_O_WORKDIR/* .
./msieve143 -v -t 8 -s 7_320P.dat -nc
mv * $PBS_O_WORKDIR/.
rmdir $tempdir[/code]
[B]$ qsub 7_320P.cmd[/B]

If I try out to write up everything within the file directory itself, the task does not execute up at all, soon instantly it gets killed off or terminated?

Here are the files for your reference purposes only. You can check out whether or not they are valid. The relation files are not possible to upload up right now. They are 100 times bigger than the cycle files itself. Only uploading up the cycle files took upto 23 minutes per each file.
[URL="http://www.sendspace.com/file/azp9dw"]10_339P.dat.cyc[/URL]
[URL="http://www.sendspace.com/file/f3makm"]10_339P.dat.dep[/URL]
[URL="http://www.sendspace.com/file/4ac5ke"]2_1778L.dat.cyc[/URL]
[URL="http://www.sendspace.com/file/8ji1fp"]7_320P.dat.cyc[/URL]
[URL="http://www.sendspace.com/file/77c531"]7_320P.dat.dep[/URL]

jasonp 2010-01-13 13:37

[QUOTE=Raman;201724]The dependency files are somehow being corrupted...
It is quite surprising. If the dependency files are corrupted then msieve should say that "Dependency file is corrupt" ("Unable to read dependency files" or "Dependency file is not valid"), right? How come is it possible that msieve reads the relations properly, but algebraic side is not a perfect square at all? How is it possible that wrong set of linearly independent relations are being picked up at the end of the linear algebra only? And then what is the reason behind the occurring of error "Algebraic side is not a square!"?[/QUOTE]
The square root has no way of determining what is wrong, it can only check five or six things that must be right. If the algebraic powers don't form a square, then either

- relations were added or deleted after the filtering or LA ran
- there was a bug in the filtering or LA
- the cycle file or dependency file is corrupt
- there was a bug reading relations, that somehow did not occur when the relations were read for the filtering or LA
- the cycle file or dependency file is from an old factorization or an old run for the current factorization
- the matrix changed after you ran the LA once

That's a lot of possibilities, and there is no way to determine which possibility is happening by the time the square root runs. The basic problem here is that if everything worked perfectly a dependency may still not find a factor, so when that happens you need to know that it was just bad luck and not a bug. If one bit is wrong somewhere, on average half the dependencies will be spoiled but the other half may still work. The checking in the square root is there because at best one can check a few conditions that must be true. That cannot diagnose what the problems, if any, actually are.

A correct dependency file will look like random garbage. A correct cycle file will look mostly like random garbage. Without the underlying relations, the most you could verify would be that both represent the same number of matrix columns, and the number of relations in all dependencies is even. What is the output of 'od -tx1 -A d msieve.dat.cyc | head', and what is the exact size of the dependency file? Did you run md5sum like we asked you to? Your previous post suggests that you did, but then why are you still wondering if something was corrupted? Did you delete both the old and the current copy of the dependency file? You should be prepared to get a little frustrated when moving to a new environment and then tackling data-intensive problems, it's difficult for everyone.

Raman 2010-01-13 20:33

When I am in my college, you are at sleep. When you are active,
I do not have in touch with those resources. The problem is the
time difference, otherwise it would have been possible that we
could do an interactive conversation, each and every hour or so, then.

Tomorrow is Pongal (Holiday), day after tomorrow I am going out
of station upto Rameswaram to view the annular solar eclipse
over there only. I will be back only on Saturday to monitor those
resources. I will not even be active on Mersenne Forum for these
two days, Thursday evening to Saturday morning Indian standard time
only.

I have uploaded up the old files already to sendspace as I have
mentioned above. The dependency files average around 44 MB each.
Meanwhile, that I have scheduled the entire post processing for
2,1778L 7,320+ 10,339+ on the compute cluster concurrently with
msieve 1.43 compiled under that environment, presuming that the
square root crash was with msieve version 1.44 only. These things
will not be disturbed by anybody at all. Post processing for 10,339+
will take upto 3 days to get completed up, 7,320+ three days and
then 2,1778L: 4 days only. After that, I will see up the results
whether or not it got completed up properly. The old scheduled
post processing job for 10,351+ has not yet been completed up
still, thus, let me see what exactly happens so. :sleep:

Batalov 2010-01-13 22:40

Raman, the .dep and .cyc without the .dat file are useless, and the .dat is too big for practical purposes (and you know that). Therefore, the road to debugging and ultimately finishing is to carefully listen to Jason and try his suggestions [U]at your site[/U], one by one and without excessive emotions.

Everyone had been in your shoes (everyone had problems, everyone at least once started from scratch), so we can relate, and yes, it is unpleasant, but just endure and then one day you will remember it with a smile.

The beauty of this hobby is that it is not for passengers.

An analogy would be - you are not driving a car on a highway or to a grocery store, rather you took a truck to the desert, off-road, now a couple of tires are low, you are stuck in a narrow passage and the engine won't start and what do you do? First thing is: don't panic ([FONT=Arial][FONT=Calibri][SIZE=3]™[FONT=Verdana][SIZE=2])[/SIZE][/FONT][/SIZE][/FONT][/FONT]. You have to have an off-road adventurer's attitude, not a passenger's. Yes, the tires can (and will) blow out, the carburator or whatever may need to be taken apart and cleaned up with the rags - and you may be alone in that desert. There's no helicopter coming to take you. You have a voice on the other side of the radio line who possibly knows and is willing tell you how to fix the truck, just don't waste time complaining and yelling "what the hell is this". Get to business and [U]listen[/U]. Ok? It's going to be alright if you do that.

Raman 2010-01-16 05:08

Number 1 of 4
 
[quote=Raman;201474][COLOR=Black]
[/COLOR][COLOR=Black]Why do you do all the latest modifications to msieve and then spoil up the previous code? I wish that I had written my own code to be devoid of these errors, being dependent upon others, but understanding the algorithm is too difficult, especially the notations given within the papers, so much optimizations needed... I don't have that patience for writing 1 man year of code at all...
[/COLOR][/quote]
Mea culpa. Sorry, that it was MY FAULT.

[quote=henryzz]To be on the safe side in your situation i would compile your binary on the system that will be using it. It should then work perfectly. People haven't been finding too many bugs lately in msieve.[/quote]
You are absolutely correct!

10,339+
[code]Sat Jan 16 07:07:55 2010 prp76 factor: 5397511769683444928966129716536741688329069042441479613115913313907895924533
Sat Jan 16 07:07:55 2010 prp111 factor: 195536788296069646441612538004667920689480607938818585757708176337130353156036650166508246335832764566608042973[/code]Information 1 of 3
On October 1, 2009, some unknown student has messaged me like this, which I saw only during the last week. :lol:
[code]From kashyap@[color=red]moderated out[/color] Thu Oct 01 18:20:43 2009
Return-path: <kashyap@[color=red]moderated out[/color]>
Envelope-to: ramanv@[color=red]moderated out[/color]
Delivery-date: Thu, 01 Oct 2009 18:20:43 +0530
Received: from kashyap by[color=red]moderated out[/color]with local (Exim 4.63)
(envelope-from <kashyap@[color=red]moderated out[/color]>)
id 1MtL7L-0003Du-8a
for ramanv@[color=red]moderated out[/color]; Thu, 01 Oct 2009 18:20:43 +0530
To: ramanv@[color=red]moderated out[/color]
Subject: Wasting the resources of the comp
Message-Id: <E1MtL7L-0003Du-8a@[color=red]moderated out[/color]>
From: kashyap@[color=red]moderated out[/color]
Date: Thu, 01 Oct 2009 18:20:43 +0530
Status: RO

Please stop whatever processeng used stupidly.
Core2Duo being fried like an egg! Bah! [/code]

[code][cs09m038@leo0 ~]$ ls -l 10__339P/10_339P.dat.dep
-rw-r--r-- 1 cs09m038 mtech 40090368 Jan 11 18:13 10__339P/10_339P.dat.dep
[cs09m038@leo0 ~]$ ls -l number3/10_339P.dat.dep
-rw-r--r-- 1 cs09m038 mtech 40090368 Jan 16 00:51 number3/10_339P.dat.dep
[cs09m038@leo0 ~]$ md5sum 10__339P/10_339P.dat.dep
eeb2f3d84845a20ec0c8796b52f6627f 10__339P/10_339P.dat.dep
[cs09m038@leo0 ~]$ md5sum number3/10_339P.dat.dep
fab9aeb429dc4480b39da4a34a64632e number3/10_339P.dat.dep[/code]
:confused:

bdodson 2010-01-16 06:09

[QUOTE=Raman;202071]Mea culpa. Sorry, that it was MY FAULT.
...
You are absolutely correct!
...
10,339+
[code]Sat Jan 16 07:07:55 2010 prp76 factor: 5397511769683444928966129716536741688329069042441479613115913313907895924533
Sat Jan 16 07:07:55 2010 prp111 factor: 195536788296069646441612538004667920689480607938818585757708176337130353156036650166508246335832764566608042973[/code][/QUOTE]

A very good start, 1-of-4. Glad to hear. -bd

Andi47 2010-01-16 06:18

[QUOTE=Raman;202071]
On October 1, 2009, some unknown student has messaged me like this, which I saw only during the last week. :lol:
[/QUOTE]

email adresses should be removed from this posting (or obscured) to avoid massive spam.

Batalov 2010-01-16 06:43

Raman, do take the unknown student's remark at its face value, because it is better than [URL="http://mersenneforum.org/showthread.php?t=12815"]the alternative[/URL]. If you are not monitoring the temperatures on the CPUs that you are loading, - you should. If the administration gave you permission to run the programs, they may revoke it in a moment (or worse!) as soon as just a few of the CPUs will have actually burnt out. Even though you didn't put together those computers and (if they indeed overheat) the fault is not entirely yours, you may be the skapegoat in the end. Take the complaint seriously and investigate.

Try the adminstration's shoes for a moment: nothing burnt out for a while and everything was fine; now, there are complaints from users and a few computers failed, and [I]voila[/I] - "you-know-who is runinning you-know-what. This must be the reason". See?

Raman 2010-01-16 07:16

[quote=Andi47;202078]email adresses should be removed from this posting (or obscured) to avoid massive spam. [/quote]
Hello, that is not an e-mail address at all. Have you seen the terminal format in Linux? My login ID vs computer's name, they are.

[quote=Batalov;202081]Raman, do take the unknown student's remark at its face value, because it is better than [URL="http://mersenneforum.org/showthread.php?t=12815"]the alternative[/URL]. If you are not monitoring the temperatures on the CPUs that you are loading, - you should. If the administration gave you permission to run the programs, they may revoke it in a moment (or worse!) as soon as just a few of the CPUs will have actually burnt out. Even though you didn't put together those computers and (if they indeed overheat) the fault is not entirely yours, you may be the skapegoat in the end. Take the complaint seriously and investigate.

Try the adminstration's shoes for a moment: nothing burnt out for a while and everything was fine; now, there are complaints from users and a few computers failed, and [I]voila[/I] - "you-know-who is runinning you-know-what. This must be the reason". See?[/quote]

That was probably not a local student, but should be an outside student while participating for a fest that was conducted in our department during that time. None of the other students, or administration know about my programs, or have any need to know about that. First of all, only students install and then maintain Linux in our computer labs. Secondly, I use nohup to run and then log off, since running in background and then computer does not slow down either, nobody knows about it at all. Heck, now sitting in one computer, I can access other computers through the ssh command. No need to login into every machine directly, no one has any chance to find out that my program is running at all.

It was very old, on October 1, and then later on, no one said anything about that at all. Computers are even given rest inbetween. Don't worry about that. I will take care. Hopefully that Core 2 Duo should not produce that much heat as when compared up to the Core 2 Quad or the Core i7 processors at all. Started up to make use of all those computers at our department only on September 22 itself.

S485122 2010-01-16 08:14

[QUOTE=Raman;202083]Hello, that is not an e-mail address at all. Have you seen the terminal format in Linux? My login ID vs computer's name, they are.[/QUOTE]Raman,

Andi was referring to this[QUOTE=Raman;202083]On October 1, 2009, some unknown student has messaged me like this, which I saw only during the last week.[code]
From mail "address in clear" Thu Oct 01 18:20:43 2009
Return-path: <"address in clear">
Envelope-to: "address in clear"
Delivery-date: Thu, 01 Oct 2009 18:20:43 +0530
Received: from "name and server name in clear" with local (Exim 4.63)
(envelope-from <"address in clear">)
id 1MtL7L-0003Du-8a
for "address in clear"; Thu, 01 Oct 2009 18:20:43 +0530
To: "address in clear"
Subject: Wasting the resources of the comp
Message-Id: <E1MtL7L-0003Du-8a@"server name in clear">
From: "address in clear"
Date: Thu, 01 Oct 2009 18:20:43 +0530
Status: RO

Please stop whatever processeng used stupidly.
Core2Duo being fried like an egg! Bah![/code][/QUOTE]Looking at the address I would say it is someone whose mail is at the same server as you.

In your place I would heed the suggestions of all those who warn you :anybody can see what you are doing by issuing the right command (ps-f if I remember well from my Unix SVr4 days...)

Jacob


All times are UTC. The time now is 21:49.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.