mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2019-10-15, 06:59   #1
AwesomeMachine
 
Apr 2018
USA

7 Posts
Default Running fstrim on SSD while mprime is running might cause errors in mprime

I have good history with the system throwing the errors. After every output I get this: Hardware errors have occurred during the test! 1 Jacobi error.


This started after I ran fstrim on the file system that mprime and its files reside on, while it was running. Because of wear-leveling algorithms, SSDs have no way to tell natively which parts of the file system are no longer in use by the operating system, and vice versa.



Fstrim is a program that marks the reusable areas of a file system so the SSD firmware knows it can reuse them. I suspect there is a flaw somewhere in the fstrim>kernel>filesystem>mprime>filesystem chain such that fstrim marks parts of mprime's files as not in use when in fact this is an error.


Since the problem seems unique to mprime, it is possible it is using some old kernel calls that fail under certain more recently developed circumstances, or with less sophisticated file formats.


I am really not fit to troubleshoot this possibility. But I will say it is probably better to close mprime before running fstrim.
AwesomeMachine is offline   Reply With Quote
Old 2019-11-16, 00:44   #2
phillipsjk
 
Nov 2019

3·23 Posts
Default

It is possible that the SSD has buggy firmware.


The fstrim command should only tell the SSD to TRIM unallocated space, unless there is a kernel bug.


A work-around may be to disable automatic TRIM (I think it would be in the mount options); and only run it monthly or similar.


If the error persists without running TRIM, you may actually have an unrelated hardware error (I would guess RAM).

Last fiddled with by phillipsjk on 2019-11-16 at 00:45 Reason: Grammar, spelling
phillipsjk is offline   Reply With Quote
Old 2020-02-22, 00:20   #3
AwesomeMachine
 
Apr 2018
USA

7 Posts
Default Happened again!

The problem occurred again. No fstrim was run between times.
Code:
[Worker #1 Feb 21 18:38] Iteration: 37610000 / 101988773 [36.87%], ms/iter: 52.999, ETA: 39d 11:47
[Worker #1 Feb 21 18:38] Hardware errors have occurred during the test!
[Worker #1 Feb 21 18:38] 1 Gerbicz/double-check error.
[Worker #1 Feb 21 18:38] Confidence in final result is excellent.
[Worker #1 Feb 21 18:40] Gerbicz error check passed at iteration 37611256.
[Worker #3 Feb 21 18:40] M103931309 stage 1 is 32.05% complete. Time: 467.809 sec.
[Worker #4 Feb 21 18:41] Iteration: 9890000 / 103946203 [9.51%], ms/iter: 45.156, ETA: 49d 03:46
[Worker #2 Feb 21 18:45] Iteration: 35440000 / 101992529 [34.74%], ms/iter: 44.817, ETA: 34d 12:31
Only happens when I run fstrim. System isn't configured for auto trim, only manual trim. It is possible it's a drive firmware bug, but those aren't generally application specific. This time I paused mprime, but did not exit completely.

If I remember in a few months--when I trim the file system next--I'll completely exit mprime, and see if that makes a difference. I predict it will!
AwesomeMachine is offline   Reply With Quote
Old 2020-02-22, 04:57   #4
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2·23·137 Posts
Default

The data in RAM is being corrupted, thus you get the error reported.

So if you are sure it is related to fstrim then there can be a number of possible cause.

Buggy driver (already mentioned above).
Bad PSU dropping voltage when the drive is sucking more current during the trim.
Overheating of the system during trim.
etc.

But also be open to the idea that trim is just a coincidence. It could be a flaky RAM stick. Cosmic ray upsets. Alpha decay in the RAM packaging. Overzealous clocking of some part. etc.

Last fiddled with by retina on 2020-02-22 at 04:58
retina is online now   Reply With Quote
Old 2021-10-07, 23:49   #5
AwesomeMachine
 
Apr 2018
USA

7 Posts
Default PSU doubtful cause

Well, I doubt if it's the PSU, because it's a laptop, and the mprime program itself requires more power than executing the trim command. The drive passes every test of it's functionality. The problem only occurs with the combination of mprime and fstrim. And now the problem has mysteriously disappeared without even the most insignificant hardware change.


I doubt if the ram was being written over, because that has nothing to do with the issue, and if it was the cause, it would occur in other scenarios. Alpha particles were a problem for system memory in the 1970s. So, probably not currently relevant.


I surmise the program, to avoid making huge files outright, uses sparse files, and fstrim doesn't handle sparse files well if they are open for r/w. Mprime, when stopped temporarily.



When the mprime program is quit, using the menu item, it writes it's data and closes the files. Then, fstrim has no trouble determining the correct boundaries.



Or, since I'm guessing, I might be completely incorrect!


I want to thank the contributors to this discussion thread, for sparking my mind to think.
AwesomeMachine is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Another mprime is already running tshinozk Information & Answers 3 2013-12-10 16:26
Running mprime on fedora jimmychauck Information & Answers 1 2010-06-16 04:42
adding a computer running mprime Unregistered Information & Answers 14 2009-02-16 14:01
mprime is running but i dont see that mhnaras Linux 2 2007-10-21 15:58
running mprime on a computer I do not own happyraul Software 4 2004-05-06 15:54

All times are UTC. The time now is 09:20.


Tue Dec 7 09:20:50 UTC 2021 up 137 days, 3:49, 0 users, load averages: 0.94, 1.25, 1.37

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.