Skip to main content

Troubleshooting

Corrupt PoST Data

One of the realities of hard drives is that, once in a while, they fail, resulting in corrupt data. Data corruption can also occur while copying or moving data across systems. Hopefully you will never have to deal with this situation. However, if it does happen, you will most likely find out about it when a message like the following appears in your log:

2023-09-18T03:37:09.147-0400    INFO    abcde.nipostValidator   Found proof for nonce: 0, pow: 22517998136898104 with [96506732, 8522344264, 15809824782, 17090809665, 22898048404, 23339075780, 33564380939, 37517090594, 38587115133, 42519923294, 47489969603, 66776374486, 85751204605, 97571767596, 98954096617, 103466964513, 105410396815, 108719413993, 110747747703, 114712568823, 122975084397, 128938645449, 135064014126, 137417405566, 137527071596, 142672661935, 145176618414, 149504039139, 150155464780, 156195137948, 162912656308, 188599527912, 192691123424, 196101601537, 196889191957, 202704285017, 204635877880] indices     {"node_id": "abcde", "module": "nipostValidator", "module": "post::prove", "file": "src/prove.rs", "line": 323}
2023-09-18T03:37:09.353-0400 INFO abcde.post proving: generated proof {"node_id": "abcde", "module": "post"} 2023-09-18T03:37:09.353-0400 INFO abcde.atxBuilder created the initial post {"node_id": "abcde", "module": "atxBuilder"}
2023-09-18T03:37:09.353-0400 INFO abcde.atxBuilder verifying the initial post {"node_id": "abcde", "module": "atxBuilder", "post": {"nonce": 0, "indices": "6c93c00500d229fe7ee0806ce53a043dc4ea0f947dd454053182c75bb1f07c097d887cc5f0227da6f8fb8897a2987932dcf6e9b05873ba303efd8a2bf7134bc1eead95fefaa1708558725c608f02f38a58baa60b54765748919cdd8f9bd56a6ddfe2a15cf2e55581e7d21427f7f9d1dcfa7f6c934005e06b03fd4de85a36d31c8e638a3c8b4c90f6f522672a7d1749bb9de55ea2a7a1a5afe03449dd6c403e246a5b612a78dd66b577c8bcf8b13fa52f"}, "metadata": {"Challenge": "0000000000000000000000000000000000000000000000000000000000000000", "LabelsPerUnit": 4294967296}, "name": "atxBuilder"}
2023-09-18T03:37:09.410-0400 ERROR abcde.nipostValidator Proof is invalid: MSB value for index: 137527071596 doesn't satisfy difficulty: 207 > 0 (label: [215, 101, 80, 15, 36, 236, 60, 243, 203, 157, 178, 129, 73, 177, 132, 65]) {"node_id": "abcde", "module": "nipostValidator", "module": "post::post_impl", "file": "ffi/src/post_impl.rs", "line": 242} 2023-09-18T03:37:09.413-0400 FATAL abcde.atxBuilder initial POST proof is invalid. Probably the initialized POST data is corrupted. Please verify the data with postcli and regenerate the corrupted files. {"node_id": "abcde", "module": "atxBuilder", "errmsg": "verify PoST: invalid proof", "name": "atxBuilder"}

This message indicates that, despite the presence of a complete identity, the smeshing node was unable to generate a PoST for a particular epoch due to corruption of the PoST data. The best way to verify this is to run postcli in verify mode:

> postcli -datadir /Volumes/post/7c8cef2b -fromFile 531 -verify -fraction 0.01
2023/09/18 13:51:59 cli: verifying identity.key
2023/09/18 13:51:59 cli: identity.key is valid
2023/09/18 13:51:59 cli: verifying POS data
2023-09-18T13:51:59.504-0400 INFO verifying POS data in /Volumes/post/7c8cef2b {"module": "post::pos_verification", "file": "src/pos_verification.rs", "line": 34}
2023-09-18T13:51:59.504-0400 INFO verifying POS files 531 -> 927 {"module": "post::pos_verification", "file": "src/pos_verification.rs", "line": 39}
2023-09-18T13:51:59.504-0400 INFO verifying file /Volumes/post/7c8cef2b/postdata_531.bin {"module": "post::pos_verification", "file": "src/pos_verification.rs", "line": 43}
2023-09-18T13:51:59.504-0400 INFO verifying 26843 labels {"module": "post::pos_verification", "file": "src/pos_verification.rs", "line": 66}
2023-09-18T13:52:02.348-0400 INFO POS data is invalid: invalid label in file 531 at offset 126368 {"module": "post::initialization", "file": "ffi/src/initialization.rs", "line": 242}
2023/09/18 13:52:02 cli: invalid POS

You can do this for an entire identity or only for a subset of files (using -fromFile and -toFile. See the README for more information). If nothing else is using the drive (e.g., if the node is shut down and the drive is not being used for any other purpose), then running postcli verify with -fraction 0.01 should be quite quick. You can run the command with a larger fraction value for a more thorough check.

info

It is possible for multiple files to be corrupt. postcli verify will quit itself after detecting a single corrupt file, and you can restart it with a higher -fromFile to continue the process right after the corrupt file. Serious smeshers may wish to run such a verification process periodically to detect corruption issues before they lead to failures in proof generation and lost rewards.

Once corrupt data is detected, the only option is to delete and regenerate the affected files. If the files are deleted and the node is restarted, it will automatically restart the PoST initialization process to fill in the missing data. This can also be done manually using postcli.

Additional Troubleshooting

timesync: peers are not time synced

Please make sure that your system clock is synced with the internet. Please refer to time synchronization instructions for your operating system.

If you're 100% certain that your system time is correct, you can disable the time sync check by setting the following config:

{
"time": {
"peersync": {
"disable": true
}
}
}

Node Uses Too Much Memory

Please add a "pprof-server": true to the config at the main level or add --pprof-server to the command line. Restart the node and then visit http://127.0.0.1:6060/debug/pprof/profile?seconds=30 and http://127.0.0.1:6060/debug/pprof/heap in your default browser and download the files. Please share then these files on Discord or in a GitHub issue. Advanced users can use go tool pprof http://localhost:6060/debug/pprof/heap to see what is using the memory.