Lack of redundancy in the boot sequence

Hello there,

In the standard layout when booting from NAND the bootloader is copied four times, I presume in order to be able to recover if one of the images becomes corrupted.

That’s very nice, but after that the FPGA RBFs, the linux kernel and its device tree are all stored in raw NAND with no way to recover if any of these files becomes corrupted. No error correction, no multiple copies, one bitflip and your device becomes unusable.

This becomes even more problematic if you want to be able to update the product since subsequent writes are more and more likely to create a bad sector. Furthermore if the update is interrupted you could end up with a corrupted file in NAND which will make the device unbootable and require a full reflash.

Am I wrong to be worried by this? Clearly this seems to be the standard way to do that but that seems unnecessarily risky to me.

In order to make the boot process more robust I was thinking about copying the “periph” RBF in two locations in NAND for redundancy (patching u-boot to try the backup if the first one fails). Once it’s loaded and the RAM is available I could have u-boot fetch the other files (“core” RBF, linux image, device tree…) from an UBI partition instead of raw NAND which would offer some error correction and also write-leveling for updates.

Any thoughts on that?