[solved] NVMe / PCIe storage on Arria10

Hi all,
I have been working on a system with an Arria10 SoCFPGA with a M.2 SSD connection on the hard PCIe controller. This design is based on https://rocketboards.org/foswiki/Projects/A10AVCVPCIeRootPortWithMSILTS

As far as I can see this is all implemented as the example.
We are running based on a vanilla kernel V4.15.
The PCIe controller is detected correctly and the output of lspci is:
00:00.0 PCI bridge: Altera Corporation Device e000 (rev 01)
01:00.0 Non-Volatile memory controller: Intel Corporation Device f1a5 (rev 03)

Only during the startup of the NVMe drivers a timeout occurs during detection.

Has anyone had any experience with NVMe or other PCIe devices on an Arria10?

Kind regards,
Rienk de Jong

The relevant kernel logs:
[ 0.100998] OF: PCI: host bridge /sopc@0/pcie@0xd0000000 ranges:
[ 0.101034] OF: PCI: MEM 0xc0000000…0xdfffffff -> 0x00000000
[ 0.101227] altera-pcie d0000000.pcie: PCI host bridge to bus 0000:00
[ 0.101243] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.101257] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff] (bus address [0x00000000-0x1fffffff])
[ 0.103448] PCI: bus0: Fast back to back transfers disabled
[ 0.106569] PCI: bus1: Fast back to back transfers disabled
[ 0.106844] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff]
[ 0.106861] pci 0000:01:00.0: BAR 0: assigned [mem 0xc0000000-0xc0003fff 64bit]
[ 0.107027] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 0.107084] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff]
[ 61.290164] nvme nvme0: I/O 3 QID 0 timeout, disable controller
[ 61.450409] nvme nvme0: Identify Controller failed (-4)
[ 61.455625] nvme nvme0: Removing after probe failure status: -5

Hi All,

We have fixed our issue.

The problem was not related to the kernel drivers or settings of the PCIe IP itself.

The example on the rocketboards page is correct ( as I expected ).

Our problem was in the memory assigned to the Kernel and mapped to the PCIe core.

The board we have has a total of 4GB of ram for the HPS, which is effectively 3GB for the HPS because the HPS to FPGA bridges and other IO claims the upper 1GB of the available memory space.

The Avalon master port of the PCIe core must be connected to the memory through a address expanded and the window this provides to the RAM can be set in power of two increments so 1GB, 2 GB or 4GB.

In our case we had our boot-loader configured to give the Linux kernel 3GB of memory, and the PCIe core had a 2GB window to the RAM.

My guess is that the kernel had memory allocated for PCIe above the 2GB window.

The solution for our system was to limit the memory for the kernel to 2GB by configuring this in the boot-loader.

So for us the issue is solved, I hope these posts can help someone else that is trying to get PCIe working on there own custom Arria 10 HPS designs.



1 Like

What M.2 SSD where you using? What file system are you using, ext3, ext4?

We have been trying to get a similar set up working with a Samsung and Kingston drive. We initially had issues with the Samsung drive which we believe were down to compatibility problems (there are various reports of this for some Samsung SSDs).

The Kingston however seemed to work OK, unlike the Samsung it did not generate NVME or EXT driver errors. However when we did more extensive testing, by writing multiple random data files to the drive and then reading back to verify the checksums, we saw significant data errors but no kernel errors from the FS or NVME drivers. We got to a point were we could write random files, check sum them, copy the files, checksum these and they were different. Sometimes just check summing again generated a different result?

We started off with kernel 4.19 but have since progressed to 5.4.44 but the fundamental data corruption issue persists, although some other driver level errors did seem to go away.