[solved] NVMe / PCIe storage on Arria10

rienk.dejong · February 15, 2019, 4:57pm

Hi all,
I have been working on a system with an Arria10 SoCFPGA with a M.2 SSD connection on the hard PCIe controller. This design is based on https://rocketboards.org/foswiki/Projects/A10AVCVPCIeRootPortWithMSILTS

As far as I can see this is all implemented as the example.
We are running based on a vanilla kernel V4.15.
The PCIe controller is detected correctly and the output of lspci is:
00:00.0 PCI bridge: Altera Corporation Device e000 (rev 01)
01:00.0 Non-Volatile memory controller: Intel Corporation Device f1a5 (rev 03)

Only during the startup of the NVMe drivers a timeout occurs during detection.

Has anyone had any experience with NVMe or other PCIe devices on an Arria10?

Kind regards,
Rienk de Jong

The relevant kernel logs:
[ 0.100998] OF: PCI: host bridge /sopc@0/pcie@0xd0000000 ranges:
[ 0.101034] OF: PCI: MEM 0xc0000000…0xdfffffff -> 0x00000000
[ 0.101227] altera-pcie d0000000.pcie: PCI host bridge to bus 0000:00
[ 0.101243] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.101257] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff] (bus address [0x00000000-0x1fffffff])
[ 0.103448] PCI: bus0: Fast back to back transfers disabled
[ 0.106569] PCI: bus1: Fast back to back transfers disabled
[ 0.106844] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff]
[ 0.106861] pci 0000:01:00.0: BAR 0: assigned [mem 0xc0000000-0xc0003fff 64bit]
[ 0.107027] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 0.107084] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff]
[ 61.290164] nvme nvme0: I/O 3 QID 0 timeout, disable controller
[ 61.450409] nvme nvme0: Identify Controller failed (-4)
[ 61.455625] nvme nvme0: Removing after probe failure status: -5

rienk.dejong · March 27, 2019, 3:31pm

Hi All,

We have fixed our issue.

The problem was not related to the kernel drivers or settings of the PCIe IP itself.

The example on the rocketboards page is correct ( as I expected ).

Our problem was in the memory assigned to the Kernel and mapped to the PCIe core.

The board we have has a total of 4GB of ram for the HPS, which is effectively 3GB for the HPS because the HPS to FPGA bridges and other IO claims the upper 1GB of the available memory space.

The Avalon master port of the PCIe core must be connected to the memory through a address expanded and the window this provides to the RAM can be set in power of two increments so 1GB, 2 GB or 4GB.

In our case we had our boot-loader configured to give the Linux kernel 3GB of memory, and the PCIe core had a 2GB window to the RAM.

My guess is that the kernel had memory allocated for PCIe above the 2GB window.

The solution for our system was to limit the memory for the kernel to 2GB by configuring this in the boot-loader.

So for us the issue is solved, I hope these posts can help someone else that is trying to get PCIe working on there own custom Arria 10 HPS designs.

Regards,

Rienk

davidb · January 27, 2021, 3:05pm

Hi
What M.2 SSD where you using? What file system are you using, ext3, ext4?

We have been trying to get a similar set up working with a Samsung and Kingston drive. We initially had issues with the Samsung drive which we believe were down to compatibility problems (there are various reports of this for some Samsung SSDs).

The Kingston however seemed to work OK, unlike the Samsung it did not generate NVME or EXT driver errors. However when we did more extensive testing, by writing multiple random data files to the drive and then reading back to verify the checksums, we saw significant data errors but no kernel errors from the FS or NVME drivers. We got to a point were we could write random files, check sum them, copy the files, checksum these and they were different. Sometimes just check summing again generated a different result?

We started off with kernel 4.19 but have since progressed to 5.4.44 but the fundamental data corruption issue persists, although some other driver level errors did seem to go away.

Thanks

BrianM · March 2, 2021, 4:08am

I’m dealing with exactly the same issues as you. I’ve got everything working except interrupts. See this thread: https://community.intel.com/t5/Intel-Quartus-Prime-Software/Bug-Quartus-Pro-20-1-1-Cyclone-V-utilizing-PCIe-example-from-16/m-p/1254729#M68055

A little summary:

You need to include the FPGA2DRAM modules in the dts file. I suspect you have that since you are up and running but getting errors.
I think the read errors you are getting have to do with the way the altera pci driver maps the root port into the DRAM. I think it puts everything at 0x0, right at the beginning of DRAM. Not a good spot. It does not appear to be movable, although some guys got other things to work a few years ago, see this thread: Altera PCIe driver issue with SSD devices

I solved it like this:
Add this to your dts top:

reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
ranges;

// 2 MiB reserved for PCI Express DMA
pcidma1@0 {
reg = <0x00000000 0x00200000>;
// Note: this is the maximum you can reserve with kernel defaults and relocating the kernel doesn't seem to work with ARM ARCH.
no-map;
};
};

This reserves the first 2MB of DRAM. The kernel complains:

ERROR: reserving fdt memory region failed (addr=0 size=200000)

however, it still works, as shown in ‘cat /proc/iomem’:

root@cyclone5:/mnt/test# cat /proc/iomem  
00200000-3fffffff : System RAM <- Notice first 2MB is not used...
 00c00000-00cabe7f : Kernel data
c0000000-c01fffff : pcie@000000000
 c0000000-c00fffff : PCI Bus 0000:01
   c0000000-c0003fff : 0000:01:00.0
     c0000000-c0003fff : nvme
ff200000-ff20007f : ff200080.msi vector_slave
ff200080-ff20008f : ff200080.msi csr
ff220000-ff223fff : c0000000.pcie Cra
ff700000-ff701fff : ff700000.ethernet ethernet@ff700000
ff702000-ff703fff : ff702000.ethernet ethernet@ff702000
ff704000-ff704fff : ff704000.dwmmc0 dwmmc0@ff704000
ff705000-ff705fff : ff705000.spi spi@ff705000
ff706000-ff706fff : ff706000.fpgamgr fpgamgr@ff706000
ff709000-ff709fff : ff709000.gpio gpio@ff709000
ffa00000-ffa00fff : ff705000.spi spi@ff705000
ffb90000-ffb90003 : ff706000.fpgamgr fpgamgr@ff706000
ffc02000-ffc0201f : serial
ffc03000-ffc0301f : serial
ffc04000-ffc04fff : ffc04000.i2c i2c@ffc04000
ffd02000-ffd02fff : ffd02000.watchdog watchdog@ffd02000
ffd05000-ffd05fff : rstmgr
ffe01000-ffe01fff : pdma@ffe01000
ffff0000-ffffffff : ffff0000.sram

Next, make the Txs bus on the Avalon-MM PCIe hard macro, 2MB big by setting it to 32 bits wide with a 1MB address width and two address pages. (This might work with 1MB in 64 bit mode, I will verify that later to shrink the reserved memory area once I figure out my latest issue)

davidb · March 2, 2021, 12:10pm

Hi @BrianM

Thanks for the post, the points you have highlighted sound promising and definitely worth trying. The links you provided to the other posts are also very interesting, I had not found those from previous searches.

Just a few questions:

You talked about needing the FPGA2DRAM module in the dts file, is that the u-boot dts file or the Linux one? I have entries in the u-boot handoff dts file for the 3 socfpga-fpga2sdram<x>-bridge nodes along with the socfpga-lwhps2fpga-bridge and socfpga-hps2fpga-bridge nodes. I don’t have any nodes in the Linux device tree other than the single HPS bridge which defines ranges for both the hps2fpga and lwhps2fpga bridges, under which we have the various PCI-E device tree entries for the MSGDMA, perf counters, MSI, etc.
The reserved memory device tree entry you present, I assume that’s for the Linux device tree? I tried adding it in there but it seemed to make no difference to the cat /proc/iomem output for the System RAM range.
What is the significance of the 2MB for the reserved memory in Linux and also for the Avalon-MM Txs bus? I noticed that the output of lspci -vv for the memory behind the bridge as noted from the link in your point 2. only reported 1MB, which is also what I get when I run that.
Where in the Quartus do you set the Txs Avalon-MM size to 2MB? When I open Platform Designer on the PCI-E core I can see under the Avalon-MM Settings tab that the Avalon-MM address width is 32-bit and the Number of address pages is 2, but the Size of address pages does not allow selection of 1MB width, the drop down list jumps from 64 KBytes - 16 bits to 4 Mbytes - 22 bits. If I select the Txs line and look at the address_width under the System Info tab then that shows 28, which does not quite match the size of addressable pages which is set to 128 Mbytes - 27 bits, but maybe it is effectively 28 bits with the two address pages.

On the last point I must caveat that we are using an Arria 10 whereas you I think have a Cyclone V, so I guess some bus width choices, etc. may be different between the two.

Cheers

davidb · March 2, 2021, 6:05pm

Hi @BrianM

I seem to have made some progress on this. I initially got the same error you did from u-boot about reserving the fdt memory region, this was with the reserve in the Linux device tree. However in Linux the cat /proc/iomem stubbornly showed the system RAM starting at 0x00000000, with the Kernel code at 0x00008000 (did not see this in your output above).

I thought the issue was due to us using the FIT format for the kernel, rootfs and device tree in u-boot so I modified the FIT its file to specify the load and entry point addresses for the kernel to be 0x00200000, i.e. 2MB. But that did not work, it still persisted at loading to 0x00008000. After much searching in the kernel and u-boot configs and failing to find any way to move the kernel load address that way I modified the TEXT_OFFSET, via the textofs-y, in the kernel Makefile and set that to be 0x00208000, not sure if it was necessary to keep the 0x8000 offset but other architecture specific TEXT_OFFSET tweaks did the same. Did read somewhere that the offset needs to be 16MB aligned but that did not seem to cause an issue.

Once the kernel was rebuilt and bundled into a FIT image, u-boot booted it fine without the fdt reserve error (previously it was generated because the dts reserved-memory overlapped with the reserve u-boot had created for the kernel being loaded at 0x00008000). I then tried my test to create a load of random data files on the NVME disk and check sum them back and they all came back correct, something which I have not managed to achieve up till now. Need to do some more testing but your suggestions may be the fix.

Questions though:
Did you not need to move your kernels TEXT_OFFSET so it did not start at 0x00008000 and hence overlap the reserved region?
Even with all the above changes the start address of the system RAM reported by /proc/iomem is still 0x00000000 for me, any ideas? Maybe the reserve has not worked but by bumping the kernel up the memory the problem has gone away. There is a danger that it might come back if that first 2MB can’t be marked off limits to the kernel.
Do you know is there is any way to list the reserved memory defined in the device tree?

Cheers

BrianM · March 2, 2021, 6:06pm

I have not gotten any of this to work in any version of u-boot. The ‘pci’ command replies, “No such device.” I have to, eventually, get it working because I wish to use a 2MB QSPI Flash, not an 16MB QSPI Flash to store u-boot, the FPGA and the kernel. I want the FPGA image and the kernel on the SSD.

Everything I’m talking about relates only to the linux kernel.

For your questions:

Here is the entries from socfpga.dtsi that I use for the FPGA. This includes the ones already there. Note: the latest kernel has the other two included. Finding this was a significant part of getting things working for me. These entries, along with the PCIe modules are added to socfpga.dtsi, then in your top level dts, they are enabled with status=“okay”.

  	fpga_bridge0: fpga_bridge@ff400000 {
  		compatible = "altr,socfpga-lwhps2fpga-bridge";
  		reg = <0xff400000 0x100000>;
  		resets = <&rst LWHPS2FPGA_RESET>;
  		clocks = <&l4_main_clk>;
  		status = "disabled";
  	};

  	fpga_bridge1: fpga_bridge@ff500000 {
  		compatible = "altr,socfpga-hps2fpga-bridge";
  		reg = <0xff500000 0x10000>;
  		resets = <&rst HPS2FPGA_RESET>;
  		clocks = <&l4_main_clk>;
  		status = "disabled";
  	};

  	fpga_bridge2: fpga-bridge@ff600000 {
  		compatible = "altr,socfpga-fpga2hps-bridge";
  		reg = <0xff600000 0x100000>;
  		resets = <&rst FPGA2HPS_RESET>;
  		clocks = <&l4_main_clk>;
  		status = "disabled";
  	};

  	fpga_bridge3: fpga-bridge@ffc25080 {
  		compatible = "altr,socfpga-fpga2sdram-bridge";
  		reg = <0xffc25080 0x4>;
  		bridge-enable = <1>;
  		status = "disabled";
  	};

  	fpgamgr0: fpgamgr@ff706000 {
  		compatible = "altr,socfpga-fpga-mgr";
  		reg = <0xff706000 0x1000
  		       0xffb90000 0x4>;
  		interrupts = <0 175 4>;
  	};

Yes, I’m only dealing with linux at the moment. However, I have added it to u-boot and u-boot seems to respect it as well. It’s a bit harder to verify since I don’t have pci working in u-boot.
If you check out that other thread with folks trying to get PCIe SATA and PCIe NVMe drivers working, you can see the full extent of our understanding. It’s not much. My theory is that the Root Port is writing to offset zero in the DRAM page. I attempted to get it to move that address via many and different means, from adding address_span_extenders with offsets to the TXs bus from the HPS, by adding an offset to the ASE between the root port and the DRAM, and setting dma-ranges in the dts. Nothing worked. Everytime I attempted to adjust the location of the Root Port’s writing (which should be controlled by dma-ranges in the dts) the system would crash as soon as the root port attempted to push data. It was either crashing hard, or the system was waiting for data at a location that was not being written. I’m not a software expert. I barely know anything about PCIe but it’s clear to me that the driver is not fully developed. I came to the conclusion that the Root Port needs to write to address 0 of the DRAM, so I reserved 2 MB for it and voila! All the read errors disappeared.
For Cyclone V I Right click the avalon-mm pci express module in Platform Designer and choose Edit. On the bottom of the page under “Avalon to PCIe Address Translation Settings” there are two entries: Set “Number of address pages” to 2 and set “Size of address pages” to “1 MByte - 20 bits” It must be different for the 10 series of FPGAs. You could try changing the “Avalon-MM address width” to 64 bit and see if it gives you more options. I don’t know why they would limit it down to 64K when as I understand it has to be at least one megabyte.

If you can’t make it any smaller than 4 MB * 2, then you are going to have to find a way to move the kernel. I tried and was not successful. It seems to me that the ARM arch of the kernel doesn’t want to set a start address. I heavily modified u-boot to try to get it to move and I did move it, but then it refused to boot. I was trying to reserve 64MB, but I was not able to. I have a feeling there is a way to do it, but it’s beyond me at the moment.

If you have access to a software guy, have him dig into the kernel and figure out how the Root Port write base address is set. If you can get control of that then you can parse the dts to get dma-ranges out and then we’ll be able to set the base address properly as the linux gurus intended. When I set the dma-ranges property I was able to break things, I was not able to improve them. So there is something going on with it, but I was not able to figure it out.

BrianM · March 2, 2021, 8:45pm

We must have posted at about the same time…

I forgot. You must put the reserved memory section in u-boot’s dts as well. Then u-boot reserves it and can pass the reserved area on to the kernel (which will complain).

I had tried the textoffset method or something like it before, but I was actually trying to move the whole kernel… I did so much that I probably broke something else or it might have worked for me.

Are you having problems with interrupts? I’m getting this message when I try to do heavy writing to the SSD:

[ 90.312675] nvme nvme0: I/O 832 QID 1 timeout, completion polled

Edit: I changed textofs-y to 0x00208000 but interrupts didn’t start working for me and I still get complaints from u-boot (before the kernel loads) about the reserved memory.

I was thinking you were on to something because I think vectors are at 0x00000000 and if the RP is writing there it would be bad. That might be my problem, which means I’m no closer to solving the issue than I was two months ago.

davidb · March 3, 2021, 11:51am

Hi @BrianM

I have not gotten any of this to work in any version of u-boot. The ‘pci’ command replies, “No such device.” I have to, eventually, get it working because I wish to use a 2MB QSPI Flash, not an 16MB QSPI Flash to store u-boot, the FPGA and the kernel. I want the FPGA image and the kernel on the SSD.

Ideally we would have liked to access the NVME SSD from u-boot as well but was told early on by Intel (through a distribution) that for the Arria 10 this was simply not possible because u-boot did not support the PCI-E hard core in that device. There is support for the Startix 10 PCI-E core, however when I queried whether we could port this driver to the Arria 10 I was told that this would be very complex and involved. This may be the case for the Cyclone V as well which could explain why the pci command fails.

A possible solution is to have a very cut down Linux image in flash, enough to get the PCI-E/NVME up, and then switch kernels with kexec, however we have failed to get kexec working on the Arria 10 also.

Here is the entries from socfpga.dtsi that I use for the FPGA. This includes the ones already there. Note: the latest kernel has the other two included. Finding this was a significant part of getting things working for me. These entries, along with the PCIe modules are added to socfpga.dtsi, then in your top level dts, they are enabled with status=“okay”.

I have similar bridge entries in my u-boot dts handoff file generated from the firmware project (Cyclone V may be different), without these any accesses to those memory regions just generate processor exceptions. They don’t appear in the generated Linux device tree at all, not needed I suspect, u-boot only uses them to correctly enable the bridge access through the bridge enable command. I scrapped the PCI-E core device tree entry in the u-boot dts once we realised that it was not supported.

Yes, I’m only dealing with linux at the moment. However, I have added it to u-boot and u-boot seems to respect it as well. It’s a bit harder to verify since I don’t have pci working in u-boot.

I tried adding the reserved-memory entry to the u-boot dts as well as Linux, but I still get the System RAM being reported as 00000000-7fffffff. I’m not sure how u-boot handles these entries in its dts, maybe it prevents it relocating itself into the reserved region when it copies to DRAM. I did trace through u-boot parsing the Linux dts and setting up the reserved sections for the kernel, rootfs and device tree (there were some others not sure what they were) and could see my reserve from 0x00000000-0x00200000 being applied there.

If you check out that other thread with folks trying to get PCIe SATA and PCIe NVMe drivers working, you can see the full extent of our understanding. It’s not much. My theory is that the Root Port is writing to offset zero in the DRAM page. I attempted to get it to move that address via many and different means, from adding address_span_extenders with offsets to the TXs bus from the HPS, by adding an offset to the ASE between the root port and the DRAM, and setting dma-ranges in the dts.
I’m not a software expert. I barely know anything about PCIe but it’s clear to me that the driver is not fully developed.

I am more on the software than firmware side, however any Linux kernel driver is still complex for me, as is PCI-E. I did take a quick look at the pcie-altera driver code, its a lot sparser than I expected. I did not see any references to it looking up dma-ranges from the dts either explicitly or through of_pci_dma_tange_parser_init() calls which several other drivers used, so you may be right with your assessment of the driver state.

For Cyclone V I Right click the avalon-mm pci express module in Platform Designer and choose Edit. On the bottom of the page under “Avalon to PCIe Address Translation Settings” there are two entries: Set “Number of address pages” to 2 and set “Size of address pages” to “1 MByte - 20 bits” It must be different for the 10 series of FPGAs. You could try changing the “Avalon-MM address width” to 64 bit and see if it gives you more options. I don’t know why they would limit it down to 64K when as I understand it has to be at least one megabyte.

Looks like I’m in the right place, although how I got there and the tab page names seem different, probably Cyclone V vs Arria 10 differences. As I said the only choices for me with 2 pages is 12-16 and 22-32 bits, 4KB-64KB and 4MB-4GB. I did try setting the address width to 64-bit and then the drop down lists disappear and only give be an address width text box, no page count. This is free form so I could set this to 20 or 21 bits, but then I got a load of instantiation warnings. I really don’t understand what the significance of changing these would be, I would need to get the firmware guys to look into this.

If you can’t make it any smaller than 4 MB * 2, then you are going to have to find a way to move the kernel. I tried and was not successful. It seems to me that the ARM arch of the kernel doesn’t want to set a start address.

I had tried the textoffset method or something like it before, but I was actually trying to move the whole kernel… I did so much that I probably broke something else or it might have worked for me.

I had no real issues moving the kernels offset and entry point. I did not modify u-boot, although you may need to specify different boot parameters I guess. I found that setting the start and entry addresses in the FIT image did not affect its final location, even though that’s what u-boot would have read, I assume. All I did in the end was patch the arch/arm/Makefile to add an explicit textofs-$(CONFIG_ARCH_SOCFPGA) := 0x00208000, there are some other examples there for SA11xx StrongARM CPUs you can follow. Just for completeness I did set the entry and start in my FIT to the same number (I think originally I messed up and set it to 0x00200000 but it still worked so maybe u-boot does not honor the values in the FIT).

Perhaps trying to shift it by 64MB is too much. The AER errors I saw always noted memory write addresses less than 0x00100000, which would sort of tie in with the 1MB behind the bridge reports from lspci -vvv.

Are you having problems with interrupts? I’m getting this message when I try to do heavy writing to the SSD:

[ 90.312675] nvme nvme0: I/O 832 QID 1 timeout, completion polled

No not seen any of those. What do you mean by heavy, many parallel writes, etc? I ran a test last night where I copied a 1GB random test file from the NVME and back to it as a differently named file 100 times, but this was sequential. Then I check summed all files repeatedly and everything was fine, no apparent errors. I might try setting up a load of parallel copies back and forward and check dmesg.

I was thinking you were on to something because I think vectors are at 0x00000000 and if the RP is writing there it would be bad. That might be my problem, which means I’m no closer to solving the issue than I was two months ago.

Usually ARM chips put their exception vector jump addresses at 0x00000000, so you need to be careful. Some newer ARM cores have the ability to put the vectors higher up in memory also. If the PCI-E core is using this lower 1MB of DRAM for something, then it must be keeping clear of the bottom 8 words which I think is the extent of these vectors. If it was to write to one of these then the whole kernel would most likely crash.

BTW thanks for your original post, it’s really helped me progress. It’s a definite start but I would really like to try and understand why moving the kernel or reserving that region seems to fix the issue.

If you make any other progress let me know, I’ll try and do the same.

BrianM · March 27, 2021, 10:06pm

I got it working. The MSI RTL was not working for me at first, but I started the qsys project from scratch and it started working. I’m not sure what was wrong.

I stripped my code out of the qsys and did a save-as. Below is a lik to a working qsys for Cyclone V and an socfpga.dtsi with all the modules you need to get it working on Cyclone V. It’s not fast. It’s kinda clunky, but it works.

Please keep in mind that for the ARMv7 cores in the Cyclone V you must block the first 64KB of DRAM by adding this to your top level dts file, if you don’t you will get read errors:

	reserved-memory {
		#address-cells = <1>;
		#size-cells = <1>;
		ranges;

		pcidma1@0 {
			reg = <0x00000000 0x00010000>;
			no-map;
		};
	};

I have disabled the pcie, msi and fpga modules by default in the dtsi file attached. They need to be enabled in the top level dts file as well.

I can’t upload the zip file here. Here is a link to the intel forums where the file is stored:

I got it working on kernel 5.11, but it should work on 5.4 as well.

https://community.intel.com/cipcp26785/attachments/cipcp26785/quartus-prime-software/68684/1/CycloneV_pcie_example_Quartus_20_2.zip

You probably need an account to download the file.