FPGA only partially accessible in u-boot

I’m working on an u-boot pcie driver for Cyclone V. I started with the 10 series driver provided by intel and massaged it to work with Cyclone V.

It is working to a certain extent:

=> pci long
PCI Autoconfig: pci class = 1540
PCI Autoconfig: Found P2P bridge, device 0
PCI Autoconfig: BAR 0, I/O, size=0x4, PCI: Failed autoconfig bar 10

PCI Autoconfig: pci class = 264
PCI AutoConfig: falling through to default class
PCI Autoconfig: BAR 0, Mem, size=0x4000, 
Scanning PCI devices on bus 0

Found PCI device 00.00.00:
  vendor ID =                   0x1172
  device ID =                   0xe000
  command register ID =         0x0006
  status register =             0x0010
  revision ID =                 0x01
  class code =                  0x06 (Bridge device)
  sub class code =              0x04
  programming interface =       0x00
  cache line =                  0x08
  latency time =                0x00
  header type =                 0x01
  BIST =                        0x00
  base address 0 =              0xffffffff
  base address 1 =              0x00000000
  primary bus number =          0x00
  secondary bus number =        0x01
  subordinate bus number =      0x01
  secondary latency timer =     0x00
  IO base =                     0x00
  IO limit =                    0x00
  secondary status =            0x0000
  memory base =                 0x0010
  memory limit =                0x0010
  prefetch memory base =        0x0000
  prefetch memory limit =       0x0000
  prefetch memory base upper =  0x00000000
  prefetch memory limit upper = 0x00000000
  IO base upper 16 bits =       0x0000
  IO limit upper 16 bits =      0x0000
  expansion ROM base address =  0x00000000
  interrupt line =              0x00
  interrupt pin =               0x01
  bridge control =              0x0000
=> pci header 1.0.0
  vendor ID =                   0x15b7
  device ID =                   0x5003
  command register ID =         0x0006
  status register =             0x0010
  revision ID =                 0x01
  class code =                  0x01 (Mass storage controller)
  sub class code =              0x08
  programming interface =       0x02
  cache line =                  0x08
  latency time =                0x00
  header type =                 0x00
  BIST =                        0x00
  base address 0 =              0x00100004
  base address 1 =              0x00000000
  base address 2 =              0x00000000
  base address 3 =              0x00000000
  base address 4 =              0x00000000
  base address 5 =              0x00000000
  cardBus CIS pointer =         0x00000000
  sub system vendor ID =        0x15b7
  sub system ID =               0x5003
  expansion ROM base address =  0x00000000
  interrupt line =              0xff
  interrupt pin =               0x01
  min Grant =                   0x00
  max Latency =                 0x00

However, when the nvme driver tries to bring up the nvme block device, the CPU hangs trying to access the Txs (BAR0). The system is acting as if the h2f interface isn’t activated. But since I have Txs and Cra mapped on the h2f interface and the Cra is working, that doesn’t make sense.

Here’s what it looks like. Yes, I have some debugging messages added to figure out what is happening:

=> nvme scan
Entering nvme_probe, name is: nvme#0 
Entering INIT_LIST_HEAD
Entering readl:
trying to readl from 0xd010001c

It hangs at this point and watchdog’s a few minutes later. Although that BAR0 address seems high. I think it should be in the first megabyte, not the second. It doesn’t matter though because if I try to md.l 0xd000001c in u-boot, the CPU hangs the same way.

I can read fine from 0xd0200000 using the md.l command, and the pcie driver can access the cra and identify pcie bus 0 and 1, but the Txs interface is not working.

I’d also like to warn people about u-boot: the dts support is very rudimentary in the socfpga branch (I’m not sure about main line). For the kernel, the dts defines the h2f and h2f_lw busses and puts the pcie devices under them. That does not work with u-boot. The only way I’ve been able to get anything to work is to specify the full address for the txs and cra in u-boot’s dts file.

I have run the u-boot command: ‘bridge enable 0xffffffff’ which worked to enable the Cra interface.

I have double checked that I’m producing the bsp info correctly and importing it into u-boot. The exact same hardware works well in linux. The memory at 0xd0000000 is accessible from devmem2 in linux.

Any ideas will be gladly tried.

Hi @BrianM
Glad you got your NVME SSD working from Linux in this thread and have made progress on porting it to u-boot. This is very interesting as we had hoped to access our PCI-E NVME disk from u-boot also, however due to the lack of Arria 10 PCI-E support in u-boot we abandoned that idea and instead went for a Linux bootstrap approach.

We did revisit this recently through an Intel support case as our options were either PCI-E/NVME support in u-boot or a bootstrap Linux with the ability to kexec to a full Linux from the SSD. Unfortunately neither option is supported and would require additional development or porting. The main concern we had with pushing PCI-E (and therefore NVME and Linux file system) support into u-boot was how much larger it made the image. We have a much tighter requirement on flash size than you do (2MB IIRC from the other thread) and so can’t afford too much u-boot bloat. We would also need to do some additional driver development in u-boot to access some custom FPGA fabric cores as part of the boot process.

Out of interest how much larger is u-boot now you have PCI-E, NVME and a file system driver built into it? If the Cyclone V PCI-E root port is the same as the Arria 10 one then I would be most interested in the port you did as it may solve the issue we have, assuming u-boot is not too big.

Not entirely sure why your NVME driver is not able to access the disk at its BAR0. I assume your H2F bridge is at 0xd0000000 hence the 0xd010001c address with 0x00100000 as the BAR? You said you were able to access other regions within the H2F so that suggests its not a bridge issue. I know for the Arria 10 that the bridge command only enables or disables bridges which are already marked with the init-val = <1>; in the u-boot device tree, so its more of an enable mask rather than an absolute enable. Not sure if that’s the same for the Cyclone V.

Does the PCI-E root port need to set up anything as a result of reading back the BAR size, if it does and that was wrong I suppose that might cause the read failure. THe fact it works from Linux perhaps suggests that there is a missing step in the set up when in the u-boot environment which is not the case for the Linux environment.

Thanks for your reply David.

I’m working with the latest socfpga version of u-boot. With the PCIe driver I cobbled together and with nvme support, u-boot is 742KiB, I’m fairly confident it can be stripped down. It’s not that much bigger than without it. As you said I have a 2MB QSPI flash I’m trying to get to work.

My solution, assuming I can get u-boot to work with the NVMe, is to compress the FPGA image. Compressing my full 6.7 MiB FPGA image generated a 1.3 MiB file, just a bit too big for the SPI flash. I could install a bigger SPI flash, but I had another idea. I did an experiment and created a Platform Designer version of the FPGA with identical pinouts, but not containing any of my RTL. The rbf compressed with bzip2 to 340 KiB. That will fit in the second half of a 2MB QSPI part without issue. I checked and u-boot has the ability to decompress bzip2. I haven’t enable it yet, but I can disable features to get it to fit if necessary.

I thought about booting linux directly, but the software guys don’t like the idea of having linux, it’s dtb and the FPGA installed in the same device as u-boot. Field upgrades would become much more risky. They would prefer we only touch the QSPI in production if at all possible. It’s better all around for the kernel, dtb and FPGA image to be on the bootable device. With a simple non-changing FPGA image on the QSPI and a static u-boot, the QSPI won’t need to change, then the final FPGA, the kernel and the dtb can be updated on the NVMe with little issue.

Since I’m having so much trouble with the NVMe, my boss has suggested I add a backup to the next rev of the board. As well as a BGA NVMe made by Swissbit, I’m going to add a BGA eMMC made by several folks and I’m going to wire it so it can boot directly from eMMC, but then I’ll take over the bus with an FPGA module that runs at a higher data rate since the eMMC on Cyclone V is so slow. There will have to be some trickery to get that to work since I have to connect the eMMC signals two places on the SOC. Switching from the internal hardware to the FPGA verison will be a bit of a trick, but I think it should be possible to turn the eMMC hard block into GPIO inputs and take over the signals from the FPGA on other pins.

As for the NVMe driver dead locking when accessing the Txs bus. I haven’t figured that out yet. I have deliberately changed how the Txs and Cra are connected, trying several different configurations and have arrived at the smallest memory footprint. In the latest incarnation I put Cra at 0xC0000000, lied to u-boot and said it was 2MiB large, and put Txs at 0xC0100000. To see if there was some u-boot programming that was somehow nullifying the BSP. I verified that this works in linux (with the proper Cra size), but when the NVMe driver accesses the PCIe port through the standard PCI driver, as soon as it hits the Txs bus (0xC010001C), the CPU hangs. It’s as if the device is not present. Even though I know it should be there because the same FPGA image works in linux.

My pcie driver works exactly the way the linux driver works to write the configuration registers which as far as I can tell is the only difference between the 10 series and the 5 series. The HIP interface is not supported in the 5 series so they are written through the Cra. It is possible that there is some incompatibility there causing the Root Port to not be configured correctly, but I have no idea how to debug it short of having a pcie analyzer. I figure since the config registers of the NVMe are accessible, the programming should be correct, but I know very little about PCIe so that might be a bad assumption on my part. Perhaps I should look into that.

I’m going to write a script for linux that dumps all the configuration registers for the FPGA, DRAM, etc and then compare values in u-boot using the md.l command. I’ve already done that with the hps2fpga interface and the dram interface and they matched perfectly, perhaps there is some other difference to explain why the Txs is not accessible. I’ll probably take a look at the Txs implementation RTL to see if I can figure out what it’s supposed to do. This seems like a waste of time because I believe the BSP sets up all these registers and they are never touched once the DRAM is initialized. I’ll probably do this last.

This is taking way too much time, at some point I’ll run out of time and have to toss the idea. Then I’ll devote myself to the eMMC solution. I have to turn the board to test that so I’ll be in a rush to validate the DDR interface and my hardware.

Do you want to try my driver? u-boot does not support interrupts so the MSI stuff doesn’t have to be translated. I started with the 10 series u-boot driver and hacked it backwards comparing it to the linux driver which supports both the 5 and 10 series. That was how I got it to recognize the root port and nvme target.

The only significant change was how the config is written for the root port. For the 10 series, you can write directly through the HIP interface into memory mapped registers. For the 5 series all configuration access happens through the TLP interface.

I could have hacked the existing 10 series driver to support 5 series, but opted instead to create a simple 5 series driver, that way if intel makes any changes to the 10 series driver, it won’t break my modifications. It’s not like the 5 series is going to need updating, once it works.

Here’s the driver, if you want to try it out. You have to add it to makefile and Kconfig. Look at the intel 10 series driver entries for examples.

// SPDX-License-Identifier: GPL-2.0
/*
 * Altera FPGA PCIe host controller driver
 *
 * Portions Copyright (C) 2013-2018 Intel Corporation. All rights reserved
 * Portions Copyright Altera Corporation (C) 2013-2015. All rights reserved
 *
 * Portions written by: Ley Foon Tan <lftan@altera.com>
 *
 */

//#define DEBUG
//#undef CONFIG_LOGLEVEL
//#define CONFIG_LOGLEVEL 8

#include <common.h>
#include <dm.h>
#include <pci.h>
#include <asm/io.h>
#include <dm/device_compat.h>
#include <linux/bitops.h>
#include <linux/delay.h>

#define RP_TX_REG0			0x2000
#define RP_TX_REG1			0x2004
#define RP_TX_CNTRL			0x2008
#define RP_TX_SOP			BIT(0)
#define RP_TX_EOP			BIT(1)
#define RP_RXCPL_STATUS			0x2010
#define RP_RXCPL_SOP			BIT(0)
#define RP_RXCPL_EOP			BIT(1)
#define RP_RXCPL_REG0			0x2014
#define RP_RXCPL_REG1			0x2018
#define P2A_INT_STATUS			0x3060
#define P2A_INT_STS_ALL			0xf
#define P2A_INT_ENABLE			0x3070
#define RP_LTSSM			0x3c64
#define RP_LTSSM_MASK			0x1f
#define LTSSM_L0			0xf

/* TLP configuration type 0 and 1 */
#define TLP_FMTTYPE_CFGRD0		0x04	/* Configuration Read Type 0 */
#define TLP_FMTTYPE_CFGWR0		0x44	/* Configuration Write Type 0 */
#define TLP_FMTTYPE_CFGRD1		0x05	/* Configuration Read Type 1 */
#define TLP_FMTTYPE_CFGWR1		0x45	/* Configuration Write Type 1 */
#define TLP_PAYLOAD_SIZE		0x01
#define TLP_READ_TAG			0x1d
#define TLP_WRITE_TAG			0x10
#define RP_DEVFN			0

#define TLP_REQ_ID(bus, devfn)		(((bus) << 8) | (devfn))

#define TLP_CFGRD_DW0(pcie, bus)					\
	((((bus > pcie->first_busno) ? TLP_FMTTYPE_CFGRD1		\
				     : TLP_FMTTYPE_CFGRD0) << 24) |	\
				       TLP_PAYLOAD_SIZE)

#define TLP_CFGWR_DW0(pcie, bus)					\
	((((bus > pcie->first_busno) ? TLP_FMTTYPE_CFGWR1		\
				     : TLP_FMTTYPE_CFGWR0) << 24) |	\
				       TLP_PAYLOAD_SIZE)

#define TLP_CFG_DW1(pcie, tag, be)					\
	(((TLP_REQ_ID(pcie->first_busno,  RP_DEVFN)) << 16) | (tag << 8) | (be))
#define TLP_CFG_DW2(bus, dev, fn, offset)				\
	(((bus) << 24) | ((dev) << 19) | ((fn) << 16) | (offset))

#define TLP_COMP_STATUS(s)		(((s) >> 13) & 7)
#define TLP_BYTE_COUNT(s)		(((s) >> 0) & 0xfff)
#define TLP_HDR_SIZE			3
#define TLP_LOOP			20000
#define DWORD_MASK			3

#define IS_ROOT_PORT(pcie, bdf)				\
		((PCI_BUS(bdf) == pcie->first_busno) ? true : false)

/**
 * struct altera_fpga_pcie - Altera FPGA PCIe controller state
 * @bus: Pointer to the PCI bus
 * @cra_base: The base address of CRA register space
 * @first_busno: This driver will not work with 
 *               multiple PCIE controllers.
 */
struct altera_fpga_pcie {
	struct udevice *bus;
	void __iomem *cra_base;
	int first_busno;
};

static u8 pcie_get_byte_en(uint offset, enum pci_size_t size)
{
	switch (size) {
	case PCI_SIZE_8:
		return 1 << (offset & 3);
	case PCI_SIZE_16:
		return 3 << (offset & 3);
	default:
		return 0xf;
	}
}

/**
 * Altera FPGA PCIe port uses BAR0 of RC's configuration space as the
 * translation from PCI bus to native BUS. Entire DDR region is mapped
 * into PCIe space using these registers, so it can be reached by DMA from
 * EP devices.
 * The BAR0 of bridge should be hidden during enumeration to avoid the
 * sizing and resource allocation by PCIe core.
 */
static bool altera_fpga_pcie_hide_rc_bar(struct altera_fpga_pcie *pcie,
					pci_dev_t bdf, int offset)
{
	if (IS_ROOT_PORT(pcie, bdf) && PCI_DEV(bdf) == 0 &&
	    PCI_FUNC(bdf) == 0 && offset == PCI_BASE_ADDRESS_0)
		return true;

	return false;
}

static inline void cra_writel(struct altera_fpga_pcie *pcie, const u32 value,
			      const u32 reg)
{
	writel(value, pcie->cra_base + reg);
}

static inline u32 cra_readl(struct altera_fpga_pcie *pcie, const u32 reg)
{
	return readl(pcie->cra_base + reg);
}

static bool altera_fpga_pcie_link_up(struct altera_fpga_pcie *pcie)
{
	return !!((cra_readl(pcie, RP_LTSSM) & RP_LTSSM_MASK) == LTSSM_L0);
}

static bool altera_pcie_valid_device(struct altera_fpga_pcie *pcie,
				       pci_dev_t bdf)
{
	if (!IS_ROOT_PORT(pcie, bdf) && !altera_fpga_pcie_link_up(pcie)) {
		return false;
	}

	if (IS_ROOT_PORT(pcie, bdf) && PCI_DEV(bdf) > 0)
		return false;

	if ((PCI_BUS(bdf) == pcie->first_busno + 1) && PCI_DEV(bdf) > 0)
		return false;

	return true;
}

static void tlp_write_tx(struct altera_fpga_pcie *pcie, u32 reg0, u32 reg1, u32 ctrl)
{
	cra_writel(pcie, reg0, RP_TX_REG0);
	cra_writel(pcie, reg1, RP_TX_REG1);
	cra_writel(pcie, ctrl, RP_TX_CNTRL);
}

static int tlp_read_packet(struct altera_fpga_pcie *pcie, u32 *value)
{
	int i;
	bool sop = false;
	u32 ctrl;
	u32 reg0, reg1;
	u32 comp_status = 1;

	/*
	 * Minimum 2 loops to read TLP headers and 1 loop to read data
	 * payload.
	 */

	for (i = 0; i < TLP_LOOP; i++) {
		ctrl = cra_readl(pcie, RP_RXCPL_STATUS);
		if ((ctrl & RP_RXCPL_SOP) || (ctrl & RP_RXCPL_EOP) || sop) {
			reg0 = cra_readl(pcie, RP_RXCPL_REG0);
			reg1 = cra_readl(pcie, RP_RXCPL_REG1);

			if (ctrl & RP_RXCPL_SOP) {
				sop = true;
				comp_status = TLP_COMP_STATUS(reg1);
			}

			if (ctrl & RP_RXCPL_EOP) {
				if (comp_status)
					return -ENODEV;

				if (value)
					*value = reg0;

				return 0;
			}
		}
		udelay(5);
	}

	dev_err(pcie->dev, "read TLP packet timed out\n");
	return -ENODEV;
}

static void tlp_write_packet(struct altera_fpga_pcie *pcie, u32 *headers,
			     u32 data)
{
	tlp_write_tx(pcie, headers[0], headers[1], RP_TX_SOP);
	tlp_write_tx(pcie, headers[2], data, RP_TX_EOP);
}

static int tlp_cfg_dword_read(struct altera_fpga_pcie *pcie, pci_dev_t bdf,
			      int offset, u8 byte_en, u32 *value)
{
	u32 headers[TLP_HDR_SIZE];
	u8 busno = PCI_BUS(bdf);

	dev_dbg(pcie->dev, "entering tlp_cfg_dword_read, busno = %d\n", busno);

	headers[0] = TLP_CFGRD_DW0(pcie, busno);
	headers[1] = TLP_CFG_DW1(pcie, TLP_READ_TAG, byte_en);
	headers[2] = TLP_CFG_DW2(busno, PCI_DEV(bdf), PCI_FUNC(bdf), offset);

	tlp_write_packet(pcie, headers, 0);

	return tlp_read_packet(pcie, value);
}

static int tlp_cfg_dword_write(struct altera_fpga_pcie *pcie, pci_dev_t bdf,
			       int offset, u8 byte_en, u32 value)
{
	u32 headers[TLP_HDR_SIZE];
	u8 busno = PCI_BUS(bdf);

	dev_dbg(pcie->dev, "entering tlp_cfg_dword_write, busno = %d\n", busno);

	headers[0] = TLP_CFGWR_DW0(pcie, busno);
	headers[1] = TLP_CFG_DW1(pcie, TLP_WRITE_TAG, byte_en);
	headers[2] = TLP_CFG_DW2(busno, PCI_DEV(bdf), PCI_FUNC(bdf), offset);

	tlp_write_packet(pcie, headers, value);

	return tlp_read_packet(pcie, NULL);
}

static int _pcie_altera_fpga_read_config(struct altera_fpga_pcie *pcie,
					pci_dev_t bdf, uint offset,
					ulong *valuep, enum pci_size_t size)
{
	int ret;
	u32 data;
	u8 byte_en;

	byte_en = pcie_get_byte_en(offset, size);

	ret = tlp_cfg_dword_read(pcie, bdf, (offset & ~DWORD_MASK),
				 byte_en, &data);
	if (ret) return ret;

	dev_dbg(pcie->dev, "(addr,size,val)=(0x%04x, %d, 0x%08x)\n",
		offset, size, data);
	*valuep = pci_conv_32_to_size(data, offset, size);

	return 0;
}

static int pcie_altera_fpga_read_config(const struct udevice *bus, pci_dev_t bdf,
				       uint offset, ulong *valuep,
				       enum pci_size_t size)
{
	struct altera_fpga_pcie *pcie = dev_get_priv(bus);

	dev_dbg(pcie->dev, "PCIE CFG read:  (b.d.f)=(%02d.%02d.%02d)\n",
		PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));

	if (altera_fpga_pcie_hide_rc_bar(pcie, bdf, offset)) {
		*valuep = (u32)pci_get_ff(size);
		return 0;
	}

	if (!altera_pcie_valid_device(pcie, bdf)) {
		*valuep = (u32)pci_get_ff(size);
		return 0;
	}

	return _pcie_altera_fpga_read_config(pcie, bdf, offset, valuep, size);
}

static int _pcie_altera_fpga_write_config(struct altera_fpga_pcie *pcie,
					 pci_dev_t bdf, uint offset,
					 ulong value, enum pci_size_t size)
{
	u32 data;
	u8 byte_en;

	dev_dbg(pcie->dev, "PCIE CFG write: (b.d.f)=(%02d.%02d.%02d)\n",
		PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf));
	dev_dbg(pcie->dev, "Write(addr,size,val)=(0x%04x, %d, 0x%08lx)\n",
		offset, size, value);

	byte_en = pcie_get_byte_en(offset, size);
	data = pci_conv_size_to_32(0, value, offset, size);

	return tlp_cfg_dword_write(pcie, bdf, offset & ~DWORD_MASK,
				   byte_en, data);
}

static int pcie_altera_fpga_write_config(struct udevice *bus, pci_dev_t bdf,
					uint offset, ulong value,
					enum pci_size_t size)
{
	struct altera_fpga_pcie *pcie = dev_get_priv(bus);

	if (altera_fpga_pcie_hide_rc_bar(pcie, bdf, offset))
		return 0;

	if (!altera_pcie_valid_device(pcie, bdf))
		return 0;

	return _pcie_altera_fpga_write_config(pcie, bdf, offset, value,
					  size);
}

static int pcie_altera_fpga_probe(struct udevice *dev)
{
	struct altera_fpga_pcie *pcie = dev_get_priv(dev);

	pcie->bus = pci_get_controller(dev);
	pcie->first_busno = dev->seq;

	/* clear all interrupts */
	cra_writel(pcie, P2A_INT_STS_ALL, P2A_INT_STATUS);
	/* disable all interrupts */
	cra_writel(pcie, 0, P2A_INT_ENABLE);

	return 0;
}

static int pcie_altera_fpga_ofdata_to_platdata(struct udevice *dev)
{
	struct altera_fpga_pcie *pcie = dev_get_priv(dev);
	struct fdt_resource reg_res;
	int node = dev_of_offset(dev);
	int ret;

	DECLARE_GLOBAL_DATA_PTR;

	ret = fdt_get_named_resource(gd->fdt_blob, node, "reg", "reg-names",
				     "Cra", &reg_res);
	if (ret) {
		dev_err(dev, "resource \"Cra\" not found\n");
		return ret;
	}

	dev_dbg(pcie->dev, "Mapping Cra to 0x%0x, size = 0x%0x \n", (int) reg_res.start, (int) fdt_resource_size(&reg_res));

	pcie->cra_base = map_physmem(reg_res.start,
				     fdt_resource_size(&reg_res),
				     MAP_NOCACHE);

	return 0;
}

static const struct dm_pci_ops pcie_altera_fpga_ops = {
	.read_config	= pcie_altera_fpga_read_config,
	.write_config	= pcie_altera_fpga_write_config,
};

static const struct udevice_id pcie_altera_fpga_ids[] = {
	{ .compatible = "altr,pcie-root-port-1.0" },
	{},
};

U_BOOT_DRIVER(pcie_altera_fpga) = {
	.name			= "pcie_altera_fpga",
	.id			= UCLASS_PCI,
	.of_match		= pcie_altera_fpga_ids,
	.ops			= &pcie_altera_fpga_ops,
	.ofdata_to_platdata	= pcie_altera_fpga_ofdata_to_platdata,
	.probe			= pcie_altera_fpga_probe,
	.priv_auto_alloc_size	= sizeof(struct altera_fpga_pcie),
};

Here is a u-boot dts example. I put this in the soc section of the socfpga.dtsi. Please note: u-boot does not grok hierarchical bus structures at all. You can include the hps2fpga and lwhps2fpga interfaces, but u-boot will not merge the addresses.

			pcie: pcie@C0000000 {
				compatible = "altr,pcie-root-port-1.0";
				reg = <0xC0100000 0x00100000>,
				      <0xC0000000 0x00200000>; // Attempted cheat.
//				      <0xC0000000 0x00004000>;
				reg-names = "Txs", "Cra";
				interrupt-parent = <&intc>;
				interrupts = <0 40 4>; // 40 is the interrupt, add 32 to relate to the GIC document
//				interrupt-controller;  // 4 is IRQ_TYPE_LEVEL_HIGH, 1 is IRQ_TYPE_EDGE_RISING
				#address-cells = <2>;
				#size-cells = <1>;
				#interrupt-cells = <1>;
				#clock-cells = <0>;
				clocks = <&clk_100>;
				device_type = "pci";
				bus-range  = <0x00 0xff>;
				ranges = <0x82000000 0x00000000 0xC0100000 0x00100000>;
				msi-parent = <&msi>;
				interrupt-map-mask = <0 0 0 7>;
				interrupt-map = <0 0 0 1 &pcie 1>,
						<0 0 0 2 &pcie 2>,
						<0 0 0 3 &pcie 3>,
						<0 0 0 4 &pcie 4>;
//				status = "disabled"; // Disabled by default
			}; //end pcie@C0000000

			msi: msi@C0200080 {
				compatible = "altr,msi-1.0";
				reg = <0xC0200080 0x00000010>,
				      <0xC0200000 0x00000080>;
				reg-names = "csr", "vector_slave";
				interrupt-parent = <&intc>;
				interrupts = <0 41 4>;
				clocks = <&pll_65>;
				msi-controller = <1>;
				num-vectors = <32>;
//				status = "disabled"; // Disabled by default
			}; //end msi@000200080

I think I found a problem. I’m not positive it’s “the” problem, but I think the PCI auto config is failing to initialize the root port properly. I added some more debugging information and fixed a few of the existing debug statements and can see that the root port memory is not being defined correctly:

=> pci
PCI Autoconfig: Bus Memory region: [c0000000-c00fffff],
                Physical Memory [c0000000-c00fffffx]
PCI Autoconfig: pci class = 1540
PCI Autoconfig: Found P2P bridge, device 0
PCI Auto: bars_num = 2, enum_only = 0
PCI Autoconfig: BAR 0, I/O, size=0x4, No resource
PCI: Failed autoconfig bar 0

BAR 1 not implemented.
PCI Autoconfig: pci class = 264
PCI AutoConfig: falling through to default class
PCI Auto: bars_num = 6, enum_only = 0
PCI Autoconfig: BAR 0, Mem, size=0x4000, address=0xc0000000 bus_lower=0xc0004000
PCI Auto: 32 bit PCI configured!

BAR 1 not implemented.
BAR 1 not implemented.
BAR 1 not implemented.
BAR 1 not implemented.
Scanning PCI devices on bus 0
BusDevFun  VendorId   DeviceId   Device Class       Sub-Class
_____________________________________________________________
00.00.00   0x1172     0xe000     Bridge device           0x04

I think the first BAR0 initialization is failing to setup the I/O for the NVMe. I think the “No Resource” message indicates that. Although, the mem is getting setup, and both devices are being recognized. It is also very strange how the 1540 device (0x0604) is failing, but the 264 (0x108) device works. Because device class 264 is wrong (non-existent really), it is taking the wrong path through a case statement in pci_auto.c. I can’t figure out why the 1540 device is 4 bytes and the 264 device is the correct size.

I’m still looking into it.

There’s definitely a difference in how linux sets up the device vs u-boot:

Linux:

root@cyclone5:~# lspci -x
00:00.0 PCI bridge: Altera Corporation Device e000 (rev 01)
00: 72 11 00 e0 46 01 10 00 01 00 04 06 10 00 01 00
10: ff ff ff ff 00 00 00 00 00 01 01 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 00 01 03 00

01:00.0 Non-Volatile memory controller: Sandisk Corp WD Black 2018/PC SN520 NVMe SSD (rev 01)
00: b7 15 03 50 46 05 10 00 01 02 08 01 10 00 00 00
10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 15 03 50
30: 00 00 00 00 80 00 00 00 00 00 00 00 36 01 00 00

U-boot:

=> pci display.b 0.0.0
00000000: 72 11 00 e0 06 00 10 00 01 00 04 06 08 00 01 00
00000010: ff 00 00 00 00 00 00 00 00 01 01 00 00 00 00 00
00000020: 00 c0 00 c0 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 50 00 00 00 00 00 00 00 00 01 00 00
=> pci display.b 1.0.0
00000000: b7 15 03 50 06 00 10 00 01 02 08 01 08 00 00 00
00000010: 04 00 00 c0 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 b7 15 03 50
00000030: 00 00 00 00 80 00 00 00 00 00 00 00 00 01 00 00

Just a little update.

I’ve been learning a lot about the ranges device tree entry.

For example, for the Txs of the PCIe controller, in u-boot, the following ranges command is – probably – correct:

ranges = <0x83000000 0x0 0x00000000 0x0 0xC0000000 0x0 0x00100000>;

There is a fundamental problem with u-boot. It does not grok when the PCI local address is different from the CPU address. It doesn’t understand and there is a lot of coding to do to make it work.

I have hacked the pci-auto-common code to program the root port correctly. It wasn’t too bad, but now the nvme code assumes that the address it reads from the pcie controller BAR is CPU space, when in this case it is clearly not.

The big issue is simply that u-boot does not grok how the simple-bus works. If it understood the subordinate hardware under the simple bus and how it needs to or in the upper bits from the simple-bus, then all my problems would go away. It looks like getting it to understand that is a very large undertaking. It looks like even hacking it to work in my instance is a bit of work. I did just that and the self checking code rejected my correct address because it is out of bounds.

This is frustrating. But I’ll keep pounding away at it.

Hi @BrianM

You seem to be making some progress, although I get the feeling it’s two steps forward and one back (or maybe one forward and two back). As I read through some of the earlier updates I thought I might have had a solution to your problem, but then you seemed to make some progress in identifying a possible issue with the config, u-boot compared to Linux, so I don’t think my idea has any legs.

Anyway I was looking at your PCI-E ranges and could not figure it out, possibly don’t have the context of the other DTS entries. The entry in my Linux DTS for the PCI-E ranges has 7 fields, with the #address-cells and #size-cells set to 3 and 2 respectively. This I believe decodes as the PCI-E address in the first three fields, the processor address in the next two fields (#address-cells being 2 in the parent bridge) and the address range in the last two fields. The first PCI-E entry is actually a set of flags used to describe the address range in the next two entries (see reference here).

For Linux DTS files it seems to use 64-bit addresses for the processor even for 32-bit CPUs, #address-cells = <2>. However all my u-boot DTS entries only define 32-bit address fields, #address-cells = <1> for the soc and various busses up to the bridge. Therefore should your u-boot PCI-E DTS ranges entry be only 6 fields and not 7?

This may be a red-herring as I’m also learning about device tree entry formats, ranges as well as others, so may well have got this wrong, but might be work a check. It could however be as you suspect and u-boot just does not handle the simple-bus properly. I seem to remember having to debug through the u-boot code for some issue, can’t remember what now, and stepping through the bus range handling code.

From a previous message you said your u-boot with the PCI-E driver and NVME was about 742KIB, that’s unfortunately way over my u-boot budget which is around half that or less as its stored in BlockRAMs in the fabric. I think my u-boot is currently around 280 - 290KiB and the FW guys would ideally like me to reduce that.

Happy debugging.

Cheers

The proper DTS entries for linux look like this:

		bridges: bridge@c0000000 {
			compatible = "altr,bridge-20.1", "simple-bus";
			reg = <0xc0000000 0x20000000>,
			      <0xff200000 0x00200000>;
			reg-names = "axi_h2f", "axi_h2f_lw";
			clocks = <&pcie &pll_65 &pcie &pll_65>;
			clock-names = "h2f_axi_clock", "h2f_lw_axi_clock", "f2h_sdram0_clock", "f2h_sdram1_clock";
			#address-cells = <2>;
			#size-cells = <1>;
			ranges = <0x00000000 0x00200080 0xc0200080 0x00000010>,
				 <0x00000000 0x00200000 0xc0200000 0x00000080>,
				 <0x00000000 0x00000000 0xc0000000 0x00100000>,
				 <0x00000000 0x00100000 0xc0100000 0x00004000>,
				 <0x00000001 0x00000000 0xff200000 0x00100000>;

			pcie: pcie@000100000 { // The @000100000 signifies nothing!
				compatible = "altr,pcie-root-port-20.1", "altr,pcie-root-port-1.0";
				reg = <0x00000000 0x00000000 0x00100000>,
				      <0x00000000 0x00100000 0x00004000>;
				reg-names = "Txs", "Cra";
				interrupt-parent = <&intc>;
				interrupts = <0 40 4>; // 40 is the interrupt, add 32 to relate to the GIC document
				interrupt-controller;  // 4 is IRQ_TYPE_LEVEL_HIGH, 1 is IRQ_TYPE_EDGE_RISING
				#address-cells = <3>;
				#size-cells = <2>;
				#interrupt-cells = <1>;
				#clock-cells = <0>;
				clocks = <&clk_100>;
				device_type = "pci";
				bus-range = <0x00 0xff>;
				ranges = <0x82000000 0x0 0x00000000 0x0 0x00000000 0x0 0x00100000>;
				msi-parent = <&msi>;
				interrupt-map-mask = <0 0 0 7>;
				interrupt-map = <0 0 0 1 &pcie 1>,
						<0 0 0 2 &pcie 2>,
						<0 0 0 3 &pcie 3>,
						<0 0 0 4 &pcie 4>;
				status = "disabled"; // Disabled by default
			}; //end pcie@000100000

			msi: msi@000200080 {
				compatible = "altr,msi-1.0";
				reg = <0x00000000 0x00200080 0x00000010>,
				      <0x00000000 0x00200000 0x00000080>;
				reg-names = "csr", "vector_slave";
				interrupt-parent = <&intc>;
				interrupts = <0 41 4>;
				clocks = <&pll_65>;
				msi-controller = <1>;
				num-vectors = <32>;
				status = "disabled"; // Disabled by default
			}; //end msi@000200080

		}; //end bridge@c0000000 (hps_0_bridges)

u-boot does not seem capable of processing the dtb generated by this. There is an advanced ranges capability in u-boot, see options: SPL_OF_TRANSLATE, and OF_TRANSLATE. The latter is already on, but when I turn on the former, even though I can get it to fit in SPL, it hangs on boot up. Therefore, it seems to me that unless the device tree is processed twice, once simply, for SPL, and once for main code, then it may not work. I’ll look for that code today. If I can force OF_TRANSLATE after u-boot is up and running, then maybe the device tree can function properly. I’m still trying to understand how the OF data is stored and utilized. I have noticed that much of the OF code comes from older linux versions (if memory serves: 4.1.9), it might also need updated to deal with the hierarchical nature of Cyclone V. I’ve put in a help request on the u-boot mailing list but I don’t think I got past the moderator.

I’ve made it so far, I’m unwilling to give up at this point. For the last day or so I have been trying to hack u-boot into using the PCIe port. I haven’t been able to trick it yet. The way u-boot callocs memory based on the ranges variable has gotten in my way every time.

As for you needing to put u-boot in the FPGA SRAM, you only have the FPGA program flash on your board, right? They are so expensive I put a QSPI device on there just in case. I’m glad I did.

I did an experiment on Cyclone V Helio development board to get it to boot from SRAM and was able to fit 1024KB without any of my code. I think it has a bigger FPGA than I have though. On my board, I think I could only get to around 400KB with my code. Without it, probably much more. I gave up on that approach when I found out how much Altera and Micron charge for the BOOTSPI parts. (note: Micron is much cheaper).

Arria V is bigger than Cyclone V, right? I haven’t looked. I only know that Arria costs more which is why we chose Cyclone V.

Are you trying to stuff all of your logic in the FPGA with the u-boot code? You don’t have to do that. You could instantiate the HPS, PCI, and a block of boot ROM, and then after u-boot is up and running reload the FPGA from your boot device.

There is another approach I thought of. You could write your own 64K SPL bare metal boot program using bits and bobs of u-boot and a simple loop to write all your config registers, bring up specifically exactly what you need to get your boot device initialized. You should be able to get all that to fit in 64KB .

Hi @BrianM

u-boot does not seem capable of processing the dtb generated by this.

So are you using the Linux DTS file in u-boot? For the Arria 10 the u-boot DTS and Linux DTS are completely separate beasts and as I said subtly different for things like the number of address cells. u-boot may be able to handle Linux DTS files with larger address cell ranges but I’m not sure. If it could I don’t see why for Arria 10 they make the two DTS file so different (u-boots is generated by the bsp-create-settings script and Linuxs by the sopcinfo script). This may explain why your u-boot does not understand the Linux DTB.

There is an advanced ranges capability in u-boot, see options: SPL_OF_TRANSLATE, and OF_TRANSLATE.

Now you mention OF_TRANSLATE I do remember some issue I had with that way back at the start. Can’t for the life of me remember what the issue was but I think it had something to so with ranges. My u-boot build currently has both SPL_OF_TRANSLATE and OF_TRANSLATE turned off.

As for you needing to put u-boot in the FPGA SRAM, you only have the FPGA program flash on your board, right?

No we are putting u-boot in FPGA fabric BlockRAMs not SRAM, i.e. within the bitstream itself. We do have a ECPQ flash which holds the bitstream, at least for now.

Cheers

I just realized a few days ago that it’s worse than this. There is a linux dts, and there is a u-boot dts, and there is a u-boot SPL dts. They are all different.

The u-boot dts is named like this: socfpga_cyclone5_[board name].dts

I found that one right away.

The SPL dts is named like this: socfpga_cyclone5_[board name]-u-boot.dtsi

One of the big problems I had was when u-boot doesn’t find the -u-boot.dtsi file it simply does not create one for SPL and SPL fails to initialize. It bricks the board. There should be a warning about not finding that file seeing as how important it is.

Right now I’m comparing the “OF” routines from linux and u-boot and u-boot is missing all the special stuff needed for pci. I think I’m going to have to add it.

Hi @BrianM

I just realized a few days ago that it’s worse than this. There is a linux dts, and there is a u-boot dts, and there is a u-boot SPL dts. They are all different.

I forgot to mention the SPL DTS, although that should be generated from the u-boot main DTS via a script whos name escapes me. On the Arria 10s build process, the u-boot main DTS contains entries u-boot,dm-pre-reloc which IIRC are picked up by the SPL DTS parsing script and used to mark those entries for retaining in the SPL DTS. All other entries are stripped out if memory serves me correctly. I’m surprised therefore that the SPL DTS was not generated as part of this build process, unless its different for Cyclone 5.

I picked up on that! I wonder why it doesn’t work for my board? I’ll have to figure that out soon.

I have made some progress. I looked at the dtb parsing code and found a comment about pci address always being three cells. Who knew?

So I adjusted the pci description and got a lot farther:

	pcie: pcie@000100000 {
		compatible = "altr,pcie-root-port-1.0";
		reg = <0xC0000000 0x00100000>,
		      <0xC0100000 0x00004000>;
		reg-names = "Txs", "Cra";
		interrupt-parent = <&intc>;
		interrupts = <0 40 4>; // 40 is the interrupt, add 32 to relate to the GIC document
		interrupt-controller;  // 4 is IRQ_TYPE_LEVEL_HIGH, 1 is IRQ_TYPE_EDGE_RISING
		#address-cells = <3>;
		#size-cells = <1>;
		#interrupt-cells = <1>;
		#clock-cells = <0>;
		clocks = <&clk_100>;
		device_type = "pci";
		bus-range  = <0x00 0xff>;
		ranges = <0x82000000 0x0 0x00000000 0xc0000000 0x00100000>;
		//       <32BITNOPF> <PCI  ADDRESS> <HOSTADDR> <MEM  SIZE>
		msi-parent = <&msi>;
		interrupt-map-mask = <0 0 0 7>;
		interrupt-map = <0 0 0 1 &pcie 1>,
				<0 0 0 2 &pcie 2>,
				<0 0 0 3 &pcie 3>,
				<0 0 0 4 &pcie 4>;
	}; //end pcie@000100000

Now, I’m fighting an v7 alignment issue.

Almost there.

=> nvme info
Device 0: Vendor: 0x15b7 Rev: 20120006 Prod: 2022GF476308        
            Type: Hard Disk
            Capacity: 244198.3 MB = 238.4 GB (500118192 x 512)
=> nvme detail
Blk device 0: Optional Admin Command Support:
        Namespace Management/Attachment: no
        Firmware Commit/Image download: yes
        Format NVM: yes
        Security Send/Receive: yes
Blk device 0: Optional NVM Command Support:
        Reservation: yes
        Save/Select field in the Set/Get features: yes
        Write Zeroes: yes
        Dataset Management: yes
        Write Uncorrectable: yes
Blk device 0: Format NVM Attributes:
        Support Cryptographic Erase: No
        Support erase a particular namespace: Yes
        Support format a particular namespace: Yes
Blk device 0: LBA Format Support:
        LBA Foramt 0 Support: (current)
                Metadata Size: 0
                LBA Data Size: 512
                Relative Performance: Good
Blk device 0: End-to-End DataProtect Capabilities:
        As last eight bytes: No
        As first eight bytes: No
        Support Type3: No
        Support Type2: No
        Support Type1: No
Blk device 0: Metadata capabilities:
        As part of a separate buffer: No
        As part of an extended data LBA: No
## Error: "" not defined
data abort
pc : [<3ff931e8>]          lr : [<3ff8efeb>]
reloc pc : [<01013228>]    lr : [<0100f02b>]
sp : 3bf6ab40  ip : 00000002     fp : 00000003
r10: 00000000  r9 : 3bf6fec0     r8 : 3ffe2038
r7 : 00000000  r6 : 7ffc4038     r5 : 3bfc6180  r4 : 3ffe1de0
r3 : 3ffe2020  r2 : 3ffe2018     r1 : 3ffe2018  r0 : 3ffe2028
Flags: nzcv  IRQs off  FIQs off  Mode SVC_32 (T)
Code: f1a0 0308 185e 68a5 (6877) 42ae 
Resetting CPU ...

resetting ...

For those who are interested, I have found two significant problems with the existing pci and nvme drivers installed in u-boot:

  1. The pci drivers are not aware that the pci address and the cpu address can be different and does not know how to deal with them when they are. I hacked the dts and the code to force it to work.

  2. The nvme driver has to clear and invalidate cache a lot to successfully talk to the nvme. The problem is, the cache clear and cache invalidate code on the ARMv7 is not very robust. In the clear case if the start sent to the cache_clear call are not on a 32 byte boundary, the cache clear adds one sector to the start value and fails to actually clear the needed address space. In the invalidate case a completely different set of code is called and in this case if the start address and end address are not on a 64 byte boundary the cache invalidate does not run and if the start and end are properly masked, and end up equal, the cache is not invalidated. I added much code to nvme.c to make sure the rules are followed and all the cache clears and invalidates work. It seems to me that the cache clear is a bug in u-boot. And the fact that there are two clearing mechanism, one that uses 32 byte cache_length and one that uses 64 byte cache length is just … wrong. I bet it’s not like that in linux.

I learned a lot about pci and nvme. Now I have to get it fully working and then figure out the right way to fix it all.

The u-boot community has not been helpful up to this point so I’m going to make changes, get it clean, and then submit a patch and see if they respond to me. If they don’t I’ll submit them to the socfpga maintainer and see how he does.

Either way, when I get it working, I’ll post a patchset here for you guys.

It does appear to be working. I think the nvme detail command crashes for reasons outside the nvme driver itself.

=> run loadfpga
7007204 bytes read in 357 ms (18.7 MiB/s)
=> pci
Scanning PCI devices on bus 0
BusDevFun  VendorId   DeviceId   Device Class       Sub-Class
_____________________________________________________________
00.00.00   0x1172     0xe000     Bridge device           0x04
=> nvme scan
GOT HERE! !!!!!!! EVIL HACK WILL WORK!
pci-uclass: pci_bus_addr = 0xc0010000
=> nvme info
Device 0: Vendor: 0x15b7 Rev: 20120006 Prod: 2022GF476308        
            Type: Hard Disk
            Capacity: 244198.3 MB = 238.4 GB (500118192 x 512)
=> ext4ls nvme 0:1
<DIR>       4096 .
<DIR>       4096 ..
<DIR>      16384 lost+found
              51 test.txt
       485251955 463MB.bin
       485251955 463MB_copy.bin
         1637856 nvme
             289 check_nvme_admin_queue.sh
             551 id-ns_a.txt
             551 id-ns_b.txt
         2224514 nvme_full_wd_debug.log

I have determined that the 2021.04 version of u-boot fixes at least one of the issues I have listed above, however there is not socfpga fork of that prepared so I can’t seem to enable the NVMe bus mastering properly when I hack in the altera pcie driver.

I’m using the hacked version of 2020.10 to attempt to boot from QSPI only.

I am able to lzma decompress the simplified FPGA from QSPI, program the FPGA, bring up pcie and nvme, and access the SSD, however when I reprogram the FPGA with the full image from the SSD, the SSD crashes.

I suspect this happens because the FPGA is re-loaded. I think I can fix this by forcing u-boot to forget the pci and nvme drivers and reload them. I have to figure out how that is possible.

I might also need to reset the FPGA externally to force the PCI to reconfigure, but I suspect that is not necessary.

I did get the kernel to boot with the stripped down FPGA image, but it hung at the random number generator [random: fast init done] line of the log. I think that’s because there is hardware missing.

I figured out why 2021.04 doesn’t work on my board. It’s because the linux kernel expects ECC to be initialized and 2021.04 has no idea what ECC is on the socfpga. I tried to hack it in myself, but all I succeeded in doing was bricking u-boot. Back to 2020.10 for me then.