Altera PCIe driver issue with SSD devices

We have a problem with ssd devices connected to the altera pcie root port. The ssd could be detected and even mounted. The read/write performance is not as good as expected of ssd drives, but it is working. After starting multiple read/write jobs, the device becomes unavailable and only reboot or a power cycle could bring it back.

We could reproduce the problem on a Cyclone V development board by using the reference design and kernel from here. We have testet it with a sata/ahci drive (plextor m6e) and a nvme drive (samsung 950 pro).

For the plextor the error message is

ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/00:00:00:87:1a/01:00:06:00:00/40 tag 0 ncq 131072 out
         res 40/00:00:e1:01:80/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }

resetting does not work

ata1: hard resetting link
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: limiting SATA link speed to 3.0 Gbps
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1.00: disabled
ata1.00: device reported invalid CHS sector 0
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
ata1: EH complete
sd 0:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
sd 0:0:0:0: [sda] tag#16 CDB: opcode=0x2a 2a 00 06 1a 81 c0 00 05 40 00

The error for the nvme drive is reported as controler fatal status (cfs), but a controller reset does not bring the device back

nvme 0000:01:00.0: Cancelling I/O 718 QID 1
nvme 0000:01:00.0: Cancelling I/O 719 QID 1
------------[ cut here ]------------
WARNING: CPU: 1 PID: 614 at lib/percpu-refcount.c:324 percpu_ref_reinit+0xb8/0xc4()
Modules linked in: nvme gpio_altera altera_sysid altera_rpdma(O)
CPU: 1 PID: 614 Comm: kworker/u4:2 Tainted: G        W  O    4.1.0-00203-g6de99ee-dirty #1
Hardware name: Altera SOCFPGA
Workqueue: nvme nvme_reset_workfn [nvme]
[<c0018898>] (unwind_backtrace) from [<c00139e8>] (show_stack+0x20/0x24)
[<c00139e8>] (show_stack) from [<c057fcb4>] (dump_stack+0x90/0xa0)
[<c057fcb4>] (dump_stack) from [<c0027260>] (warn_slowpath_common+0x94/0xc4)
[<c0027260>] (warn_slowpath_common) from [<c002734c>] (warn_slowpath_null+0x2c/0x34)
[<c002734c>] (warn_slowpath_null) from [<c02e2998>] (percpu_ref_reinit+0xb8/0xc4)
[<c02e2998>] (percpu_ref_reinit) from [<c02c0154>] (blk_mq_unfreeze_queue+0x64/0xac)
[<c02c0154>] (blk_mq_unfreeze_queue) from [<bf020fa0>] (nvme_dev_resume+0x74/0xf0 [nvme])
[<bf020fa0>] (nvme_dev_resume [nvme]) from [<bf0210bc>] (nvme_reset_failed_dev+0x30/0x130 [nvme])
[<bf0210bc>] (nvme_reset_failed_dev [nvme]) from [<bf01d0cc>] (nvme_reset_workfn+0x1c/0x20 [nvme])
[<bf01d0cc>] (nvme_reset_workfn [nvme]) from [<c003da48>] (process_one_work+0x15c/0x3dc)
[<c003da48>] (process_one_work) from [<c003dd1c>] (worker_thread+0x54/0x4f4)
[<c003dd1c>] (worker_thread) from [<c0043470>] (kthread+0xec/0x104)
[<c0043470>] (kthread) from [<c000faa8>] (ret_from_fork+0x14/0x2c)
---[ end trace 52c30efa12417f12 ]---
nvme 0000:01:00.0: Failed status: 3, reset controller
nvme 0000:01:00.0: Cancelling I/O 225 QID 2
...

Like mentioned, after reboot both ssd drives are functional again. Could these problems come from the pcie driver and what could go wrong with the pcie communication that stops the ssd drives from working?

Here are the steps to reproduce the issue on the CV development board (we have rev. E), using the example setup from http://releases.rocketboards.org/release/2016.05/pcierd/hw/cv_soc_devkit_pcie.tar.gz

The file hashes from the first partition are

root@cyclone5:/# md5sum /mnt/*
7db850e073bbf65a2b0fe110899e0587 /mnt/soc_system.rbf
fae10dbd1480e3f4aa7b44a1d94e0fd6 /mnt/socfpga.dtb
4dbe49de919395517c0b566e13602bbb /mnt/u-boot.scr
30dc5fe698f5e22a61cb636670a3afdf  /mnt/zImage

The kernel is

root@cyclone5:~# uname -a
Linux cyclone5 4.1.0-00203-g6de99ee-dirty #1 SMP Sun May 1 10:48:28 MYT
2016 armv7l GNU/Linux

When mounting the ssd (formatted as ext2 by mkfs.ext2), we can run some simple performance tests

root@cyclone5:~# mount /dev/nvme0n1p1 /mnt/
root@cyclone5:~# dd if=/dev/zero of=/mnt/test bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes (512.0MB) copied, 3.572857 seconds, 143.3MB/s

root@cyclone5:~# dd if=/dev/zero of=/mnt/test bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 7.573389 seconds, 135.2MB/s

root@cyclone5:~# dd if=/dev/zero of=/mnt/test bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (10.0GB) copied, 79.426496 seconds, 128.9MB/s

Sometimes even the last write command (10GB) can’t finished, but if so, starting some concurrent read/write jobs with this big file crashes the ssd device communication

root@cyclone5:~# cp /mnt/test /mnt/test_1 &
[1] 2465
root@cyclone5:~# cp /mnt/test /mnt/test_2 &
[2] 2470
root@cyclone5:~# cp /mnt/test /mnt/test_3 &
[3] 2477
root@cyclone5:~# cp /mnt/test /mnt/test_4 &
[4] 2482

After a while the described error messages occur in the kernel log (to be viewed with dmesg).

If someone has a similar setup by hand it would be a great help to check these steps. I would be glad to hear about the observations and the used ssd device.

Hi Stefan,

I’m seeing the same on the Altera 4.1 LTSI kernel, both on the CV Dev Kit and our own hardware. Using a Samsung SM951 NVME.

The problem seems to come and go, haven’t been able to pin down yet what causes it. Reads always seem solid, the problem appears to only be on writes.

HI Stefan, huntero,

We are seeing similar stability issues when accessing SSD devices over PCIe, sometimes nVME access works fine, sometimes not. Have you guys made any progress with this ?

Many thanks,

Kenny

Hello. Sorry my English is very bad. I will write in Russian.

У нас Quartus Prime 18.0 Standard Edition, Cyclone V SOC, Samsung NVMeM.2 970EVO Plus.
Мы столкнулись с похожей проблемой, изучили ее и нашли решение.

Устройствам PCI назначаются физические адреса.

lspci -vv | grep “Memory behind”
Memory behind bridge: 00000000-000fffff [size=1M]

Этот диапазон перекрывается с SDRAM. PCI HARD IP не пропускает запросы от устройства PCI в SDRAM по адресам из диапазона выделенного для устройств PCI (00000000-000fffff).

При запросе чтения появляются ошибки:

lspci -vv | grep DevSta
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

При запросе записи появляются ошибки:

lspci -vv | grep DevSta
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-

В DTS мы назначили адреса PCI за пределами SDRAM (в нашей системе 1GB SDRAM)

Было:

ranges = <
0x82000000 0x00000000 0x00000000 0xe0000000 0x00000000 0x10000000
>;

Стало:

ranges = <
0x82000000 0x00000000 0x50000000 0xe0000000 0x00000000 0x10000000
>;

lspci -vv | grep “Memory behind”
Memory behind bridge: 50000000-500fffff [size=1M]

Теперь PCI устройство может обращаться на чтение и запись ко всей SDRAM. Но теперь запросы от HOST не могут быть обработаны устройством PCI, потому что Avalon-MM for PCIe не знает, что старший байт адресов PCI равен 0x50.

После генерации файлов в QSYS нужно редактировать файл altpciexpav_stif_tx.v строка 240.

Было:

  .TxAddress_i(TxAddress_i), // word address

Стало:

  .TxAddress_i(TxAddress_i + 32'h50000000), // word address

После этих коррекций HOST может обращаться к PCI устройству, а PCI устройство может обращаться ко всем адресам SDRAM (в нашей системе 1GB). Проблем с работой SSD теперь не наблюдаем.

Hi All,

Replying here, because there are just a few useful pages googlable on the subject :slight_smile:
We also had this problem and solved it.

Setup: Cyclone V SoC with Linux -> PCIe root port (1x) -> SATA controller (ASMedia 1061) -> SSD.

With basic PCIe setup done, we’ve got “/dev/sda” and could format/mount the drive. However, reading was very unstable - md5sum of the same large (600 MB) file gave different results each time (some parts of the file were replaced by zeroes).

Turning on AER (Advanced Error Reporting) in both PCIe core and Linux kernel showed that PCIe Root port complained (Unsupported Request) on TLPs which are writes to sdram by SATA controller with destination addresses below 1 MB:

[ 34.217398] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[ 34.229108] pcieport 0000:00:00.0: device [1172:e000] error status/mask=00100000/00400000
[ 34.237441] pcieport 0000:00:00.0: [20] Unsupported Request (First)
[ 34.244204] pcieport 0000:00:00.0: TLP Header: 40000020 010000ff 0003d000 00000000
[ 34.252048] pcieport 0000:00:00.0: AER: Device recovery failed

This matches asivkov’s finding very well (thank you very much, Andrei!): PCIe root doesn’t let devices behind it (SATA controller) to write into configuration space.

As we could see, all problematic TLP destinations (0003d000 above) were below 1 MB - hitting into default configuration space.

In our setup, PCIe.RPMaster is connected directly to one of HPS SDRAM interfaces. PCIe RX port simply “forwards” everything to avalon-MM output port without any translation, so PCIe Addresses == SDRAM Addresses.

Configuration space (1 MB) was located at the beginning of PCI address space (0x0) by default and overlaps with SDRAM:

Memory behind bridge: 00000000-000fffff [size=1M]

Whenever Linux (AHCI driver) issued DMA command with SDRAM buffer address under 1 MB, it failed.

In theory, there is “dma-ranges” device tree option which would resolve this conflict, but it looks like it is not respected by altera root port driver (?). So, we moved the configuration address to some non-conflicting area, as asivkov suggested.

For some reason, moving it to above 1 GB didn’t work for us (system hangs), so we made a “hole” in Linux RAM map at 64-96 MB, using “reserved-memory” mechanism in device tree:

ranges = <0x82000000 0x00000000 0x04000000 0x00000000 0x04000000 0x00000000 0x02000000>;

Now it works.

Some more notes:

  1. We switched PCIe core to 64 bit addresses. 32 bit addressing involves some weird “Avalon-MM address translation” which seems to be NOT being setup by altera root port driver in any manner. I don’t understand how is it supposed to work without driver support.
  2. Instead of hacking altpciexpav_stif_tx.v, we’ve inserted Address Span Extender unit on H2F -> PCIe.Txs path, which maps 32bit->64bits and adds necessary offset (this is the key place!).
    There is also clock-crossing bridge placed before extender, so the whole chain is

H2F -> clock-crossing bridge -> address extender -> PCIe.Txs

  1. In the “ranges” string in DTS:
    ranges = <0x82000000 0x00000000 0x04000000 0x00000000 0x00000000 0x00000000 0x02000000>;
  • first number (0x8200_0000) is “non-prefetchable ram on PCI bus 0” (see https://elinux.org/Device_Tree_Usage#PCI_Address_Translation).
  • second and third numbers are PCI address (0x0 0x04000000 - 64 MB) - that’s the offset of where you move PCI configuration space to. It must match the offset in span extender (!).
  • forth and fifth are H2F addresses (address of clock-crossing bridge in our case) on H2F bus. It can be chosen arbitrary.
  • last 2 numbers are size of configuration space (32 MB in our example; can’t be less than 1 MB).

If you use “reserved memory”, offset and size of configuration space must match your reserved region.

  1. Final memory layout:

cat /proc/iomem

00000000-03ffffff : System RAM
00008000-00afffff : Kernel code
00c00000-00cd0003 : Kernel data
08000000-3fffffff : System RAM
c0000000-c0ffffff : /sopc@0/bridge@0xc0000000/pcie@0x040000000
c0000000-c00fffff : PCI Bus 0000:01
c0000000-c00001ff : 0000:01:00.0
c0000000-c00001ff : ahci
ff200000-ff20000f : vector_slave

lspci -vvv | grep emo

    Memory behind bridge: 04000000-040fffff [size=1M]
    Prefetchable memory behind bridge: 00000000-000fffff [size=1M]
    Region 5: Memory at c0000000 (32-bit, non-prefetchable) [size=512]

Hope this helps someone,
Anatoly

Translated, thanks Google:

We have Quartus Prime 18.0 Standard Edition, Cyclone V SOC, Samsung NVMeM.2 970EVO Plus.
We faced a similar problem, investigated it and found a solution.

PCI devices are assigned physical addresses.

lspci -vv | grep “Memory behind”
Memory behind bridge: 00000000-000fffff [size=1M]

This range overlaps with SDRAM. PCI HARD IP does not allow requests from PCI devices to SDRAM at addresses from the range allocated for PCI devices (00000000-000fffff).

When requesting a read, errors appear:

lspci -vv | grep DevSta
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

When requesting a record, errors appear:

lspci -vv | grep DevSta
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-

In DTS, we assigned PCI addresses outside of SDRAM (in our system, 1GB SDRAM)

It was:

ranges = <
0x82000000 0x00000000 0x00000000 0xe0000000 0x00000000 0x10000000
>;

Became:

ranges = <
0x82000000 0x00000000 0x50000000 0xe0000000 0x00000000 0x10000000
>;

lspci -vv | grep “Memory behind”
Memory behind bridge: 50000000-500fffff [size=1M]

Now the PCI device can read and write to the entire SDRAM. But now, requests from the HOST cannot be processed by the PCI device because Avalon-MM for PCIe does not know that the high byte of PCI addresses is 0x50.

After generating the files in QSYS, you need to edit the file altpciexpav_stif_tx.v line 240.

It was:

 .TxAddress_i(TxAddress_i), // word address

Became:

 .TxAddress_i(TxAddress_i + 32'h50000000), // word address`

After these adjustments, the HOST can access the PCI device, and the PCI device can access all SDRAM addresses (1GB in our system). Now we do not observe problems with the operation of the SSD.

See this post. I have added a link to a Cyclone V qsys and dtsi pcie design that works for me.