Following document [Intel Agilex® 7 FPGA I-Series PCIe Root Port Reference Design | Documentation | RocketBoards.org] (Intel Agilex® 7 FPGA I-Series PCIe Root Port Reference Design | Documentation | RocketBoards.org)
We designed a board with Agilex I-series FPGA to run Linux on HPS and managed to enable a PCIe 5.0 x4 NVMe host with R-tile. We can run sequential read/write with FIO on HPS/Linux but sequential read speed is only ~900MB/s, far slow than expected 15GB/s. After some debugging, it is found that BVALID of HPS is too slow. As it can be seen in signal_tap below, BVALID asserts many cycles after WLAST signal. The data write channel transfers 512 bytes in 8 burst cycles (clock is 250MHz). WLAST/WVALID/WREADY seem to be correct. But BVALID signal doesn’t assert until 304ns (76 axi_clk cycles (250MHz)) after WLAST signal. This results in hugh waste on f2h write bandwidth.
Can someone advise why BVALID is too slow? Can BVALID be ingored?