Signal bvalid of Agilex HPS is too slow when Rtile BAM writes to CCU through f2h_bridge

Following document [Intel Agilex® 7 FPGA I-Series PCIe Root Port Reference Design | Documentation | RocketBoards.org] (Intel Agilex® 7 FPGA I-Series PCIe Root Port Reference Design | Documentation | RocketBoards.org)

We designed a board with Agilex I-series FPGA to run Linux on HPS and managed to enable a PCIe 5.0 x4 NVMe host with R-tile. We can run sequential read/write with FIO on HPS/Linux but sequential read speed is only ~900MB/s, far slow than expected 15GB/s. After some debugging, it is found that BVALID of HPS is too slow. As it can be seen in signal_tap below, BVALID asserts many cycles after WLAST signal. The data write channel transfers 512 bytes in 8 burst cycles (clock is 250MHz). WLAST/WVALID/WREADY seem to be correct. But BVALID signal doesn’t assert until 304ns (76 axi_clk cycles (250MHz)) after WLAST signal. This results in hugh waste on f2h write bandwidth.
Can someone advise why BVALID is too slow? Can BVALID be ingored?


in signal tap, data write only occurs when WREADY and WVALID are high. Because BVALID asserts too late, there is no data write in most clock cycles, resulting in low sequential read speed (memory write TLPs)

Updated Quartus design to ingore BVALID from HPS. Sequential read speed boosts from ~900MB/s to 3.5GB/s. There are still idle cycles with no data transactions.