Arria 10 HPS I2C (designware) issue?

Hello,
I have a problem with i2c on arria 10 hps. After some time the I2C communication stops and the most curious thing is that it stops with bus free conditions. E.g. SDA and SCL are high. The problem is only with one of the slaves on the big network. The slave is Microchip PIC micro MCU. But before blaming PICmicro I want to be sure that Arria i2c works fine. For me it is not normal the bus to be free and no attempts for slaves access to be present from master side.
The CPUs are not blocked and the system is running
Info: Linux 4.9.76.
Not always but sometimes information like this is yelled on the terminal:


[ 360.815901] INFO: task kworker/0:2:914 blocked for more than 120 seconds.
[ 360.822663] Not tainted 4.9.76-rt61-ltsi-rt-csc5-AUTOINC+413d2d0a9b #1
[ 360.829680] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 360.837476] kworker/0:2 D 0 914 2 0x00000000
[ 360.842968] Workqueue: events set_brightness_delayed
[ 360.847942] [<8079b778>] (__schedule) from [<8079ba74>] (schedule+0x60/0xfc)
[ 360.854972] [<8079ba74>] (schedule) from [<8079ceb0>] (__rt_mutex_slowlock+0x88/0x15c)
[ 360.862862] [<8079ceb0>] (__rt_mutex_slowlock) from [<8079d4ac>] (rt_mutex_slowlock_locked+0xd8/0x260)
[ 360.872135] [<8079d4ac>] (rt_mutex_slowlock_locked) from [<8079d69c>] (rt_mutex_slowlock.constprop.8+0x68/0xa0)
[ 360.882185] [<8079d69c>] (rt_mutex_slowlock.constprop.8) from [<8079d9c8>] (rt_mutex_lock_state+0x9c/0xc4)
[ 360.891803] [<8079d9c8>] (rt_mutex_lock_state) from [<8079da0c>] (rt_mutex_lock+0x1c/0x20)
[ 360.900037] [<8079da0c>] (rt_mutex_lock) from [<8079f764>] (_mutex_lock+0x18/0x1c)
[ 360.907588] [<8079f764>] (_mutex_lock) from [<80540ae4>] (ccuss_smbus_request+0x20/0x24)
[ 360.915675] [<80540ae4>] (ccuss_smbus_request) from [<804a30d8>] (__ccuss_gpio_set+0x38/0x84)
[ 360.924184] [<804a30d8>] (__ccuss_gpio_set) from [<804a3374>] (ccuss_gpio_set+0x50/0x5c)
[ 360.932259] [<804a3374>] (ccuss_gpio_set) from [<8049cfe4>] (_gpiod_set_raw_value+0x70/0x160)
[ 360.940762] [<8049cfe4>] (_gpiod_set_raw_value) from [<8049e380>] (gpiod_set_value_cansleep+0x58/0xa0)
[ 360.950045] [<8049e380>] (gpiod_set_value_cansleep) from [<8062c340>] (gpio_led_set+0x64/0x68)
[ 360.958632] [<8062c340>] (gpio_led_set) from [<8062c3ac>] (gpio_led_set_blocking+0x18/0x20)
[ 360.966960] [<8062c3ac>] (gpio_led_set_blocking) from [<8062aecc>] (set_brightness_delayed+0x80/0xc4)
[ 360.976153] [<8062aecc>] (set_brightness_delayed) from [<8013d444>] (process_one_work+0x1f8/0x570)
[ 360.985083] [<8013d444>] (process_one_work) from [<8013e118>] (worker_thread+0x68/0x614)
[ 360.993149] [<8013e118>] (worker_thread) from [<80143718>] (kthread+0x118/0x120)
[ 361.000525] [<80143718>] (kthread) from [<80108218>] (ret_from_fork+0x14/0x3c)
[ 361.007744] INFO: task cat:29294 blocked for more than 120 seconds.
[ 361.014108] Not tainted 4.9.76-rt61-ltsi-rt-csc5-AUTOINC+413d2d0a9b #1
[ 361.021131] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 361.028928] cat D 0 29294 1982 0x00000000
[ 361.034433] [<8079b778>] (__schedule) from [<8079ba74>] (schedule+0x60/0xfc)
[ 361.041465] [<8079ba74>] (schedule) from [<8079de7c>] (schedule_timeout+0x90/0x34c)
[ 361.049098] [<8079de7c>] (schedule_timeout) from [<8018b3e4>] (msleep+0x3c/0x48)
[ 361.056478] [<8018b3e4>] (msleep) from [<80540be8>] (ccuss_smbus_read_word_data+0x3c/0x4c)
[ 361.064727] [<80540be8>] (ccuss_smbus_read_word_data) from [<8064a848>] (ccuss_iio_read_raw+0x40/0x7c)
[ 361.074003] [<8064a848>] (ccuss_iio_read_raw) from [<80646e70>] (iio_read_channel_info+0x98/0x9c)
[ 361.082854] [<80646e70>] (iio_read_channel_info) from [<80518750>] (dev_attr_show+0x2c/0x58)
[ 361.091266] [<80518750>] (dev_attr_show) from [<802c19cc>] (sysfs_kf_seq_show+0x98/0x100)
[ 361.099420] [<802c19cc>] (sysfs_kf_seq_show) from [<802c02b8>] (kernfs_seq_show+0x34/0x38)
[ 361.107665] [<802c02b8>] (kernfs_seq_show) from [<802767b8>] (seq_read+0xbc/0x4ec)
[ 361.115855] [<802767b8>] (seq_read) from [<802c1118>] (kernfs_fop_read+0x148/0x1dc)
[ 361.123791] [<802c1118>] (kernfs_fop_read) from [<80250150>] (do_readv_writev+0x310/0x3e0)
[ 361.132035] [<80250150>] (do_readv_writev) from [<80250270>] (vfs_readv+0x50/0x68)
[ 361.139580] [<80250270>] (vfs_readv) from [<80283034>] (default_file_splice_read+0x180/0x294)
[ 361.148077] [<80283034>] (default_file_splice_read) from [<80282b84>] (do_splice_to+0x8c/0xa0)
[ 361.156658] [<80282b84>] (do_splice_to) from [<80282c4c>] (splice_direct_to_actor+0xb4/0x25c)
[ 361.165154] [<80282c4c>] (splice_direct_to_actor) from [<80282e8c>] (do_splice_direct+0x98/0xc0)
[ 361.173909] [<80282e8c>] (do_splice_direct) from [<80250754>] (do_sendfile+0x1b4/0x334)
[ 361.181887] [<80250754>] (do_sendfile) from [<802512f0>] (SyS_sendfile64+0x11c/0x148)
[ 361.189694] [<802512f0>] (SyS_sendfile64) from [<80108140>] (ret_fast_syscall+0x0/0x50)

Last update: after playing with delay value in i2c_dw_wait_bus_not_busy() in i2c-designware-core.c I achieved quite long operation (3-4h) without hanging.

Any suggestions?
Cordially
Georgi

I had problems with the embedded library I2C routines also and went to a simpler path. I was using an A10_SoC_devKit board and went to using a few short routines to perform the basic I2C functions using the IOCTL support in Linux. You can see what I did in the post on this message board titled ‘Useful I2C from the HPS on the A10 SoC Devkit board’ If you have a test that can consistently reproduce the failure using the embedded library you can try it using the simple approach.

Thanks, Jambie but I don’t think your example will help. ioctl is a wrapper for system call from userland and it will result at the end to designware routines which A10 HPS uses…My problem is located in the bosom of the kernel. One (proprietary) MFD calls designware code, scheduler is invoked on the critical places, everything looks perfect but is not working. I have some more ideas to try but I will do it when I have more time. Mean time I just want to see if anybody saw similar problems. The only things I found over the web were related to implementing I2C restore when the slave hold SCL too long or forever…but this is also not my case.