你好对话框,
我们正在看到一些奇怪的行为,其中一些包含DA14581芯片的系统。在我们的系统中,DA14581是从设备,不同的供应商提供主芯片芯片。
在少数这样的系统上,我们观察到从DA14581的空的PDU传输中随机暂停。连接了BLE包嗅探器迹线。随机丢失在轨迹中可见,作为吞吐量的液滴。每个暂停始终始终为1.1 - 1.2秒。在跟踪时没有交换应用层流量 - 这些都是空闲数据包。
Our systems should be built and configured identically. The application running at the time of the trace was downloaded via the IDE (NOT running out of OTP) This behavior is only exhibited in a small minority of such systems. Slave latency is set to 0 for all (confirmed multiple times!).
Any thoughts as to what might be behind this behavior? It is causing connections to drop in these cases.
Thanks,
Jim
Device:

嗨jameshiebert,
因此,这种行为在共享相同构建和FW的设备上随机展出?这是正确的吗 ?我的意思是展示此行为的设备是唯一的,您的意思是您是否在多个设备上对类似的环境进行了此测试,并且只有一个(例如)设备在连续测试中暴露此行为?Also what about the masters connected ? Is this behaviour correlated with a specific master or this will occur regardless the master of the link ?
Do you have the actual sniffer trace to share when the incident occurs and have you verified that this side effect is due to missed events from the side of the 581, so on the sniffer log you see packets from the master side and no packet from the slave side ? You have attached a graph from the throughput, but this doens't help much to find a root cause for this, what would also help is a current capture when the issue occurs, and actually see that the 581 misses connection events through the power consumption. Also it would help mentioning the SDK that you are using and the configuration of the device, is the application running entirely on the 581 or you are using the HCI interface, i assume that the app is running on the 581 entirely. And what about your low power clock, are you using the RCX or the XTAL ?
谢谢mt_dialog.
嗨对话框,
因此,这种行为在共享相同构建和FW的设备上随机展出?这是正确的吗 ?[是的。具有完全相同的HW和FW配置的少数(少量百分比)展示了丢失行为。]
我的意思是展示此行为的设备是唯一的,您的意思是您是否在多个设备上对类似的环境进行了此测试,并且只有一个(例如)设备在连续测试中暴露此行为?[正确的。在一个特定的环境中,有一个具有十几个或如此相同的测试站的实验室,所有具有具有从581芯片的DUT,并且只有一个DA14581表现出行为。我们将电路板与不同的板(不同581)交换,问题消失了。它也发生在其他环境中。该行为遵循581(或至少焊接的电路板)。]
Also what about the masters connected ? Is this behaviour correlated with a specific master or this will occur regardless the master of the link ? [The vast majority of our master devices are identical custom built boxes. There are a few outliers - commercial smart phones. I'll run an experiment with a known "failing" DA14581 and a smartphone.]
Do you have the actual sniffer trace to share when the incident occurs? [I attached a trace with 2 dropouts. We use a Teledyne LeCroy ComProbe Protocol Analysis System. Can you read those files?]
。。。and have you verified that this side effect is due to missed events from the side of the 581, so on the sniffer log you see packets from the master side and no packet from the slave side ? [This is also correct. The master continues to transmit idle packets with no response from the DA14581. These intervals occur somewhat randomly but always last for approximately 1.1 - 1.2 seconds.]
您已从吞吐量中附加了一个图表,但这并不有助于找到一个根本原因,有什么帮助是当前问题时的当前捕获,并且实际上看到581通过功耗错过了连接事件。[这将是棘手的,我们的DUT配置......]
Also it would help mentioning the SDK that you are using and the configuration of the device, is the application running entirely on the 581 or you are using the HCI interface, i assume that the app is running on the 581 entirely. [The FW running on the 581 is based on the 5.0.4 SDK, but it has been heavily modified. Most of the arch*.c files are untouched however. HCI is not used. In our system there is an app running on the 581 as well as another app running on a different IC. The two chips communicate via SPI with a custom protocol, not HCI.]
你的低功耗时钟怎么样,你是用rcx还是xtal?[RCX]
Some additional info. The pauses seem to disappear with longer Connection Intervals. For instance, no more than 1 slave idle packet in a row was dropped in over 7 hours with a 47.5ms CI. I am running further tests with shorter intervals.
我在前一条消息中的观察结果有11.25ms编程的CI。
嗨jameshiebert,
您能够在从奴隶的不活动中重新建立通信的事实意味着您在时钟侧应该可以确定,并且在问题发生时不会重置。我没有看到总体方面的奇怪,一个可能出现问题的先前交易,并影响了奴隶,所以我可以想到的唯一事情可能会导致这种情况:
结果,设备未命中活动。当前的迹线将有助于使用标识,因为如果设备通过运行代码占用,我们应该看到一个扁平的线路,只需额外的时间,我们应该看到唤醒唤醒,并且没有无线电活动(自If设备已经醒来为服务它不会启动无线电交易而迟到)。
还有什么可以帮助,如果您尝试运行一个简单的示例并检查您的HW上是否可以复制此操作,例如来自SDK的Proximity Reporter。此外,该设备是否发生了任何活动,同时发生事件,提及其连接到外部设备并通过SPI通信,因此在该时段期间在总线上有任何操作,或者可能与可以将SPI交易相关联的模式问题?也许问题与2个设备的通信之间存在某种相关性。
谢谢mt_dialog.
嗨对话框,
抱歉延迟了。我们一直在收集有关可能帮助您帮助我们的问题的一些数据。
以下是我们所知的摘要:
•暂停仅发生在设备子集(5%)中?所有设备都运行相同的应用程序SW。
• When the pauses do occur:
○ They occur randomly
○暂停的持续时间取决于主服务器的SCA(请参阅附加的CI.jpg删除对话框)。
○仅在连接间隔为11.25ms(22.5ms,45ms,...)的倍数时,暂停仅发生暂停。如果协商其他CI,则无丢弃(请参阅附加的zip中的ci.jpg删除对话框)。
○ A current trace shows consistent timing but no current spikes (representing RX and TX activity) when the pauses occur (see ComparePause_NoPause.png in attached zip). The upper trace shows CE's without pauses (two tall spikes representing idle TX and RX activity are seen), and the lower shows CE's with pauses. The timing looks consistent.
它可能是我们的应用程序SW正在做的(或不做),但对我们来说并不明显。感谢您的任何帮助,您可以提供!
"The wake up sequence takes more than expected (clock settling or code has been added in wake-up sequence) and the device reaches the programming of the event too late (there is an assertion in the SLP handler in the power_up() function, is that assertion still exists on your code ?)"
关于您的问题,上面是关于断言的问题,这是有问题的代码吗?我将assert_warning修改为assert_error以使这种情况发生非常明显的重置。
发生暂停时,不会发生以下assert_error。
/*
*检查BLE_SLP_IRQ是否已被断言。在这种情况下,我们延迟了periph_init()。
*增加LP_ISR_TIME_XTAL32_CYCLES and LP_ISR_TIME_USEC values to give more execution time
*to periph_init().
* /
if(getBits32(ble_intstat_reg,slpintstat))
assert_error(0);
Hi jamesHiebert,
从您所附的迹线看起来似乎设备正确唤醒,但显然是Rx / Tx事件的时间是错误的,最可能延迟,这是我可以提出的唯一解释,因为我可以拿出没有射频活动的唯一解释某些醒来。虽然如果这次错误,那么我提到的断言应该击中,也可以检查第二个断言是否发生在lld_sleep_compensate_func_patched()中发生。在问题发生时,SPI设备上还有任何活动,而问题发生在问题时,也许您正在遇到的是竞争条件的一些问题,因为我无法证明为什么在使用11.25连接间隔的倍数时,设备将表现得那样。
谢谢mt_dialog.
Hello again Dialog,
我们还在研究这个问题。一些新的学习,也许这将有助于你帮助我们?
The pauses do not occur with sleep disabled.
No asserts are seen when the dropouts happen, including the ones in lld_sleep_compensate_func_patched()
我修改了现有的Rwble.c“上次事件”日志记录以将事件记录到大循环缓冲区,然后在发生丢弃时读取缓冲区。
当一切都好,我看到BLE_EVT_LP, BLE_EVT_SLP BLE_EVT_CSCNT, BLE_EVT_RX, BLE_EVT_TX, BLE_EVT_END repeating.
出现丢弃时,我只看到ble_evt_lp,ble_evt_slp,ble_evt_cscnt重复。
我会检查SPI活动,但不应该有任何....
另请注意设置USE_POWER_OPTIMIZATIONS to 0 with extended sleep enabled seems to eliminate the pauses/dropouts. We do not use deep sleep.
嗨Jameshierbert,
The fact that you are seeing only the BLE_EVT_LP, BLE_EVT_SLP, BLE_EVT_CSCNT when the issue occurs just confirms the fact that the device wakes up properly but with no BLE
据我所知,正如以前的帖子所提到的,我可以想到的唯一原因是如果在RX / TX活动之前有延迟,这将阻止设备按时唤醒。由于您报告说,您无法从SDK所具有的已全部调试结构中看到任何断言,那么BLE_EVT_SLP和BLE_EVT_CSCNT必须有延迟。从您提供的捕获,我无法跟踪CSCNT事件,除非是CSCNT部分在最后一个凹凸中执行,我在有问题的跟踪中看到(在这种情况下失去事件是合理的)。看来SLP事件和CSCNT之间的时间似乎显着大于适当的迹线。所以在SLP之后的主要虽然SDK的功能有机会运行,你是否确定你在应用程序代码中的某些情况下你不关闭中断吗?您可以在下面的说明之后进行一些时间测量吗?
在ble_cscnt_handler()中尝试在问题发生时记录以下值,EVT->时间(应安排下一个事件的时间)以及当前时间计数,要执行以下操作:
For testing purposes i ve used two 16 bytes length arrays.
struct lld_evt_tag * evt =(struct lld_evt_tag *)co_list_pick(&lld_evt_env.evt_prog);
time1_log [time_idx] = EVT->时间;
time2_log [time_idx] = lld_evt_time_get();
time_idx++;
time_idx&=0xF;
Normally the values collected from time1_log should be at least 2 slots in the future than the current time value (lld_evt_time_get() time2_log). If not then that means that something (most probably from application level) is stalling the programming of the BLE event, and apparently the BLE event never occurs since its delayed.
谢谢mt_dialog.
你好对话框,
抱歉慢响应。我正在努力收集您所要求的数据,但由于实际暂停之间的滞后以及在嗅探器上注意到它时,它会产生挑战。我需要足够的缓冲空间来记录远程和内存运行低的时序数据!我会继续努力解决这个问题。
In the mean time, I've discovered the following: if I disable SPI during a connection the dropouts cease.
如您所知,每次BLE从延长睡眠中唤醒外围设备必须重新初始化。在我们的系统中,有一个应用程序在581上运行,以及在不同的IC上运行的另一个应用程序。这两个芯片通过SPI与自定义协议进行通信。我在我们的581应用程序中放入逻辑,在扩展睡眠后停止重新启用SPI。特别是当不再运行以下代码行时,我看到暂停立即停止。
SetBits16(SPI_CTRL_REG,SPI_ON, 1);
I have also tried masking the SPI interrupt and stopping other GPIO interrupts between the BLE and other chip in the system instead (see below) but in these cases the pauses continue. Above is the only peripheral re-initialization step that skipping seems to stop the pauses.
• Not related to enabling the SPI interrupt
•与通过GPIO发信号通知其他芯片无关。
• Not related to SPI internal clock divider setting (CLK_PER_REG changed from /2 to /4)
•与SPI中断优先级无关(从20到3增加)
•与clk_per_reg启用无关
•与配置SPI的GPIO引脚(输入/输出,功能等)无关
你好对话框,
Please find attached a write up with the timing data collected as suggested in your Thu, 2018-07-12 10:36 post.
嗨吉姆,
我将在这个问题上脱机。对于此问题之后的其他人来说,我们将无法记录我们的结果。
/MHv