stuck in a loop in tcp_input.c or tcp_output.c in lwIP v1.4.1

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TM4C129 Launchpad lwIP Buffer issue with high TCP/IP network traffic

I am using a TM4C129 Connected Launchpad for TCP/IP communications back to a Windows host PC. The MCU is a TM4C129ENCPDT, rev A02. I have UDP and TCP/IP communications operating for start up static IP address assignment on a private network and basic data transfer works ok, so I think I have the basic peripheral setup correct. If I run at a low data rate of e.g. 2 Hz, sending a 255 byte data packet (max) over TCP/IP from the TM4C129 to the host PC, everything works well. I get the data that I expect at the rate that I expect it. 

If I increase the data transmission rate to 10Hz, then I am able to run for a short - and apparently random - period of time (max 5 mins). After that short period of time, TCP/IP communications with the TM4C129 stop. When I debug the TM4C129 code, I find that I am stuck in a loop in tcp_input.c or tcp_output.c in lwIP v1.4.1. (the rest of the TI code is also from 2.1.1.71). My only way out is to reset the board. 

More specificially, I get stuck either here:

tcp_output.c

err_t
tcp_output(struct tcp_pcb *pcb)

{

// some missing code

/* useg should point to last segment on unacked queue */
useg = pcb->unacked;
if (useg != NULL) {
for (; useg->next != NULL; useg = useg->next);    //     <= I get stuck here, useg->next is never NULL
 }

// some more missing code

}

OR I get stuck here

tcp_in.c:

 

static void
tcp_receive(struct tcp_pcb *pcb)
{

// some missing code

/* Remove segment from the unacknowledged list if the incoming
ACK acknowlegdes them. */
while (pcb->unacked != NULL &&
TCP_SEQ_LEQ(ntohl(pcb->unacked->tcphdr->seqno) +
TCP_TCPLEN(pcb->unacked), ackno)) {

  // do lwIP work      // I get stuck in this while loop, pcb->unacked != NULL is always true

}

// some more missing code

}

So 

(a) I realize that this code is in lwIP

(b) I have tried manipulating some of the lwIP  options in lwipopts.h. Increasing the number of buffers, or increasing their size delays the issue but does not solve it.

(c) I need to run at 10Hz to satisfy requirements in my Windows app, running at 2Hz is not an option for me

(d) it seems to me as though the buffer list is being corrupted somehow in that when this event occurs, there is never a NULL pcb at the end of the  UNACK list.

I am pretty sure that the issue is on the TM4C129 side of the code in terms of buffer handling. I think that the Windows app is ok.

I was  wondering if anyone else has seen similar events and if so, can you give some suggestions for how to proceed? Is it a case of following the logic through lwIP and finding where the UNACK queue is being adjusted (presumably incorrectly)?

Simon

11 Replies

  • 234285
    Amit Ashara
    Hello Simon

    Are you using the buffer allocation and then freeing up the buffer. Try to increase the heap size.

    Regards
    Amit

    Regards,

    Amit Ashara

  • In reply to Amit Ashara:

    Hi,

    Increasing the heap size (doubling it to 8kB) had a marginal impact. I was able to run for about a minute longer. So I think I still have an underlying issue. As a lot of the TCP/IP activity takes place in an interrupt context (at least, that is what the docs imply), I thought that there were restrictions on malloc'ing (i.e. malloc isn't a good idea inside an interrupt handler because of re-entrancy concerns: http://processors.wiki.ti.com/index.php/Reentrant). Let me check the buffer allocation / freeing.  

  • 234285
    Amit Ashara

    In reply to Simon Bird:

    Hello Simon

    If increasing the heap allowed it to run a little longer, then I believe it is a memory leak. Too many buffers may being allocated and not freed up. LwiP has it is own malloc. Can you let us know which version of TivaWare are you using because there was a memory leak issue in lwIP in 2.1.0.12573 which was later fixed in 2.1.1.71 onwards.

    Regards
    Amit

    Regards,

    Amit Ashara

  • In reply to Amit Ashara:

    Hi,
    I am using 2.1.1.71 (lwIP 1.4.1). 
    Thanks,
    Simon
  • 234285
    Amit Ashara

    In reply to Simon Bird:

    Hello Simon

    In the release mentioned the issue was fixed. So it seems that somewhere in your code the buffer allocation after use is not being dealloc/free.

    Regards
    Amit

    Regards,

    Amit Ashara

  • In reply to Amit Ashara:

    Hi,

    I think that I may have found the issue. I had a second call to tcp_tmr() in the lwIPHostTimerHandler by mistake. 
    Removing that seems to resolve the issue.

    Thanks,

    Simon 

  • In reply to Simon Bird:

    Sorry to say that this issue has re-surfaced as a I ramp up network traffic. 

    I keep seeing either the issue reported above or the socket closing but I am yet to detect why either of these are happening.

  • In reply to Simon Bird:

    I am also facing the same issue. Found any leads?
  • In reply to Omkar Lad:

    I think that the main issue is that you can't call any part of the lwIP API from different contexts nor from different threads. 
    All calls to the lwIP API need to be invoked from an interrupt context. I had a number of calls from my main loop, and when I removed them,
    I was able to run at the rate that I was looking for. I still see some intermittent timeout issues but I think it is now more on the host PC side. 
    So I would suggest making sure that any calls to lwIP (beyond a simple initialization call before any network traffic begins) are in an interrupt
    context. I drove them from the SysTickIntHandler which calls lwIPTimer which then (eventually) drives the polling / callback functions for lwIP and this seemed to work for me. 

  • In reply to Simon Bird:

    The reason you can't call lwIP from different contexts is that it is not re-entrant.

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

猜你喜欢

转载自blog.csdn.net/zwl1584671413/article/details/80008163
今日推荐