Linux dev_queue_xmit send function analysis - transfer

When the top ready for a package, to handle the following functions:

int dev_queue_xmit(struct sk_buff *skb)   
    struct net_device *dev = skb->dev;   
    struct netdev_queue *txq;   
    struct Qdisc *q;   
    int rc = -ENOMEM;   
    /* GSO will handle the following emulations directly. */  
    if (netif_needs_gso(dev, skb))   
        goto gso;   
    //  First, be segmented to determine whether the skb  ,  If the sub-card does not support the segment and read the words scattered all segments need to be re-combined into one segment     
    //  This is actually __skb_linearize  __pskb_pull_tail(skb, skb->data_len),  This function is basically the same as  pskb_may_pull   
    //pskb_may_pull  The role is to test the corresponding main skb  buf  Is there enough space to pull out  len  Length  ,   
    //  If not for redistribution and skb  frags  The data into the new distribution of copies in the main buff  ,  And here the argument len is set to  skb->datalen,   
    //  That is, all data will be copied to the main buff of all  ,  In this manner the linear skb     
    if (skb_shinfo(skb)->frag_list &&   
        !(dev->features & NETIF_F_FRAGLIST) &&   
        goto out_kfree_skb;   
    /* Fragmented skb is linearized if device does not support SG,  
     * or if at least one of fragments is in highmem and device  
     * does not support DMA from it.  
     //  If the above is a linear, where  __skb_linearize  Will be returned directly     
     //  Note the difference between frags and  frag_list,   
     //  The former is more data into a single page allocation, sk_buff only one  .  While the latter is connected to multiple  sk_buff   
    if (skb_shinfo(skb)->nr_frags &&   
        (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) &&   
        goto out_kfree_skb;   
    /*  If the packet checksum calculation and the driver does not support hardware checksum calculation, you need to calculate the checksum here  */  
    if (skb->ip_summed == CHECKSUM_PARTIAL) {   
        skb_set_transport_header(skb, skb->csum_start -   
        if (!dev_can_checksum(dev, skb) && skb_checksum_help(skb))   
            goto out_kfree_skb;   
    /* Disable soft irqs for various locks below. Also  
     * stops preemption for RCU.  
    //  Select a send queue, if the device provides  select_queue  Callback function to use it, otherwise, select a queue by the kernel     
    //  Most drivers are not set up multiple queues, but the call  alloc_etherdev  Distribution of the queue when the number of net_device set  1   
    //  That is only one queue     
    txq = dev_pick_tx(dev, skb);   
    //  Removed from the structure of the device netdev_queue  qdisc   
    q = rcu_dereference(txq->qdisc);   
    skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);   
    //  It said most of the drive is only one queue, but only a queue does not mean that device is ready to use it     
    //  Check here if this queue functions enqueue  ,  If the description of equipment will use this queue, or an additional treatment     
    //  Set on the enqueue function  ,  I found  dev_open->dev_activate  Call to set up a qdisc_create_dflt  ,   
    //  Do not know how to set the general drive  queue   
    //  Note that, there is not passed in the  skb  Directly, but first-team  ,  And scheduling queue  ,   
    //  Which specific package sent by the enqueue and  dequeue  Function of the decision, which reflects the device queuing discipline     
    if (q->enqueue) {   
        spinlock_t *root_lock = qdisc_lock(q);   
        if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {   
            rc = NET_XMIT_DROP;   
        } else {   
            //  Skb to the device send queue  ,  Then call qdisc_run to send     
            rc = qdisc_enqueue_root(skb, q);   
            qdisc_run(q); //  See below     
        goto out;   
    //  The following is handling the situation does not use the send queue, pay attention to a comment see below     
    /* The device has no queue. Common case for software devices:  
       loopback, all the sorts of tunnels...  
       Really, it is unlikely that netif_tx_lock protection is necessary  
       here.  (f.e. loopback and IP tunnels are clean ignoring statistics  
       However, it is possible, that they rely on protection  
       made by us here.  
       Check this and shot the lock. It is not prone from deadlocks.  
       Either shot noqueue qdisc, it is even simpler 8)  
    //  To determine the device is turned on, the following should be run to determine the queue  .  Starting and stopping the queue determined by the driver     
    //  See ULNI Chinese  P251   
    //  English such as the above mentioned comments, the device is not a typical output queue is the loopback device     
    //  We have to do is call the driver directly send it out hard_start_xmit     
    //  If sending fails to directly discarded because the queue can not save it     
    if (dev->flags & IFF_UP) {   
        int cpu = smp_processor_id(); /* ok because BHs are off */  
        if (txq->xmit_lock_owner != cpu) {   
            HARD_TX_LOCK(dev, txq, cpu);   
            if (!netif_tx_queue_stopped(txq)) {   
                rc = 0;   
                //  For the loopback device  ,  Its function is hard_start_xmit  loopback_xmit   
                //  We can see that  loopback_xmit  Netif_rx function call directly at the end of     
                //  Will take direct receiving packets sent back     
                //  The following detailed analysis of this function, the return  0  For success, skb has been  free   
                if (!dev_hard_start_xmit(skb, dev, txq)) {    
                    HARD_TX_UNLOCK(dev, txq);   
                    goto out;   
            HARD_TX_UNLOCK(dev, txq);   
            if (net_ratelimit())   
                printk(KERN_CRIT "Virtual device %s asks to "  
                       "queue packet!\n", dev->name);   
        } else {   
            /* Recursion is detected! It is possible,  
             * unfortunately */  
            if (net_ratelimit())   
                printk(KERN_CRIT "Dead loop on virtual device "  
                       "%s, fix it urgently!\n", dev->name);   
    rc = -ENETDOWN;   
    return rc;   
    return rc;   

It can be seen from this function, when the driver when using the send queue to cycle out packets from the queue when the queue instead of sending only one, if not sent successfully to direct disposal

int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,   
            struct netdev_queue *txq)   
    if (likely(!skb->next)) {   
        //  It can be seen, for each packet will be sent to  ptype_all  A ,   
        //  The socket is created for the packet  proto  The Council for the ETH_P_ALL  ptype_all  Registered in a member of the     
        //  Therefore, the protocol number ETH_P_ALL of  packet  For a socket, send and receive data can be received     
        //  The other members seem to not work, try this to go back     
        if (!list_empty(&ptype_all))     
            dev_queue_xmit_nit(skb, dev);   
        if (netif_needs_gso(dev, skb)) {   
            if (unlikely(dev_gso_segment(skb)))   
                goto out_kfree_skb;   
            if (skb->next)   
                goto gso;   
        //  This is the driver to provide a callback function to send     
        return dev->hard_start_xmit(skb, dev);   
    do {   
        struct sk_buff *nskb = skb->next;   
        int rc;   
        skb->next = nskb->next;   
        nskb->next = NULL;   
        rc = dev->hard_start_xmit(nskb, dev);   
        if (unlikely(rc)) {   
            nskb->next = skb->next;   
            skb->next = nskb;   
            return rc;   
        if (unlikely(netif_tx_queue_stopped(txq) && skb->next))   
            return NETDEV_TX_BUSY;   
    } while (skb->next);   
    skb->destructor = DEV_GSO_CB(skb)->destructor;   
    return 0;   

qdisc_run and __qdisc_run function is very simple, is to check whether the queue is running

static inline void qdisc_run(struct Qdisc *q)   
    if (!test_and_set_bit(__QDISC_STATE_RUNNING, &q->state))   
void __qdisc_run(struct Qdisc *q)   
    unsigned long start_time = jiffies;   
    //  The real action in this function inside     
    while (qdisc_restart(q)) {   
         * Postpone processing if  
         * 1. another process needs the CPU;  
         * 2. we've been doing it for too long.  
        if (need_resched() || jiffies != start_time) {   
            //  When the need for scheduling or for more than one time slice to exit the loop when  ,  Before exiting the soft interrupt request issued     
    clear_bit(__QDISC_STATE_RUNNING, &q->state);   

Then the following loop calls this function to send data qdisc_restart qdisc_restart is really a function of sending data packets remove it from the queue, a frame, and then try to send it out if the failure is usually sent back into the team.
This function return value is: Send the queue length success failed to send back the remaining returns 0 (if the queue is sent successfully and the remaining length of 0 also returns 0)

static inline int qdisc_restart(struct Qdisc *q)   
    struct netdev_queue *txq;   
    int ret = NETDEV_TX_BUSY;   
    struct net_device *dev;   
    spinlock_t *root_lock;   
    struct sk_buff *skb;   
    /* Dequeue packet */  
    if (unlikely((skb = dequeue_skb(q)) == NULL))   
        return 0;   
    root_lock = qdisc_lock(q);   
    /* And release qdisc */  
    dev = qdisc_dev(q);   
    txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));   
    HARD_TX_LOCK(dev, txq, smp_processor_id());   
    if (!netif_tx_queue_stopped(txq) &&   
        ret = dev_hard_start_xmit(skb, dev, txq);   
    HARD_TX_UNLOCK(dev, txq);   
    switch (ret) {   
    case NETDEV_TX_OK:   
        /* Driver sent out skb successfully */  
        ret = qdisc_qlen(q);   
    case NETDEV_TX_LOCKED:   
        /* Driver try lock failed */  
        //  CPU is likely to hold this other lock  ,  This conflict     
        //  Function is not seen, which simply lower the processing flow     
        //  Two cases, if the possession of lock  cpu  Is the current cpu, then the release of this package also print a warning     
        //  Otherwise, the package back into the team     
        ret = handle_dev_cpu_collision(skb, txq, q);   
        /* Driver returned NETDEV_TX_BUSY - requeue skb */  
        //  Send the queue when the queue is stopped and when there is data to be sent back  NETDEV_TX_BUSY   
        //  Send on behalf busy, this case will be re-queued packet     
        if (unlikely (ret != NETDEV_TX_BUSY && net_ratelimit()))   
            printk(KERN_WARNING "BUG %s code %d qlen %d\n",   
                   dev->name, ret, q->q.qlen);   
        ret = dev_requeue_skb(skb, q);   
    if (ret && (netif_tx_queue_stopped(txq) ||   
        ret = 0;   
    return ret;   

At this point, dev_queue_xmit sent to the driver layer of the process to end.

Already have dev_queue_xmit function, why do you need the soft interrupt to send it?
We can see in the dev_queue_xmit skb will be some processing (such as merging into a single package, calculate the checksum, etc.)
Finished with the skb is sent directly, and then into the team dev_queue_xmit will first skb (skb generally in this function into the team)
And call qdisc_run try to send, but likely this fails, then back into the team will skb, scheduling soft interrupt, and return it directly.
Soft interrupt just send queue and the release has been sent skb skb, it is no longer linear skb or check and deal with another of the queue is stopped in the case, dev_queue_xmit package can still be queued, but can not send this in the queue when you need to be awakened by the soft interrupt to send the package to stop the backlog during the short, dev_queue_xmit is to do the final processing and skb first attempt to send, soft interrupt is the former failed to send or not send finished packet sent.
(In fact, there is a role of the soft interrupt transmission, is the release of the package has been sent, because in some cases to send complete hardware interrupts,
In order to improve the efficiency of the hardware interrupt handling, kernel provides a way to release the skb will be carried out into the soft interrupt,
Then just call dev_kfree_skb_irq, it will add softnet_data the completion_queue skb, and then open the Send soft interrupt,
net_tx_action completion_queue in the soft interrupt the skb will be all released out)

This article comes from CSDN blog, reproduced, please credit: **blog.csdn**/peimichael/archive/2009/10/19/4699609.aspx #
标签: amp, len, linux, queue, fragments, nr, buff, pskb, gso, emulations
分类: CPP
时间: 2010-12-14


  1. Linux threads implementation mechanism analysis (transfer)

    Original Address: I. Basics: threads and processes ...
  2. linux network card driver analysis

    linux network card driver analysis Learning should be a first simplification of the problem, the problem in th ...
  3. PS command Linux operating system, detailed analysis

    PS command Linux operating system, detailed analysis To monitor and control processes in the system, with the ...
  4. mysql create function problems (transfer)

    mysql create function problems (transfer) 2010-05-26 19:37 Today, when sorting the data, but also encountered ...
  5. Linux C language function Daquan

    To download from the Internet today, your down Linux C language function Daquan [] order the next, made doc an ...
  6. AutoIt Send function

    Send To activate the window to send simulated keystrokes to operate. Send ("keys" [, flag]) Paramete ...
  7. Linux C common function of time

    Linux C common function of time 1 time_t time (time_t * t); Back to the UTC time from 1970.1.1 0:0:0 GMT time ...
  8. 1.linux thread pthread_detach function

    Quote linux thread pthread_detach function article Category: C + + programming linux and windows of different ...
  9. I used the Linux command sftp - Secure file transfer command line tool

    I used the Linux command sftp - Secure file transfer command line tool This link: http://codingstandards.iteye ...
  10. C + + virtual function table analysis (transfer

    C + + virtual function in the role of principal is to achieve a multi-state mechanism. On multi-state, in a nu ...
  11. Linux boot process analysis (transfer)

    Boot process is the time from power on the computer screen until the user logs on LINUX show the whole process ...
  12. iphdr and tcphdr Detailed (skb_header_pointer function analysis)

    access to information under linux 2.6.26 tcp: tcph = skb_header_pointer (skb, protoff, sizeof (tcph), & tc ...
  13. Three Forms of Application Database Design Analysis (transfer)

    Simple and clear! Application database design analysis of the three major paradigms Introduction Database desi ...
  14. Getting Started: Linux file search techniques (1) (transfer)

    Each of the operating system by thousands of different types of documents composed. Which comes with the file ...
  15. linux automatically send e-mail

    If the server when I am afraid that the larger the scale of one to view a machine if the machine's operating s ...
  16. On the Linux core files under the [transfer]

    When our program crashes, the kernel is possible to present the program to the core memory-mapped file to faci ...
  17. eclipse of the redo undo function operation history of the dispose function analysis

    Turned out to be a function similar to empty the Recycle Bin. That is a node in the recycle bin and must remov ...
  18. Android boot analysis (transfer)

    Android boot of (turn) Android init process is started after the system implementation of the first name of th ...
  19. linux thread pthread_detach function

    linux and windows of different thread, pthread have two states joinable state and unjoinable state If the thre ...