Skip to content

TCP Notes

A few notes on TCP.

TCP/IP

TCP/IP is a suite of protocols that governs how data is transmitted over the internet. Two key protocols in this suite are IP (Internet Protocol) and TCP (Transmission Control Protocol).

IP

IP is a connectionless protocol that handles the addressing and routing of data packets between devices on a network. It sends each packet independently without establishing a connection, meaning it does not guarantee delivery or packet order, focusing only on directing packets to the correct destination using IP addresses.

TCP

While IP is responsible for routing packets across networks, TCP ensures that these packets are delivered correctly and in the right order. It establishes a connection between the sender and receiver, manages retransmissions of lost packets, and organizes the data into a continuous stream, ensuring that communication remains reliable.

POSIX APIs

/*
   client                        server            related

1. socket();                  1. socket();         fcntl()
2. bind(); // optional        2. bind();           epoll
3. connect();                 3. listen();         epoll_create()
4. send();                    4. accept();         epoll_ctl()
5. recv();                    5. recv();           epoll_wait()
6. close();                   6. send();           ...
                              7. close();
*/

// creates a new socket
int socket(int domain, int type, int protocol);

// binds a socket to a specific local IP address and port
// if omitted on the client side, the operating system will
// automatically assign an available local IP adress and an
// ephermeral port for the connection
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

// initiates a connection to another socket
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

// marks a socket as a passive socket to accept incomming connection requests
int listen(int sockfd, int backlog);

// accepts an incoming connection on a listening socket
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

// sends data over a connected socket
ssize_t send(int sockfd, const void *buf, size_t len, int flags);

// receives data from a connected socket
ssize_t recv(int sockfd, void *buf, size_t len, int flags);

// closes a socket
int close(int sockfd);

// changes the behavior of a file descriptor
// for example, setting it to non-blocking mode, controlling file locking,
// or modifying file access settings
int fcntl(int fd, int cmd, ... /* arg */);

TCB

TCB, or Transmission Control Block, is a data structure used by TCP to store information about a specific connection. It keeps track of the state of the TCP connection and maintains various details necessary for managing the reliable, ordered delivery of data. The TCB includes buffers like wmem for outgoing data and rmem for incomming data.

When socket() is called, a file descriptor is assigned to the created socket, and a TCB is allocated underneath (in kernel space), along with its components such as wmem and rmem.

When bind() is called, the IP address and port information are assigned to the TCB.

When send() is called, data is copied into the wmem buffer.

When recv() is called, data is copied from the rmem buffer.

send() and recv() are similar to write() and read(), as they handle copying data to and from buffers, but they are not directly responsible for actual network data transmission.

TCP Connection State Diagram

IETF. (1981). RFC 793: Transmission Control Protocol. Retrieved from https://www.ietf.org/rfc/rfc793.txt, p. 22.

[Page 22]                                                               


September 1981                                                          
                                           Transmission Control Protocol
                                                Functional Specification




                              +---------+ ---------\      active OPEN  
                              |  CLOSED |            \    -----------  
                              +---------+<---------\   \   create TCB  
                                |     ^              \   \  snd SYN    
                   passive OPEN |     |   CLOSE        \   \           
                   ------------ |     | ----------       \   \         
                    create TCB  |     | delete TCB         \   \       
                                V     |                      \   \     
                              +---------+            CLOSE    |    \   
                              |  LISTEN |          ---------- |     |  
                              +---------+          delete TCB |     |  
                   rcv SYN      |     |     SEND              |     |  
                  -----------   |     |    -------            |     V  
 +---------+      snd SYN,ACK  /       \   snd SYN          +---------+
 |         |<-----------------           ------------------>|         |
 |   SYN   |                    rcv SYN                     |   SYN   |
 |   RCVD  |<-----------------------------------------------|   SENT  |
 |         |                    snd ACK                     |         |
 |         |------------------           -------------------|         |
 +---------+   rcv ACK of SYN  \       /  rcv SYN,ACK       +---------+
   |           --------------   |     |   -----------                  
   |                  x         |     |     snd ACK                    
   |                            V     V                                
   |  CLOSE                   +---------+                              
   | -------                  |  ESTAB  |                              
   | snd FIN                  +---------+                              
   |                   CLOSE    |     |    rcv FIN                     
   V                  -------   |     |    -------                     
 +---------+          snd FIN  /       \   snd ACK          +---------+
 |  FIN    |<-----------------           ------------------>|  CLOSE  |
 | WAIT-1  |------------------                              |   WAIT  |
 +---------+          rcv FIN  \                            +---------+
   | rcv ACK of FIN   -------   |                            CLOSE  |  
   | --------------   snd ACK   |                           ------- |  
   V        x                   V                           snd FIN V  
 +---------+                  +---------+                   +---------+
 |FINWAIT-2|                  | CLOSING |                   | LAST-ACK|
 +---------+                  +---------+                   +---------+
   |                rcv ACK of FIN |                 rcv ACK of FIN |  
   |  rcv FIN       -------------- |    Timeout=2MSL -------------- |  
   |  -------              x       V    ------------        x       V  
    \ snd ACK                 +---------+delete TCB         +---------+
     ------------------------>|TIME WAIT|------------------>| CLOSED  |
                              +---------+                   +---------+

                      TCP Connection State Diagram
                               Figure 6.

SYN queue and accept queue

The sync queue, or SYN queue is used by the server to temporarily hold incoming SYN packets during the initial handshake phase of a TCP connection. When a client sends a SYN packet to initiate a connection, the server responds with a SYN-ACK, and if the handshake hasn't yet been completed, the packet is placed in the sync queue.

Once the handshake is finished and the connection is established, the connection is moved to the accept queue, which holds fully established connections that are ready fro communication.

When the server calls the accept() system call, it retrieves a connection from the accept queue.

TCP Header Format

IETF. (1981). RFC 793: Transmission Control Protocol. Retrieved from https://www.ietf.org/rfc/rfc793.txt, p. 14.

[Page 14]                                                               


September 1981                                                          
                                           Transmission Control Protocol



                      3.  FUNCTIONAL SPECIFICATION

3.1.  Header Format

  TCP segments are sent as internet datagrams.  The Internet Protocol
  header carries several information fields, including the source and
  destination host addresses [2].  A TCP header follows the internet
  header, supplying information specific to the TCP protocol.  This
  division allows for the existence of host level protocols other than
  TCP.

  TCP Header Format


    0                   1                   2                   3   
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Source Port          |       Destination Port        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Acknowledgment Number                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Data |           |U|A|P|R|S|F|                               |
   | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
   |       |           |G|K|H|T|N|N|                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Checksum            |         Urgent Pointer        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             data                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                            TCP Header Format

          Note that one tick mark represents one bit position.

                               Figure 3.

  Source Port:  16 bits

    The source port number.

  Destination Port:  16 bits

    The destination port number.

Three-way Handshake

  1. SYN: The client initiates the connection by sending a SYN packet (set SYN bit to with a random sequence number.
  2. SYN-ACK: The server responds with a SYN-ACK packet, ackownledging the client's SYN and sending its own sequence number.
  3. ACK: The client sends an ACK packet to confirm receipt of the server's SYN-ACK, completing the handshake.

TCP 3-way handshake

TCP three-way Handshake

The 5-tuple

The connections (the small rectangular boxes above), whether the semi-established ones in the SYN queue or the established ones in the accept queue, are identified by a tuple of 5 elements (source_ip, source_port, dest_ip, dest_port, protocol).

DDoS SYN Flood

A DDoS SYN flood attack targets the SYN queue of a server by sending a massive number of SYN packets to initiate TCP connections, often with spoofed IP addresses (meaning the source IP address is forged to be from a random or unreachable IP address). The server responds with SYN-ACK packets, but because the attacker doesn't complete the 3-way handshake with the final ACK packet, the server’s SYN queue fills up with half-open connections. This consumes server resources and prevents legitimate connections, leading to a Denial of Service (DoS). The attack exploits the server's inability to distinguish between legitimate and malicious SYN requests, causing service disruption.

A DDoS attack can sometimes be referred to as a CC attack in certain contexts.

TCP 3-way handshake

DDoS SYN Flood

ET or LT for accept

The accept() system call retrieves a connection from the accept queue, which holds fully established connections.

For accept(), Level-Triggered (LT) mode notifies the application as long as there are pending connections in the accept queue, allowing repeated notifications until all pending connections are handled.

In contrast, Edge-Triggered (ET) mode notifies the application only once when new connections arrive, requiring the application to process all pending connections in a loop. ET requires non-blocking I/O to avoid blocking on accept() when no connections are available, as it will return -1 immediately and set errno to EAGAIN or EWOULDBLOCK.

// set a socket to non-blocking mode
int set_nonblocking(int fd) {
    int flags = fcntl(fd, F_GETFL, 0);
    if (flags == -1) {
        perror("fcntl F_GETFL");
        return -1;
    }
    if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) == -1) {
        perror("fcntl F_SETFL O_NONBLOCK");
        return -1;
    }
    return 0;
}

// accept in ET mode with non-blocking fd
while(1) {
    int conn_fd = accept(listen_fd, NULL, NULL);
    if (conn_fd == -1) {
        if (errno == EAGAIN || errno == EWOULDBLOCK) {
            // no more connections to accept
            break;
        } else {
            perror("accept")
        }
    }

    // set the accept socket to non-blocking
    if (set_nonblocking(conn_fd) == -1) {
        close(conn_fd);
        continue;
    }
    // connection accepted
    printf("accepted connection, fd %d\n", conn_fd);
}

LT is easier to use, while ET is suited for high-performance applications with careful design.

backlog in listen()

The implementation of the backlog parameter in the listen() system call has evolved over time. In early implementations, it primarily defined the size of the SYN queue, holding half-open connections. Modern systems distinguish between the SYN queue and the accept queu, with backlog influencing both.

Linux, for example, introduces parameters like somaxconn and tcp_max_syn_backlog to manage these queues more effectively. While backlog serves as a hint for connection handling, the actual behavior depends on system-specific settings, kernel versions, and dynamic adjustments, meaning the size of the connection queues can vary significantly.

MTU

MTU, or the Maximum Transmission Unit, defines the largest packet size that can be transmitted over a network without fragmentation. TCP uses the Path MTU Discovery mechanism to determine the optimal packet size to improve efficiency and avoid fragmentation-related overhead.

Sliding Window

The sliding window is a broader concept in TCP that refers to how the sender and receiver manage and track data flow during communication. It determines how much data can be sent at any given time before needing an acknowledgment. The sliding window mechanism allows for efficient flow control by enabling the sender to transmit multiple packets before waiting for an acknowledgment, but within the limits of the window size.

Congestion Window

The congestion window (cwnd) is a specific part of the sliding window mechanism, specifically dealing with congestion control. It dictates how much data the sender can send based on the network’s capacity to handle traffic, adjusting dynamically to prevent congestion.

Congestion Control

Congestion control in TCP contains different phases that manage the growth and reduction of the congestion window (cwnd) to ensure efficient and stable data transmission. These phases help avoid network congestion, minimize packet loss, and optimize throughput. The primary phases are:

  1. Slow Start: Rapidly increases the congestion window (cwnd) exponentially until the slow-start threshold (ssthresh) is reached. (Quickly explores the available bandwidth.)

  2. Congestion Avoidance: Grows the cwnd linearly after the ssthresh is reached, allowing for more controlled data transmission. (Gradually increases the sending rate in a controlled mannger to avoid congestion once the network's capacity is better understood.)

  3. Fast Retransmit/Fast Recovery: Quickly recovers from packet loss by retransmitting lost packets. After detecting packet loss (via three duplicate ACKs), the cwnd is halved (multiplicative decrease), and then it grows linearly during recovery. (Efficiently recover from packet loss.)

  4. Timeout Retransmission: When a timeout occurs and a packet is not acknowledged, the cwnd is halved to reduce network traffic and allow for congestion recovery. After the halving, TCP enters slow start again until the ssthresh is reached. (To recover from severe network congestion or loss when duplicate ACKs are not received.)

The 4-way handshake

The 4-way handshake is the process TCP uses to gracefully terminate a connection between a client and a server. It can be initiated by either the client, the server, or both.

The 4-way handshake

The 4-way handshake initiated by client to terminate a connection

  1. Initiation: Either the client or server sends a FIN to signal that it has no more data to send.
  2. Acknowledgment: The receiving side sends an ACK to acknowledge the FIN.
  3. Second FIN: The side that hasn’t yet sent a FIN sends its own FIN once it finishes transmitting its remaining data.
  4. Final ACK: The receiving side sends a final ACK to confirm the second FIN, completing the connection termination.

If both sides initiate the termination, it is also handled gracefully (see State Diagram).

Termination initiated by both sides

Termination initiated by both sides

Communicate without a TCP server

Two TCP sockets can communicate without a central TCP server by establishing a direct connection between two devices. This approach is similar to peer-to-peer (P2P) communication, where devices (peers) exchange data directly with each other, without the need for a centralized server to manage the connection.

TCP P2P

TCP P2P