TCP/IP Knowledge Overview

This mainly uses text narration to try to explain this process, used to organize knowledge points and identify gaps. If any descriptions are inaccurate, please forgive me.

Q: From entering an address in the browser and pressing enter, to receiving the response - describe the details of this process.

Premise

Complex programs need to be layered.

Overview

Explaining this process according to the OSI seven-layer protocol, focusing on the performance in OSI layer 2 - Data Link Layer (hereafter MAC layer), layer 3 - Network Layer (hereafter IP layer), layer 4 - Transport Layer, and the Application Layer. After reaching the Application Layer, I’ll discuss CDN principles and common architectures. Then briefly cover data center architecture.

OSI Seven Layers vs TCP/IP Four Layers

The OSI seven-layer model is a reference model for open systems interconnection, while TCP/IP protocol suite is a set of communication protocols for implementing network interconnection. The seven layers of OSI are: Physical Layer, Data Link Layer, Network Layer, Transport Layer, Session Layer, Presentation Layer, Application Layer. Memory mnemonic: All People Seem To Need Data Processing

The four layers of TCP/IP are: Interface Layer, Internet Layer, Transport Layer, Application Layer. The Interface Layer in TCP/IP corresponds to Physical and Data Link Layers, while Application Layer corresponds to Session, Presentation, and Application Layers.

Data Link Layer

Every networked device has a unique hardware address, commonly called a MAC address. MAC addresses are used to find corresponding devices within a broadcast domain. Although MAC addresses are unique, they’re not locationable, so IP protocol is used for communication outside the broadcast domain.

Layer 2 devices can cache the correspondence between MAC addresses and port devices, caching devices within the same broadcast domain. The correspondence between MAC addresses and IP addresses can be cached in local routing devices, called the routing table. When IP is known but MAC address is unknown, an ARP request can be sent to query the MAC address.

Internet/Network Layer

What’s entered in the browser is usually a domain name, which is resolved to an IP through DNS.

First determine if the corresponding IP is in the same network segment through CIDR. If not, the request goes through the gateway using routing protocols to find the corresponding network and device. When passing through intermediate routing devices (layer 3), first compare if MAC address matches, then if IP address matches, to see if it’s a packet for itself or needs forwarding.

Routing Protocols

Routing is divided into dynamic and static routing. The routing protocols here refer to dynamic routing protocols.

Distance Vector Routing Protocol

Based on Bellman-Ford algorithm, routers transmit part or all of their routing table to adjacent routers

RIP: 1. Uses hop count as metric 2. Maximum hop count is 15 3. RIP v1 periodically synchronizes the entire routing table

BGP: BGP can be seen as an advanced distance vector routing protocol. In BGP systems, networks can be divided into multiple autonomous systems. Within autonomous systems, iBGP is used to synchronize routing information, while eBGP broadcasts routes between autonomous systems. Autonomous System: All IP networks and routers under the management of one (or more) entities. IANA assigns an ASN (Autonomous System Number) to autonomous systems, enabling BGP protocol to run between ISPs on the Internet. ASN: A 16-bit number, now with 32-bit notation: <high 16 bits decimal>.<low 16 bits decimal>

Autonomous System Classification: 1. Multihomed AS: Autonomous system with more than one connection. This type doesn’t allow other autonomous systems to pass through it to access another autonomous system. 2. Stub AS: Autonomous system connected to only one other autonomous system. 3. Transit AS: An autonomous system that provides connectivity between separate networks. This is the essence of ISPs.

BGP Usage Conditions: 1. Need routers that support storing large routing tables 2. Need multiple connections 3. Have sufficient bandwidth to transmit required data (including routing tables)

Shortest Path First Algorithm

Based on Dijkstra algorithm, routers transmit link state information to all routers in the same area

OSPF: 1. Uses multicast to send link state updates, uses triggered updates on link state changes, improving bandwidth utilization 2. No maximum hop count limit, uses delay and cost as metrics

Difference and Connection Between IGP and IBGP

IGP includes protocols like OSPF/RIP, used within autonomous systems, mainly for route discovery and calculation. IBGP is also used within routing systems. The differences are: 1. IBGP delegates route discovery entirely to IGP, focusing on route control itself. 2. IGP has poor capability handling large routing tables, while IBGP can handle them hierarchically. 3. If BGP routing information is given directly to IGP, route attributes are lost, creating routing loop risks. IBGP can handle these route attributes (point 1)

Gateway

Type one: After leaving the gateway, the destination MAC address changes to the next hop device’s MAC address, while source and destination IP remain unchanged until reaching the specified device. This method is suitable for scenarios without IP address conflicts. For overlapping IP addresses, NAT gateway is needed.

Type two: NAT gateway. After the request leaves the gateway, both source IP and MAC become the gateway’s, while destination IP remains unchanged, reaching the next hop. When the response arrives, destination IP and MAC are mapped back.

DNS

DNS is a distributed data query system for domain name to IP address translation. When a client initiates a DNS query, it first queries the local DNS server (the DNS server configured on the ISP or router). If no record exists, it queries root name servers, which return top-level domain server addresses, which in turn return authoritative name server addresses. The client’s query to local server is recursive, while local DNS server’s upward queries are iterative.

Transport Layer

The IP packet header identifies the transport layer protocol type, commonly UDP and TCP.

UDP

UDP is a connectionless protocol. It inherits most of IP protocol’s characteristics: packet-based, stateless, unordered, no congestion control. Simply put, it’s a stateless transport protocol. In UDP headers, only source and destination port numbers identify the connection. Can be used in simple environments, intranets, and scenarios where packet loss is acceptable. Application layer can also implement state maintenance, making it a reliable connection.

TCP

TCP is a connection-oriented protocol. A connection here means a series of state transitions. After maintaining a complex state machine, connections become ordered, reliable, with congestion control, etc. But TCP’s underlying IP protocol is connectionless and unordered, so TCP heavily uses retransmission and congestion control algorithms to implement these features.

Three-way handshake: Requester sends SYN packet to establish connection (SYN_SENT), receiver returns ACK + SYN packet (SYN_RCVD), requester receives ACK packet (ESTABLISHED), then responds to peer’s SYN with ACK. When peer receives ACK, state becomes ESTABLISHED. At this point, both sides have completed one send-receive cycle, state is ESTABLISHED, connection established.

Four-way handshake (termination): Requester sends FIN packet to terminate connection (FIN_WAIT_1), receiver returns ACK after receiving FIN (CLOSE_WAIT), initiator enters (FIN_WAIT_2) after receiving ACK(seq=k+1). When receiver finishes upper layer logic, returns FIN + ACK packet (seq=k+1), initiating termination (LAST_ACK). Sender responds with ACK after receiving these packets, enters TIME_WAIT state, waits two MSL periods then closes connection. Receiver also closes after receiving ACK. TIME_WAIT state is to prevent receiver not receiving the last ACK, triggering FIN + ACK retry.

Application Layer

HTTP

Viewing web pages generally uses HTTP protocol for data transmission. HTTP is a protocol built on TCP. In HTTP, both request and response messages are plaintext. Request messages consist of: request line, request headers, request body, where request line can be subdivided into request method and request URL. Response messages consist of: response status, response headers, and response body. Response status is divided into status code and reason.

Common request methods: GET/POST/PUT/DELETE/OPTION Common response codes: 200 OK/201 Created/301/302/403 Forbidden/404 Not Found/405 Not Allowed/500 Internal Error/502 Bad Gateway/503 Service Unavailable/504 Gateway Timeout

Keepalive

TCP protocol has a keepalive concept, and HTTP also has a keepalive concept which is enabled by default after HTTP/1.1. HTTP’s keep-alive allows clients to send multiple requests over the same TCP connection, while TCP’s keep-alive is a mechanism to keep TCP connections alive through heartbeats. The two are not directly related.

HTTPS

Since HTTP request and response messages are plaintext, it’s not suitable for high-security scenarios. This introduces HTTPS. HTTPS uses asymmetric encryption to exchange keys, then symmetric encryption after key exchange, based on HTTP protocol. The handshake process is:

Client initiates client hello, mainly to negotiate encryption protocol version, compression algorithm, random number c1, and SNI info
Server responds with server hello, informing client of adopted encryption protocol version, compression algorithm, and random number s1 generated on server side. At this point, client-generated c1 is stored for later use
Server also responds with server certificate for client verification
Finally responds with server hello done to tell client hello info is complete
After client verifies certificate is legitimate (hash certificate info and compare with CA signature decrypted with CA public key), generates pre-master-key for symmetric encryption
Transmits random pre-master-key to server, the client key exchange
Client initiates change cipher spec, changing from asymmetric to symmetric encryption
Client initiates encrypted handshake message, transmitting info encrypted with c1 + s1 + pre-master-key to server
Server also initiates change cipher spec
Server similarly initiates encrypted handshake message At this point, both sides’ SSL handshake is complete. Encrypted data transmission begins.