tags:
  - OS

6.5 Inter-Process Communications

第一课 IPC

1.1 IPC Motivation

人类需要沟通和交流，交流与合作共同建造了我们这个美好的世界。在原始村落时代，人类只能通过面对面交流进行协作，沟通范围局限在同村居民之间。电话的发明打破了物理距离的壁垒，使跨地域的实时沟通成为可能，极大地扩展了协作的边界。

对于进程也一样，在没有网络的时代，进程间通信只局限在一台主机上。随着网络基础设施的建设，我们已经进入了光纤入户(FTTH)的时代。进程间通信不再具有局限性，远距离的通信成为了可能。通过网络，分布在不同地理位置的进程可以互相通信，协同工作，共同完成复杂的任务。

1.3 IPC Types

我们有两类进程间通信，一类是同一台主机上的进程间通信，还有一类是不同主机之间的进程间通信。这两类进程间通信的主体思想都很简单，即发送方进程发送信息，接收方进程接收信息。

不管是哪种进程间通信，你都需要考虑收发过程的同步和互斥问题、同步性和异步性，我们本节不做介绍，相关内容将在后续的同步互斥篇章进行介绍。

1.3.1 IPC on the Same Machine

对于同一台主机上的进程间通信，其核心思想是在内核或用户空间创建共享存储区域，其中一个进程W往里面写，另外一个进程R往外读。这种模型本质上就是生产者-消费者问题的实现，需要同步机制来保证缓冲区空、缓冲区满和竟态条件的问题。共享存储器可以用内核缓冲区(pipe) 或 用户内存空间(Shared memory, shm) 的方式来实现。

还有将数据放到磁盘中的IPC机制叫内存映射文件(Memory Mapped Files)。这种方法允许多个进程通过映射文件到共享内存的方式进行通信，并将数据持久化到磁盘上。

此外，信号也是一种进程间通信方式。与上面不同的是，信号本身并不携带信息(messages)，因而也被称为轻量级的进程间通信机制。信号主要用于通知进程发送某些事件，比如终止、中断和自定义事件等。

1.3.2 IPC on Different Machines

如果两个进程不在同一台主机上，我们就需要借助网络的力量，使用操作系统提供的Socket API向特定的主机发送消息报文。对于上层应用者来说，完全可以将Socket看作是邮递数据的“邮政公司”。我们将数据交给“邮政公司”（Socket API），它们会妥善处理一切。

通过Sockets，分布在不同主机上的进程可以建立连接，进行数据交换。这种方式广泛应用于网络应用、分布式系统和客户端-服务器模型中。

1.3 Send-Receive Protocol

1.3.1 Formats

进程间通信总是伴随着不同的格式进行的，正如人类交流一样。讲话方（发送方）使用不同的语言不同（不同的格式），如果聆听方听不懂那种语言（没有对应的parser函数），这种交流便是没有意义的。在进程间通信中，尽管信息都是以0和1在计算机世界中传输的，但只要规定了一定的格式，接收双方达成一定共识（使用同一种格式），交流就可以达成。

常见的通信格式有JSON和XML，通过格式提供的约定，我们可以按照接收/发送之间的约定将信息进行包装(packet)和解码(parse)（也叫序列化和反序列化）。我们也可以使用官方所提供的库。

1.3.2 Order

一旦涉及到信息的发送和接收，收发的顺序是需要有一定的约束的。

1.3.2.1 Synchronous

同步通信要求发送方和接收方在某一特定时间点上进行协同。比方如，发送方必须等待接收方准备好接收数据后再发送。这种方式的优点是确保数据的可靠传输，但可能会导致等待时间的增加。

1.3.2.2 Asynchronous

异步则允许发送方和接收方无需协同、独立工作。发送方在发送数据后可以继续处理其他任务，而接收方在准备好接收数据时再进行处理。这种方式提高了系统的并发性和效率，但需要额外的机制来确保数据的一致性和完整性。

第二课 Sockets for Network Communication

对于不同主机间的通信，虽然还有其他的方式，但我们主要借助 Socket API 来进行。尽管你可以用网络来交换进程之间的数据。但一般来说，socket 并不作为 IPC 的一部分。

2.1 Sockets

Socket API的命名灵感来源于电源插座🔌(power socket)。当设备与电源插座连接时，插座允许设备连接并交换电流，当设备断开与电源插座的连接，电流的交换也随之结束。同样的道理，Socket API也允许不同的进程之间进行数据的交流。Socket API为上层应用封装屏蔽了下层传输层的细节。

我们知道，传输层提供两种通信范式：数据报(datagram) 和 连接流(connection stream)。TCP是面向连接的传输层协议，为上层的应用提供可靠传输(reliable transfer)服务。TCP保证数据完整有序的送达。常见的使用TCP作为传输层协议的应用层协议有FTP、SMTP和HTTP。

UDP是面向非连接的协议，它为上层应用提供不可靠的传输服务。由于UDP不需要建立连接，因此UDP不保证数据的完整性和顺序，但它的开销较小，传输速度较快。所以UDP比TCP更加简单高效。UDP常见于语音/视频通话和游戏中。

2.2 Sockets as Files

在Linux等类Unix的系统中，socket也被视为一种文件。所以你能复用部分标准的文件处理（read、write）来操作socket。但是socket提供了一些其他的语义抽象。这种抽象简化了编程接口，你可以用socket系统调用来创建一个socket。

socket()的函数原型如下：

#include <sys/socket.h>

int socket(int domain, int type, int protocol);
/* 
Parameters:
	1. Domain: address format; 
		- AF_INET: IPv4
		- AF_INET6: IPv6
		- AF_UNIX or AF_LOCAL: Unix domain sockets
		- AF_PACKET: Low-level packet interface
	    - AF_NETLINK: Kernel/user-space communication
	    - et cetera...
	2. Type: what kind of data;
		- SOCK_DGRAM: Datagram socket (UDP)
	    - SOCK_STREAM: Stream socket (TCP)
	    - SOCK_RAW: Original socket (IPPROTO_RAW)
	    - SOCK_SEQPACKET
	3. Protocol: how data is transported; 0 for type inference
		- IPPROTO_TCP: Used with SOCK_STREAM for TCP
	    - IPPROTO_UDP: Used with SOCK_DGRAM for UDP
	    - IPPROTO_ICMP: Used with SOCK_RAW for sending/recving ICMP packets
		- IPPROTO_SCTP
Return value: 
	- Return a socketfd on success.
	- -1 on failure.
*/

2.2.1 socket() Attr.

socket()系统调用有三个参数，分别是domain, type 和 protocol。

2.2.1.1 `domain`

这个参数定义了地址格式和通信范围，决定了socket的底层协议族。常见的有AF_INET, AF_INET6 和 AF_UNIX/AF_LOCAL。这里的AF_指的是Address Family。AF_INET 和 AF_INET6用于网络通信，其中一个用于IPv4，一个用于IPv6。它们的地址结构也是不一样的，分别是：struct sockaddr_in和struct sockaddr_in6。

AF_UNIX或者AF_LOCAL，则用于本地的进程间通信。因为它并不涉及网络通信，所以没有网络协议栈封包拆包所造成的开销。

2.2.1.2 `type`

这个参数定义了socket的类型，用于选择数据的传输方式和服务类型。我们关注SOCK_STREAM跟SOCK_DGRAM。前者，SOCK_STREAM定义socket的类型为可靠、有序和双向的面向连接的字节流（TCP），和SOCK_DGRAM，也就是将socket的类型定义为不可靠的无连接的数据报文。

2.2.1.3 `protocol`

这个参数明确地指定传输层使用何种网络协议。我们关注IPPROTO_TCP和IPPROTO_UDP。当你设置为0时，内核会根据前两个参数自动推导。

2.2.1.4 Illegal Combination

当组合非法或不兼容，就会返回-1并设置errno = EINVAL。

2.2.2 Creating an Socket

socket()函数中有许多参数，选择不同的参数，我们可以创建不同类型的socket。下面，我们将用IPv4创建一个TCP和一个UDP的传输协议的网络连接。

2.2.2.1 TCP Sockets in IPv4

#include <sys/socket.h>
#include <netinet/in.h>

int main() {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) {
        perror("TCP socket creation failed");
        return 1;
    }
    return 0;
}

2.2.2.2 UDP Sockets in IPv4

#include <sys/socket.h>
#include <netinet/in.h>

int main() {
    int sockfd = socket(AF_INET, SOCK_DGRAM, 0);
    if (sockfd < 0) {
        perror("UDP socket creation failed");
        return 1;
    }
    return 0;
}

2.2.3. `setsockopt()`

socket创建好之后，我们就可以使用setsockopt 函数通过各种选项来配置和调整套接字。这个函数允许你控制和修改套接字的行为，比如超时时间、缓冲区大小、重用地址、启用或禁用特定协议特性等。在后面的学习中，检测另一方是否还在发送报文就少不了setsockopt()。

int setsockopt(int sockfd, int level, int optname, const void *optval, socklen_t optlen);
/* 
Parameters:
	1. sockfd: The file descriptor of the socket on which to set the option.
	2. level: The level at which the option is defined (e.g., SOL_SOCKET for socket-level options, IPPROTO_TCP for TCP options).
	3. optname: The option name to set.
	   - Under SOL_SOCKET level:
	     1. SO_RCVTIMEO: set receive timeout time.
	     2. SO_SNDTIMEO: set send timeout time.
	     3. SO_REUSEADDR: allow local address reuse.
	     4. SO_KEEPALIVE: keep connections alive by enabling periodic transmissions.
	     5. SO_RCVBUF: set the receive buffer size.
	     6. SO_SNDBUF: set the send buffer size.
	     7. SO_LINGER: linger on close if data is present.
	     8. SO_BROADCAST: permit sending of broadcast messages.
	     9. SO_ERROR: retrieve and clear the socket error status.
	     10. SO_OOBINLINE: leave received out-of-band data in the input stream.
	   - Under IPPROTO_TCP level:
	     1. TCP_NODELAY: disable Nagle's algorithm.
	     2. TCP_KEEPIDLE: set the idle time before keep-alive probes are sent.
	     3. TCP_KEEPINTVL: set the interval between keep-alive probes.
	     4. TCP_KEEPCNT: set the number of keep-alive probes to be sent.
	   - Under IPPROTO_IP level:
	     1. IP_TTL: set the IP time-to-live value.
	     2. IP_MULTICAST_TTL: set the multicast time-to-live value.
	     3. IP_MULTICAST_LOOP: control the loopback of multicast packets.
	     4. IP_ADD_MEMBERSHIP: join a multicast group.
	     5. IP_DROP_MEMBERSHIP: leave a multicast group.
	   - Under IPPROTO_IPV6 level:
	     1. IPV6_V6ONLY: restrict the socket to IPv6 communications only.
	     2. IPV6_MULTICAST_HOPS: set the multicast hop limit.
	     3. IPV6_MULTICAST_LOOP: control the loopback of multicast packets.
	     4. IPV6_JOIN_GROUP: join an IPv6 multicast group.
	     5. IPV6_LEAVE_GROUP: leave an IPv6 multicast group.
	4. optval: A pointer to the buffer containing the value for the option. This buffer contains the value to be set for the specified option.
	5. optlen: The size, in bytes, of the buffer pointed to by optval.

Return value:
	- On success: Returns 0.
	- On failure: Returns -1, and errno is set to indicate the error. You can use the perror function or strerror to print the error message.
*/

2.2.4 Endianness and Network Byte Order

我们在Endianness中对大小端字节序进行了介绍。在网络传输中，为了避免两台计算机因为大小端问题产生的一系列问题，我们规定将大端序作为网络字节序（历史原因）。这样做不但统一了计算机网络的交流方式，也使得工程师抓包后方便阅读各种信息。

2.2.4.1 Host and Network Byte Order Conversion

在arpa/inet.h头文件中，提供了一些转换字节序的库函数。如果你不清楚你所用系统的字节序，建议加上这些转换函数来确保数据在网络上传输时的正确性：

#include <arpa/inet.h>

// Host TO Network Long/Short
uint32_t htonl(uint32_t hostlong); 
uint16_t htons(uint16_t hostshort); 

// Network TO Host Long/Short
uint32_t ntohl(uint32_t netlong); 
uint16_t ntohs(uint16_t netshort);

2.2.4.2 Decimal Presentation and Network Byte Order Address Conversion

除此之外，arpa/inet.h头文件中还有将点分十进制（十六进制）与网络字节序数值进行转换的库函数。这些转换在设置或接收网络地址时非常有用。

#include <arpa/inet.h>
int inet_aton(const char *cp, struct in_addr *inp);
/* 
Parameters:
	1. cp: IP address in decimal form (as a string, e.g., "192.168.1.1")
	2. inp: Pointer to a struct in_addr where the function will store the network address

Return value: Returns 1 on success, 0 if the input is not a valid IP address.
*/

// Internet Presentation TO Network
int inet_pton(int af, const char *src, void *dst);
/* 
Parameters:
	1. af: Address family (AF_INET for IPv4, AF_INET6 for IPv6)
	2. src: IP address in decimal form (as a string, e.g., "192.168.1.1")
	3. dst: Pointer to a buffer where the function will store the network address (usually a struct in_addr or struct in6_addr) 

Return value: Returns 1 on success, 0 if inputs is not a valid IP address, and -1 on error.
*/

// Internet Network TO Presentation
const char *inet_ntop(int af, const void *src, char *dst, socklen_t size);
/* 
Parameters:
	1. af: Address family (AF_INET for IPv4, AF_INET6 for IPv6)
	2. src: Pointer to the network address structure (e.g., struct in_addr or struct in6_addr)
	3. dst: Pointer to a buffer where the function will store the IP address in decimal form (as a string)
	4. size: Size of the destination buffer 

Return value: 
	- Returns a pointer to the destination buffer `dst` on success. 
	- NULL on failure.
*/

2.2.5 Socket Addresses

2.2.5.1 IPv4 Address

和发快递时你需要知道对方的地址信息一样，当我们在网络上传输报文时，你需要用一种约定好的格式来创建一个address structure。我们用 sockaddr_in 结构体来表示IPv4地址：

#include <netinet/in.h>
#include <arpa/inet.h>

struct sockaddr_in {
    sa_family_t sin_family; // Address Family
    in_port_t sin_port;     // Port number
    struct in_addr sin_addr; // IP address
};
struct in_addr {
    uint32_t s_addr; // 32-bit IPv4 address
};

2.2.5.2 IPv6 Address

我们用addrinfo来表示IPv6的套接字的地址信息。这个结构体是一个通用的结构体，能够表示各种不同类型的地址。

struct addrinfo {
    int              ai_flags;
    int              ai_family;
    int              ai_socktype;
    int              ai_protocol;
    socklen_t        ai_addrlen;
    struct sockaddr *ai_addr;
    char            *ai_canonname;
    struct addrinfo *ai_next;
};

2.2.5.3 Local Domain Address

此外，我们还有sockaddr_un结构体用于表示Unix domain sockets:

#include <sys/un.h>

struct sockaddr_un {
    sa_family_t sun_family; // Address Family (AF_UNIX)
    char sun_path[108];     // Path name
};

2.2.5.4 Port Number: Just Like the House Number

每当主机上有运行一个程序，那个程序就会注册一个端口号以便标识程序的入口。计算机上有多个端口（一般为 65536 个），通过这些端口，数据可以准确的发送道正确的进程处。向邮递员用门牌号来发件送件一样，网络通信也需要这些端口号来接收数据。

2.2.5.5 String <---> IP Address

在定义并初始化IPv4/IPv6的套接字结构体时，我们需要把人类可读的字符串转换成机器可读的IP地址格式以便路由器和其他网络设备能够进行正常的处理和数据包路由。

我们前面在这里讨论了一点IP地址和Sting互相转换的库函数。其中inet_aton用于将IPv4地址从点分十进制字符转换成二进制格式。而后者inet_pton是更通用的函数，支持IPv4和IPv6的地址转换。

2.2.5.6 Initial the Network Address (IPv4)

网络地址定义好之后，我们需要按照特定的顺序来初始化 sockaddr_in 结构体的各个字段。

#include <stdio.h>
#include <string.h> // for memset
#include <netinet/in.h>
#include <arpa/inet.h>

int main() {
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr)); // Clear the addr structure
    addr.sin_family = AF_INET; // Use IPv4.
    addr.sin_port = htons(8080); // Indicate the port number.
    addr.sin_addr.s_addr = inet_addr("192.168.1.1"); // inet_addr(INADDR_ANY); 
    // or inet_pton(AF_INET, "192.128.1.1", &addr.sin_addr); // Recommended
    printf("Address: %s, Port: %d\n", inet_ntoa(addr.sin_addr), ntohs(addr.sin_port));
    return 0;
}

2.2.6 Get the Address

这个函数的主要作用是将主机名（如"example.com"）或服务名（如 "http"）转换为可以用于创建套接字的地址信息。通常是客户端用来获取服务器端的网络地址信息来创建想要的套接字。得到服务器的地址，客户端就可以与服务器建立连接。

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

int getaddrinfo(const char *node,    // e.g. "www,example.com" or IP
				const char *service, // e.g. "http" or port number
				const struct addrinfo *hints,
				struct addrinfo **res);

/* 
Parameters:
	1. node: hostname or IP address
	2. service: protocol or port number
	3. hints: used to restrict the kind of connection you want
	4. res: pointer to be updated with the result

Return value: 
	- Returns 0 on success, and update the pointer res points to. 
	- Else on error.
*/

当我们使用getaddrinfo时，会返回结构体addrinfo，通过其中的ai_addr字段，我们可以得到指向的sockaddr_in结构体。如下：

struct addrinfo hints, *res;
getaddrinfo("example.com", "http", &hints, &res);

struct sockaddr_in *ipv4 = (struct sockaddr_in *)res->ai_addr;

2.3 Client Workflow: Connect (TCP)

我们将主机上的不同的进程用端口号进行标识，在网络的进程间通信中，我们只要知道主机的地址（即IP地址）和端口号，我们就能和那个“远方的”进程进行通信。在网络通信中，客户端要做到实际上远不及服务器端做的多。客户端要做的，就是打招呼（connect()）并说话（socket通信）。

2.3.1 `connect()`

下面，我们看看客户端是如何通过connect()来与服务器打招呼的，以下是其函数原型：

int connect(int sockfd, struct sockaddr *addr, socklen_t len);
/* Blocking the thread by default.
Parameters:
	1. sockfd: The file descriptor for the socket to be connected. This is the integer value returned by the socket() function.
	2. addr: A pointer to a struct sockaddr, which contains the address of the target host. This can be cast to a pointer of specific address types, like sockaddr_in for IPv4 or sockaddr_in6 for IPv6.
	3. len: The size, in bytes, of the address structure pointed to by addr. Typically, this will be sizeof(struct sockaddr_in) or sizeof(struct sockaddr_in6).

Return value:
	- On success: Returns 0.
	- On failure: Returns -1, and errno is set to indicate the error. You can use the perror function or strerror to print the error message.
*/

2.3.2 Say Hello to Server

在之前，我们学过了getaddrinfo函数，通过这个函数，我们就可以得到服务器的网络地址。在客户端眼中，我们就知道了服务器叫什么名字了。知道了网络地址，我们就可以通过connect函数来与服务器建立连接，为之后的交流做好铺垫。

struct addrinfo hints;
struct addrinfo *res;
int sockfd;

memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;

getaddrinfo("www.example.com", "80", &hints, &res);
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);

int status = connect(sockfd, res->ai_addr, res->ai_addrlen);

在这个例子中，如果返回值status为 0 就表示服务器收到了我们的连接请求并成功建立连接。

2.4 Server Workflow: Bind, Listen and Accept (TCP)

在网络通信中，服务器的职责可比客户端大得多。要和服务器打招呼，客户端可不需要知道自己的名字是什么，操作系统会自动为客户端分配一个临时的端口号。而服务器可不一样，因为要时时刻刻监听来自客户端的连接请求。服务器必须显式调用bind()将套接字绑定到特定的端口号上。（毕竟要是你的名字要是随机的，客户端对建立连接将毫无头绪）

绑定完成之后，服务器开始运行并需要持续监听来自外界的连接请求，以便对客户端进行服务。当服务器监听到来自客户端的连接请求之后，客户端会接受并建立一个专门的套接字来于这个特定的客户端进行通讯。

2.4.1 `bind()`

bind()函数用来让进程与一个特定的端口号进行绑定的函数。

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
/* 
Parameters:
	1. sockfd: The file descriptor of the socket to be bound. This is the integer value returned by the socket() function.
	2. addr: A pointer to a struct sockaddr, which contains the address to bind to the socket. This can be cast to a pointer of specific address types, like sockaddr_in for IPv4 or sockaddr_in6 for IPv6.
	3. addrlen: The size, in bytes, of the address structure pointed to by addr. Typically, this will be sizeof(struct sockaddr_in) or sizeof(struct sockaddr_in6).

Return value:
	- On success: Returns 0.
	- On failure: Returns -1, and errno is set to indicate the error. You can use the perror function or strerror to print the error message.
*/

例子1：

#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <cstring>
#include <iostream>

int main() {
    int sockfd;
    struct sockaddr_in server_addr;

    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("socket");
        return 1;
    }

    memset(&server_addr, 0, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons(8080);
    server_addr.sin_addr.s_addr = INADDR_ANY;

    if (bind(sockfd, (struct sockaddr *)&server_addr, sizeof(server_addr)) == -1) {
        perror("bind");
        close(sockfd);
        return 1;
    }

    // Other code...
    close(sockfd);
    return 0;
}

客户端进程的bind()并不是必要的，为什么？

2.4.2 `listen()`

listen() 函数用于将套接字设置为被动模式，以便接受传入的连接请求。listen()系统调用会创建一个容量为backlog的队列保存未处理的客户端请求。我们会人为地设置一个backlog。当客户端与服务器的连接数等于这个数字时，客户端的连接将被服务器拒绝。

int listen(int sockfd, int backlog);
/* 
Parameters:
	1. sockfd: The file descriptor of the socket that will be put into a listening state. This is the integer value returned by the socket() function.
	2. backlog: The maximum length to which the queue of pending connections may grow. Typically a small positive integer.

Return value:
	- On success: Returns 0.
	- On failure: Returns -1, and errno is set to indicate the error. You can use the perror function or strerror to print the error message.
*/

例子2：

// Assume the bind() function has been called

if (listen(sockfd, 10) == -1) {
    perror("listen");
    close(sockfd);
    return 1;
}

// Other code...
close(sockfd);
return 0;

2.4.3 `accept()`

accept() 函数用于接受传入的连接请求，从监听套接字队列中提取第一个连接请求，并为新的连接创建一个新的套接字。accept是一个阻塞调用(blocking call)，当调用accept时，客户端对服务器的监听将会被阻塞。也就是说，每一次服务器只能接受一个客户端的连接。

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
/* 
Parameters:
	1. sockfd: The file descriptor of the listening socket. This is the integer value returned by the socket() function and used by the bind() and listen() functions.
	2. addr: A pointer to a struct sockaddr, which will be filled with the address of the connecting entity. Can be cast to a pointer of specific address types, like sockaddr_in for IPv4 or sockaddr_in6 for IPv6.
	3. addrlen: A pointer to a socklen_t, which on input contains the size of addr, and on output contains the size of the address returned.

Return value:
	- On success: Returns a new file descriptor for the accepted socket.
	- On failure: Returns -1, and errno is set to indicate the error. You can use the perror function or strerror to print the error message.
*/

例子3：

    // Assume bind() and listen() functions have been called

    sin_size = sizeof(struct sockaddr_in);
    new_fd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
    if (new_fd == -1) {
        perror("accept");
        close(sockfd);
        return 1;
    }

    std::cout << "Connection accepted" << std::endl;

    // Other code...
    close(new_fd);
    close(sockfd);
    return 0;
}

如果你不关心客户端是谁，你可以将accept()后面的字段设置成NULL。届时，服务器就不需要存储客户端的这些信息了，简化了代码。

int new_fd = accept(sockfd, NULL, NULL);

上面的例子中accept获得的新的文件描述符new_fd代表了与客户端的连接。你可以使用这个套接字进行接收和发送数据，而不需要手动维护客户端的地址信息。

2.5 Communication (TCP)

当连接建立、一切准备就绪之后，通信就可以开始了。在TCP的网络通信中，我们主要使用send()和recv()两个函数来发送和接收信息。

因为套接字也可以看成是一种特殊的文件，所以你也可以使用最底层的read()和write()系统调用来接收和发送网络信息。在使用read()时，你需要通过循环一直接收数据，因为你不知道要接收多少数据。在write()时，你也需要使用循环来确保所有数据都能够发出去（缓存满）。

2.5.1 `send()`

send() 函数用于通过套接字发送数据。它将指定缓冲区中的数据发送到与套接字关联的另一端。send()操作是阻塞的，即如果套接字发送缓冲区已满，send() 调用将阻塞，直到有足够的缓冲空间为止。

int send(int sockfd, const void* msg, int length, int flags);
/* 
Parameters
	1. sockfd: Socket to send the data to
	2. msg: Bytes of data to be sent
	3. length: Size of the message
	4. flags: Options, giving in 0 will suffice, common flags are:
	    - MSG_CONFIRM: Tell the link layer that the packet was received.
	    - MSG_DONTWAIT: Enable non-blocking operation.
	    - MSG_OOB: Send out-of-band data.
	    - MSG_PEEK: Peek at the incoming message.
	    - MSG_WAITALL: Wait for the full request or error.
	    - MSG_NOSIGNAL: Do not generate SIGPIPE.
	    - MSG_MORE: Sender will send more data.

Return value: number of bytes sent, returns -1 if something went wrong
*/

2.5.2 `recv()`

recv() 函数用于通过套接字接收数据。它会从指定的套接字接收数据并将其存储在缓冲区中。与 send() 类似，recv()也是阻塞的，即如果没有可用数据，recv() 将阻塞程序，直到数据到达。

int recv(int sockfd, void* buffer, int length, int flags);
/* 
Parameters
	1. sockfd: Where to receive data from
	2. buffer: Where the data goes
	3. length: The maximum size of the buffer
	4. flags: Flags can also be 0 here, common flags are:
	    - MSG_CONFIRM: Tell the link layer that the packet was received.
	    - MSG_DONTWAIT: Enable non-blocking operation.
	    - MSG_OOB: Receive out-of-band data.
	    - MSG_PEEK: Peek at the incoming message.
	    - MSG_WAITALL: Wait for the full request or error.
	    - MSG_NOSIGNAL: Do not generate SIGPIPE.

Return value: 
	- Returns the number of bytes actually read into the buffer on success.
	- Returns 0 if the connection has been closed.
	- Returns -1 on error.
*/

2.5.3 Are You Still There?

知道另一方是否还在发送报文并不容易。但是我们有一些机制来检测对方是否仍然在传输数据。我们的想法是，当一端发送了报文，一段时间后仍未接受到向本端发来的响应报文就断开连接。用这种方法防止程序一直阻塞下去。

这里，我们主要通过通过setsockopt设置超时时间、启用Keep-Alive等各种套接字属性。

2.5.3.1 TCP Keep-Alive

TCP协议本身支持 Keep-Alive 机制，用于检测空闲连接是否仍然活跃。通过在协议层面发送探测报文并等待响应，可以判断连接的状态。可以使用setsockopt函数在应用程序中启用这项功能。

int enable_tcp_keepalive(int sockfd) {
	// Enable keep-alive.
    int optval = 1;
    setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, &optval, sizeof(optval));

    int keepidle = 60; // 空闲时间（秒）
    int keepinterval = 10; // 探测报文发送间隔（秒）
    int keepcount = 3; // 最大重试次数

    setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPIDLE, &keepidle, sizeof(keepidle));
    setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPINTVL, &keepinterval, sizeof(keepinterval));
    setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPCNT, &keepcount, sizeof(keepcount));

    return 0;
}

上面的例子中，我们启用了TCP所提供的Keep-alive机制。之后，我们还设置了其余几项参数：

keepidle：表示如果连接在keepidel秒内没有任何活动，开始发送Keep-Alive探测报文。
keepinterval：表示每隔多少秒发送一次探测报文，直到收到对方的响应或达到最大重试次数。
keepcount：表示在未收到对方响应时，最多发送keepcount次探测报文。如果在尝试这么多次后仍未收到响应，就认为连接已断开。

2.5.3.2 Timeouts

通过设置接收超时时间，可以在指定时间内没有接收到数据时断开连接。这种方法对突发性传输较为有效，但可能会误判传输较慢的情形。

#include <sys/types.h>
int set_recv_timeout(int sockfd, int seconds) {
    struct timeval timeout;
    timeout.tv_sec = seconds;
    timeout.tv_usec = 0;

    if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout)) < 0) {
        perror("setsockopt");
        return -1;
    }
    return 0;
}

2.5.3.3 Heartbeat Messages

通过定期发送心跳包（短的空数据包）来确认连接是否保持。对方收到心跳包时，需要回复一个确认包。如果没有收到确认包，可以认为连接已经断开。心跳检测需要客户端和服务器两端共同配合。在客户端，同样需要处理心跳消息，并回复服务器发送的心跳包。

// server-side code
while (1) {
    const char *heartbeat = "HEARTBEAT";
    send(sockfd, heartbeat, strlen(heartbeat), 0);

    char buffer[1024];
    int bytes_received = recv(sockfd, buffer, sizeof(buffer), 0);
    if (bytes_received < 0) {
        if (errno == EWOULDBLOCK || errno == EAGAIN) {
            printf("Receive timeout, no data received\n");
        } else {
            perror("recv");
            break;
        }
    } else {
        printf("Received heartbeat response: %.*s\n", bytes_received, buffer);
    }
    sleep(5);
}
// client side code
// ...

这个例子仍需完善，因为没有结合超时机制，如果另一方一直不发送响应报文就会使得recv()调用无限期阻塞下去。

2.5.4 Use Format to Serialization

Sending and Receiving Struct Type. XML and JSON are popular format for information transfer.

2.6 Datagrams

不同于TCP，UDP是粗鲁的、没有教养的，因为UDP忽略了打招呼的过程。同时，UDP不保证数据包到达的完整性，因而UDP被称为无连接的、轻量级的通信协议。

2.6.1 sendto()

sendto 函数用于在无连接（如UDP）套接字上发送数据报（datagram）。

int sendto(int sockfd, const void *msg, size_t len, int flags, const struct sockaddr *dest_addr, socklen_t addrlen);
/* 
Parameters:
	1. sockfd: The socket file descriptor.
	2. msg: A pointer to the buffer containing the message to be sent.
	3. len: The length of the message in bytes.
	4. flags: Options to modify the behavior of the function.
	5. dest_addr: A pointer to the struct sockaddr containing the destination address.
	6. addrlen: The size of the destination address structure.

Return value:
	- On success: Returns the number of bytes sent.
	- On failure: Returns -1, and errno is set to indicate the error. You can use the perror function or strerror to print the error message.
*/

2.6.2 recvfrom()

recvfrom 函数用于在无连接（如UDP）套接字上接收数据报（datagram）。

int recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen);
/* 
Parameters:
	1. sockfd: The socket file descriptor.
	2. buf: A pointer to the buffer where the received message will be stored.
	3. len: The length of the buffer.
	4. flags: Options to modify the behavior of the function.
	5. src_addr: A pointer to the struct sockaddr where the source address will be stored. Can be NULL.
	6. addrlen: A pointer to a socklen_t object which will contain the size of the source address structure.

Return value:
	- On success: Returns the number of bytes received.
	- On failure: Returns -1, and errno is set to indicate the error. You can use the perror function or strerror to print the error message.
*/

2.7 cURL (libcurl)

This part will be discussed in the 14. Asynchronous IO.

2.7.1 Webservices

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <curl/curl.h>

int main(void) {
    CURL *curl;
    CURLcode res;

    curl_global_init(CURL_GLOBAL_DEFAULT);
    curl = curl_easy_init();
    if(curl) {
        curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
        res = curl_easy_perform(curl);
        if(res != CURLE_OK) {
            fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res));
        }
        curl_easy_cleanup(curl);
    }
    curl_global_cleanup();
    return 0;
}

2.7.2 Callbacks Setting Up

2.7.2.1 Read Callback

size_t read_callback(char* buffer, size_t size, size_t nitems, void *inputdata);

// size_t: represents a size and can be treated like an integer
// buffer: the area where you are going to put the data to send
// nitems: the number of items
// return value: the number of bytes successfully put there, 0 signals EOF

2.7.2.2 Write Callback

size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata);

// size_t: represents a size and can be treated like an integer
// ptr: points to whatever data we have received
// nmemb: the size of the data
// size: always 1
// userdata: arbitrary structure we get to pass directly to this punction
// return value: number of bytes processed

2.7.2.3 Registration of Callback

CURLcode curl_easy_setopt(CURL *handle, CURLOPT_READFUNCTION, read_callback);
CURLcode curl_easy_setopt(CURL *handle, CURLOPT_READDATA, void *pointer);

CURLcode curl_easy_setopt(CURL *handle, CURLOPT_WRITEFUNCTION, write_callback);
CURLcode curl_easy_setopt(CURL *handle, CURLOPT_WRITEDATA, void *pointer);

第三课 Pipes and Shared Memory

3.1 UNIX Pipes

当我们用管道机制进行进程间通信时，操作系统会在内核空间中划分额外的内核空间用于数据共享。Pipe是单向传输的（像水管一样），一个管道的数据只能向一个方向流动（写端到读端），所用如果你想实现全双工则需要两个管道。管道有两种实现机制：pipe()和mkfifo()。

Pasted image 20241130023253.jpg

Pipe一般上是循环队列，管道缓冲区的大小（Linux上）通常是4KB-64KB之间。管道以字节流的方式通信，发送方每一次发送都会将消息分为很多个小块（一字节）之后将字节块放入队列中，然后接收方会一个字节一个字节的接收数据。pipe的系统调用提供同步机制（阻塞锁），如果缓冲区已满/为空时，写操作/读操作就会被阻塞，直到数据被读出/写入。命名和匿名管道而言都是如此。

3.1.1 Ordinary Pipes(Anonymous Pipes)

在类Unix的系统中，你可以用pipe()系统调用函数来创建用于进程间通信的匿名管道。匿名管道只能在有血缘关系的进程之间使用。匿名管道的创建函数如下：

#include <unistd.h>

pipe(int pipefd[2]);
// pipefd[0] is the read-end
// pipefd[1] is the write-end

3.1.1.1 Creating an Anonymous Pipe

下面，我们举个创建匿名管道的例子：

#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int i = 100;
const char* str = "hello world";
char rdBuf[32];

int main(int argc, char const *argv[])
{
	int pipefd[2];
	if(pipe(pipefd) == -1){
		perror("pipe create fail");
		return -1;
	}
	if (fork() == 0){
		printf("this is child process\n");
		memset(readBuf, 0, 32);
		if (read(pipefd[0], rdBuf, 32) == -1){
			perror("read fail");
			return -1;
		}
		printf("I have read the character string: %s\n",rd);
		close(pipefd[0]);
	}
	else{
		printf("this is parent process\n");
		sleep(2);
		if (write(pipefd[1], str, strlen(str)) == -1){
			perror("write fail");
			return -1;
		}
		close(pipefd[1]);
	}
	return 0;
}

参数pipefd是一个包含两个整型元素的数组，用来存放管道的读写文件描述符。

pipefd[0]是管道的读端，read(pipefd[0], buffer, sizeof(buffer))从管道中读出并写到buffer中。
pipefd[1]是管道的写端，write(pipefd[1]), message, strlen(message) + 1) 将message中的数据写到管道里面。
当pipe系统调用成功，返回0，失败返回-1并设置errno。

只要任一端的文件描述符没有被关闭，pipe就会接着存在，也就是说只有我们close()pipe的两端之后，pipe才会被清理掉。

3.1.2 FIFO (Named Pipe)

管道是一个特殊的文件类型，本质还是文件。我们用open函数打开磁盘文件的操作会将文件描述符(file descriptor, fd)存放在进程的虚拟空间中。在匿名管道的学习中，我们对这种管道“文件”并不敏感，因为匿名管道的文件描述符虽然产生了，但却是一直存放在进程的虚存中对内核空间的内存进行操作，本质上不进行对磁盘IO的操作，所以我们也就看不到相关的文件。

而fifo命名管道就有所不同了，创建命名管道的同时在也会在文件系统上创建一个fifo文件。这个文件是有名称的，任何进程只要知道这个名称，就可以通过该名称打开fifo文件从而与另一端进行通信。所以命名管道允许在多个不同进程之间传输数据。（fifo文件充当管道入口的作用）

命名管道在创建后会一直存在（持久性），直至显示删除，那怕没有进程使用。命名进程创建好后，对该管道的操作就和操作文件一样，使用open、read、write等系统调用来使用管道进行通信。由于命名管道不支持文件定位操作，且遵守先进先出的原则，所以命名管道也被称为 FIFO special file。

3.1.2.1 Creating a Named Pipe

#include <iostream>
#include <sys/types.h>
#include <sys/stat.h>
#include <cstring>

//写端

const char* pipe_name = "tmp";
const char* str = "hello world";
char readBuf[32];
memset(readBuf, 0, 32);
if (mkfifo(pipe_name, 0644) == -1){
	perror("mkfifo func error");
	return -1;
}
int fifo_Writefd = open(pipe_name, O_WRONLY);
if(fifo_Writefd == -1){
	perror("open fifofd error");
	return -1;
}
if(write(fifo_Writefd, str, strlen(str)) == -1)(
	perror("write func error");
	return -1;
)
close(fifo_Writefd);

//读端

const char* pipe_name = "tmp";
int fifo_Readfd = open(pipe_name, O_RDONLY);
if(fifo_Readfd == -1){
	perror("open fifofd error");
	return -1;
}
if(read(fifo_Readfd, readBuf, 32) == -1)(
	perror("write func error");
	return -1;
)
std::cout << readBuf << std::endl;
close(fifo_Readfd);
if(unlink(pipe_name) == -1){
	perror("unlink func error");
	return -1;
}

3.1.3 Pipe and Disk I/O

3.1.3.1 Anonymous Pipes Have Nothing To Do with Disk

在上面的命名管道小节中，我们其实提到了，在适用系统调用pipe()创建管道的文件描述符时，我们实际上并不用到磁盘。虽然我们进行了文件操作，但是这些操作都是在内存的内核区中进行的，并不会涉及实际的“对文件操作”。

所以，匿名管道(pipe)是用于有亲缘关系进程之间通信的一种方式。匿名管道是临时存在于内存中的，当所有相关进程终止或关闭管道文件描述符（读端和写端）后，匿名管道自动销毁。在对匿名管道操作时，我们并不需要调用unlink系统调用函数删除文件。

3.1.3.2 Named Pipes Use Disk I/O

在非亲缘关系进程之间通信时，我们使用mkfifo命令创建命名管道(fifo)，fifo是一个有名字的特殊文件。当A进程创建了这个fifo，由于它具有文件名属性，因此B进程可以通过文件路径、文件名等属性对这个文件进行访问，通过访问fifo文件，B进程可以知道内核缓冲区的哪一部分作为管道使用。这就是fifo实现非亲缘关系进程通信的基本原理。

如果进程A使用mkfifo命令首次创建FIFO文件时，会在文件系统中创建一个相应的inode记录（首次访问文件系统）。之后用open()系统调用打开管道时，会读取inode表项以获取文件相关元数据，（第二次访问文件系统）。由于此前系统已经缓存了文件的inode，这时进程B通过系统上的inode记录打开FIFO文件并创建合适的fd就不再需要访问文件系统了。

请注意，这里说的使用I/O并不是说管道传输的数据会先放到磁盘上，这样太慢了。我们说使用I/O是指使用命名管道时会在文件系统（磁盘）上创建一个FIFO文件。这个文件的创建是使用I/O的。

3.2 Shared Memory

共享内存的进程间通信机制划分出了一段内存用于共享，与管道不同的是，通常而言共享内存的方式能够划分的内存更大（通常管道分配4KB-64KB），而且共享内存段(shm)位于用户空间。

3.2.1 shm in POSIX

POSIX 标准强调统一性和可移植性，所以POSIX的标准利用文件系统接口（文件描述符）来操作shm，也简化了系统的实现和使用。通常与mmap()结合使用，常用到的系统调用有：

shm_open：通过shm_open函数创建或打开一个共享内存对象，并返回一个文件描述符shm_fd。这个文件描述符将用于后续的共享内存操作，就像操作一个普通文件一样。
ftruncate：使用ftruncate函数设置共享内存对象的大小。这一步确保共享内存对象有足够的空间来存储数据。
mmap：通过mmap函数将共享内存对象映射到进程的地址空间，并返回这段内存的起始地址（*ptr）。这个过程使得你可以像操作普通内存一样操作共享内存。
使用memcpy或strcpy：一旦共享内存被映射到地址空间，你可以使用标准的内存操作函数（如memcpy或strcpy）来操作这段内存。这个过程与操作普通内存没有区别。
close(shm_fd)：在使用完共享内存后，通过close函数关闭共享内存对象的文件描述符。这个步骤类似于关闭文件，表示不再需要访问该共享内存对象。
shm_unlink：最后，使用shm_unlink函数删除共享内存对象。这一步类似于删除文件，释放掉不再需要的共享内存对象。

3.2.1.1 `shm_open()`

shm_open 用于创建或打开一个共享内存对象，并返回一个文件描述符用于后续的内存操作。创建时会在文件系统（通常是/dev/shm）中显示一个对应的文件条目。

#include <sys/mman.h>
#include <fcntl.h>
#include <sys/stat.h>

int shm_open(const char *name, int oflag, mode_t mode);
/* 
Parameters:
	1. name: The name of the shared memory object.
	2. oflag: The open flags (e.g., O_CREAT, O_RDWR).
	3. mode: The permission mode of the shared memory object (e.g., 0666).

Return value: Returns a file descriptor on success, -1 on failure and sets errno appropriately.
*/

3.2.1.2 `ftruncate()`

ftruncate 用于设置共享内存段的大小，确保其有足够的空间来存储数据。由于在内存中页框的大小通常是4KB，所以在我们设置共享内存段大小时通常设置为4096的倍数。

int ftruncate(int fd, off_t length);
/* 
Parameters:
	1. fd: The file descriptor of the shared memory object.
	2. length: The size to set for the shared memory object in bytes.

Return value: Returns 0 on success, -1 on failure and sets errno appropriately.
*/

3.2.1.3 `close()`

close 用于关闭共享内存对象的文件描述符，类似于关闭文件。

int close(int fd);
/* 
Parameters:
	1. fd: The file descriptor to close.

Return value: Returns 0 on success, -1 on failure and sets errno appropriately.
*/

3.2.1.4 `shm_unlink()`

shm_unlink 用于删除共享内存对象，类似于删除文件，释放不再需要的共享内存。对于POSIX共享内存，如果未调用shm_unlink函数来删除共享内存对象，那么它会继续驻留在系统内存中，直到系统重启或显式删除。

int shm_unlink(const char *name);
/* 
Parameters:
	1. name: The name of the shared memory object.

Return value: Returns 0 on success, -1 on failure and sets errno appropriately.
*/

如果在shm_unlink之前没有close共享内存对象，该函数会标记这段共享内存，当所有进程都关闭描述符后，系统会清理资源（届时删除共享内存段）。

3.2.1.5 Manipulate a shm Object

我们下面用代码简单演示一下POSIX中shm的创建（打开）、设置大小、关闭、删除shmfd的一系列操作。这里请注意，设置shm大小我们交给写进程来设置：

#include <iostream>
#include <cstring>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

//Write process
const char* shm_name = "my_shm";
const char* str = "hello world";

int main(int argc, char const *argv[])
{
	int shm_fd = shm_open(shm_name, O_CREAT | O_RDWR, 0666);
	if (shm_fd == -1){
		perror("shm_open");
		exit(EXIT_FAILURE);
	}
	
	if (ftruncate(shm_fd, 4096) == -1) {
        perror("ftruncate");
        close(shm_fd);
        exit(EXIT_FAILURE);
    }

    void *ptr = mmap(0, 4096, PROT_WRITE, MAP_SHARED, shm_fd, 0);
    if (ptr == MAP_FAILED){
    	perror("mmap");
    	close(shm_fd);
    	exit(EXIT_FAILURE);
    }

	memcpy(ptr, str, strlen(str));
	close(shm_fd);

	std::cout << "Character string's been sent" << std::endl;
	return 0;
}

//Read process
const char* shm_name = "my_shm";
char rdBuf[32];

int main(int argc, char const *argv[])
{
	int rdshm_fd = shm_open(shm_name, O_RDONLY, 0666);

	void *ptr = mm(0, 4096, PROT_READ, MAP_SHARED, rdshm_fd, 0);
	if (ptr == MAP_FAILED){
    	perror("mmap");
    	close(rdshm_fd);
    	exit(EXIT_FAILURE);
    }
	std::cout << "Read from shared memory: " << (char*)ptr <<std::endl;

	close(rdshm_fd);
	shm_unlink(shm_name);
	return 0;
}

3.2.1.6 Shared Memory Implementation Mechanism | POSIX

既然在POSIX下万物皆文件，而且我们用shm_open系统调用会返回一个文件句柄，那么我们应该能在磁盘中找到相关的共享内存文件吧？没错，确实会如此。当我们执行下面一行代码int shm_fd = shm_open(shm_name, O_CREAT | O_RDWR, 0666);后，我们查看/dev/shm，应当可以看到相关的文件。运行后查看目录：

du@du-virtual-machine:~/Desktop/OS$ ls /dev/shm
my_shm

实际上，共享内存在传递信息之前通过打开文件对象my_shm返回一个文件的描述符shm_fd。有了这个文件描述符，进程就可以对特定的共享内存段中进行读写操作（也就相当于fd实际上是共享内存段的索引）。写端进程在修改这部分内存时，数据会在内存中被更新，实现了数据在不同进程间的交换。结合信号量可以实现同步分次读写。

3.2.2 shm in System V

System V使用不同于POSIX标准的IPC机制， System V下的shm并不依赖文件系统持久存储（不同于POSIX中使用文件描述符和路径名管理共享内存段），而是通过特定的标识符（shmid）来管理。System V共享内存并不会持久化存储在文件系统中，而是存储在内存中，用于进程间的快速通信。相比于万物皆文件的POSIX而言效率和性能更好。

在System V的shm方式中，我们主要用到四个系统调用shmget、shmat、shmdt和shmctl。这四个系统调用的含义分别是：

shmget获取(get)一个新的共享内存段(shm)，并返回唯一的标识符(shmid)用于后续的操作。
shmat将创建的shm附加(attach)到进程的地址空间。
shmdt从进程的地址空间中将shm分离(detach)出来。
shmctl用于控制和操作shm，通常用来删除内存段。

3.2.2.1 shm Get

System V的共享内存通过标识符（shmid）来管理，这些标识符通过shmget和ftok等函数生成。要创建新的或获取存在的共享内存段的引用，我们需要使用shmget系统调用，它的函数原型是：

#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>

int shmget(key_t key, size_t size, int shmflg);
/* 
Parameters:
	1. key: A unique identifier for the shared memory segment. This can either be the result of an ftok() call or the constant IPC_PRIVATE.
	2. size: Indicates the size of the shared memory segment in bytes.
	3. shmflg: Access permissions (UNIX standards, e.g., 0600). Optional flags include:
	   - IPC_CREAT: Create a new segment if it does not exist.
	   - IPC_EXCL: Fail if the segment already exists.

Return value: Returns the shared memory segment identifier (shmid), which is an integer.
*/

要获得对一段共享内存段的唯一标识（键值），我们需要用到ftok函数。其函数原型如下：

#include <sys/types.h>
#include <sys/ipc.h>

key_t ftok(const char *pathname, int proj_id);
/* 
Parameters:
	1. pathname: A pointer to the path of an existing and accessible file.
	2. proj_id: A project identifier. This is usually a single character.

Return value: Returns a key of type key_t, which can be used to identify a shared memory segment, message queue, or semaphore.
*/

如果没有相关文件，我们可以创建一个空文件来生成键值。（open系统调用）

3.2.2.2 shm Attachment

当共享内存段存在之后，我们就可以将其附加到我们的进程空间中了。附加完成之后，进程就可以通过指针对这段空间进行操作，从而实现与其他进程之间的通信。其函数原型如下：

void* shmat(int shmid, const void* shmaddr, int shmflg);
/* 
Parameters:
	1. shmid: ID of the shared memory segment.
	2. shmaddr: Address at which to attach the shared memory segment (always use NULL to allow the system to choose the address).
	3. shmflg: Flags for the operation (e.g., SHM_RDONLY to attach in read-only mode).

Return value: Standard C pointer with the address of the shared memory.
*/

3.2.2.3 shm Detachment

使用完成共享内存段之后，我们需要用detach系统调用将其与进程内存空间进行分离。这个系统调用非常简单，只有一个参数：

int shmdt(const void* shmaddr);
/* 
Parameters:
	1. shmaddr: The address returned by the attach call.

Return value: 0 for success and -1 for error.
*/

虽然很简单，但是请不要忽视其重要性。虽然进程终止后操作系统也会帮我们进行资源管理，但是有时我们需要进程一直运行，这时，如果不进行shmdt就会导致shm一直驻留到内存中。

3.2.2.4 shm Control

我们用shmctl来删除共享内存段，这个函数能做的不仅仅是删除，但这里我们仅仅关注其删除的功能。下面是这个函数的原型：

int shmctl(int shmid, int cmd, struct shmid_ds *buf);
/* 
Parameters:
	1. shmid: Shared memory segment ID.
	2. cmd: Command to perform on the shared memory segment. For deletion, use IPC_RMID(ReMove ID).
	3. buf: Pointer to a struct shmid_ds, used for control commands that require or return data, not needed for IPC_RMID(in this case, just set to NULL).

Return value: Returns 0 on success, -1 on failure and sets errno appropriately.
*/

System V的shm使用引用计数(reference count)来管理其生命周期。当进程通过shmat附加共享内存时，引用计数增加。当进程通过shmdt分离共享内存段时，引用计数减少。只要引用计数不为0，我们调用shmctl删除共享内存段时这个共享内存段并不会立刻被删除，内核会将其标记为”待删除“状态，直到引用计数归零。

3.3 Message Queue

除了管道和共享内存，消息队列也是一种常见的IPC机制。消息队列和管道有些相似，它们都是一种内核对象、以先进先出的方式对消息进程处理（默认情况）。但它们处理的对象不同。管道传递的消息是以字节流的方式进行传输的，而消息队列中传递的消息具有消息类型信息（包含着消息类型和消息体）。

而且在消息队列中，我们可以通过消息类型进行筛选和优先级处理，而管道不具备这种能力。消息队列允许你根据不同的消息类型有选择地读取消息，从而实现更细粒度的控制。而管道只能按照数据到达的顺序逐个读取，无法跳过或优先处理特定的数据。

由于消息具有类型信息，而且消息队列可以作为进程间通信的中间人存储消息，从而解耦合进程之间的同步问题，即进程可以独立地执行任务，而不必频繁地等待或与其他进程直接同步。所以消息队列常用于生产者-消费者模型中，生产者进程将数据发送到消息队列后可以立即继续生成新的数据，不必等待消费者进程处理完毕。

3.3.1 Message Queue in POSIX

#include <mqueue.h>

mqd_t mq_open(const char *name, int oflag, ...);
/* 
Parameters:
	1. name: Name of the message queue.
	2. oflag: Flags for the operation (e.g., O_CREAT to create the queue if it doesn't exist).
	3. ...: Optional mode and attributes (used when creating the queue).

Return value: Message queue descriptor (mqd_t) on success, (mqd_t)-1 on failure and sets errno appropriately.
*/

int mq_send(mqd_t mqdes, const char *msg_ptr, size_t msg_len, unsigned int msg_prio);
/* 
Parameters:
	1. mqdes: Message queue descriptor.
	2. msg_ptr: Pointer to the message to be sent.
	3. msg_len: Size of the message in bytes.
	4. msg_prio: Priority of the message.

Return value: 0 on success, -1 on failure and sets errno appropriately.
*/

ssize_t mq_receive(mqd_t mqdes, char *msg_ptr, size_t msg_len, unsigned int *msg_prio);
/* 
Parameters:
	1. mqdes: Message queue descriptor.
	2. msg_ptr: Pointer to the buffer where the received message will be stored.
	3. msg_len: Size of the message buffer in bytes.
	4. msg_prio: Pointer to store the message priority (optional).

Return value: Number of bytes received on success, -1 on failure and sets errno appropriately.
*/

int mq_close(mqd_t mqdes);
/* 
Parameters:
	1. mqdes: Message queue descriptor.

Return value: 0 on success, -1 on failure and sets errno appropriately.
*/

int mq_unlink(const char *name);
/* 
Parameters:
	1. name: Name of the message queue.

Return value: 0 on success, -1 on failure and sets errno appropriately.
*/

3.3.2 Message Queue in System V

#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>

int msgget(key_t key, int msgflg);
/* 
Parameters:
	1. key: Unique key to identify the message queue.
	2. msgflg: Flags for the operation (e.g., IPC_CREAT to create the queue if it doesn't exist).

Return value: Message queue identifier (msgid) on success, -1 on failure and sets errno appropriately.
*/

int msgsnd(int msqid, const void *msgp, size_t msgsz, int msgflg);
/* 
Parameters:
	1. msqid: Message queue identifier.
	2. msgp: Pointer to the message to be sent.
	3. msgsz: Size of the message in bytes.
	4. msgflg: Message flags to alter default behavior (e.g., IPC_NOWAIT).

Return value: 0 on success, -1 on failure and sets errno appropriately.
*/

ssize_t msgrcv(int msqid, void *msgp, size_t msgsz, long msgtyp, int msgflg);
/* 
Parameters:
	1. msqid: Message queue identifier.
	2. msgp: Pointer to the buffer where the received message will be stored.
	3. msgsz: Size of the message buffer in bytes.
	4. msgtyp: Type of message to be received.
	5. msgflg: Message flags to alter default behavior (e.g., IPC_NOWAIT).

Return value: Number of bytes received on success, -1 on failure and sets errno appropriately.
*/

int msgctl(int msqid, int cmd, struct msqid_ds *buf);
/* 
Parameters:
	1. msqid: Message queue identifier.
	2. cmd: Command to perform on the message queue (e.g., IPC_RMID to remove the queue).
	3. buf: Pointer to a struct msqid_ds, used for control commands that require or return data.

Return value: 0 on success, -1 on failure and sets errno appropriately.
*/

3.4 IPC Comparisons

3.4.1 Shared Memory in Different Standards

对比POSIX和System V下的共享内存，POSIX的共享内存通过shm_open系统调用在文件系统中创建一个共享内存文件，之后通过这个文件描述符对共享内存段进行管理。相比之下，在SystemV中的共享内存机制使用标识符（shmid）管理，不依赖于文件系统进行索引。

相同点在于，这两种标准都使用文件系统的信息作为共享内存段标识符的生成依据，尽管方式不同（POSIX是直接依赖，System V是间接依赖）。而且不论是POSIX还是System V，实际的数据都是存储在用户群的内存中。

3.4.2 Shared Memory vs. Pipes

POSIX标准下的shm，管道的使用场景更加specific。管道具有先进先出的特性，而且有内核维护其缓冲区，所以管道不需要显式地同步机制，但是大小较为局限（通常为4KB-64KB）。相比之下，共享内存需要信号量或互斥锁等同步机制来防止数据竞争。

还与共享内存不同的是，管道是一种流式传输信息的通信工具，因为这种字节流传递方式，所以从管道中传输的数据不能有复杂的类型（如结构体）。而共享内存则提供了更多的灵活性和更高的性能，共享内存可以存储和访问如结构体等复杂的数据结构。

此外，内核态到用户态的切换开销也是我们需要考虑的。虽然内核保证了数据传输的安全性，但是频繁的切换使得管道的性能不及共享内存高。

3.4.3 Pipes vs. Message Queue

管道和消息队列很相似，在数据传输的顺序上，它们都是以FIFO的顺序进行传输的。但是消息队列相比指向更加灵活，因为消息队列通过链表结构存放信息并传输数据。这就造就了消息队列一些管道不支持的特性，例如：消息带有类型标识符，能够标识消息的类型和优先级。这样，消息就可以根据特定的类型和优先级被消息接收方所接收。（链表结构的便利）

管道的同步机制比较简单，生产者进程写入数据后，消费者进程读取数据，过程由内核自动管理，但缺乏显式的同步控制。而消息队列支持显式同步，通过msgsnd和msgrcv系统调用，提供进程间更灵活的同步与调度机制。

第四课 Memory Mapped Files

一般我们可能并不会将内存映射文件 mmap 机制作为一种进程间通信的方式去使用。但它确实可以提供信息在不同进程间的通信的功能。我们下面就一起来看看它的功能。

进程映射文件，你从名字上就能知道和文件脱不了干系。它的主要作用就是将文件映射到进程的虚拟地址空间，使得进程可以直接通过内存地址访问文件内容，而不需要 read/write。即访问内存就等价于访问文件（操作系统会自动处理缺页和回写）。我们先了解了解文件是如何打开的。

4.1 How Files are Opened

详细请参阅13. File Systems，这里仅作概述。

操作系统内维护了一张打开文件表 (Open File Table, OFT)，其中包含许多表项，我们称之为 OFD (Open File Description)。当你使用 open("example.txt", O_RDONLY) 打开某一文件时，实际上你只是将文件表项被加载进了内存。打开文件表项包含该文件的元数据和控制信息，通过这些信息，你就可以知道如何在磁盘中寻得文件内容了。

当某个进程要打开文件时，系统会首先在打开文件表中查找相关表项，如果找到了相应的表项，就说明我们不需要从磁盘加载文件打开表项了。即使内容可能并没有被加载进内存，但这时我们也说该文件已经打开。（一般来说，只有需要用到内容的时候才会把文件内容从磁盘上加载进内存，即"lazy approach"）

对于只读文件产生的条目，我们不用担心多程序并发带来的同步互斥问题。但当有进程写操作时，我们就需要留意同步互斥访问文件资源的问题。

4.2 Memory Mapping Files Implementation:`mmap()`

当文件打开之后，我们就可以用 mmap() 系统调用建立从文件到进程虚拟内存空间之间的映射。由于访问 IO 很耗时间，所以这时候，文件内容仍然不会加载进内存，在第一次访问映射后的地址空间时才会（缺页中断->加载文件内容）。我用一张图来简单的形容一下：

在进程通过 open() 系统调用知道文件的源信息后，我们就可以用 mmap 将文件的前 4KB 内容映射到进程的虚拟地址空间中。这时仅仅建立虚拟地址到文件的虚拟映射。当程序要读/写文件时，由于缺页，这时系统分配 4KB 物理内存并建立虚拟地址到物理内存地址的映射，并将文件内容从磁盘加载到内存中。（图应该是 0-4095 字节）

4.2.1 `mmap()`

现在，我们知道了内存映射是将一个文件映射到进程的地址空间。进而实现文件磁盘和进程虚拟空间中一段虚拟地址的一一对应关系。这种对应关系会保存在虚拟存储空间的文件映射与匿名映射区中，这段区域位于堆和栈之间。

mmap函数的函数原型如下：

#include <sys/mman.h>

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
/* 
Parameters:
	1. addr: Starting address for the new mapping. Typically set to NULL to let the kernel choose the address.
	2. length: Length of the mapping in bytes. Must be a multiple of the system's page size.
	3. prot: Desired memory protection of the mapping. This can be a combination of the following:
	   - PROT_READ: Pages can be read.
	   - PROT_WRITE: Pages can be written.
	   - PROT_EXEC: Pages can be executed.
	   - PROT_NONE: Pages cannot be accessed.
	4. flags: Flags that determine the nature of the mapping. Common flags include:
       - MAP_SHARED: Write updates to the mapping are visible to other processes mapping the same region, and also reflected in the underlying file.
       - MAP_PRIVATE: Changes to the mapping are private to the process and not visible to other processes. Changes are not reflected in the underlying file (copy-on-write).
       - MAP_ANONYMOUS: Mapping is not backed by any file; the fd parameter is ignored (should be -1).
       - MAP_FIXED: Forces the mapping to use exactly the address specified in addr. Be cautious as it may overwrite existing mappings.
       - MAP_FIXED_NOREPLACE: Similar to MAP_FIXED, but will fail with EINVAL if the specified address is already occupied.
       - MAP_POPULATE: Populates page tables for the mapping immediately instead of waiting for lazy access.
       - MAP_NORESERVE: Prevents reserving swap space for the mapping. If physical memory runs out, the process may be terminated.
       - MAP_LOCKED: Locks the mapping in memory, preventing it from being swapped out.
       - MAP_HUGETLB: Uses huge pages for the mapping to reduce TLB (Translation Lookaside Buffer) overhead.
       - MAP_UNINITIALIZED: Allocates uninitialized memory. Unsafe and supported only on specific architectures.
	5. fd: File descriptor of the file to be mapped. Ignored if MAP_ANONYMOUS is set.
	6. offset: Offset in the file where the mapping starts. Typically set to 0.

Return value: Returns a pointer to the mapped area on success, or MAP_FAILED on failure. The errno variable is set to indicate the error.
*/

4.2.2 Protection Access Modification: `mprotect()`

如果你想要修改映射区域的保护权限，我们可以使用 mprotect 系统调用，其函数原型如下：

#include <sys/mman.h>

int mprotect(void* address, size_t length, int prot);
/* 
Parameters:
	1. address: Starting address of the memory region to be protected.
	2. length: Length of the memory region in bytes.
	3. prot: Desired protection of the memory region. It can be a combination of the following:
	   - PROT_READ: Pages can be read.
	   - PROT_WRITE: Pages can be written.
	   - PROT_EXEC: Pages can be executed.
	   - PROT_NONE: Pages cannot be accessed.
   
Return value: Returns 0 on success, -1 on failure and sets errno to indicate the error.
*/

4.2.3 Mapping Consistency Guarantee : `msync()

msync 函数用于同步内存映射区域与其底层存储之间的内容。通过msync，我们可以确保内存中进行的修改能够被写回到映射的文件上，保持数据的一致性。通常而言，我们在对内存映射区域进行写操作之后使用msync确保数据的持久保存。

#include <sys/mman.h>

int msync(void* address, size_t length, int flags);
/* 
Parameters:
	1. address: Starting address of the memory region to be synchronized.
	2. length: Length of the memory region to be synchronized.
	3. flags: Flags that determine the synchronization behavior. Common flags include:
	   - MS_SYNC: Perform synchronous writes(blocking).
	   - MS_ASYNC: Perform asynchronous writes.
	   - MS_INVALIDATE: Invalidate all cached data.

Return value: Returns 0 on success, -1 on failure and sets errno to indicate the error.
*/

4.2.4 Memory Un-Mapping : `munmap()`

munmap 系统调用用于解除一个映射关系，将之前通过 mmap 映射的内存区域释放回操作系统。它的函数原型如下：

int munmap(void *addr, size_t length);
/* 
Parameters:
	1. addr: Starting address of the memory region to be unmapped. This should be the address returned by a previous call to mmap.
	2. length: Length of the memory region to be unmapped. Must be the same length as that specified in the original mmap call.

Return value: Returns 0 on success, -1 on failure and sets errno to indicate the error.
*/

4.3 IPC with `mmap()`

了解了内存映射文件是什么。你应该能想得到如果两个进程都将同一个文件映射到自己的虚拟内存空间，实现基本的同步互斥。那么就可以实现进程间的通信。这当然没有问题，但是这也太扯了，非常无聊。我们不想和文件打交道。

如果你细心，你可能会对 mmap() 系统调用的 flags 项感兴趣：MAP_PRIVATE、MAP_SHARED 和 MAP_ANONYMOUS。它们什么意思？我们一项一项解释。

4.3.1 `MAP_PRIVATE` and `MAP_SHARED`

MAP_PRIVATE 提供一种只有进程内部私有这段虚拟内存的映射关系。

而 MAP_SHARED 提供多线程共享同一物理资源。

4.3.2 Anonymous Mapping

当你使用 MAP_ANONMYMOUS 时，虚拟内存将不会和任何文件产生映射关系，而是在访问的时候系统会分配物理内存并建立虚拟内存到物理内存上的映射。

你可能了解过，当库函数 malloc 申请堆内存时，申请内存小于 128KB 时（一般而言），底层会调用 sbrk() 系统调用。当申请的堆内存大于 128KB 时，底层会调用 mmap()。你可能会奇怪为什么堆内存的申请会和内存映射文件产生联系，即便没有任何文件参与。而到这里，你可能就明白是怎么回事了。

实际上，进程启动的时候，就会用 malloc 申请内存段空间（数据段、代码段等）。这些在程序运行的整个生命周期都不会改变的内存就相当于：

void *mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
// Size must be a multiple of the system's page size. (4n * KB)

一般情况下，你使用 malloc 申请大块内存时，底层调用的 mmap 实际上相当于：

void *mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// Size must be a multiple of the system's page size. (4n * KB)

如果配合 MAP_SHARED 我们就可以实现类似 shm 共享内存的进程间通信了。但是这种进程间通信仅仅存在于父子进程或其他有亲缘关系的进程之间。但这里你需要知道 mmap() 映射的大小应为系统页大小的倍数。（页大小一般为 4KB ，也有 16KB 等的）

4.4 Easy Peasy Example

下面我们让父进程在”共享内存“中写 Hello, kid，然后让子进程读。我们不设置复杂的同步互斥机制，先让子进程阻塞一秒。这里请注意，fork()系统调用会将父进程所有的资源都复制一份，所以当我们关闭文件描述符的时候应当关闭两次，而不是一次。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/wait.h>

#define MY_SHM_SIZE 4096

int main(){
    // 1. create my shared memory
    char *my_shm = mmap(NULL, MY_SHM_SIZE,
                    PROT_READ | PROT_WRITE,
                    MAP_ANONYMOUS | MAP_SHARED,
                    -1, 0);

    if(my_shm == MAP_FAILED){
        perror("mmap failed");
        exit(EXIT_FAILURE);
    }

    // 2. write-in
    strcpy(my_shm, "Hello, kid");

    // 3. fork
    pid_t pid = fork();
    if(pid == -1){
        perror("fork failed");
        exit(EXIT_FAILURE);
    }

    if(pid == 0){
        sleep(1);
        printf("Child received: %s\n", my_shm);
        exit(EXIT_SUCCESS);

    }else{
        wait(NULL);
    }

    // 4. unmapping
    if(munmap(my_shm, MY_SHM_SIZE) == -1){
        perror("munmap failed");
    }
    return 0;
}

第五课 IPC for Process Control: Signals

在上个阶段的最后，我们看了看如何用信号处理僵尸进程的问题。我们通过注册SIGCHILD信号的处理程序来捕获子进程终止的信号，从而避免僵尸进程的产生。信号机制作为一种轻量级的进程间通信方式，我们本节课来详细探讨探讨相关的细节。

5.1 Signaling vs. Exception

在Interruption的阶段中，我们了解了内核是如何处理异常的。当程序执行过程中发生异常时，内核会接管并执行相应的异常处理例程(exception handler)。因为异常的处理程序都在内核中，用户程序对异常的发生是毫不知情的。比方说缺页中断。

信号机制在软件层面上模拟了硬件中断。信号提供了一种机制来通知用户进程发生了何种异常，以便用户程序能够根据自身情况做出响应。回想在用信号处理僵尸进程时，我们自己定义的handler实际上是放在用户区的，而中断通常会在内核空间中执行。此外，信号的触发和处理机制和中断并不相同。（信号的嵌套是被允许的。）

Linux中共有64种信号，其中1-34属于非实时信号，而35-64属于实时信号，用于实时系统。

$ kill -l
 1) SIGHUP	 2) SIGINT	 3) SIGQUIT	 4) SIGILL	 5) SIGTRAP
 6) SIGABRT	 7) SIGBUS	 8) SIGFPE	 9) SIGKILL	10) SIGUSR1
11) SIGSEGV	12) SIGUSR2	13) SIGPIPE	14) SIGALRM	15) SIGTERM
16) SIGSTKFLT	17) SIGCHLD	18) SIGCONT	19) SIGSTOP	20) SIGTSTP
21) SIGTTIN	22) SIGTTOU	23) SIGURG	24) SIGXCPU	25) SIGXFSZ
26) SIGVTALRM	27) SIGPROF	28) SIGWINCH	29) SIGIO	30) SIGPWR
31) SIGSYS	34) SIGRTMIN	35) SIGRTMIN+1	36) SIGRTMIN+2	37) SIGRTMIN+3
38) SIGRTMIN+4	39) SIGRTMIN+5	40) SIGRTMIN+6	41) SIGRTMIN+7	42) SIGRTMIN+8
43) SIGRTMIN+9	44) SIGRTMIN+10	45) SIGRTMIN+11	46) SIGRTMIN+12	47) SIGRTMIN+13
48) SIGRTMIN+14	49) SIGRTMIN+15	50) SIGRTMAX-14	51) SIGRTMAX-13	52) SIGRTMAX-12
53) SIGRTMAX-11	54) SIGRTMAX-10	55) SIGRTMAX-9	56) SIGRTMAX-8	57) SIGRTMAX-7
58) SIGRTMAX-6	59) SIGRTMAX-5	60) SIGRTMAX-4	61) SIGRTMAX-3	62) SIGRTMAX-2
63) SIGRTMAX-1	64) SIGRTMAX

在上个阶段的最后，我们还简单地了解了信号的 default action，然而对于大部分信号而言，进程可以对特定的信号进行忽略或设置自己的信号处理程序。

5.2 Signal Control

5.2.1 Kernel Data Structure (`include/linux/sched/signal.h`)

学完进程，我们现在知道每个进程会对应一个 task_struct 结构体，其中，就有很多与信号相关的数据结构。对于信号的管理，我们有 signal_struct 结构体、sighand_struct、信号集 sigset_t 和用于记录当前挂起信号的 sigpending 结构体。

struct task_struct{
// ...
	// 指向一个signal_struct结构体的指针。用于管理与进程相关的信号信息。
	struct signal_struct *signal; 
	// 指向一个sighand_struct结构体。用于管理信号服务例程。	
	struct sighand_struct *sighand; 
	// 用于表示当前被阻塞的信号。其中sigset_t是一个64位的位掩码。
	sigset_t blocked;
	// sigpending用于表示当前挂起的信号。
	struct sigpending pending;
// ...
}

Pasted image 20250128190859.png

5.2.1.1 `struct signal_struct`

在 task_struct 中，我们看到每个进程私有的挂起信号 sigpending。而在 signal_struct 中，我们还有一个存放进程组共享挂起信号的数据结构 shared_pending。在学习进程组时提到，信号的处理可以以进程组为单位进行。当信号发出时，信号会被存放在 shared_pending 结构体中。

signal_struct的结构体原型如下：

struct signal_struct {
    atomic_t sigcnt;
    atomic_t live;
    int nr_threads;
    struct list_head thread_head;
    struct sigpending shared_pending;
    struct sigpending group_exit_pending;
    int group_exit_code;
    unsigned int flags;
    struct rcu_head rcu;
};
/* 
Parameters:
	1. sigcnt: Atomic counter for the number of signals.
	2. live: Atomic counter for the number of live processes in the signal group.
	3. nr_threads: Number of threads in the signal group.
	4. thread_head: List head for the threads in the signal group.
	5. shared_pending: Shared pending signals for the signal group.
	6. group_exit_pending: Pending signals for group exit.
	7. group_exit_code: Exit code for the group exit.
	8. flags: Flags for the signal group.
	9. rcu: RCU head for the signal group.
*/

5.2.1.2 `typedef unsigned long sigset_t`

sigset_t 实际上是一个位掩码(bitmask)，用于表示一组信号。在许多操作系统中，sigset_t 是一个 64 位的 unsigned long 数据类型，每一位代表一个信号。由于我们有 64 种信号，因此每一位可以表示一个信号的状态。当我们发送某个信号时，就将 sigset_t 的那一位置为 1。

信号的轻量型就体现在这种使用位操作来管理和操作信号集的方式。

5.2.1.3 `sighand_struct`

sighand_struct是信号处理程序的结构体，其中包括三个参数。我们需要关注的是struct k_sigaction action[64];。从数组的大小就能看来，其对应Linux中支持的64种信号，根据数组编号的不同，记录着不同信号的处理方式（actions）。

结构体k_sigaction的原型如下：

struct k_sigaction {
    struct sigaction sa;
    unsigned long sa_flags;
    void (*sa_restorer)(void);
    __sigaction_handler_t sa_handler;
    sigset_t sa_mask;
};
/* 
Parameters:
	1. sa: User-defined signal handler structure, containing the signal handling function and signal mask.
	2. sa_flags: Signal handling flags, used to control the behavior of signal handling.
	3. sa_restorer: Restorer function pointer, used to restore the execution environment after the signal handler returns.
	4. sa_handler: User-defined signal handler function pointer.
	5. sa_mask: Signal mask, used to block certain signals during the execution of the signal handler.
*/

其中的struct sigaction就是用于定义信号处理程序的结构体。其参数包含信号处理函数、信号掩码和控制标志等。下面是struct sigaction的结构体原型：

struct sigaction {
    void (*sa_handler)(int);
    void (*sa_sigaction)(int, siginfo_t *, void *);
    sigset_t sa_mask;
    int sa_flags;
    void (*sa_restorer)(void);
};
/* 
Parameters:
	1. sa_handler: Signal handling function pointer, which is called when the signal is received.
	2. sa_sigaction: Alternative signal handling function pointer, used when the SA_SIGINFO flag is set.
	3. sa_mask: Signal mask, used to block certain signals during the execution of the signal handler.
	4. sa_flags: Signal handling flags, used to control the behavior of signal handling.
	5. sa_restorer: Restorer function pointer, used to restore the execution environment after the signal handler returns.
*/

这里面，sa_handler是指向信号处理函数的函数指针，当收到信号时调用。这里我们可以使用预定义的常量SIG_DFL（默认信号处理）或者SIG_IGN（忽略信号）来设置这个字段。

5.1.3 Signal Blocking and Signal Pending

信号可以被阻塞屏蔽(blocked)，被屏蔽信号不会立即被传递给进程的信号处理程序。这些信号会被保留，直到它被解除屏蔽后才会被处理。屏蔽进程用于阻止某些信号的干扰，以确保进程在关键操作期间不会被信号中断。在屏蔽期间到来的信号并不会被处理，我们称之为信号的挂起(pending)。每个线程都会有自己的挂起信号集(pending signal set)，集合中的信号表示不是0就是1（表示被挂起，需要被处理）,一个信号无论有被挂起多少次，最终只会被处理一次。进程的信号挂起集合是有线程屏蔽集合合并生成的。fork(2)创建好的子进程会初始化自己的集合。

进程可以使用sigprocmask(2)来操作信号的屏蔽，作用于下辖所有的线程。在多线程的环境中，使用 pthread_sigmask(2) 可以确保信号屏蔽设置仅影响当前线程。fork但子进程会继承父进程对信号的屏蔽，在execve(2)后仍然会保留。

5.3 Signal Handling

如果我们什么都不做，不同的信号会表现出不同的默认操作。根据信号的不同，不同的信号有以下五种不同的 default action：

Default Action	Explanation
Term	to terminate the process
Ign	to ignore the signal
Core	to terminate the process and core dump
Stop	to stop the process
Cont	to continue the process if it is currently stopped.

一个进程可以使用signal(2)或sigaction(2)来改变信号的默认处理方式。当一个信号被传送给进程时，对于大多数信号，进程可以自行决定是按照默认方式处理呢、还是忽视掉这个信号、亦或是使用一个自定义的函数来处理信号。进程的处理方式是一个per-process的事情，这就意味着同一进程中的不同线程对于相同信号的处理方法是一样的。fork()后，子进程的信号处理和父进程相同，在execve(2)后改变。

5.4 Signal Sending and Receiving

信号有很多不同的类型，分别代表着不同事件的发生。信号可以由内核、其他进程或自身进程发送，用于通知进程发生了某些事件。当一个进程向另一个进程发送信号时，会经由内核将信号传递给目标进程，内核起到中介的作用。

5.4.1 Sending a Signal

Send a signal, raise a flag，信号非常小，一般不携带数据信息，所以当我们说信号是轻量级的进程间通信。当进程发送一个信号时，内核会在目标进程的信号队列（signal pending set）中添加该信号，并在适当的时候将其传递给目标进程。信号可能被阻塞，这就意味着信号暂时不会被处理。若未阻塞信号，目标进程接收到信号后，会在适当的实际用预先设定的信号处理程序来处理该信号。

根据不同的系统调用接口，信号会被发送给进程组（killpg()）、进程（kill(), sigqueue(), pid_send_signal()）或是线程（raise(), pthread_kill(), tgkill()）。当信号发送给线程时，特定的线程会在未阻塞相关信号时对其进行处理。当发送给进程时，本着阻塞不绝对就是绝对不阻塞的原则，内核会任意挑选一个未对特点信号进行阻塞的线程来处理相关的信号服务例程。

5.4.2 Waiting for a Signal to be Caught

我们有两个系统调用 pause() 和 sigsuspend() 来挂起线程的执行，直到有信号被捕获。

pause() 系统调用会挂起执行，捕获任何信号都会让线程接着执行。

sigsuspend() 也会挂起线程，但不同的是，sigsuspend() 会临时改变信号屏蔽字，并且只会在未被屏蔽的信号被捕获时才恢复线程的执行。这使得 sigsuspend() 更灵活，因为你能控制在等待时哪些信号是屏蔽的，哪些信号是可处理的。

5.4.3 Receiving a Signal

5.1.4 Execution of Signal Handler

当发生从内核态到用户态的转变（系统调用返回、发生线程的调度）时，内核会检查是否有未阻塞的挂起信号。如果进程自己注册了相关的信号服务例程，那么一旦发生从内核态到用户态的转变，就会发生：

挂起信号集合中的相关位清零（信号从集合中移除）。
如果信号处理函数是使用sigaction系统调用注册的，并且指定了SA_ONSTACK标志。内核会为信号处理函数加载一个单独的信号栈（默认信号的服务例程在进程的栈空间中建立栈帧）。
和中断类似的，执行handler前，我们还要将一些上下文信息保存到一个栈帧里，便于信号处理完毕后恢复继续执行。
之后，内核会为信号处理程序构建一个栈帧，设置PC为指向信号处理程序函数的第一条指令。
内核将控制权交给用户空间，信号处理程序开始处理。完成后将控制权交给signal trampoline。
signal trampoline调用sigreturn(2)，这个系统调用和恢复上下文，继续执行有关。

从内核的角度来看，信号处理程序代码的执行与任何其他用户空间代码的执行完全相同。也就是说，内核不会记录任何特殊的状态信息来指示线程当前正在执行信号处理程序。所有必要的状态信息都保存在用户空间的寄存器和用户空间栈中。嵌套信号处理程序的调用深度仅受用户空间栈的限制（以及合理的软件设计）。

5.2 Signal Handling in Practice

#include <signal.h>

void (*signal(int sig, void (*handler)(int)))(int);
/* 
Parameters:
	1. sig: The signal number to be handled. Common signals include SIGINT, SIGTERM, SIGKILL, etc.
	2. handler: A pointer to the signal handling function. This function takes a single argument of type int (the signal number) and returns void.
		- SIG_IGN: Ignore the signal.
		- SIG_DFL: Treat signal as its default manner.
		- Your own handler.

Return value:
	- On success: Returns the previous signal handler.
	- On failure: Returns SIG_ERR and sets errno appropriately.
*/

The Null Signal

null 信号（信号值为 0）是 Unix-like 系统中的一个特殊信号。在前面，我们看到记录信号的结构 sigset_t 只有 64 位。所以 null 信号实际上不存在，他没有任何 default action，它唯一的作用就是测试。用于检测进程是否存在或测试用户权限。

if (kill(pid, 0) == 0) {
    // the process exists and we have promission to send a signal
} else {
    // the process is non-existing, or we don't have much promission
}