CS162_HW2: Shell

2024-04-08 18:47:40

CS162_HW2: Shell

作业要求：

实现cd、pwd指令运行
Program Execution
解析路径
输入输出重定向

Optional：

管道
信号处理和停止控制
前台、后台切换

Get Started

作业提供了Shell的骨架代码，包括一个分词工具tokenizer.c和shell的初始化、一个最简单的exit内建命令

tokenizer.c的用法：

/* A struct that represents a list of words. */
struct tokens;

/* Turn a string into a list of words. */
struct tokens* tokenize(const char* line);

/* How many words are there? */
size_t tokens_get_length(struct tokens* tokens);

/* Get me the Nth word (zero-indexed) */
char* tokens_get_token(struct tokens* tokens, size_t n);

/* Free the memory */
void tokens_destroy(struct tokens* tokens);

这部分代码将在其后提供

cd、pwd

实现shell的内建命令cd、pwd
要实现该命令，我们需要先清楚这个两个命令的行为

使用man来获取详细信息：man cd、man pwd

cd有两种情况，一种是带参的，另一种是不带参数的
带参数的cd会打开对应路径，且只能有一个参数
不带参数的cd会打开环境变量HOME里存储的路径，若该变量未定义，则不做处理

pwd直接打印当前所在工作目录

注意到题目提供的代码骨架中已经有了help和exit内建命令的实例了，我们照着它的写法，在cmd_table里添加对应的记录，然后实现cmd_cd与cmd_pwd函数即可

/* Built-in command struct and lookup table */
typedef struct fun_desc {
	cmd_fun_t *fun;
	char *cmd;
	char *doc;
} fun_desc_t;

fun_desc_t cmd_table[] = {
	{ cmd_help, "?", "show this help menu" },
	{ cmd_exit, "exit", "exit the command shell" },
	{ cmd_cd, "cd", "change the working directory" },
	{ cmd_pwd, "pwd", "print name of current/working directory" }
};

cd命令中，当没有路径作为参数传入的时候，需要获取环境变量HOME的值
使用库函数getenv来完成。这个函数定义在unistd.h中

使用库函数chdir()来更改当前程序的工作目录，同样定义在unistd.h中

cmd_cd如下：

/* change working directory */
int cmd_cd(unused struct tokens *tokens)
{
	char *dst = NULL;
	int res = -1;

	switch (tokens_get_length(tokens)) {
	case 1:	/* no directory operand is given, if HOME is given, cd $HOME */
		dst = getenv("HOME");
		break;
	case 2:
		dst = tokens_get_token(tokens, 1);
		break;
	default:
		shell_msg("too many argument\n");
	}
	if (dst == NULL)
		return -1;
	res = chdir(dst);
	if (res == -1)
		shell_msg("No such file or directory\n");
	return res;
}

此处的shell_msg()是一个宏函数，它展开为

#define shell_msg(FORMAT, ...) \
do {\
	if (shell_is_interactive) { \
		fprintf(stdout, FORMAT, ##__VA_ARGS__);\
	} \
} while(0)

它的作用是，只有当shell作为交互式程序启动的时候，才会输出警告或者错误信息

pwd更简单，可以直接使用getcwd库函数来实现

/* get current full path */
int cmd_pwd(unused struct tokens *tokens)
{
	char *path = getcwd(NULL, 0);
	if (path == NULL) {
		shell_msg("%s\n", strerror(errno));
		return -1;
	}
	printf("%s\n", path);
	free(path);
	return 0;
}

strerror的作用是输出errno对应的错误信息

执行程序

shell执行程序的流程如下：

执行fork调用，拷贝一个当前进程的副本
子进程执行exec系列的系统调用，将当前子进程替换成要执行的程序
shell等待子进程返回或者让子进程在后台运行（如果有添加&参数）

因为我们暂时没有展开可执行文件的路径，因此，目前，只能输入完整的文件路径

题目限制条件：不允许使用execvp调用

exec族的调用有

exec      execle    execv     execveat  execvpe   
execl     execlp    execve    execvp

等，其中execv和execvp的区别是，execv不会自动展开路径，而execvp会自动展开可执行文件的路径

具体操作可以分为以下几步：

解析参数
使用fork调用
子进程使用execv调用
父进程等待子进程结束

解析参数
我们创建这样子的一个结构体：

struct ch_process {
	int tokens_len;
	int next_token;
	char **args;
};

然后，使用函数void parse_args(struct ch_process *ch, struct tokens *tokens);来解析参数
该函数的定义如下：

void parse_args(struct ch_process *ch, struct tokens *tokens)
{
	char *token;
	while (ch->next_token < ch->tokens_len) {
		token = tokens_get_token(tokens, ch->next_token);
		ch->args[ch->next_token++] = token;
	}
	ch->args[ch->next_token] = NULL;
}

fork调用有两个返回值，子进程返回0，父进程返回子进程的进程号

整个函数如下

/* start a child process to execute program */
int run_program(struct tokens *tokens)
{
	int tokens_len = tokens_get_length(tokens);
	if (tokens_len == 0)	/* no input */
		exit(0);

	char *args[tokens_len + 1];
	struct ch_process child = { 0 };
	child.tokens_len = tokens_len;
	child.next_token = 0;
	child.args = args;

	parse_args(&child, tokens);

	pid_t chpid = fork();
	if (chpid < 0) {	/* fork error */
		shell_msg("fork : %s\n", strerror(errno));
		return -1;
	} else if (chpid == 0) {
		execv(path, args);
	}
	if (wait(NULL) == -1) {	/* wait until child process done */
		shell_msg("wait: %s\n", strerror(errno));
		return -1;
	}

	return 0;
}

解析路径

我们在实际使用shell的时候，并不需要每次都输入可执行文件的完整路径
shell对此的处理流程是这样子的：
优先在本目录下进行搜索，判断其是否是本目录下的可执行文件；
如未找到，则在环境变量PATH的值中记录的目录下进行查找
PATH下有很多目录，这些目录用一个字符串表示，使用:作为分隔符
例如，在我的电脑上，运行echo $PATH，结果如下：

$ echo $PATH
/home/yingmanwumen/.cargo/bin:/home/yingmanwumen/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

因此，我们设计了char *get_fullpath(char *name)函数来完成这个工作

首先判断用户输入的路径是不是已经是完整路径：

strcpy(path, name);
if (access(path, X_OK) == 0)
	return path;

access系统调用用来判断某个文件或者目录是否具有某种属性，例如X_OK就是判断其是否可执行

接下来，使用getenv，获取$PATH
然后，解析其下的每一条路径，在每个路径下查找是否存在对应的可执行文件

完整代码如下

char *get_fullpath(char *name)
{
	char *val = getenv("PATH");
	int i, j, len;
	char *path = (char *)malloc(BUFSIZ);
	/* if name is already full path */
	strcpy(path, name);
	if (access(path, X_OK) == 0)
		return path;
	/* enumerate $PATH and search reachable path */
	len = strlen(val);
	i = 0;
	while (i < len) {
		j = i;
		while (j < len && val[j] != ':')
			j++;
		int k = j - i;
		memset(path, 0, BUFSIZ);
		strncpy(path, val + i, k);
		path[k] = '/';
		strcpy(path + k + 1, name);
		if (access(path, X_OK) == 0)
			return path;
		i = j + 1;
	}
	free(path);
	return NULL;
}

输入输出重定向

使用<重定向标准输入，使用>重定向标准输出，使用>>则不覆盖标准输出对应的文件，而是使用append的方式写入

题目对多个<、>、>>存在的情况说得比较含糊
经过自己的测试，得出如下规律：

多个<存在的情况下，只有最后一个<才是有效的
多个>或者多个>>存在的情况下，同样也只有最后一个参数是有效的，但是，前面那些文件也会被打开。

例如，ls -l >> a > b > c < d < e中，shell首先打开文件a、b、c，若不存在则创建，然后打开文件d，发现其后还有一个到e的重定向，则关闭文件d、打开文件e，因此，当d存在时，最终的结果是在c中输出ls -l的结果，而b中原有的数据被清除、a中的数据没有影响

我们扩展之前的struct ch_process，改为：

struct ch_process {
	int tokens_len;
	int next_token;
	char **args;
	int in_fd;
	int out_fd;
	int out_attr;
};

然后构建函数void parse_redirection(struct ch_process *ch, struct tokens *tokens);
注意，它一定只能在解析完参数后执行

同时，我们更改解析参数函数：

void parse_args(struct ch_process *ch, struct tokens *tokens)
{
	char *token;
	int finish = 0;
	while (ch->next_token < ch->tokens_len && !finish) {
		token = tokens_get_token(tokens, ch->next_token);
		/* if first char of token is < or >, break */
		finish = (token[0] == '<' || token[0] == '>');
		/* if not finish, !finish 1, then args[next_token] = token, then next_token inccrease
		else if finish, args[next_token] = NULL, and next_token refer to the first < or > or >> */
		/* This line may be hard to understand, but it can avoid IF branch */
		ch->args[ch->next_token] = (char *)((!finish) * (int64_t)(void*)(token));
		ch->next_token += !finish;
	}
	ch->args[ch->next_token] = NULL;
}

void parse_redirection(struct ch_process *ch, struct tokens *tokens)函数的核心部分如下：

switch(arrow[0]) {
		case '<':
			/* redirect standard input.
			If there are multiple '<' in command line, such as `prog < foo1 < foo2`,
			the last one would be active */
			if (access(path, R_OK) == 0) {
				if (ch->in_fd != 0) {
					close(ch->in_fd);
				}
				ch->in_fd = open(path, O_RDONLY);
			} else {
				shell_msg("%s is not exsist or readable\n", path);
				return;
			}
			break;
		case '>':
			/* The only diff between > and >> is that << have the attrs of O_APPEND and  O_TRUNC */
			attr = O_WRONLY | O_CREAT;
			if (arrow[1] == '>') {
				attr |= O_APPEND;
			} else {
				attr |= O_TRUNC;
			}
			ch->out_attr = attr;
			if (ch->out_fd != 1) {
				close(ch->out_fd);
			}
			ch->out_fd = open(path, attr, 0664);	/* -rw-rw-r-- */
		}

arrow记录<、>与>>所在的token，path则是记录arrow后面跟着的文件名：

/* next_tocken start from the first < or > or >>
for example, if `program > foo`, then arrow = >, path = foo */
arrow = tokens_get_token(tokens, ch->next_token++);
if (ch->next_token >= ch->tokens_len) {
	/* next_token is out of range, no filename next to < or > or >> */
	shell_msg("No file next to '%s'\n", arrow);
	return;
}
path = tokens_get_token(tokens, ch->next_token++);

在run_program这个函数中，也需要做一定的变更，例如在execv前增加重定向有关的代码：

/* redirect */
if (child.in_fd != 0) {
	dup2(child.in_fd, 0);
}
if (child.out_fd != 1) {
	dup2(child.out_fd, 1);
}

因为fork的程序的文件描述符都是完全复制自父进程的，它们指向同一个文件，因此，可以直接使用

这里使用了一个新的系统调用dup2，与之对应的还有dup调用
它的作用是复制文件描述符
例如，dup2(old, new)就会把old对应的文件复制到new对应的文件中（old、new都是文件描述符）
若new对应一个已经打开的文件，则dup2会先关闭该文件，然后再将文件信息复制过去

资源

暂时懒得放

码农公寓

CS162_HW2: Shell

Get Started

cd、pwd

执行程序

解析路径

输入输出重定向

资源

相关文章