SECCOMP AND PTRACE

Introduction

In this post, I will talk about the seccomp and ptrace step by step. The manual reference of ptrace [6] is heavily recommended for this topic. The whole post will be divided into 6 sections: (1) Introduction of syscall. (2) Introduction of seccomp. (3) Introduction of ptrace. (4) Advanced ptrace. (5) Seccomp Filter and seccomp-tool. and (6) Seccomp escape with ptrace.
I will give many sample codes in the post since I am a newbie in seccomp escape.

Introduction of syscall

Syscall is a familiar concept for shellcode writers. The page in [8] gives a detailed description of syscall in different platforms. In this post, we will only focus on the code on x64 platform.

Take a look at the code below.

//code1.c
//gcc code1.c -o code1
#include<stdio.h>
#include<unistd.h>
#include<sys/syscall.h>

int main()
{
	char *argv[]={"/bin/cat", "flag", NULL};
	char *env[]={NULL};
	char cmd[20] = "/bin/cat";
	syscall(59, cmd, argv, env);
	return 0;
}

We can soon find out the result of this code.

$ ./code1 
DANGOKYO{THIS_IS_A_ESCAPE}

59 is the syscall number of sys_execve. Here we write the string in flag to the screen.

Introduction of seccomp

Seccomp is a security feature introduced in Linux System for securing the whole system. To put it simple, seccomp provides a customizable functionality to conditionally forbid system call.

The seccomp with full security

//code2.c
//gcc code2.c -o code2 -lseccomp
#include<stdio.h>
#include<unistd.h>
#include<sys/syscall.h>
#include<sys/prctl.h>
#include<linux/seccomp.h> 

int main()
{
	char *argv[]={"/bin/cat", "flag", NULL};
	char *env[]={NULL};
	char cmd[20] = "/bin/cat";
	prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
	syscall(59, cmd, argv, env);
	return 0;
}

The final result

$ ./code2
Killed

Now, let’s observe what is the customizable security.

#include<stdio.h>
#include<unistd.h>
#include<sys/syscall.h>
#include<sys/prctl.h>
#include<linux/seccomp.h> 
#include<seccomp.h>

int main()
{
	char *argv[]={"/bin/cat", "flag", NULL};
	char *env[]={NULL};
	char cmd[20] = "/bin/cat";

	scmp_filter_ctx ctx;
	ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: Allow
	seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 0);
	seccomp_load(ctx);

	syscall(59, cmd, argv, env);
	return 0;
}

In the sample code, I am using seccomp_rule_add to customize a security policy. I blacklist the sys_write in my code. And the final result is given below.

$ ./code3
Bad system call

More importantly, the seccomp policy will also be inherited by the child process.

//code4.c
//gcc code4.c -o code4 -lseccomp
#include<stdio.h>
#include<unistd.h>
#include<sys/syscall.h>
#include<sys/prctl.h>
#include<sys/types.h>
#include<sys/wait.h>
#include<linux/seccomp.h> 
#include<seccomp.h>

int main()
{
	char *argv[]={"/bin/cat", "flag", NULL};
	char *env[]={NULL};
	char cmd[20] = "/bin/cat";
	pid_t pid;
	int rv;
	scmp_filter_ctx ctx;
	ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: Allow
	seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 0);
	seccomp_load(ctx);

	pid = fork();
	if(pid==0){
		syscall(59, cmd, argv, env);
	}
	else
	{
		waitpid(pid, &rv, 0);
	}
	return 0;
}

The code above will not output anything.

Introduction of ptrace

From my perspective, ptrace provides a customizable debugger for programmer to inspect and modify the register status at run-time.

Let me use the following code to demonstrate the usage of ptrace

//code5.c
//gcc code5.c -o code5
#include <stdio.h>
#include <string.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
	pid_t  pid;
	int rv;
	long orig_rax;
	char *argv[]={"/bin/cat", "flag", NULL};
	char *env[]={NULL};
	char cmd[20] = "/bin/cat";
	long value;
	int insyscall = 0;
	struct user_regs_struct regs;
	pid = fork();
	if(pid == 0)
	{
		ptrace(PTRACE_TRACEME, 0, NULL, NULL);
		syscall(59,  cmd ,argv, env);
		exit(0);
	}
	else
	{
		while(1)
		{
			wait(&rv);
			if(WIFEXITED(rv)){
				break;
			}
			orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL);
			
			if(orig_rax == 1)
			{
				if(insyscall == 0)
				{
					printf("Syscall number: %d\n", orig_rax);
					ptrace(PTRACE_GETREGS, pid, NULL, &regs);
					printf("write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx);
					insyscall = 1;
				}
				else
				{
					int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL);
					printf("Write returned with %d\n", rax);
		                        insyscall = 0;
				}

			}
			ptrace(PTRACE_SYSCALL, pid, NULL, NULL);
		}
		printf("The child process exits\n");
	}
	return 0;
}

In the sample code, we use three different types of ptrace function call.
ptrace(PTRACE_TRACEME, 0, NULL, NULL): Trace only the child process.

ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL): Retrieve the register status.

ptrace(PTRACE_SYSCALL, pid, NULL, NULL): Stop the tracee at the entry or exit point of system call.

I stop the tracee at the time of invoking sys_write and retrieve value in the argument register.

$ ./code5 
Syscall number: 1
write called with 0x1, 0x7f4338b8c000, 0x1b
DANGOKYO{THIS_IS_A_ESCAPE}
Write returned with 27
The child process exits

Advanced ptrace

This part is based on [4]. But I remove some unnecessary codes to keep the code clean.
In this section, I will reverse the output of sys_write.

Based on the sample code in last section, I use the following code to reverse the code

//code6.c
//gcc code6.c -o code6
#include <stdio.h>
#include <string.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/prctl.h>     /* prctl */
#include <linux/seccomp.h> /* seccomp's constants */
#include <stdlib.h>
#include <unistd.h>

#define longsize 8

char buffer[100];

int reverse(char *str)
{
	int i, j;
	char temp;
	for(i = 0, j = strlen(str) - 2; i <= j; ++i, --j) {
		temp = str[i];
		str[i] = str[j];
		str[j] = temp;
	}
}

int getdata(pid_t pid, char *buf, long addr, long length)
{
	int i;
	union {
		long val;
		char chars[longsize];
	} value;
	for(i=0; i < length/longsize + 1; i++)
	{
		value.val = ptrace(PTRACE_PEEKDATA, pid, (void*)(addr + i*8), NULL);
		memcpy(buf + i*8, value.chars, 8);
	}
	buf[length+1] = '\0';
	return 0;
}

int putdata(pid_t pid, char *buf, long addr, long length)
{
	int i;
	union {
		long val;
		char chars[longsize];
	}value;
	for(i=0; i<length/longsize + 1; i++)
	{
		memcpy(value.chars, buffer + i*8, 8);
		ptrace(PTRACE_POKEDATA, pid, (void*)(addr + i*8), value.val);
	}
	return 0;
}

int main()
{
	pid_t  pid;
	int rv;
	long orig_rax;
	char *argv[]={"/bin/cat", "flag", NULL};
	char *env[]={NULL};
	char cmd[20] = "/bin/cat";
	long length;
	long addr;
	int insyscall = 0;
	struct user_regs_struct regs;
	pid = fork();
	if(pid == 0)
	{
		ptrace(PTRACE_TRACEME, 0, NULL, NULL);
		syscall(59,  cmd ,argv, env);
		exit(0);
	}
	else
	{
		while(1)
		{
			wait(&rv);
			if(WIFEXITED(rv)){
				break;
			}
			orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL);
			
			if(orig_rax == 1)
			{
				if(insyscall == 0)
				{
					printf("Syscall number: %d\n", orig_rax);
					ptrace(PTRACE_GETREGS, pid, NULL, &regs);
					printf("write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx);
					addr = regs.rsi;
					length = regs.rdx;
					getdata(pid, buffer, addr, length);
					reverse(buffer);
					putdata(pid, buffer, addr, length);
					insyscall = 1;
				}
				else
				{
					int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL);
					printf("Write returned with %d\n", rax);
		                        insyscall = 0;
				}

			}
			ptrace(PTRACE_SYSCALL, pid, NULL, NULL);
		}
		printf("The child process exits\n");
	}
	return 0;
}

To output the reversed string, we use another two ptrace functions:
ptrace(PTRACE_PEEKDATA, pid, addr, NULL): Read value at addr from the memory space of tracee.

ptrace(PTRACE_POKEDATA, pid, addr, value): Write value to addr into the memory space of tracee.

The final result is given below:

$ ./code6
Syscall number: 1
write called with 0x1, 0x7f20cd3b4000, 0x1b
}EPACSE_A_SI_SIHT{OYKOGNAD
Write returned with 27
The child process exits

Seccomp Filter and seccomp-tools

In Section 3 of this post, I give a simple example of seccomp_rule_add. However, seccompt_rule_add is much more powerful than expected. It can deploy conditional filter for one specific syscall. In particular, the programmer can choose under what condition the syscall should be killed.

At first, we need to install the seccomp-tools [7], which is a helper program to identify the seccomp filter set in the program.

//code7.c
//gcc code7.c -o code7 -lseccomp
#include <stdio.h>
#include <string.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/prctl.h>     /* prctl */
#include <linux/seccomp.h> /* seccomp's constants */
#include <seccomp.h>
#include <stdlib.h>
#include <unistd.h>

#define longsize 8

char buffer[100];


int main()
{
	pid_t  pid;
	int rv;
	long orig_rax;
	char *argv[]={"/bin/cat", "flag", NULL};
	char *env[]={NULL};
	char cmd[20] = "/bin/cat";
	long length;
	long addr;
	int insyscall = 0;
	struct user_regs_struct regs;

	scmp_filter_ctx ctx;
	ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: allow

	seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 1, SCMP_A2(SCMP_CMP_EQ, 60));

	seccomp_load(ctx);
	prctl(PR_SET_NO_NEW_PRIVS, 1);
	pid = fork();
	if(pid == 0)
	{
		ptrace(PTRACE_TRACEME, 0, NULL, NULL);
		syscall(59,  cmd ,argv, env);
	}
	else
	{
		while(1)
		{
			wait(&rv);
			if(WIFEXITED(rv)){
				break;
			}
			orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL);
			
			if(orig_rax == 1)
			{
				if(insyscall == 0)
				{
					printf("Syscall number: %d\n", orig_rax);
					ptrace(PTRACE_GETREGS, pid, NULL, &regs);
					printf("Write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx);
					addr = regs.rsi;
					length = regs.rdx;
					insyscall = 1;
				}
				else
				{
					int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL);
					printf("\nWrite returned with %d\n", rax);
		                        insyscall = 0;
				}

			}
			ptrace(PTRACE_SYSCALL, pid, NULL, NULL);
		}
		printf("The child process exits\n");
	}
	return 0;
}

Let’s use the seccomp-tools to observe the filter

$ seccomp-tools dump ./code7
 line  CODE  JT   JF      K
=================================
 0000: 0x20 0x00 0x00 0x00000004  A = arch
 0001: 0x15 0x00 0x08 0xc000003e  if (A != ARCH_X86_64) goto 0010
 0002: 0x20 0x00 0x00 0x00000000  A = sys_number
 0003: 0x35 0x06 0x00 0x40000000  if (A >= 0x40000000) goto 0010
 0004: 0x15 0x00 0x04 0x00000001  if (A != write) goto 0009
 0005: 0x20 0x00 0x00 0x00000024  A = args[2] >> 32
 0006: 0x15 0x00 0x02 0x00000000  if (A != 0x0) goto 0009
 0007: 0x20 0x00 0x00 0x00000020  A = args[2]
 0008: 0x15 0x01 0x00 0x0000003c  if (A == 0x3c) goto 0010
 0009: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0010: 0x06 0x00 0x00 0x00000000  return KILL

From the output of seccomp-tools, we can see that for sys_write call. The process will be killed only if the third argument is 0x3c.

Now we can observe what would happen if we set constraint on the third argument to 0x1b.

//code8.c
//gcc code8.c -o code8 -lseccomp
#include <stdio.h>
#include <string.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/prctl.h>     /* prctl */
#include <linux/seccomp.h> /* seccomp's constants */
#include <seccomp.h>
#include <stdlib.h>
#include <unistd.h>

#define longsize 8

char buffer[100];


int main()
{
	pid_t  pid;
	int rv;
	long orig_rax;
	char *argv[]={"/bin/cat", "flag", NULL};
	char *env[]={NULL};
	char cmd[20] = "/bin/cat";
	long length;
	long addr;
	int insyscall = 0;
	struct user_regs_struct regs;

	scmp_filter_ctx ctx;
	ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: allow

	seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 1, SCMP_A2(SCMP_CMP_EQ, 27));

	seccomp_load(ctx);
	prctl(PR_SET_NO_NEW_PRIVS, 1);
	pid = fork();
	if(pid == 0)
	{
		ptrace(PTRACE_TRACEME, 0, NULL, NULL);
		syscall(59,  cmd ,argv, env);
	}
	else
	{
		while(1)
		{
			wait(&rv);
			if(WIFEXITED(rv)){
				break;
			}
			orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL);
			
			if(orig_rax == 1)
			{
				if(insyscall == 0)
				{
					printf("Syscall number: %d\n", orig_rax);
					ptrace(PTRACE_GETREGS, pid, NULL, &regs);
					printf("Write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx);
					addr = regs.rsi;
					length = regs.rdx;
					insyscall = 1;
				}
				else
				{
					int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL);
					printf("\nWrite returned with %d\n", rax);
		                        insyscall = 0;
				}

			}
			ptrace(PTRACE_SYSCALL, pid, NULL, NULL);
		}
		printf("The child process exits\n");
	}
	return 0;
}

No output.

$ ./code8
Syscall number: 1
Write called with 0x1, 0x7fd637207000, 0x1b
^C
$

# And corresponding seccomp filter
$ seccomp-tools dump ./code8
 line  CODE  JT   JF      K
=================================
 0000: 0x20 0x00 0x00 0x00000004  A = arch
 0001: 0x15 0x00 0x08 0xc000003e  if (A != ARCH_X86_64) goto 0010
 0002: 0x20 0x00 0x00 0x00000000  A = sys_number
 0003: 0x35 0x06 0x00 0x40000000  if (A >= 0x40000000) goto 0010
 0004: 0x15 0x00 0x04 0x00000001  if (A != write) goto 0009
 0005: 0x20 0x00 0x00 0x00000024  A = args[2] >> 32
 0006: 0x15 0x00 0x02 0x00000000  if (A != 0x0) goto 0009
 0007: 0x20 0x00 0x00 0x00000020  A = args[2]
 0008: 0x15 0x01 0x00 0x0000001b  if (A == 0x1b) goto 0010
 0009: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0010: 0x06 0x00 0x00 0x00000000  return KILL

Seccomp Escape

After introducing all necessary background information above, we can now put them together to achieve seccomp escape. The idea is straightforward: modify the argument register with ptrace.

//escape.c
//gcc escape.c -o escape -lseccomp
#include <stdio.h>
#include <string.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/prctl.h>  
#include <linux/seccomp.h>
#include <seccomp.h>
#include <stdlib.h>
#include <unistd.h>

#define longsize 8

int main()
{
	pid_t  pid;
	int rv;
	long orig_rax;
	char *argv[]={"/bin/cat", "flag", NULL};
	char *env[]={NULL};
	char cmd[20] = "/bin/cat";
	long length;
	long addr;
	int insyscall = 0;
	struct user_regs_struct regs;

	scmp_filter_ctx ctx;
	ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: allow

	seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 1, SCMP_A2(SCMP_CMP_EQ, 27));

	seccomp_load(ctx);
	prctl(PR_SET_NO_NEW_PRIVS, 1);
	pid = fork();
	if(pid == 0)
	{
		ptrace(PTRACE_TRACEME, 0, NULL, NULL);
		syscall(59,  cmd ,argv, env);
	}
	else
	{
		while(1)
		{
			wait(&rv);
			if(WIFEXITED(rv)){
				break;
			}
			orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL);
			
			if(orig_rax == 1)
			{
				if(insyscall == 0)
				{
					printf("Syscall number: %d\n", orig_rax);
					ptrace(PTRACE_GETREGS, pid, NULL, &regs);
					printf("Write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx);
					addr = regs.rsi;
					length = regs.rdx;
					if(regs.rdx == 27)
					{
						regs.rdx = 26;
					}
					rv = ptrace(PTRACE_SETREGS, pid, NULL, &regs);
					insyscall = 1;
				}
				else
				{
					int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL);
					printf("\nWrite returned with %d\n", rax);
		                        insyscall = 0;
				}

			}
			ptrace(PTRACE_SYSCALL, pid, NULL, NULL);
		}
		printf("The child process exits\n");
	}
	return 0;
}

The seccomp-tool dump and final result:

//seccomp-tools dump
$ seccomp-tools dump ./escape
 line  CODE  JT   JF      K
=================================
 0000: 0x20 0x00 0x00 0x00000004  A = arch
 0001: 0x15 0x00 0x08 0xc000003e  if (A != ARCH_X86_64) goto 0010
 0002: 0x20 0x00 0x00 0x00000000  A = sys_number
 0003: 0x35 0x06 0x00 0x40000000  if (A >= 0x40000000) goto 0010
 0004: 0x15 0x00 0x04 0x00000001  if (A != write) goto 0009
 0005: 0x20 0x00 0x00 0x00000024  A = args[2] >> 32
 0006: 0x15 0x00 0x02 0x00000000  if (A != 0x0) goto 0009
 0007: 0x20 0x00 0x00 0x00000020  A = args[2]
 0008: 0x15 0x01 0x00 0x0000001b  if (A == 0x1b) goto 0010
 0009: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0010: 0x06 0x00 0x00 0x00000000  return KILL

//final result
$ ./escape
Syscall number: 1
Write called with 0x1, 0x7fa308699000, 0x1b
DANGOKYO{THIS_IS_A_ESCAPE}
Write returned with 26
Syscall number: 1
Write called with 0x1, 0x7fa30869901a, 0x1


Write returned with 1
The child process exits

Conclusion

In this post, I give a journey from syscall to seccomp escape. I give a lot of sample code here for newbies like me. For more details in the post, please read through the reference list below.

Reference

[1] https://gist.github.com/thejh/8346f47e359adecd1d53
[2] https://blog.yadutaf.fr/2014/05/29/introduction-to-seccomp-bpf-linux-syscall-filter/
[3] https://www.alfonsobeato.net/c/filter-and-modify-system-calls-with-seccomp-and-ptrace/
[4] https://www.linuxjournal.com/article/6100
[5] http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
[6] http://man7.org/linux/man-pages/man2/ptrace.2.html
[7] https://github.com/david942j/seccomp-tools/blob/master/README.md
[8] http://man7.org/linux/man-pages/man2/syscall.2.html
[9] http://man7.org/linux/man-pages/man3/seccomp_rule_add.3.html

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.