Introduction
In this post, I will talk about the seccomp and ptrace step by step. The manual reference of ptrace [6] is heavily recommended for this topic. The whole post will be divided into 6 sections: (1) Introduction of syscall. (2) Introduction of seccomp. (3) Introduction of ptrace. (4) Advanced ptrace. (5) Seccomp Filter and seccomp-tool. and (6) Seccomp escape with ptrace.
I will give many sample codes in the post since I am a newbie in seccomp escape.
Introduction of syscall
Syscall is a familiar concept for shellcode writers. The page in [8] gives a detailed description of syscall in different platforms. In this post, we will only focus on the code on x64 platform.
Take a look at the code below.
//code1.c //gcc code1.c -o code1 #include<stdio.h> #include<unistd.h> #include<sys/syscall.h> int main() { char *argv[]={"/bin/cat", "flag", NULL}; char *env[]={NULL}; char cmd[20] = "/bin/cat"; syscall(59, cmd, argv, env); return 0; }
We can soon find out the result of this code.
$ ./code1 DANGOKYO{THIS_IS_A_ESCAPE}
59 is the syscall number of sys_execve. Here we write the string in flag to the screen.
Introduction of seccomp
Seccomp is a security feature introduced in Linux System for securing the whole system. To put it simple, seccomp provides a customizable functionality to conditionally forbid system call.
The seccomp with full security
//code2.c //gcc code2.c -o code2 -lseccomp #include<stdio.h> #include<unistd.h> #include<sys/syscall.h> #include<sys/prctl.h> #include<linux/seccomp.h> int main() { char *argv[]={"/bin/cat", "flag", NULL}; char *env[]={NULL}; char cmd[20] = "/bin/cat"; prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT); syscall(59, cmd, argv, env); return 0; }
The final result
$ ./code2 Killed
Now, let’s observe what is the customizable security.
#include<stdio.h> #include<unistd.h> #include<sys/syscall.h> #include<sys/prctl.h> #include<linux/seccomp.h> #include<seccomp.h> int main() { char *argv[]={"/bin/cat", "flag", NULL}; char *env[]={NULL}; char cmd[20] = "/bin/cat"; scmp_filter_ctx ctx; ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: Allow seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 0); seccomp_load(ctx); syscall(59, cmd, argv, env); return 0; }
In the sample code, I am using seccomp_rule_add to customize a security policy. I blacklist the sys_write in my code. And the final result is given below.
$ ./code3 Bad system call
More importantly, the seccomp policy will also be inherited by the child process.
//code4.c //gcc code4.c -o code4 -lseccomp #include<stdio.h> #include<unistd.h> #include<sys/syscall.h> #include<sys/prctl.h> #include<sys/types.h> #include<sys/wait.h> #include<linux/seccomp.h> #include<seccomp.h> int main() { char *argv[]={"/bin/cat", "flag", NULL}; char *env[]={NULL}; char cmd[20] = "/bin/cat"; pid_t pid; int rv; scmp_filter_ctx ctx; ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: Allow seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 0); seccomp_load(ctx); pid = fork(); if(pid==0){ syscall(59, cmd, argv, env); } else { waitpid(pid, &rv, 0); } return 0; }
The code above will not output anything.
Introduction of ptrace
From my perspective, ptrace provides a customizable debugger for programmer to inspect and modify the register status at run-time.
Let me use the following code to demonstrate the usage of ptrace
//code5.c //gcc code5.c -o code5 #include <stdio.h> #include <string.h> #include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/user.h> #include <sys/reg.h> #include <unistd.h> #include <sys/syscall.h> #include <sys/prctl.h> #include <linux/seccomp.h> #include <stdlib.h> #include <unistd.h> int main() { pid_t pid; int rv; long orig_rax; char *argv[]={"/bin/cat", "flag", NULL}; char *env[]={NULL}; char cmd[20] = "/bin/cat"; long value; int insyscall = 0; struct user_regs_struct regs; pid = fork(); if(pid == 0) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); syscall(59, cmd ,argv, env); exit(0); } else { while(1) { wait(&rv); if(WIFEXITED(rv)){ break; } orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL); if(orig_rax == 1) { if(insyscall == 0) { printf("Syscall number: %d\n", orig_rax); ptrace(PTRACE_GETREGS, pid, NULL, ®s); printf("write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx); insyscall = 1; } else { int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL); printf("Write returned with %d\n", rax); insyscall = 0; } } ptrace(PTRACE_SYSCALL, pid, NULL, NULL); } printf("The child process exits\n"); } return 0; }
In the sample code, we use three different types of ptrace function call.
ptrace(PTRACE_TRACEME, 0, NULL, NULL): Trace only the child process.
ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL): Retrieve the register status.
ptrace(PTRACE_SYSCALL, pid, NULL, NULL): Stop the tracee at the entry or exit point of system call.
I stop the tracee at the time of invoking sys_write and retrieve value in the argument register.
$ ./code5 Syscall number: 1 write called with 0x1, 0x7f4338b8c000, 0x1b DANGOKYO{THIS_IS_A_ESCAPE} Write returned with 27 The child process exits
Advanced ptrace
This part is based on [4]. But I remove some unnecessary codes to keep the code clean.
In this section, I will reverse the output of sys_write.
Based on the sample code in last section, I use the following code to reverse the code
//code6.c //gcc code6.c -o code6 #include <stdio.h> #include <string.h> #include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/user.h> #include <sys/reg.h> #include <unistd.h> #include <sys/syscall.h> #include <sys/prctl.h> /* prctl */ #include <linux/seccomp.h> /* seccomp's constants */ #include <stdlib.h> #include <unistd.h> #define longsize 8 char buffer[100]; int reverse(char *str) { int i, j; char temp; for(i = 0, j = strlen(str) - 2; i <= j; ++i, --j) { temp = str[i]; str[i] = str[j]; str[j] = temp; } } int getdata(pid_t pid, char *buf, long addr, long length) { int i; union { long val; char chars[longsize]; } value; for(i=0; i < length/longsize + 1; i++) { value.val = ptrace(PTRACE_PEEKDATA, pid, (void*)(addr + i*8), NULL); memcpy(buf + i*8, value.chars, 8); } buf[length+1] = '\0'; return 0; } int putdata(pid_t pid, char *buf, long addr, long length) { int i; union { long val; char chars[longsize]; }value; for(i=0; i<length/longsize + 1; i++) { memcpy(value.chars, buffer + i*8, 8); ptrace(PTRACE_POKEDATA, pid, (void*)(addr + i*8), value.val); } return 0; } int main() { pid_t pid; int rv; long orig_rax; char *argv[]={"/bin/cat", "flag", NULL}; char *env[]={NULL}; char cmd[20] = "/bin/cat"; long length; long addr; int insyscall = 0; struct user_regs_struct regs; pid = fork(); if(pid == 0) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); syscall(59, cmd ,argv, env); exit(0); } else { while(1) { wait(&rv); if(WIFEXITED(rv)){ break; } orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL); if(orig_rax == 1) { if(insyscall == 0) { printf("Syscall number: %d\n", orig_rax); ptrace(PTRACE_GETREGS, pid, NULL, ®s); printf("write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx); addr = regs.rsi; length = regs.rdx; getdata(pid, buffer, addr, length); reverse(buffer); putdata(pid, buffer, addr, length); insyscall = 1; } else { int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL); printf("Write returned with %d\n", rax); insyscall = 0; } } ptrace(PTRACE_SYSCALL, pid, NULL, NULL); } printf("The child process exits\n"); } return 0; }
To output the reversed string, we use another two ptrace functions:
ptrace(PTRACE_PEEKDATA, pid, addr, NULL): Read value at addr from the memory space of tracee.
ptrace(PTRACE_POKEDATA, pid, addr, value): Write value to addr into the memory space of tracee.
The final result is given below:
$ ./code6 Syscall number: 1 write called with 0x1, 0x7f20cd3b4000, 0x1b }EPACSE_A_SI_SIHT{OYKOGNAD Write returned with 27 The child process exits
Seccomp Filter and seccomp-tools
In Section 3 of this post, I give a simple example of seccomp_rule_add. However, seccompt_rule_add is much more powerful than expected. It can deploy conditional filter for one specific syscall. In particular, the programmer can choose under what condition the syscall should be killed.
At first, we need to install the seccomp-tools [7], which is a helper program to identify the seccomp filter set in the program.
//code7.c //gcc code7.c -o code7 -lseccomp #include <stdio.h> #include <string.h> #include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/user.h> #include <sys/reg.h> #include <unistd.h> #include <sys/syscall.h> #include <sys/prctl.h> /* prctl */ #include <linux/seccomp.h> /* seccomp's constants */ #include <seccomp.h> #include <stdlib.h> #include <unistd.h> #define longsize 8 char buffer[100]; int main() { pid_t pid; int rv; long orig_rax; char *argv[]={"/bin/cat", "flag", NULL}; char *env[]={NULL}; char cmd[20] = "/bin/cat"; long length; long addr; int insyscall = 0; struct user_regs_struct regs; scmp_filter_ctx ctx; ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: allow seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 1, SCMP_A2(SCMP_CMP_EQ, 60)); seccomp_load(ctx); prctl(PR_SET_NO_NEW_PRIVS, 1); pid = fork(); if(pid == 0) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); syscall(59, cmd ,argv, env); } else { while(1) { wait(&rv); if(WIFEXITED(rv)){ break; } orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL); if(orig_rax == 1) { if(insyscall == 0) { printf("Syscall number: %d\n", orig_rax); ptrace(PTRACE_GETREGS, pid, NULL, ®s); printf("Write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx); addr = regs.rsi; length = regs.rdx; insyscall = 1; } else { int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL); printf("\nWrite returned with %d\n", rax); insyscall = 0; } } ptrace(PTRACE_SYSCALL, pid, NULL, NULL); } printf("The child process exits\n"); } return 0; }
Let’s use the seccomp-tools to observe the filter
$ seccomp-tools dump ./code7 line CODE JT JF K ================================= 0000: 0x20 0x00 0x00 0x00000004 A = arch 0001: 0x15 0x00 0x08 0xc000003e if (A != ARCH_X86_64) goto 0010 0002: 0x20 0x00 0x00 0x00000000 A = sys_number 0003: 0x35 0x06 0x00 0x40000000 if (A >= 0x40000000) goto 0010 0004: 0x15 0x00 0x04 0x00000001 if (A != write) goto 0009 0005: 0x20 0x00 0x00 0x00000024 A = args[2] >> 32 0006: 0x15 0x00 0x02 0x00000000 if (A != 0x0) goto 0009 0007: 0x20 0x00 0x00 0x00000020 A = args[2] 0008: 0x15 0x01 0x00 0x0000003c if (A == 0x3c) goto 0010 0009: 0x06 0x00 0x00 0x7fff0000 return ALLOW 0010: 0x06 0x00 0x00 0x00000000 return KILL
From the output of seccomp-tools, we can see that for sys_write call. The process will be killed only if the third argument is 0x3c.
Now we can observe what would happen if we set constraint on the third argument to 0x1b.
//code8.c //gcc code8.c -o code8 -lseccomp #include <stdio.h> #include <string.h> #include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/user.h> #include <sys/reg.h> #include <unistd.h> #include <sys/syscall.h> #include <sys/prctl.h> /* prctl */ #include <linux/seccomp.h> /* seccomp's constants */ #include <seccomp.h> #include <stdlib.h> #include <unistd.h> #define longsize 8 char buffer[100]; int main() { pid_t pid; int rv; long orig_rax; char *argv[]={"/bin/cat", "flag", NULL}; char *env[]={NULL}; char cmd[20] = "/bin/cat"; long length; long addr; int insyscall = 0; struct user_regs_struct regs; scmp_filter_ctx ctx; ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: allow seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 1, SCMP_A2(SCMP_CMP_EQ, 27)); seccomp_load(ctx); prctl(PR_SET_NO_NEW_PRIVS, 1); pid = fork(); if(pid == 0) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); syscall(59, cmd ,argv, env); } else { while(1) { wait(&rv); if(WIFEXITED(rv)){ break; } orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL); if(orig_rax == 1) { if(insyscall == 0) { printf("Syscall number: %d\n", orig_rax); ptrace(PTRACE_GETREGS, pid, NULL, ®s); printf("Write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx); addr = regs.rsi; length = regs.rdx; insyscall = 1; } else { int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL); printf("\nWrite returned with %d\n", rax); insyscall = 0; } } ptrace(PTRACE_SYSCALL, pid, NULL, NULL); } printf("The child process exits\n"); } return 0; }
No output.
$ ./code8 Syscall number: 1 Write called with 0x1, 0x7fd637207000, 0x1b ^C $ # And corresponding seccomp filter $ seccomp-tools dump ./code8 line CODE JT JF K ================================= 0000: 0x20 0x00 0x00 0x00000004 A = arch 0001: 0x15 0x00 0x08 0xc000003e if (A != ARCH_X86_64) goto 0010 0002: 0x20 0x00 0x00 0x00000000 A = sys_number 0003: 0x35 0x06 0x00 0x40000000 if (A >= 0x40000000) goto 0010 0004: 0x15 0x00 0x04 0x00000001 if (A != write) goto 0009 0005: 0x20 0x00 0x00 0x00000024 A = args[2] >> 32 0006: 0x15 0x00 0x02 0x00000000 if (A != 0x0) goto 0009 0007: 0x20 0x00 0x00 0x00000020 A = args[2] 0008: 0x15 0x01 0x00 0x0000001b if (A == 0x1b) goto 0010 0009: 0x06 0x00 0x00 0x7fff0000 return ALLOW 0010: 0x06 0x00 0x00 0x00000000 return KILL
Seccomp Escape
After introducing all necessary background information above, we can now put them together to achieve seccomp escape. The idea is straightforward: modify the argument register with ptrace.
//escape.c //gcc escape.c -o escape -lseccomp #include <stdio.h> #include <string.h> #include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/user.h> #include <sys/reg.h> #include <unistd.h> #include <sys/syscall.h> #include <sys/prctl.h> #include <linux/seccomp.h> #include <seccomp.h> #include <stdlib.h> #include <unistd.h> #define longsize 8 int main() { pid_t pid; int rv; long orig_rax; char *argv[]={"/bin/cat", "flag", NULL}; char *env[]={NULL}; char cmd[20] = "/bin/cat"; long length; long addr; int insyscall = 0; struct user_regs_struct regs; scmp_filter_ctx ctx; ctx = seccomp_init(SCMP_ACT_ALLOW); // default action: allow seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write), 1, SCMP_A2(SCMP_CMP_EQ, 27)); seccomp_load(ctx); prctl(PR_SET_NO_NEW_PRIVS, 1); pid = fork(); if(pid == 0) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); syscall(59, cmd ,argv, env); } else { while(1) { wait(&rv); if(WIFEXITED(rv)){ break; } orig_rax = ptrace(PTRACE_PEEKUSER, pid, 8 * ORIG_RAX, NULL); if(orig_rax == 1) { if(insyscall == 0) { printf("Syscall number: %d\n", orig_rax); ptrace(PTRACE_GETREGS, pid, NULL, ®s); printf("Write called with 0x%lx, 0x%lx, 0x%lx\n", regs.rdi, regs.rsi, regs.rdx); addr = regs.rsi; length = regs.rdx; if(regs.rdx == 27) { regs.rdx = 26; } rv = ptrace(PTRACE_SETREGS, pid, NULL, ®s); insyscall = 1; } else { int rax = ptrace(PTRACE_PEEKUSER, pid, 8 * RAX, NULL); printf("\nWrite returned with %d\n", rax); insyscall = 0; } } ptrace(PTRACE_SYSCALL, pid, NULL, NULL); } printf("The child process exits\n"); } return 0; }
The seccomp-tool dump and final result:
//seccomp-tools dump $ seccomp-tools dump ./escape line CODE JT JF K ================================= 0000: 0x20 0x00 0x00 0x00000004 A = arch 0001: 0x15 0x00 0x08 0xc000003e if (A != ARCH_X86_64) goto 0010 0002: 0x20 0x00 0x00 0x00000000 A = sys_number 0003: 0x35 0x06 0x00 0x40000000 if (A >= 0x40000000) goto 0010 0004: 0x15 0x00 0x04 0x00000001 if (A != write) goto 0009 0005: 0x20 0x00 0x00 0x00000024 A = args[2] >> 32 0006: 0x15 0x00 0x02 0x00000000 if (A != 0x0) goto 0009 0007: 0x20 0x00 0x00 0x00000020 A = args[2] 0008: 0x15 0x01 0x00 0x0000001b if (A == 0x1b) goto 0010 0009: 0x06 0x00 0x00 0x7fff0000 return ALLOW 0010: 0x06 0x00 0x00 0x00000000 return KILL //final result $ ./escape Syscall number: 1 Write called with 0x1, 0x7fa308699000, 0x1b DANGOKYO{THIS_IS_A_ESCAPE} Write returned with 26 Syscall number: 1 Write called with 0x1, 0x7fa30869901a, 0x1 Write returned with 1 The child process exits
Conclusion
In this post, I give a journey from syscall to seccomp escape. I give a lot of sample code here for newbies like me. For more details in the post, please read through the reference list below.
Reference
[1] https://gist.github.com/thejh/8346f47e359adecd1d53
[2] https://blog.yadutaf.fr/2014/05/29/introduction-to-seccomp-bpf-linux-syscall-filter/
[3] https://www.alfonsobeato.net/c/filter-and-modify-system-calls-with-seccomp-and-ptrace/
[4] https://www.linuxjournal.com/article/6100
[5] http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
[6] http://man7.org/linux/man-pages/man2/ptrace.2.html
[7] https://github.com/david942j/seccomp-tools/blob/master/README.md
[8] http://man7.org/linux/man-pages/man2/syscall.2.html
[9] http://man7.org/linux/man-pages/man3/seccomp_rule_add.3.html