Skip to content

Instantly share code, notes, and snippets.

@mrdomino

mrdomino/a.md Secret

Last active December 19, 2023 20:23
Show Gist options
  • Save mrdomino/2222cab61715fd527e82e036ba4156b1 to your computer and use it in GitHub Desktop.
Save mrdomino/2222cab61715fd527e82e036ba4156b1 to your computer and use it in GitHub Desktop.
cosmopolitan argv / exec

Cosmopolitan pathfinding and argv handling

Context

https://github.com/jart/cosmopolitan/wiki/FAQ#my-program-isnt-behaving-the-way-i-expect

Goals

  1. Have cosmopolitan binaries be able to use argv[0] normally, e.g. for a shell to determine whether it is a login shell by checking *argv[0] == '-', or for a busybox-type binary to have the command to run passed as argv[0], without necessarily having to have a link to it with that name.
  2. Have binaries be generally able to locate the file they were launched as, in order to read /zip.
  3. Support reasonable set-id cases, namely when the binary is assimilated, or when the platform implements secure set-id shebangs via /dev/fd (which OpenBSD, NetBSD, and MacOS can do, according to this.)
  4. Try not to introduce regressions in terms of either program runtime or binary size, especially in the loader.

Non-goals

  • We don't care to sanitize the path that the loader is given. We needn't even prepend getcwd. If we are invoked set-id, it is the kernel's job to have done this. Otherwise, it is the user's prerogative to structure the paths as they please.

Obstacles

  • __sys_execve (and therefore execve, whenever __sys_execve succeeds on the first try) clobbers argv[0]. As far as I can tell, the only time argv[0] is preserved across exec is when __sys_execve fails and the ape fallback code is called.
  • The loader's broken realpath logic prevents FreeBSD and NetBSD from receiving the program executable name.
  • GetProgramExecutableName needs to match what the loader does.

Commentary on proposals

The four proposals that follow are mostly independent of each other and can each be taken or left on their own (although the first should probably be taken.) Proposal three is the most speculative.

My preference would be: do 1 (sanity) immediately. If we're going to do 2a (set-id), then do that alongside it since they go hand-in-hand. Probably just forget about 2b. Do 3 (cosmo_execve) some time later. Do not do 4, as assimilation is just better whenever anything like it is needed.

Proposals

1. Immediate damage repair / sanity:

  1. Roll back all realpath and getcwd code in the loaders, and pass the (possibly relative) resolved program path as x2/%rdx on all platforms.
  2. Rework GetProgramExecutableName so it does the same thing for each of the platform-generic options, and include __program_executable_name as the top priority among those options (i.e. prefer the platform-specific ones first.) Specifically, for each of __program_executable_name, argv[0], and _, in that order: (including COSMOPOLITAN_PROGRAM_EXECUTABLE last if there is a desire for compatibility with loaders that were never officially minted)
    1. Try to sys_faccessat it from AT_FDCWD.
    2. After that succeeds, prepend getcwd if it is relative and use it.

2. Set-id security

Nothing further is needed on the loader side for this.

As a proposed modification to GetProgramExecutableName, if issetugid and __program_executable_name looks secure (i.e. it is /dev/fd/[n] on a platform that implements that), then use that as the top choice. Since the platform-specific methods will all return the name of the loader, not the name of the binary, and all the other methods are vulnerable to TOCTOU between the loader and the binary (as well as between the kernel and the loader), if issetugid and __program_executable_name is set to something that does not look secure, use the empty string.

2b. Dubious further set-id defensiveness

As a questionable further proposed modification, some time very early, if issetugid and __program_executable_name is non-null and set to something that does not look secure, then halt, melt, and catch fire. Decided not to do this. Assuring the sanity of the particular set-id interpreter script setup being used is not our job.

3. cosmo_execve extension

Implement a cosmo_execve that is like sys_execve except it checks for an ape binary before the kernel call. (So pull the ape-specific code out into __ape_execve, and then sys_execve is __sys_execve(); __ape_execve(); whereas cosmo_execve is __ape_execve(); __sys_execve();.) In places where argv[0] correctness trumps performance concerns (e.g. bash and zsh in superconfigure), patch the program to use cosmo_execve.

3b. Alternatives to cosmo_execve

Another interesting option that would get the usage benefits witout the generalized performance hit would be to make it so that ape binaries always fail if they are passed through __sys_execve from a running ape binary; this is what happens on my Apple Silicon machine, and it actually leads to very nice behavior in that argv[0] is not clobbered by exec. As a discussion-starter, one way to do this would be to set e.g. "APE_YOUDIENOW=SYS_EXECVE" in the initial __sys_execve call, but that particular approach has too many problems to be reasonable. The failure would have to happen in __sys_execve itself, before control transitions to the child process, so it’s difficult to imagine how this could be done without breaking other things.

4. $prog.ape in the SYSV loader

Just to bring this up one more time — while assimilation ought to be the preferred option for using cosmo binaries as login shells, it would be nice to be able to keep a single binary with a heavily customized /zip on e.g. a nfs volume and use that as a shell everywhere, and the only way I can imagine to do this is the $prog.ape hack. But I admit I may just not be used to the possibility of assimilation. In any event — I lightly suggest using some of the space savings from reverting RealPath to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment