Skip to content

Instantly share code, notes, and snippets.

@benjamingr
Last active November 13, 2023 08:40
Show Gist options
  • Save benjamingr/3d5e86e2fb8ae4abe2ab98ffe4758665 to your computer and use it in GitHub Desktop.
Save benjamingr/3d5e86e2fb8ae4abe2ab98ffe4758665 to your computer and use it in GitHub Desktop.

Workers PR FAQ

This is a FAQ list of all the questions asked in the worker pull request.

The content here is mostly copied material written by Anna (@addaleax) and people on the PR and copied here by myself (@benjamingr). Big Props to @AyushG3112 for a lot of these questions.

Corrections, improvements and suggestions from anyone are welcome.

Q: What is included in the PR?

A: This PR adds threading support for to Node.js. This includes standardized shared memory and locks (mutexes).

Q: Worker? Thread?

A: At the moment a worker uses a single thread and each thread is a worker. We might enable an n:m module in the future rather than the 1:1 model. The original PR creates the capability for a userland library to do this.

Q: Why do this? Why does the PR recommend it only for CPU intensive tasks?

A: The primary goal is offloading CPU-heavy work onto a different thread, not doing typical Node.js things. Generally, all libuv-based APIs are available, but what we really should avoid is encouraging users to spawn a number of threads and letting each do synchronous I/O work – that’s kind of antithetical to the whole idea of Node.js, I think. 😄

Node.js is arguably popular to begin with because "you rock when you don't block", you don't have to deal with threads for i/o (threads are hard) but instead have async APIs that call back into your code.

Q: What is the association of worker's platform, V8 Isolate, V8 Environment, and libuv with that of main thread? are they 1:1 vs. 1:many?

A: In the PR:

  • Platforms are process-global
  • Each thread corresponds 1:1 to a tuple of Isolate, IsolateData, Environment, and uv_loop_t

In general, it is conceivable to have multiple Environments per Isolate, or multiple Environments per uv_loop_t, but we don't implement that at this point.

Q: Execution Environment: Are the parent's environment (environ **) cloned into and available to workers?

A: Assuming you’re talking about environment variables here: They are inherently per-process (at least on UNIX systems), so Workers will have a read-only copy in order not to interfere with the main thread’s usage of them.

Q: Doesn't threading create new race conditions with *Sync APIs?

A: It is possible to create race conditions between threads with *Sync APIs, but that is no different from creating them with asynchronous APIs at the moment.

Q: What sort of objects can be shared between threads and how?

A: The communication between threads largely builds on the MessageChannel Web API. Transferring ArrayBuffers and sharing memory through SharedArrayBuffers is supported.

Q: Do we have a syncing primitive like a mutex/semaphore available?

A: This does differ from a multi-process situation! :) This PR has support for SharedArrayBuffers, so we can use Atomics.wait() and Atomics.wake() to implement real mutexes and other synchronization primitives in pure JS.

Q: Can I synchronously block with this PR? Does it provide "real" mutexes?

A: Yes, that’s what Atomics.wait actually does. Not everybody is a fan of that, but it should enable these kinds of use cases and can be used with this implementation. :)

Q: Are globals shared across threads?

A: They are freshly set up for Workers. These are JS objects, so they live on a per-Isolate heap, which means that they cannot be shared with the current design of V8, and modifying one global object won’t affect others.

Q: Would it be possible to share a server socket between threads, similar to what cluster does?

A: Yes, although it might be good to get explicit libuv support for transferring handles between event loops first (as opposed to piggybacking on IPC mechanisms like we do for child processes). It’s not hard to do, but definitely out of scope for the initial PR.

Q: What happens when two threads try listening to the same port?

A: Same thing that happens when two processes start listening to the same port – we get an exception.

Q: What is the module going to be called?

A: Suggestions welcome, the working name has been worker but not everyone likes it and the owner of that package is not interested in collaborating with Node.js on it or giving us the name. There is ongoing discussion.

Q: Where is the code and discussions?

A: here is the code changes

Q: Can I check if a thread is the main thread?

A: You can use require('worker').isMainThread.

Q: Why can't I pass relative paths or functions to workers?

A: At the moment workers require an absolute path, you can use __filename and __dirname in order to create relative paths. This is not a hard limitation but it's a choice for limiting scope in the PR.

Other options will be evaluated later.

Q: How do uncaught exceptions work in workers?

A: Currently it only stops the worker thread and emits an error event on the Worker object in the main thread. That does stop the main thread if it’s unhandled, though.

(See test/parallel/test-worker-uncaught-exception.js for a test for this behaviour – if an exception in the worker does stop its parent, even with an event listener, that’s a bug, so please let me know!) :)

Q: Do native addons work? Cluster? child_processes?

A: not yet, but it is planned for the future. Launching child processes and clusters from workers is possible.

Q: Messaging: Is it between worker and main? or between workers too? is it a broadcast channel, or peer-to-peer?

The MessageChannel API on top of which this is building is following a 1:1 model, so no broadcasting is possible. You start out with a channel between the parent and the child thread, but since you can create new channels and transfer them along existing channels, you can set up worker-to-worker message passing if that’s what you want.

Q: Can workers listen to signals?

A: No, that’s not by accident, it’s explicitly disabled, because that’s per-process information. We can modify this if you think it’s a good idea, though.

Q: Does the node inspector work? Can I use the Chrome devtools to debug workers?

A: Not initially, but it's a todo and a high priorty. Help appreciated

Q: Resources: What native attributes of the worker thread is exposed at the moment, that can be tuned?

A: At the moment: Nothing. :) addaleax/node@9a72555 has some ideas for limiting the heap size, but it doesn’t seem to me like V8’s API allow good error handling at this point, so I didn’t include that in this PR. This is something I’d eventually want to have, though.

The usable stack size is currently a process-global option for V8, so I don’t think there’s any point in limiting it at this point.

Q: How does this work with async_hooks?

A: Async hooks work like they do right now, with the difference that due to the additional built-in objects some async IDs will be different (e.g. execution ID of the main script). If we accept that users shouldn’t rely on the exact values anyway, that’s not an issue.

See the last commit in this PR re: tests – sadly, a number of async_hooks tests has to be skipped in Workers right now, because of this over-reliance on details, but in general async_hooks tests are run just like all others.

Q: What does process.memoryUsage() report? I expect it reports per worker or are the heaps accumulated?

A: rss is per-process, the other properties are per-Isolate/per-Worker. I’ve added a note on that in the docs. :)

Q: As AsyncHooks are independent in each worker. Is there anything available to correlate communication between workers? Or is it up to the user to transfer some metadata between workers and create AsyncResources and use them as needed?

A: Yes, each worker has an independent set of AsyncHooks. At this point, it would be necessary to communicate such information manually (e.g. through another MessageChannel).

We could implement a builtin utility for that, but it will definitely be less powerful because it has to work asynchronously, so it can only read basic information from the Worker side rather than being able to inspect objects etc.

Q: Are there any plans to e.g. inhibit that someone creates e.g. http servers in several workers?

A: No, and I personally don’t believe that it’s a good idea to artificially restrict users from doing something when there’s no technical reason to do so.

We should warn users about making I/O code synchronous and offloading it to Workers, though – that will most likely not help anyone.

Q: What is the difference between this and child_process/cluster

A:

Workers are conceptually very similar to child_process and cluster. Some of the key differences are:

  • Communication between Workers is different: Unlike child_process IPC, we don’t use JSON, but rather do the same thing that postMessage() does in browsers.
    • This isn’t necessarily faster, although it can be and there might be more room for optimization. (Keep in mind how long JSON has been around and how much work has therefore been put into making it fast.)
    • The serialized data doesn’t actually need to leave the process, so overall there’s less overhead in communication involved.
    • Memory in the form of typed arrays can be transferred or shared between Workers and/or the main thread, which enables really fast communication for specific use cases.
    • Handles, like network sockets, can not be transferred or shared (yet).
  • There are some limitations on the usable API within workers, since parts of it (e.g. process.chdir()) affect per-process state, loading native addons, etc.
  • Each workers have its own event loop, but some of the resources are shared between workers (e.g. the libuv thread pool for file system work)
@Xotabu4
Copy link

Xotabu4 commented Feb 16, 2019

Super nice! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment