This is a FAQ list of all the questions asked in the worker
pull request.
The content here is mostly copied material written by Anna (@addaleax) and people on the PR and copied here by myself (@benjamingr). Big Props to @AyushG3112 for a lot of these questions.
Corrections, improvements and suggestions from anyone are welcome.
A: This PR adds threading support for to Node.js. This includes standardized shared memory and locks (mutexes).
A: At the moment a worker uses a single thread and each thread is a worker. We might enable an n:m
module in the future rather than the 1:1
model. The original PR creates the capability for a userland library to do this.
A: The primary goal is offloading CPU-heavy work onto a different thread, not doing typical Node.js things. Generally, all libuv-based APIs are available, but what we really should avoid is encouraging users to spawn a number of threads and letting each do synchronous I/O work – that’s kind of antithetical to the whole idea of Node.js, I think. 😄
Node.js is arguably popular to begin with because "you rock when you don't block", you don't have to deal with threads for i/o (threads are hard) but instead have async APIs that call back into your code.
Q: What is the association of worker's platform, V8 Isolate, V8 Environment, and libuv with that of main thread? are they 1:1 vs. 1:many?
A: In the PR:
- Platforms are process-global
- Each thread corresponds 1:1 to a tuple of Isolate, IsolateData, Environment, and uv_loop_t
In general, it is conceivable to have multiple Environments per Isolate, or multiple Environments per uv_loop_t, but we don't implement that at this point.
Q: Execution Environment: Are the parent's environment (environ **) cloned into and available to workers?
A: Assuming you’re talking about environment variables here: They are inherently per-process (at least on UNIX systems), so Workers will have a read-only copy in order not to interfere with the main thread’s usage of them.
A: It is possible to create race conditions between threads with *Sync APIs, but that is no different from creating them with asynchronous APIs at the moment.
A: The communication between threads largely builds on the MessageChannel Web API. Transferring ArrayBuffer
s and sharing memory through SharedArrayBuffer
s is supported.
A: This does differ from a multi-process situation! :) This PR has support for SharedArrayBuffer
s, so we can use Atomics.wait()
and Atomics.wake()
to implement real mutexes and other synchronization primitives in pure JS.
A: Yes, that’s what Atomics.wait actually does. Not everybody is a fan of that, but it should enable these kinds of use cases and can be used with this implementation. :)
A: They are freshly set up for Workers. These are JS objects, so they live on a per-Isolate heap, which means that they cannot be shared with the current design of V8, and modifying one global object won’t affect others.
A: Yes, although it might be good to get explicit libuv support for transferring handles between event loops first (as opposed to piggybacking on IPC mechanisms like we do for child processes). It’s not hard to do, but definitely out of scope for the initial PR.
A: Same thing that happens when two processes start listening to the same port – we get an exception.
A: Suggestions welcome, the working name has been worker
but not everyone likes it and the owner of that package is not interested in collaborating with Node.js on it or giving us the name. There is ongoing discussion.
A: You can use require('worker').isMainThread
.
A: At the moment workers require an absolute path, you can use __filename
and __dirname
in order to create relative paths. This is not a hard limitation but it's a choice for limiting scope in the PR.
Other options will be evaluated later.
A: Currently it only stops the worker thread and emits an error
event on the Worker object in the main thread. That does stop the main thread if it’s unhandled, though.
(See test/parallel/test-worker-uncaught-exception.js
for a test for this behaviour – if an exception in the worker does stop its parent, even with an event listener, that’s a bug, so please let me know!) :)
A: not yet, but it is planned for the future. Launching child processes and clusters from workers is possible.
Q: Messaging: Is it between worker and main? or between workers too? is it a broadcast channel, or peer-to-peer?
The MessageChannel
API on top of which this is building is following a 1:1 model, so no broadcasting is possible.
You start out with a channel between the parent and the child thread, but since you can create new channels and transfer them along existing channels, you can set up worker-to-worker message passing if that’s what you want.
A: No, that’s not by accident, it’s explicitly disabled, because that’s per-process information. We can modify this if you think it’s a good idea, though.
A: Not initially, but it's a todo and a high priorty. Help appreciated
Q: Resources: What native attributes of the worker thread is exposed at the moment, that can be tuned?
A: At the moment: Nothing. :) addaleax/node@9a72555 has some ideas for limiting the heap size, but it doesn’t seem to me like V8’s API allow good error handling at this point, so I didn’t include that in this PR. This is something I’d eventually want to have, though.
The usable stack size is currently a process-global option for V8, so I don’t think there’s any point in limiting it at this point.
A: Async hooks work like they do right now, with the difference that due to the additional built-in objects some async IDs will be different (e.g. execution ID of the main script). If we accept that users shouldn’t rely on the exact values anyway, that’s not an issue.
See the last commit in this PR re: tests – sadly, a number of async_hooks tests has to be skipped in Workers right now, because of this over-reliance on details, but in general async_hooks tests are run just like all others.
Q: What does process.memoryUsage() report? I expect it reports per worker or are the heaps accumulated?
A: rss is per-process, the other properties are per-Isolate/per-Worker. I’ve added a note on that in the docs. :)
Q: As AsyncHooks are independent in each worker. Is there anything available to correlate communication between workers? Or is it up to the user to transfer some metadata between workers and create AsyncResources and use them as needed?
A: Yes, each worker has an independent set of AsyncHooks. At this point, it would be necessary to communicate such information manually (e.g. through another MessageChannel).
We could implement a builtin utility for that, but it will definitely be less powerful because it has to work asynchronously, so it can only read basic information from the Worker side rather than being able to inspect objects etc.
A: No, and I personally don’t believe that it’s a good idea to artificially restrict users from doing something when there’s no technical reason to do so.
We should warn users about making I/O code synchronous and offloading it to Workers, though – that will most likely not help anyone.
A:
Workers are conceptually very similar to child_process
and cluster
.
Some of the key differences are:
- Communication between Workers is different: Unlike
child_process
IPC, we don’t use JSON, but rather do the same thing thatpostMessage()
does in browsers.- This isn’t necessarily faster, although it can be and there might be more room for optimization. (Keep in mind how long JSON has been around and how much work has therefore been put into making it fast.)
- The serialized data doesn’t actually need to leave the process, so overall there’s less overhead in communication involved.
- Memory in the form of typed arrays can be transferred or shared between Workers and/or the main thread, which enables really fast communication for specific use cases.
- Handles, like network sockets, can not be transferred or shared (yet).
- There are some limitations on the usable API within workers, since parts of it (e.g.
process.chdir()
) affect per-process state, loading native addons, etc. - Each workers have its own event loop, but some of the resources are shared between workers (e.g. the libuv thread pool for file system work)
Super nice! Thanks!