On Aug 8, 2014 the following group gathered in Vancouver, BC to discuss the state and future of node.js tracing. The goal of this ground was to summarize their current approaches to tracing, their respective goals in tracing, and what points were particularly painful in today in node.js versions 0.10/0.11.
Many users of node have both performance and business questions about their applications. There are several companies all wishing to provide insight into node applications, however node.js makes several important questions particularly difficult to answer without a great deal of dynamic monkey-patching, source-rewriting, or native (c++) modules.
The current methods can be error prone, for example monkey-patching is not necessarily stable across module updates. There is also a performance impact with low-level tracing. In addition, not all environments can host native modules easily.
- A user schedules a callback with
setTimeout
. If an error occurs in the callback, the user would like to know details about how the callback was scheduled. This is similar to the problem solved by long stack traces. - A user has an http server, and an analytics module would like to transparently measure the request/response latency of each connection.
- An incoming http connection kicks off a series of background tasks. The http response is complete, but we would like to know when all the spawned background tasks have also completed.
- We want to measure the latency of a web request that traverses a redis connection pool.
- An event emitter emits an error. We want to capture and record the error without altering the behavior of the application.
-
Track events related to I/O resources.
The async nature of node makes observing cause-and-effect relationships difficult over time. For example, a web service may kick off a number of external requests and background tasks due to an incoming request. There is a real need to associate those actions together, and to know when those actions have all completed.
-
Loosely coupled, but structureable data.
Many tracing modules attempt to build a structured representation of the applicationg, where async callbacks are associated with their initiating contexts. These boundaries however can be fluid, and are often not well-defined. Each tracing module needs the freedom to structure data, and draw boundaries as it sees fit.
-
Dynamic capture of arbitrary metadata.
During any set of events, we wish to gather arbitrary metadata about the current program. The type of data is highly dependent on the goals of the module doing the tracing, and may include request-specific information such as SQL queries and POST parameters, or aggregate information like CPU level or memory usage.
-
User-facing API.
We would like both JavaScript and C++ modules to be able to emit events into a unified API.
- Monkey-patching slows execution and performance
- Async back-traces must be exposed through monkey-patching
- Breaking continuations by multiplexing async activity (e.g. redis connection pipeline)
In terms of strawman proposals & prototypes, I made http://npm.im/transaction-tracer which is a proof-of-concept for a mechanism that could be used.
Here's an unpublished example of how this could be used to address the redis connection pipelining issue: redis/node-redis@f531bbc