Skip to content

Instantly share code, notes, and snippets.

@bterlson
Last active March 26, 2020 14:44
Show Gist options
  • Save bterlson/da8f02b95b484cd4f8d9 to your computer and use it in GitHub Desktop.
Save bterlson/da8f02b95b484cd4f8d9 to your computer and use it in GitHub Desktop.

Pattern Matching

This is a strawman proposal for adding pattern matching to ECMAScript. Pattern matching is useful for matching a value to some structure in a similar way to destructuring. The primary difference between destructuring and pattern matching are the use cases involved - destructuring is useful for binding pieces out of larger structures whereas pattern matching is useful for mapping a value's structure to data or a set of behaviors. In practice this means that destructuring tends to allow many shapes of data and will do its best to bind something out of it, whereas pattern matching will tend to be more conservative.

Additionally, the power of pattern matching is increased substantially when values are allowed to participate in the pattern matching semantics as a matcher as well as a matchee. This proposal includes the notion of a pattern matching protocol - a symbol method that can be implemented by objects that enables developers to use those values in pattern matching. A common scenario where this is important is matching a string against a regexp pattern.

At this point, any syntax in this proposal should be considered highly provisional - there is plenty of room for bikeshedding here. Pattern matching can be useful in a number of situations including catch guards and case statements. This proposal includes affordances for both, in addition to an operator form that can be broadly useful.

Basic Syntax & Semantics

A MatchClause is composed of three parts - a MatchExpression (which can be any expression), a MatchAssignmentPattern (which is some sigil followed by a AssignmentPattern, and a MatchIfClause which is an if token followed by any expression. Either a MatchExpression or a MatchAssignmentPattern must be provided. The following are examples of valid MatchClauses:

  • somePerson -> { id, props: { name } } if name === ""
  • -> [ first, second ]
  • x if x > 1

Note the usage of the token -> to separate the MatchExpression from the MatchAssignmentPattern. You could imagine a number of alternative syntaxes here including wraping the MatchExpression in parens. Feel free to comment with your ideas.

The high level semantics of the MatchClause as applied to some test value are as follows:

  • If a MatchExpression is provided, it is evaluated. The Symbol.matches method is invoked on the result passing the test value as a parameter. The resulting value is passed on to the rest of the MatchClause. If the MatchExpression is omitted, it is as if some value were provided whose Symbol.matches method simply returns its first argument unchanged (i.e. the identify matcher function).
  • If the MatchAssignmentPattern is provided, it is used to both assert a certain structure about the test value and to bind pieces out of it. For example, a MatchAssignmentPattern like [x, y] will ensure the test value (or the value returned by the MatchExpression if provided) is an array of length 2 and bind x and y out of it.
  • The MatchIfClause is executed in the context of any bindings created by the MatchAssignmentPattern and can be any expression that must evaluate to true for the pattern to match.

Switch statements may now contain either CaseClauses or SwitchMatchClauses exclusively (no mixing of the two). The SwitchMatchClause begins with match (acceptable as the region immediately inside the switch block is syntactically free) and is followed by a MatchClause. The other key difference from a CaseClause is that the SwitchMatchClause is braced (as it introduces lexical (let) bindings in a similar fashion to function parameters) and does not fall through by default (although you may use continue to do so).

The following are some motivating examples of the SwitchMatchClause in action:

switch (value) {
  match -> { x, y } {
    // matches objects with only x and y properties. x and y are bound in this block.
  }
  
  match -> [x, y] {
    // matches arrays or array-like objects with a .length property === 2.
  }
  
  match -> { x, y, ... } {
    // matches objects with at least an x and y enumerable property
  }
  
  match -> [ x, y, ... rest ] {
    // matches objects wtih a .length >= 2.
  }
  
  match SomeClass {
    // SomeClass[Symbol.matches](value) is invoked, and if truthy, this block will be executed.
    // It may be desirable to install Function[Symbol.matches](value) that returns `value instanceof this;`
    // in order to make this pattern available across all classes.
  }
    
  match /(\d+)-(\w+)/ -> [, id, name] if name === "Brian" {
    // matches if /abc/[Symbol.matches](value) returns an array that matches the pattern and the name
    // part of the regexp is equal to "Brian". id and name are bound in this block.
  }
  
  match /(\d+)-(\w+)/ -> [, id, name] {
    // matches if /abc/[Symbol.matches](value) returns an array that matches the pattern. id and name are bound
    // in this block.
  }
  
  match /(\d+)-(\d+)/ {
    // matches if /abc/[Symbol.matches](value) is truthy (which will do an obvious thing of returning the RegExp exec matcher).
  }
}

Additionally, the MatchClause can be used as a catch guard with nearly identical semantics:

try {
  ...
} catch(e match if isInteresting(e)) {
  ...
} catch(e match SyntaxError) {
  ...
}

Additionally, an operator would be very useful and would enable usage of these pattern matching semantics in a wide array of use-cases. Unfortunately, the ideal operator =~ is syntactically ambiguous due to the ~ operator alrady being a thing, and anyway the current syntax is somewhat unpalatable when used with a MatchAssignmentPattern, so suggestions are welcome. In any case, proceeding with the strawman =~ for illustrative purposes:

if (str =~ /foo/) { // /foo/[Symbol.matches(str); }

if (instance =~ SomeClass) { SomeClass[Symbol.matches](instance) }

if (obj =~ -> { x, y } ) { pointCache[x][y] = obj }

The =~ operator is also useful to enable pattern matching on deeply nested structures. For example:

switch (node) {
    match Leaf -> { parent } if parent =~ Container {
      ...
    }
} 

Other Considerations

instanceof could be defined in terms of the new matching protocol. Specifically, Function[Symbol.matches] could implement the current instanceof semantics and instanceof could simply delegate to this protocol. This would provide an easy way of overridding instanceof which has been discussed in the past.

@bterlson
Copy link
Author

It's useful to be able to bind the "rest" of the items in a pattern

Agreed, intended to be possible. Updated the examples.

It's useful to be able to bind sub-patterns

Agreed, but I think it can be a separate proposal.

Having an expression form of pattern matching is super useful

That is interesting. It can work if match is a contextual keyword inside switch-match blocks. I think this should be a separate proposal though, since we'd likely be talking about an expression form of switch-case too. I'd consider making match a contextual keyword in this proposal in order to be forward (side?) compatible, although it makes me sad because match is an obvious variable name when doing pattern matching :-P

I'm a bit weary of having two predicate positions in a MatchClause ( MatchExpression + MatchIfClause ).

This is a valid concern. I think the MatchExpression is usually going to be very terse so from a readability standpoint it seems ok to me. It's also a good question whether usage of MatchExpression or MatchAssignmentPattern should carry the extra syntactic burden (eg. `-> or whatever). Someone suggested MatchExpression could be wrapped in parens, for example.

Anyway, I agree with you that much of this comes down to how dominant the two use cases are. I think regexp is quite common and there are many library uses as well (eg. instanceof, set/range membership). OOPy code is extremely common so this use case should not be undersold. A look at usage in other languages could be helpful in judging this.

@jeffmo
Copy link

jeffmo commented Nov 21, 2015

OOPy code is extremely common so this use case should not be undersold

This is true, but I just wanted to clarify that instaceof-reliant code is likely a big subset of all OOPy code, so OOPy code being popular isn't alone an indication that chained instanceof checks would be pervasive.

That is interesting. It can work if match is a contextual keyword inside switch-match blocks

I'm pretty sure I'm missing something, but can you spell out why match needs to be a keyword to make this work?

@bterlson
Copy link
Author

I'm pretty sure I'm missing something, but can you spell out why match needs to be a keyword to make this work?

@jeffmo Maybe not, but it seems like you run into difficulty with something like:

foo(switch (x) {
  match SyntaxError: 1;
    match: { };
})

Where we don't know if this is a labeled block or a default match clause. Similar problems when MatchExpression is wrapped in parens - it could be a call to match. Arrays might look like property access. You can work around all this but it seems difficult or impossible on the surface. Haven't thought too deeply yet though.

@jeffmo
Copy link

jeffmo commented Nov 21, 2015

Because switch is a keyword, doesn't that kick off the leader for the grammar during parsing?

EDIT: Derp, I misinterpreted the problem. Updated resopnse:

I'd think, for an expression form of switch, the thing after the : would be a LHSExpr -- and thus it couldn't be a labeled block (which is a statement).

So for your example, I think that is unambiguously a "match instanceof SyntaxError maps to 1. match fallthrough maps to empty object".

@wycats
Copy link

wycats commented Nov 21, 2015

@jeffmo in my mind, one of the most important use-cases for match clauses is cleaning up the if statement soup that is currently used to implement overloads.

From that perspective, a small handful of common MatchExpressions go a long way:

  1. regular expressions (Symbol.matches succeeds if the regex matches, and produces the capture)
  2. numeric ranges (succeeds if the test value is a number and in the range)
  3. Sets (succeeds if the test value is a member of the Set)
  4. Classes (succeeds if the test value is an instance of the class)
  5. Value types, including non-literal strings (succeeds if the test value is identical to the value type)
  6. Predicates, like isArray (makes it possible to express Array.isArray concisely)

In most overload scenarios, there is a mix of nominal checking (either via instanceof or via a check like Array.isArray), structural matching that can be easily expressed using patterns, and more dynamic predicates ((typeof obj.length === 'number') && (obj.length - 1 in obj)).

One of the main reasons these dynamic predicates come up so much is that many duck-typing protocols involve type assertions, and because JavaScript does not have static types, JavaScript's pattern language is usually not expressive enough to describe those kinds of type assertions.

MatchExpression gives us the ability to create predicate abstractions (simple ones like isArray or more involved ones like isArrayLike that can be easily reused when matching.

I don't consider this particularly related to "OOPy" patterns, but rather to a desire to be able to abstract away predicates that are not strictly about nominal types (in that sense, you might consider them more related to schools of thought focused on data structures rather than inheritance hierarchies).

@jeffmo
Copy link

jeffmo commented Nov 21, 2015

I don't think either kind of predicate here is about typing anymore than the other. All I'm concerned with is that we have two forms eating syntax into each other (and arguably both are useful enough to be common). If they didn't intrude on each other at all (i.e. the need for the -> token) I'd feel much less concern.

If we could come up with an operator that works in place of the hypothetical =~, this would give us a level of conciseness and abstraction that still fits into only the more general predicate form:

switch(obj) {
  match if obj =~ Cat { ... }
  match if obj =~ Dog { ... }
  match if obj =~ Animal { ... }
}

It's not quite as concise, but it still gives the benefits of abstracting the predicate + coupling the abstraction with the type -- and it lets us avoid the need for two different predicate spaces.

Ohaithere <>, ><, <==>, =%, and friends!

@weswigham
Copy link

@jeffmo:
Your suggestion duplicates the obj argument between the switch and the match - seems like needless repetition. And how is that even any different from an if ... else if chain at that point?

I think the beauty of the trailing if clause in this proposal is in its ability to test the things you destructure from the destructuring match before deciding to accept a given match block.

  1. Having an expression form of pattern matching is super useful

It would be possible to expect do expressions to exist and claim that match is an expression - consuming construct only.

@jeffmo
Copy link

jeffmo commented Nov 25, 2015

Your suggestion duplicates the obj argument between the switch and the match

That's true if you want to match the toplevel value, but it's often just as useful to match on bindings named within the pattern. My whole point is that having two predicate positions is unnecessary and burdensome on the case where you only need one of the two (which is most likely to be the common case). If this proposal were to replace if ... else if with a more concise form without compromising other more direct use cases, that would be convenient -- but I don't see it doing so thus far and I don't know if that should really be a toplevel goal...

It would be possible to expect do expressions to exist and claim that match is an expression - consuming construct only.

Yes, do expressions would assist here on the statement/expression front. However, until do expressions have a more active champion, it might not be a great idea to take them on as a dependency here. Definitely worth trying to drum up support here, though.

@adrianmcli
Copy link

The last comment for this proposal is 2015. Are there any updates on this?

@sergeysova
Copy link

I think switch is bad syntax for pattern-matching. See in rust

Example for JS:

match (value) {
  { x, y } -> {
    // Match object with properties
  }

  [x, y, ...rest] -> {
    // match array
  }
  
  ClassName -> {
    // ClassName[Symbol.matches](value) === true
  }

  2 -> {
    // value === 2
  }

  2, 3, 4 -> {
    // value === 2 || value === 3 || value === 4
  }

  Number, Symbol -> {
    //  Number[Symbol.matches](value) === true || Symbol[Symbol.matches](value) === true
  }

  () -> {
    // default
  }
}

@sergeysova
Copy link

@phpnode
Copy link

phpnode commented Jan 31, 2017

In case anyone reading this is interested, here's a pattern matching syntax based on types which works today (via a babel-plugin), it's very inspired by Rust:

import t from 'flow-runtime';

const makeUpperCase = t.pattern(
  (input: string) => input.toUpperCase(),
  (input: number) => input
);

console.log(makeUpperCase("foo bar"));
console.log(makeUpperCase(123));
console.log(makeUpperCase(false)); // throws

See https://codemix.github.io/flow-runtime/#/docs/pattern-matching

@bali182
Copy link

bali182 commented May 3, 2017

Love the proposal, great idea! One thing that's bugging me: is it required to introduce a new kind of arrow? Wouldn't => do just fine?

Also, I greatly support @lestad s argument: switch is already in javascript and a switch statement is a sideefect. We would most likely want pattern matching to evaluate to whatever the matching branch returns. So a new keyword would be fairly reasonable in my opinion.

@rauschma
Copy link

rauschma commented Jun 3, 2017

This is awesome!

A few thoughts:

  • I’d also prefer match () {} (versus switch () {} plus prefix for each rule).
  • Bikeshedding: could the match operator =~ be a keyword (e.g., match)?
  • How would you match primitive values? One could introduce pseudo-classes for them: PrimitiveString, PrimitiveNumber, etc.; maybe even PrimitiveNull and PrimitiveUndefined. NonPrimitive would also be useful, considering that not all objects are instances of Object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment