Skip to content

Instantly share code, notes, and snippets.

@laughinghan
Last active August 30, 2024 01:30
Show Gist options
  • Save laughinghan/fec2f7b72ace119e82182f40d453266a to your computer and use it in GitHub Desktop.
Save laughinghan/fec2f7b72ace119e82182f40d453266a to your computer and use it in GitHub Desktop.
A table of every HTML element and some properties relevant to formatting HTML into plaintext
> A table of every HTML element and some properties relevant to formatting
> HTML into plaintext.
>
> Key:
> transparent
> elements that are noted in the HTML spec as having a "transparent content model",
> and aren't [replaced elements]
> https://developer.mozilla.org/en-US/docs/Web/CSS/Replaced_element
> (just <a>, <del>, <ins>, and <map>)
> trans-space
> elements with transparent content model and *are* replaced elements; will want
> to plaintext-ify as inline containers but with spaces around contents
> block
> noted in the HTML spec for breaking <p> elements, and non-void (only <hr>)
> https://html.spec.whatwg.org/multipage/syntax.html#optional-tags:the-p-element
> block-void
> breaks <p> elements and *is* void (only <hr>)
> block*
> not noted in the HTML spec for breaking <p> elements, but for our purposes
> we want to break lines when we encounter it. Primarily table rows (<tr>s),
> table row groupings (<thead> <tbody> <tfoot>), and list items (<li>s),
> because we want each table row or list item to plaintext-ify to its own line.
> (Table row groupings should only ever be immediately between a <table> and <tr>
> but just in case I guess.) <dialog> also falls under this but that appears to be
> a spec bug, bc observed behavior in browsers is it does break <p>s
> https://github.com/whatwg/html/issues/10590
> block-void*
> <br>, which is considered inline-level (aka phrasing content category) by the
> browser when parsing HTML and does not break paragraphs, but for our purposes
> we do want to break lines when we encouter it
> inline
> noted in the HTML spec for as being in the "phrasing content category" and
> not noted for having a transparent content model; and neither void nor
> a replaced element. For our purposes, ends up treated the same as transparent
> inline-space
> same as inline, but *is* a replaced element (and non-void). Ends up treated
> same as trans-space
> inline-void
> same as inline, but void meaning empty content model or, for the purposes
> of plaintext-ifying, we want to ignore its contents (<template>, <slot>,
> <math>, <svg>)
> void
> not in the flow content category in the HTML spec, but void so we want to
> ignore its contents regardless
> Note: all "flow content" elements that aren't phrasing content and aren't transparent content model, except for <dialog>, breaks paragraphs
transparent <a>
inline <abbr>
block <address>
inline-void <area>, if it is a descendant of a <map> element
block <article>
block <aside>
trans-space <audio>
inline <b>
void <base>
inline <bdi>
inline <bdo>
block <blockquote>
block-void* <br>
inline-space <button>
trans-space <canvas>
inline <cite>
inline <code>
void <col>
void <colgroup>
inline <data>
inline <datalist>
transparent <del>
block <details>
inline <dfn>
block* <dialog>
block <div>
block <dl>
inline <em>
inline-void <embed>
block <fieldset>
block <figcaption>
block <figure>
block <footer>
block <form>
block <h1>-<h6>
block <header>
block <hgroup>
block-void <hr>
inline <i>
inline-void <iframe>
inline-void <img>
inline-void <input>
transparent <ins>
inline <kbd>
inline <label>
block* <li>
inline-void <link>, if the itemprop attribute is present
block <main>
transparent <map>
inline <mark>
inline-void <math>
block <menu>
inline-void <meta>, if the itemprop attribute is present
inline-space <meter>
block <nav>
inline-void <noscript>
trans-space <object>
block <ol>
inline <output>
void <optgroup>
void <option>
block <p>
inline-void <picture>
block <pre>
inline-space <progress>
inline-space <q>
inline-space <ruby>
inline <s>
inline <samp>
inline-void <script>
block <search>
block <section>
inline-void <select>
void <source>
inline-void <slot>
inline <small>
inline <span>
inline <strong>
void <style>
inline <sub>
inline <sup>
inline-void <svg>
block <table>
block* <thead>
block* <tbody>
block* <tfoot>
block* <tr>
inline-space <th>
inline-space <td>
inline-void <template>
inline-void <textarea>
inline <time>
void <title>
void <track>
inline <u>
block <ul>
inline <var>
trans-space <video>
inline-void <wbr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment