Last active
January 1, 2020 20:36
-
-
Save guest271314/59406ad47a622d19b26f8a8c1e1bdfd5 to your computer and use it in GitHub Desktop.
SpeechSynthesis *to* a MediaStreamTrack or: How to execute arbitrary shell commands using inotify-tools and DevTools Snippets
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The requirement described at Support SpeechSynthesis *to* a MediaStreamTrack (https://github.com/WICG/speech-api/issues/69) | |
is possible at Firefox and Nightly due to those browsers exposing `"Monitor of <device>"` as a device that can be selected | |
when `navigator.getUserMedia()` is executed. That provides a means to capture audio being output to speakers or headphones | |
without also capturing microphone input. | |
That output is also possible at Chrome/Chromium by following the proceure described at This is again recording from microphone, | |
not from audiooutput device (https://github.com/guest271314/SpeechSynthesisRecorder/issues/14#issuecomment-527020198). | |
After filing issues Support capturing audio output from sound card (https://github.com/w3c/mediacapture-main/issues/629), | |
Clarify getUserMedia({audio:{deviceId:{exact:<audiooutput_device>}}}) in this specification mandates capability | |
to capture of audio output device - not exclusively microphone input device (https://github.com/w3c/mediacapture-main/issues/650), | |
at Media Capture and Streams specification (aka getUserMedia) (https://github.com/w3c/mediacapture-main) in order to make what | |
is already possible clear in the specification so that Chrome/Chromium authors to explicitly expose the device `"Monitor of <device>"` | |
at *nix, decided to revisit the subject matter anew, with a more expansive mandate than only capturing speech synthesis as | |
a `MediaStream` or, specifically, as a `MediaStreamTrack`; rather, with the requirement to execute arbitrary shell commands | |
with the capability to get the output of those commands, if any, within the browser, at *nix. | |
Prior issues describing the concept built upon hereafter | |
- <script type="shell"> to execute arbitrary shell commands, and import stdout or result written to local file as a JavaScript module (https://github.com/whatwg/html/issues/3443) | |
- Add execute() to FileSystemDirectoryHandle (https://github.com/WICG/native-file-system/issues/97) | |
Procedure | |
1. Install `inotify-tools` (https://github.com/rvoicilas/inotify-tools). | |
2. Launch Chrome/Chromium with necessary flags set with `--use-file-for-fake-audio-capture` value set as the `wav` file | |
that we will have the ability to get as a `MediaStream` in 6. and 7. below. | |
``` | |
chromium-browser --allow-file-access-from-files --autoplay-policy=no-user-gesture-required --use-fake-device-for-media-stream --use-fake-ui-for-media-stream --use-file-for-fake-audio-capture=$HOME/localscripts/output.wav%noloop --user-data-dir=$HOME/test | |
``` | |
3. Create a local directory (e.g. `localscripts`) where the file to be monitored for `close` event is saved. | |
Open DevTools at Chrome/Chromium, select `Sources`, select `Snippets`, select `New snippet`, name the snippet `run`, | |
right-click on `run` snippet, select `Save as...` then save the file in `localscripts` directory. | |
4. Create a shell script to be executed when `close` event of file `run` occurs, again, in Snippets at DevTools. | |
Follow the procedure in 2., save the script in a directory in `PATH`, e.g. `$HOME/bin`, here the file is named `waiting.sh` | |
``` | |
#!/bin/sh | |
while inotifywait -e close $HOME/localscripts/run; do | |
$HOME/bin/input.sh | |
done | |
``` | |
5. Create the shell script to be executed (again, in Snippets at DevTools following 2.) and save as `input.sh`. Set the script as | |
executable `chmod +x` and place in `PATH` or `localscripts` directory. | |
``` | |
#!/bin/sh | |
espeak-ng -m -f $HOME/localscripts/input.txt -w $HOME/localscripts/output.wav | |
``` | |
In this case we read text input from `input.txt` and output the resulting `wav` file to `output.wav`. | |
6. To meet the requirement "Support SpeechSynthesis *to* a MediaStreamTrack" at Chrome/Chromium we will launch Chromium with the | |
necessary flags set to get input to the `MediaStream` from `output.wav` using JavaScript. Again, we follow 2. to create | |
and name the file `stream`. | |
``` | |
async function speak() { | |
// one issue with speech synthesis directly to MediaStream or MediaStreamTrack | |
// is there is no way to determine when the output is really ended, as neither | |
// ended, mute, or unmute events are fired, and the input, due to -m flag passed to espeak-ng | |
// can contain SSML <break> elements, e.g., <break time="2500ms"> which can convey | |
// false-positive if detecting silence is used to check if the expected output has completed | |
// therefore store the MediaStreamTrack as a global variable and execute stop() when speak() | |
// is called again | |
if (!globalThis.track) { | |
globalThis.track = null; | |
} else { | |
console.log(globalThis.track); | |
globalThis.track.stop(); | |
} | |
const stream = await navigator.mediaDevices.getUserMedia({audio: true}); | |
globalThis.track = stream.getTracks()[0]; | |
// sound is not output to speakers or headphones (https://github.com/cypress-io/cypress/issues/5592#issuecomment-569972506) | |
const ac = new AudioContext(); | |
const source = ac.createMediaStreamSource(stream); | |
source.connect(ac.destination); | |
} | |
``` | |
7. Following 2. we create a JavaScript file to execute the code defined in `stream` and name the file `speak` | |
``` | |
speak(); | |
``` | |
8. Following 2 we write input text in `input` and save the file as `input.txt` in `localscripts`, e.g. | |
``` | |
Do stuff. | |
Do other stuff! | |
Now, let's try this. | |
<p>A paragraph.<p> | |
<s>a sentence</s> | |
123<break time="2500ms"> | |
456<break time="2500ms"> | |
789 | |
``` | |
9. Execute `waiting.sh` (3.). | |
``` | |
$ ~/bin/waiting.sh | |
``` | |
10. Right-click `stream` and select `Run` to define `speak` globally. | |
11. Right-click `run` and select `Save as...` to save the file in `localscripts` which will cause `inotifywatch` event | |
to be fired | |
``` | |
/home/user/localscripts/run CLOSE_WRITE,CLOSE | |
Setting up watches. | |
Watches established. | |
``` | |
12. Right-click `speak` and select `Run`. |
Author
guest271314
commented
Jan 1, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment