The Rabbit R1 uses a few custom APIs to talk to The Cloud™. Almost nothing happens on-device, and all the AI magic happens on servers.
Consequently, you don't really need the physical device.
In lieu of an authentication scheme, Rabbit's servers attempt to verify device authenticity by checking the TLS client's JA3 fingerprint, presumably enforced by AWS WAF.
If your TLS client doesn't match an expected fingerprint, you'll get HTTP 403 errors. This fingerprint works:
771,4865-4866-4867-49195-49196-52393-49199-49200-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-51-45-43-21,29-23-24,0
This is a common fingerprint for Android devices and is not exclusive to the R1.
I use utls to replicate this fingerprint like so (drop this into ja3proxy)
Visit https://rabbit.tech/activate in a web browser and set up an account. Follow the registration process, and you should end up with a QR code.
Decode the QR. I like to use zxing. You should get a URL that looks like:
https://hole.rabbit.tech/apis/linkDevice?userId=auth0%7Crandomhex&linkingPasscode=randomhex
The URL must first be modified to append a deviceId
parameter, set to a 15 digit decimal number (it's supposed to be an IMEI, but you can use any value) (alternatively, generate a more realistic value with this)
Make an HTTP GET request to the modified URL, and on success you'll get a JSON response that looks like this:
{"actualUserId":"auth0|randomhex","userId":"randomhex","accountKey":"randomhex","userName":"blah"}
Keep these values safe, particularly accountKey
, you'll need them for later.
The account is now activated, and your browser session should have access to the "rabbithole", where you can let them skim your creds for 3rd party service integrations over VNC, and other such features.
The main API is a JSON-based RPC-like mechanism running over a websocket, at wss://r1-api.rabbit.tech/session
The API is clearly based on the GAMA NPC "Quantum Engine AI" integration thing, which you can find partial docs for here (paste it into https://studio.asyncapi.com/), but this is more of a curiosity than useful documentation.
You'll need to set a couple of HTTP headers before it'll work, App-Version
and OS-Version
. Valid values for these fields change in each update, so I won't list them here, but maybe someone will be nice and leave currently-working values in the comments. (it sounds like OS-Version
is the more important of the two, App-Version
maybe doesn't matter)
In newer updates (v0.8.99+) a timestamp string in the format rabbit_OS_v0.8.99_20240606175556,YYYYMMDDHHMMSSmmm,xx
(where mmm is milliseconds, xx is a random 2-digit even integer) is encrypted with the following RSA-3072 public key:
-----BEGIN RSA PUBLIC KEY-----
MIIBigKCAYEAqLNRPcujKw1elkNJc+10o37YVbb7OjYa4Cv2pG2BzfSV3Ev7LMva
A2w0PAy25DhQU2NI7RU2a51OvTz0DsXM69oakuN0oSrKa9Eit2GPnX89H702MXGX
iRDZWEufAx67AaxK9d80Bajh2Abn06Bwaz9Z4D8vMxUOGsYkVKMW0LrmnW4984XI
UqT3+lOiEijBamodU/mORTeuxc5cdan00fq8qTOYuGFuKlPJSI3EExFHP3ONHD6z
44+PxXmhw532uAiNnT74yKXBoVYU19b8AAWLiSKyjf1eeus7dTobPKcpMemlJgxH
tVHtaSgnUugQ0a3XvmTVQpSeytPw8bL+/3c5KXfjGxPchoEZi7d71wv/AufDiSXr
gaew1KfJZBsr8Somr03b8xsHRJruPT61iPceh9bTWscwnK3WmDpAxnjdPQiflt/m
KkPEETtKGx0X5kUImHnr1jhUdYKmEOHfwkXBKVc66hpn85WGJ7MPVyixIOpzScAY
nKjVsP4ma6iFAgMBAAE=
-----END RSA PUBLIC KEY-----
(nb: This key changed in v0.8.107
)
in RSA_PKCS1_OAEP_PADDING
mode (MGF1, SHA1). The resulting value is base64 encoded and stored in the Device-Health
header. It's unclear how this measures the health of a device, but it's a feature nonetheless.
I haven't thoroughly tested this yet. At present, the API doesn't seem to mind whether Device-Health
is correct, or specified at all.
To authenticate, send a JSON blob that looks like this:
{
"global": {
"initialize": {
"deviceId": IMEI,
"evaluate": false,
"greet": true,
"language": "en",
"listening": true,
"location": {
"latitude": 0.0,
"longitude": 0.0
},
"mimeType": "wav",
"timeZone": "GMT",
"token": "rabbit-account-key+" + ACCOUNT_KEY,
}
}
}
deviceId
is whatever you used during activation, and ACCOUNT_KEY
is the value of accountKey
from the activation response message. Use your imagination for the other fields (I haven't figured out precisely what "listening", "greet" or "evaluate" do yet).
This should be the first thing you send after initiating the websocket connection.
Send a JSON message that looks like this
{
"kernel": {
"userText": {
"text": INPUT
}
}
}
Text-based responses look like this:
{"kernel": {"assistantResponse": OUTPUT}}
Example output:
{
"kernel": {
"assistantResponseDevice": {
"text": {
"language":"en",
"chars":[" ","H","e","l","l","o",","," ","h","o","w"," ","c","a","n"," ","I"," ","a","s","s","i","s","t"," ","y","o","u"," ","t","o","d","a","y","?"],
"char_start_times_ms":[0,...],
"char_durations_ms":[0,...]
},
"audio": BASE64_WAV,
"canned": false,
}
}
}
NOTE: The text
field is actually a stringified JSON object, I'm showing it as plain JSON above for clarity.
I wonder what the canned
field indicates?
Send a JSON message like this:
{
"kernel": {
"voiceActivity": {
"imageBase64": "",
"state": STATE
}
}
}
Where STATE
is one of: inactive
, pttButtonPressed
, pttButtonReleased
.
Set PTT state to pressed, then send 0.1 second chunks of uncompressed WAV as bytes directly down the websocket, then set PTT state to released. It looks like it uses 16kHz stereo, 16-bit samples.
Send a base64-data-uri-encoded JPEG file, nominally 1080x720px at 100% quality (although other resolutions/qualities/formats presumably work too?) in a pttButtonReleased
PTT message's imageBase64
field. (sent along with a voice input as described above)
{
"kernel": {
"voiceActivity": {
"imageBase64": "data:image/jpeg;base64,blahblahasdfasdfasdf===",
"state": "pttButtonReleased"
}
}
}
Holy f*** this is amazing