> ## Documentation Index
> Fetch the complete documentation index at: https://glasskit.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Vosk voice commands

> Add offline fixed-phrase voice commands that mirror Rokid touchpad actions.

Use Vosk for offline, fixed-phrase commands that mirror the Rokid touchpad actions: `select`, `back`, `next`, `previous`.

## Build inputs

```kotlin theme={null}
// app/build.gradle.kts
defaultConfig {
    ndk {
        abiFilters += listOf("arm64-v8a", "x86_64")
    }
}

dependencies {
    implementation("com.alphacephei:vosk-android:0.3.75@aar")
    implementation("net.java.dev.jna:jna:5.18.1@aar")
}
```

Keep the Vosk and JNA dependencies as inline strings: Gradle version catalogs lose the `@aar` qualifier and can pull duplicate JNA classes.

Add `android.permission.RECORD_AUDIO` to the manifest and request it at runtime before opening `AudioRecord`.

Bundle a model at `app/src/main/assets/model-en-us/`. Recommended default: `vosk-model-small-en-us-0.15`.

```sh theme={null}
ASSET_DIR=app/src/main/assets
curl -L -o /tmp/vosk-model-small-en-us-0.15.zip \
  https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip -q /tmp/vosk-model-small-en-us-0.15.zip -d /tmp
rm -rf "$ASSET_DIR/model-en-us"
mkdir -p "$ASSET_DIR"
mv /tmp/vosk-model-small-en-us-0.15 "$ASSET_DIR/model-en-us"
printf 'en-us-small-0.15-v1\n' > "$ASSET_DIR/model-en-us/uuid"
```

Load the bundled model through Vosk's Android storage helper:

```kotlin theme={null}
StorageService.unpack(
    context.applicationContext,
    "model-en-us",
    "model",
    { model -> /* create Recognizer */ },
    { exception -> /* report init failure */ }
)
```

Check `context.assets.list("model-en-us")` before unpacking so missing models report a useful runtime error.

## Recognizer

```kotlin theme={null}
private const val SAMPLE_RATE_HZ = 16_000

val commands = linkedSetOf("select", "back", "next", "previous")
val grammarJson = JSONArray().apply {
    commands.forEach { put(it) }
    put("[unk]")
}.toString()

val recognizer = Recognizer(model, SAMPLE_RATE_HZ.toFloat(), grammarJson).apply {
    setWords(false)
    setPartialWords(false)
    setEndpointerDelays(5.0f, 0.25f, 3.0f)
}
```

Normalize configured commands and recognized text with `trim().lowercase(Locale.US)`. Keep `[unk]` in the grammar so out-of-grammar speech does not force a command.

The endpoint delays above bias command recognition toward short utterances: tolerate startup silence, finalize quickly after trailing silence, and cap utterances at three seconds.

## Audio loop

Feed the recognizer 16 kHz mono PCM16 from a worker thread. Use sample counts, not byte counts, when passing a `ShortArray` to `acceptWaveForm`.

```kotlin theme={null}
val minBufferBytes = AudioRecord.getMinBufferSize(
    SAMPLE_RATE_HZ,
    AudioFormat.CHANNEL_IN_MONO,
    AudioFormat.ENCODING_PCM_16BIT
)
require(minBufferBytes > 0)

val record = AudioRecord(
    MediaRecorder.AudioSource.MIC,
    SAMPLE_RATE_HZ,
    AudioFormat.CHANNEL_IN_MONO,
    AudioFormat.ENCODING_PCM_16BIT,
    maxOf(minBufferBytes, SAMPLE_RATE_HZ * 200 / 1000 * 2)
)

check(record.state == AudioRecord.STATE_INITIALIZED)
record.startRecording()
check(record.recordingState == AudioRecord.RECORDSTATE_RECORDING)

Process.setThreadPriority(Process.THREAD_PRIORITY_AUDIO)
val buffer = ShortArray(SAMPLE_RATE_HZ * 50 / 1000)

while (!stopRequested) {
    val readCount = record.read(buffer, 0, buffer.size)
    if (readCount < 0) {
        reportAudioReadFailure(readCount)
        return
    }
    if (readCount == 0) continue

    if (recognizer.acceptWaveForm(buffer, readCount)) {
        publishPartial("")
        dispatchResult(recognizer.getResult())
    } else {
        publishPartial(partialText(recognizer.getPartialResult()))
    }
}

if (!stopRequested) {
    publishPartial("")
    dispatchResult(recognizer.getFinalResult())
}
```

Parse Vosk JSON with `JSONObject`: final results use `"text"` and partial results use `"partial"`.

```kotlin theme={null}
fun resultText(resultJson: String) = JSONObject(resultJson)
    .optString("text", "")
    .trim()
    .lowercase(Locale.US)

fun partialText(partialJson: String) = JSONObject(partialJson)
    .optString("partial", "")
    .trim()
    .lowercase(Locale.US)

val text = resultText(resultJson)
if (text in commands) {
    onCommand(text)
}
```

Callbacks from the recognition thread must hop to the main thread before touching Android views.

## Lifecycle

* Start only after the model is unpacked and `RECORD_AUDIO` is granted.
* On stop, set a stop flag, stop `AudioRecord`, interrupt/join the worker briefly, release `AudioRecord`, clear partial UI state, and reset any audio meter to zero.
* On destroy, close `Recognizer` and `Model`.
* Call `recognizer.reset()` before each new listening session.
* Suppress duplicate final commands in a short window, around `400ms`, because endpointing can produce repeated finals.
* Surface actionable errors for missing model, unpack failure, missing permission, invalid buffer size, recorder init/start failure, negative reads, and runtime exceptions.
