Skip to main content
GlassKit is an open-source toolkit for building smart-glasses AI apps. Your coding agent can use the skill, docs, and runnable examples to build apps that understand what wearers see and hear, then guide them in real time. GlassKit currently starts with Rokid Glasses. The long-term goal is a developer platform for building, hosting, and shipping smart-glasses apps across more devices.

Quickstart

Choose the fastest path: install the agent skill, copy the Rokid starter, or copy a complete example.

GitHub

Browse source, examples, issues, and demo videos in the repository.

Discord

Join the Discord for project discussion and help.

Demos

The GitHub README includes demo videos. These demos cover the core GlassKit building blocks for Rokid Glasses: camera and microphone capture, WebRTC streaming, a monochrome on-lens display, touchpad and offline voice controls, OpenAI Realtime, Overshoot, and object detection.

Drink-making coach

Proactive drink-making coach that watches ingredients, picks a recipe, and guides each step.

Sushi speedrun timer

Physical-task timer that uses RF-DETR to detect configured objects and advance HUD splits after confirmation.

IKEA assembly assistant

Voice-first assembly assistant that streams microphone and camera input to OpenAI Realtime.

Live scene reader

Scene reader that sends live camera context to Overshoot and displays inference text on the HUD.

Rokid feature demo

Device-feature reference app for touchpad navigation, offline voice commands, camera, microphone, audio, and reusable screen controllers.

Searchable life recording

Full-day smart-glasses recording demo that makes long first-person recordings browsable and searchable.

How apps work

A typical GlassKit app has four pieces:
  1. A Rokid Glasses Android app captures camera and microphone input, handles touchpad gestures, and renders a HUD.
  2. WebRTC carries live media between the glasses, your backend, and AI services.
  3. A backend coordinates session setup, workflow state, model calls, tool calls, and app-specific decisions.
  4. The wearer gets real-time feedback through display and audio.
The exact architecture varies by example. Some pieces can run offline, including local voice commands, device controls, and local vision or privacy processing.

Requirements

Many examples need: Each example README has the exact setup steps and environment variables.