Surface Studio
A unified projection mapping studio that combines quad warping, 13 interactive widget types, camera-based hand and face tracking, configurable TouchOSC-style actions, voice commands with Whisper STT, and LLM integration — all in a single self-contained HTML file.
▶ Quick Start
- Serve the
interactive-spaces/folder over HTTP (e.g.python -m http.server 8000) - Open the launcher page in your browser
- Click "Open Editor" to enter the full editing interface
- Click "+ Quad" in the toolbar to create your first surface
- Drag widgets from the palette on the left onto the canvas
- Switch to Preview mode to see warped output and drag corners
- Open a second window with "Open Projection" for fullscreen output
- Connect a camera in Camera mode to enable hand/face tracking
☰ Modes
Surface Studio uses hash-based routing with three modes:
#editor#projection⚙ Architecture
The entire app is a single index.html file (~2100 lines). No build step, no dependencies beyond CDN imports for MediaPipe and Whisper.
Data Flow
Key Technologies
- Homography — 4-point DLT algorithm for perspective transforms, converted to CSS
matrix3d() - MediaPipe — HandLandmarker (21 landmarks) + FaceLandmarker (468+ landmarks) via WebAssembly
- Whisper — Tiny English model via
@huggingface/transformers, runs entirely in-browser - Ollama — Local LLM via
localhost:11434REST API with streaming responses - BroadcastChannel — Real-time cross-window sync without WebSocket
▦ Editor Layout
Toolbar
The toolbar shows the project name (editable), a "+ Quad" button, and the canvas mode switcher (Edit / Preview / Camera).
Hierarchy Panel
The left panel shows a tree of all quads and their widgets. Click to select. Each quad shows a colored dot matching its border color. Widgets are indented with a type icon prefix.
Widget Palette
Below the hierarchy, a 3-column grid shows all 13 widget types. Click any item to add it to the center of the selected quad.
Canvas Area
The center panel has three views:
Inspector Panel
The right panel has three tabs:
Properties Tab
Shows editable fields for the selected item:
- Quad selected: Name, color, canvas dimensions, corner coordinates
- Widget selected: Type info, transform (X/Y/W/H), all type-specific properties (label, color, value, src, etc.)
Actions Tab
Configure TouchOSC-style triggers and actions for the selected widget. See the Action System section.
Camera Tab
Camera controls, calibration buttons, hand/face tracking stats, and settings toggles.
Status Bar
The bottom bar shows live status indicators:
- Voice — Listening status (green dot when active)
- Whisper — Model load state and readiness
- Ollama — Connection status and active model name
- Face — Face count and mouth state when camera is active
- Cal — Calibration accuracy in pixels
- Info — Quad and widget counts
▪ 13 Widget Types
press, release. Default 160×60.value-change. Default 200×50.value-change. Default 80×80.toggle-on, toggle-off. Default 120×40.press, release. Default 240×240.touch. Default 160×60.touch. Default 200×40.press. Default 300×200.record-start, record-stop. Default 100×100.touch. Default 400×200.▢ Quads & Projection Mapping
Quads are the projection surfaces. Each quad is a rectangular canvas that gets warped via 4-point homography to match a physical surface.
Working with Quads
- Add: Click "+ Quad" in the toolbar or use
Quads > Add Quad - Select: Click in the hierarchy panel or on the canvas
- Warp: Switch to Preview mode, drag the corner handles
- Move: In Preview mode, click and drag the quad body
- Layout: Use
Quads > Layoutto auto-arrange as grid, row, or column - Delete: Select and press Del or use the inspector
Homography Math
Each quad's four corners define a perspective transform computed via Direct Linear Transform (DLT). The resulting 3×3 homography matrix is converted to a CSS matrix3d() for GPU-accelerated rendering.
⚡ Action System (TouchOSC-style)
Every widget can have triggers that fire actions. This is the core interactivity system, inspired by TouchOSC's trigger/action model.
How It Works
- Select a widget and go to the Actions tab in the inspector
- Choose a trigger event from the dropdown (e.g.,
press) and click "+ Trigger" - Under the trigger, click "+ Action" to add what happens
- Configure the action type, target widget, and parameters
- Optionally set a voice keyword on the trigger for voice activation
Trigger Events
| Widget Type | Available Triggers |
|---|---|
| Button, Big Button | press, release, face-detected, face-lost, mouth-open, mouth-close, blink |
| Slider, Knob | value-change |
| Toggle | toggle-on, toggle-off |
| Mic | record-start, record-stop |
| Readout, Label | touch, face-detected, face-lost, mouth-open, mouth-close, blink |
| Image, Video, HTML, Web | touch |
| Soundboard | press |
Action Types
value to 100, change a label's text, update a slider's value.on state).add button at 100 200.Example: Button Controls a Readout
- Add a Button and a Readout to a quad
- Select the Button, go to Actions tab
- Add trigger:
press - Add action:
Set Widget Prop - Target: the Readout widget
- Prop:
value, Value:100 - Now pressing the button in projection mode sets the readout to "100"
📎 Drag & Drop
Drop files or URLs directly onto the editor canvas:
- Image files (PNG, JPG, GIF, SVG) — Creates an Image widget with the file embedded as a data URI
- Video files (MP4, WebM) — Creates a Video widget with the file embedded
- URLs (dragged from browser address bar or links) — Creates an iframe widget with the URL as source
🌐 Live Web Embeds
The Web (iframe) widget embeds live web pages that render at full resolution within warped quads in projection mode.
Use Cases
- Embed dashboards, data visualizations, or monitoring tools
- Display GitHub Pages projects as interactive surfaces
- Show web apps, documentation, or any URL
- Combine multiple web embeds across different quads
Setting Up
- Add a Web widget from the palette
- In the inspector, set the
srcproperty to your URL - The iframe renders live in projection mode with script/form support
X-Frame-Options. GitHub Pages, personal sites, and most web apps work fine. Sites like Google or Twitter will not load in iframes.📷 Camera Setup
Surface Studio uses your webcam for two things: hand tracking (touch interaction on projected surfaces) and face tracking (expression-based triggers).
Starting the Camera
- Switch to Camera mode (3) — camera starts automatically
- Or use
Camera > Start Camerafrom the menu - Grant browser permission when prompted
- The camera runs in the background even when viewing Edit/Preview
✋ Hand Tracking
MediaPipe HandLandmarker detects up to 2 hands with 21 landmarks each, running on your GPU via WebAssembly.
How Touch Works
- The index fingertip (landmark 8) position determines the cursor location
- Touch detection uses two signals:
- Finger extended (tip above PIP joint)
- Z-depth < -0.05 (finger pressing toward camera)
- When both conditions are met, a touch event fires
- After calibration, camera coordinates are transformed to projector space via homography
Landmark Map
Camera View Overlay
When enabled, the camera view draws:
- Green skeleton — Bone connections between landmarks
- Green dots — Each landmark position
- Pink dot — Index fingertip (landmark 8), larger than others
- Stats text — Hand count and face count in the top-left corner
😶 Face Tracking
MediaPipe FaceLandmarker detects up to 2 faces with 468+ landmarks each, enabling expression-based interactions.
What's Detected
face-detected and face-lost triggers on widgets.mouth-open and mouth-close triggers.blink trigger. Tracks total blink count.Face Overlay
The camera view draws face landmarks in cyan/aqua:
- Jawline contour — Full face silhouette
- Eyebrows — Left and right brow arcs
- Lips — Outer lip contour
- Nose bridge — Center vertical line
- Eye centers — Larger dots using iris landmarks
- Key points — Nose tip, eye corners, lip centers
Expression Triggers
Use face events as triggers in the Action System. Example use cases:
- Open mouth to trigger a sound effect
- Blink to cycle through slide content
- Face detection starts an ambient animation, face lost pauses it
- Head pose direction controls which quad is highlighted
🎯 Calibration
Calibration maps camera coordinates to projector coordinates so hand touches land on the right widgets.
4-Point Calibration Flow
- Open the projection window on your projector
- Point your camera at the projected surface
- Go to
Camera > Start Calibration(or say "calibrate") - The projection shows dot 1 (top-left) with a pulsing ring animation
- In the camera view, click where you see dot 1
- Dot 1 turns green (confirmed), dot 2 appears
- Repeat for all 4 corners: TL → TR → BR → BL
- System computes homography and reports reprojection error
- Status bar shows permanent "Cal: 2.1px" accuracy badge
Calibration Quality
- < 5px — Excellent. Touch will be very accurate.
- 5-10px — Good. Adequate for buttons and large widgets.
- > 10px — Warning shown. Consider redoing calibration. Make sure dots are clearly visible and click precisely.
Persistence
Calibration data is saved to localStorage and restored on reload. Use Camera > Reset Calibration to clear it.
👆 Touch Interaction
After calibration, your hand becomes a touch controller for the projected surface.
Interaction Model
- Hover: Point your index finger at the surface — cursor follows
- Press: Push your finger toward the surface (z-depth decreases) — triggers touch-start
- Release: Pull finger back — triggers touch-end
Widget Interactions
| Widget | Touch Behavior |
|---|---|
| Button / Big Button | Press visual feedback + fires press trigger |
| Slider | Drag horizontally to change value |
| Knob | Drag up/down to change value |
| Toggle | Tap to switch on/off |
| Soundboard | Tap individual sound buttons |
| Mic | Tap to start/stop recording indicator |
Touch Cursor
The projection window shows a circular cursor that follows the tracked fingertip. It shrinks and fills when pressing, giving clear visual feedback.
🎤 Voice Commands
Press F5 to start voice recognition. Whisper STT processes 3-second audio chunks locally in your browser.
Built-in Commands
| Command | Example | Description |
|---|---|---|
add [type] | "add button at 200 300 labeled Play" | Add a widget at position with optional label |
add quad | "add quad named Display" | Create a new quad |
select [name] | "select play button" | Select a widget or quad by name |
move [x] [y] | "move to 500 400" | Move selected widget |
resize [w] [h] | "resize 300 200" | Resize selected widget |
set color [color] | "set color red" | Change widget color (named colors or hex) |
delete | "delete" | Delete selected widget/quad |
duplicate | "duplicate" | Clone selected widget |
layout [mode] | "layout grid" | Auto-arrange quads (grid/row/column) |
save | "save" | Save project |
calibrate | "calibrate" | Start calibration flow |
start/stop camera | "start camera" | Toggle webcam |
ask [question] | "ask suggest a layout" | Query Ollama LLM |
Whisper STT
Speech recognition uses Xenova/whisper-tiny.en running entirely in your browser via @huggingface/transformers.
- Model size: ~40MB, downloaded once and cached
- Processing: 3-second audio chunks at 16kHz
- Silence detection: Chunks below energy threshold 0.005 are discarded
- Language: English only (tiny.en model)
The status bar shows download progress during first load.
Ollama LLM Integration
Connect to a local Ollama instance for AI-powered assistance.
Setup
- Install Ollama and pull a model:
ollama pull llama3 - Make sure Ollama is running on
localhost:11434 - Surface Studio auto-detects available models (prefers llama3, then mistral)
- Status bar shows connection status and model name
Usage
- Menu:
Voice > LLM PromptCtrl+L - Voice: Say "ask" followed by your question
- Action: Use
query-llmaction type on widget triggers
Responses stream into the command log in real-time.
Figurate API Key
Surface Studio includes a built-in place to store your Figurate key for voice and image-description/analyze integrations.
- Open
Help > Add Figurate Key... - Paste your key in the prompt. Figurate keys start with
fg_ - Click OK to save, or clear the field to remove the key
surface-studio-figurate-key. It is not written into exported project JSON.Use this when you want Surface Studio or related integrations to call Figurate-backed voice, narration, image description, or analyze endpoints without re-entering the key each session.
Voice Keywords
Any trigger can have a voiceKeyword — a phrase that activates the trigger when spoken.
Example
- Add a Button widget labeled "Start Show"
- Go to Actions tab, add a
presstrigger - Set voice keyword to
start show - Add actions under that trigger (e.g., set readout value, play sound)
- Now saying "start show" fires all those actions
⌨ Keyboard Shortcuts
| Key | Action |
|---|---|
| 1 | Edit mode |
| 2 | Preview mode |
| 3 | Camera mode |
| Ctrl+N | New project |
| Ctrl+O | Open / Import |
| Ctrl+S | Save |
| Ctrl+Z | Undo |
| Ctrl+Shift+Z | Redo |
| Ctrl+D | Duplicate widget |
| Ctrl+G | Toggle grid |
| Ctrl+L | LLM prompt |
| Ctrl+Shift+P | Screenshot |
| Del / Backspace | Delete selected |
| F5 | Toggle voice listening |
| H | Projection HUD: show / hide controls |
| M | Projection HUD: toggle minimal view |
| C | Projection HUD: toggle cursor |
| G | Projection HUD: toggle test grid |
| F | Projection HUD: toggle fullscreen |
| Escape | Deselect all |
💾 Save & Export
- Auto-save: Every change is debounced (500ms) and saved to localStorage
- Manual save: Ctrl+S or
File > Save - Export: Downloads the full project as a
.jsonfile - Import: Load a previously exported
.jsonfile - Undo/Redo: 40-level undo stack, survives within the session
- Calibration: Saved separately in localStorage, persists across sessions
📡 BroadcastChannel API
The editor and projection windows communicate via BroadcastChannel('surface-studio').
Message Types
| Type | Direction | Description |
|---|---|---|
project-update | Editor → Projection | Full project state + test grid flag |
widget-update | Both directions | Single widget props update |
widget-activated | Projection → Editor | Widget interaction event |
touch | Editor → Projection | Hand tracking cursor position |
touch-start | Editor → Projection | Finger press detected |
touch-end | Editor → Projection | Finger released |
touch-visibility | Editor to Projection | Shows or hides the projected touch cursor when tracking changes |
send-broadcast action type to send messages on any channel. External web apps can listen on the same channel to react to Surface Studio events.⚠ Troubleshooting
Camera not starting
- Ensure you're serving over HTTP (not file://). Camera requires secure context.
- Check browser permissions — click the lock icon in the address bar
- Try a different browser (Chrome recommended for MediaPipe)
MediaPipe models not loading
- Models are fetched from CDN on first use. Check your internet connection.
- If behind a firewall, ensure
cdn.jsdelivr.netandstorage.googleapis.comare accessible
Whisper not transcribing
- First load downloads ~40MB. Watch the status bar for progress.
- Speak clearly and at normal volume
- Silence detection threshold is 0.005 RMS — very quiet speech may be discarded
Ollama not connecting
- Ensure Ollama is running:
ollama serve - Check CORS: Ollama 0.1.24+ allows localhost by default
- Verify models:
ollama list
Projection window not updating
- Both windows must be on the same origin (same host:port)
- BroadcastChannel requires same-origin — no cross-port communication
- Try closing and reopening the projection window
Iframe not loading
- The target site may block iframes via
X-Frame-Optionsor CSP - GitHub Pages, personal sites, and most web apps work fine
- Test with a simple URL like
https://example.comfirst
Calibration inaccurate
- Keep the camera steady and avoid parallax
- Click the exact center of each calibration dot
- Ensure the camera can see the full projected area
- Redo calibration if you move the camera or projector
Surface Studio is part of the Interactive Spaces toolkit. Built with vanilla HTML, CSS, and JavaScript. No build tools required.