Design notes for LiveTranscriber

LiveTranscriber is an iOS side project for local audio recording and live transcription. The app records audio, shows the transcript while recording, saves the audio and timed transcript together, and keeps the recording state visible through the Lock Screen and Dynamic Island.

Project: github.com/iamwilliamli/LiveTranscriber

Design goal

The main design goal is to make transcription feel like a native recording utility, not a cloud service wrapped in an app. I wanted something close to Voice Memos in interaction model, but with the transcript visible during recording and useful after the recording ends.

The app is designed around a few product constraints:

  • recording should start quickly and feel obvious
  • the transcript should be visible without taking over the whole app
  • saved recordings should become reusable files, not temporary sessions
  • audio and transcript data should stay local by default
  • system surfaces such as Live Activities should show only stable, useful state

This made the project less about adding many features and more about deciding what each screen is responsible for.

Product structure

The app has three main areas: recording, saved files, and settings.

The recording tab is the primary surface. It shows the current state, elapsed time, selected language, recording format, and live transcript. The bottom control area changes between a single start button and a compact pause/stop dock while recording.

The recordings tab is a library. It supports playback, transcript seeking, importing audio, re-transcription, copying, sharing, searching, and on-device summary/tag generation. I used a native list because the behavior should feel closer to a system file or voice memo list than a custom gallery.

The settings tab is intentionally narrow. It contains language, recording format, storage state, iCloud status, privacy information, and developer diagnostics. Settings should explain system boundaries, not become a second product.

Local-first boundary

The privacy model drives many design choices. LiveTranscriber does not use a developer-operated server, third-party analytics, ads, tracking, or custom upload flow. Audio files and transcripts stay in the app-private container by default.

When iCloud is enabled, the app uses an app-private iCloud container for audio and transcript files, and the user’s private CloudKit database for the SwiftData index. This keeps sync optional and user-owned.

This also affects the UI. The app shows storage state and iCloud upload status, but it does not ask the user to think about folders, servers, or accounts. The default mental model is simple: recordings belong to the app unless the user exports or shares them.

Recording screen

The recording screen is designed to answer four questions at a glance:

  • Is the app recording?
  • How long has it been recording?
  • What language and format are active?
  • What has been transcribed so far?

I avoided a decorative waveform on the main recording screen. A waveform looks active, but it did not add much decision-making value during live transcription. The transcript itself is the useful signal.

The stop flow also matters. Instead of immediately saving a recording after tapping stop, the app presents a save sheet. This gives the user one small moment to name the recording, add tags, attach location metadata, or discard the draft.

Saved recordings

A recording should not become a dead file after it is saved. The detail screen treats the transcript as an interface for the audio. Each transcript line has a timestamp, and tapping a line seeks playback to that moment.

Search also uses more than filenames. It can match recording names, languages, transcript previews, full transcript text, summaries, and tags. This makes the recordings tab closer to a small personal archive than a simple audio folder.

Imported audio follows the same model as live recordings. The user picks an audio file, chooses a transcription language, and the app creates a recording item with progress state. This keeps imported files from feeling like a separate feature.

User Interface

Live Activities are useful because recording often continues while the user leaves the app. The Lock Screen and Dynamic Island show elapsed time, recording state, language, line count, and the latest final transcript text.

Trade-offs

Some omissions are intentional.

I did not expose MP3 as a recording format because the native iOS recording path is much stronger for WAV and M4A. I also did not add speaker diarization because the current Apple Speech pipeline does not provide a stable speaker separation API for this use case.

I also avoided making the app depend on cloud transcription. That limits some model choices, but it keeps the product boundary clear: this is a local-first utility, not a remote transcription client.

Chengqi (William) Li
Chengqi (William) Li

My research interests include 3D perception, computer vision, and deep learning.