The New Era of Zero-Latency Voice: How Sensory is Revolutionizing Tiny STT with LiteRT and NPU Acceleration

15th Apr, 2026

4 min read

About The Author

Todd Mozer

Founder & CEO

A serial entrepreneur with an IPO, an acquisition, 50+ patents, and a lifetime in audio-tech innovation. Todd has deep experience licensing and working with the largest tech companies in the world, including Amazon, Apple, Google, Microsoft, Samsung, and many others.

1. The NPU-First Philosophy: Eliminating "CPU Fallback"
2. Small Models, Big Intelligence: The 2.7MB and 13MB Breakthroughs
3. Built on LiteRT: Universal Portability

Experience AI That Works On-Device

See how Sensory technology transforms user experiences — instantly, privately, and securely.

Request a Demo

For decades, the “Holy Grail” of speech recognition has been the ability to process natural language entirely on-edge, without the privacy risks or latency of the cloud. However, the industry has long been stuck in a compromise: either use massive, power-hungry processors for high accuracy, or settle for “Command & Control” triggers that fail the moment a user deviates from a script.

Today, Sensory is breaking that compromise. By marrying our high-accuracy Speech-to-Text (STT) models with LiteRT Micro (formerly TensorFlow Lite Micro) and a “NPU-First” architectural philosophy, we are delivering a new class of “Tiny STT” that fits into the smallest footprints imaginable.

1. The NPU-First Philosophy: Eliminating “CPU Fallback”

In the world of embedded AI, the Neural Processing Unit (NPU) is often treated as a “nice-to-have” accelerator. Traditional STT engines often “ping-pong” data between the CPU and NPU because their neural network operators aren’t fully supported by the hardware. This creates massive overhead, increases latency, and drains battery life.

Sensory’s approach is fundamentally different. Our STT models achieve 100% NPU operator mapping. This means:

Zero CPU Fallback: The entire tensor computation graph stays on the silicon designed for it.
Minimal Power Consumption: By keeping the application processor idle during inference, we drastically reduce the energy-per-inference metric.
Deterministic Latency: Without the CPU managing complex data handoffs, transcription happens in real-time, every time.

2. Small Models, Big Intelligence: The 2.7MB and 13MB Breakthroughs

We’ve optimized our STT engine into two distinct, high-performance profiles that redefine what “small” means for speech recognition:

The 2.7MB Domain-Specific Model: Perfect for “Command & Control” in automotive or industrial settings. It utilizes Domain Adaptation to maintain incredible accuracy in noisy environments while using just 787.11 KiB of Peak SRAM.
The 13MB General-Purpose Model: A “Natural Language” powerhouse. It handles large vocabularies and diverse accents out-of-the-box, yet is small enough to fit within standard 2MB SRAM/TCM limits, consuming only 1.68 MB of Peak SRAM.

3. Built on LiteRT: Universal Portability

By adopting LiteRT as our essential runtime layer, Sensory provides developers with a standardized, future-proof integration path. This allows our STT technology to deploy seamlessly across the world’s most popular embedded platforms:

Silicon Partner	Supported Platforms
Arm®	Cortex-M Series (M4, M7, M55) and Ethos™-U NPUs
Cadence®	Tensilica® HiFi 4, HiFi 5, and HiFi iQ DSPs
Espressif	ESP32 Series
NXP	i.MX RT Crossover MCUs
Development Boards	Arduino Nano 33 BLE Sense, Sony Spresense

Technical FAQ: Understanding On-Device STT & LiteRT Optimization

Q: What is LiteRT, and why is it used for Speech-to-Text?

A: LiteRT Micro is the evolved version of TensorFlow Lite Micro. It is a high-performance runtime designed specifically for executing machine learning models on microcontrollers and other resource-constrained devices. By using LiteRT, Sensory ensures that our STT models can run on devices with minimal memory without needing an OS or dynamic memory allocation.

Q: How does Sensory achieve 100% NPU operator mapping?

A: Most neural networks use a variety of mathematical operations (kernels). If a hardware NPU doesn’t support a specific operation, the system “falls back” to the CPU to finish the calculation. Sensory meticulously designs its STT architectures to use only the specific operators supported by edge NPUs like the Arm Ethos-U, ensuring the CPU never has to intervene during active transcription.

Q: Can these models run on standard Arduino or ESP32 boards?

A: Yes. Because the engine is compatible with LiteRT for Microcontrollers, it can be deployed as a standard C++ library. It has been tested on popular platforms, including the Arduino Nano 33 BLE Sense and Espressif ESP32.

Q: What is the benefit of “Domain Adaptation” in the 2.7MB model?

A: Domain adaptation allows a small model to achieve the accuracy of a much larger one by focusing its “intelligence” on a specific set of vocabulary or environmental conditions (like car cabin usage and noise). This makes it possible to have highly reliable voice control on hardware that traditionally could only handle simple keyword spotting.

Q: How does on-device STT improve user privacy?

A: Because Sensory enables the model to run entirely on the local hardware, no voice data or audio recordings are ever transmitted to a cloud server. This eliminates the risk of data intercepts and ensures that the device can operate in “comms-denied” environments without losing functionality.

Wake Words

Speech-to-Text & Commands

Language Models & Grammars

Sound Identification

Biometrics

VoiceHub

Stick to the Heavy Lifting: Build the Best Cloud AI with Sensory Providing the Edge

Webinar Recap: “Hey Car, What’s Next?”

Voices from the Vault: 30+ Years of Sensory’s Most Exciting Voice Tech Adventures

10 predictions for Edge AI in 2026: LLMs gain Efficiency

The New Era of Zero-Latency Voice: How Sensory is Revolutionizing Tiny STT with LiteRT and NPU Acceleration

About The Author

Table Of Contents

Experience AI That Works On-Device

1. The NPU-First Philosophy: Eliminating “CPU Fallback”

2. Small Models, Big Intelligence: The 2.7MB and 13MB Breakthroughs

3. Built on LiteRT: Universal Portability

Technical FAQ: Understanding On-Device STT & LiteRT Optimization

Related Articles

On-Device Voice AI FAQ for Product Teams

Webinar Recap: Stop Renting Your Voice Stack

Designing Reliable Wake Words for Action Cameras

Products

Company

Features

Resources

Wake Words

Speech-to-Text & Commands

Language Models & Grammars

Sound Identification

Biometrics

VoiceHub

The New Era of Zero-Latency Voice: How Sensory is Revolutionizing Tiny STT with LiteRT and NPU Acceleration

About The Author

Table Of Contents

Experience AI That Works On-Device

Share This article

1. The NPU-First Philosophy: Eliminating “CPU Fallback”

2. Small Models, Big Intelligence: The 2.7MB and 13MB Breakthroughs

3. Built on LiteRT: Universal Portability

Technical FAQ: Understanding On-Device STT & LiteRT Optimization

Related Articles

On-Device Voice AI FAQ for Product Teams

Webinar Recap: Stop Renting Your Voice Stack

Designing Reliable Wake Words for Action Cameras