AI That Listens, Sees, and Understands — On the Edge
Industry News

Sensory Announces the World’s Smallest and Most Powerful On-Device Speech-to-Text Engine

15th Apr, 2026
4 min read
Sensory Announces the World’s Smallest and Most Powerful On-Device Speech-to-Text Engine

Now Supporting TensorFlow Lite Micro and NPU Architectures

SANTA CLARA, CA – April 15, 2026Sensory Inc., a pioneer in on-device AI for over 30 years, today announced a breakthrough in embedded speech recognition with the launch of its latest Speech-to-Text (STT) engine. Optimized for TensorFlow Lite Micro (TFLM) and advanced Neural Processing Units (NPUs), including the Arm® Ethos™-U55, this new engine delivers unparalleled accuracy and performance in an ultra-compact footprint.

Global Reach: 37 Languages on the Edge

Breaking down the barriers of localized AI, Sensory’s STT engine supports 37 languages, enabling manufacturers to deploy truly global products with a single, ultra-efficient architecture. Supported languages include:

Afrikaans, Arabic, Belarusian, Bengali, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Farsi, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Mandarin, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swahili, Swedish, Thai, Turkish, Ukrainian, and Vietnamese.

Maximum Power, Minimum Footprint

By leveraging specialized Neural Processing Units (NPUs), Sensory’s STT engine is built to eliminate the performance-draining data transfers between a CPU and an NPU. This architecture offloads the entire tensor computation graph to the hardware accelerator, which can significantly reduce power consumption and latency. By keeping the CPU idle during inference, device manufacturers can extend battery life in portables or reserve processing cycles for complex system tasks and UI management.

The engine is available in two optimized configurations:

  • 2.7 MB Domain-Specific Model: Optimized for large vocabulary “Command & Control” tasks, this model utilizes domain adaptation to maintain high accuracy in specific environments, such as automotive cabins. It features a peak SRAM usage of 787.11 KiB and operates at 892.9 Million MACs per inference.
  • 13 MB General-Purpose Model: A versatile model designed to handle natural language and large vocabularies without per-domain tuning. It fits within standard 2MB SRAM limits with a peak footprint of 1.68 MB, operating at 4.37 Billion MACs per inference.

 

Technical Specifications & Performance

Feature Domain-Specific Model General-Purpose Model
Ideal Use Case Targeted Commands 

Large vocabulary

Natural Language 

Unlimited vocabulary

Model Size 2.7 MB 13 MB
Peak SRAM Usage 787.11 KiB 1.68 MB
Compute Requirements 892.9 Million MACs/inference 4.37 Billion MACs/inference
Acceleration 100% NPU Mapping 100% NPU Mapping

 

Universal Compatibility: LiteRT Micro & High-Performance Silicon

Sensory’s STT engine is engineered for rapid portability across a broad ecosystem. By using LiteRT Micro (formerly known as Tensorflow Lite Micro) as the essential runtime layer, Sensory provides seamless integration for:

  • Arm® Ethos™ NPU Family: Native support for Ethos-U55, U65, and U85.
  • Cadence® Tensilica® HiFi DSPs: Full compatibility with the HiFi 4, HiFi 5, and HiFi iQ series.
  • Edge Platforms: Optimized for Arm Cortex-M (M4, M7, M55) and popular boards like Arduino Nano 33 BLE Sense, ESP32, and Sony Spresense.

Privacy and Performance

“Our STT engine demonstrates that natural language interfaces can be powerful without relying on the cloud,” said Todd Mozer, Chairman and CEO of Sensory. By processing 100% of voice data on-device, Sensory helps developers ensure user privacy, lower latency, and consistent reliability in environments with limited or no connectivity.

Why Embedded STT Matters

“Our latest STT engine proves that you don’t need a cloud connection or even a big embedded model for powerful, natural language interfaces,” said Todd Mozer, Chairman and CEO of Sensory. Sensory’s on-device approach offers several critical advantages over cloud solutions:

  • Privacy & Security: Voice data never leaves the device, ensuring total user privacy.
  • Low Latency: Instantaneous results without relying on internet connectivity or bandwidth.
  • Lower power/lower heat: Model efficiency and NPU usage reduces power substantially.
  • Cost Efficiency: Eliminates ongoing cloud processing fees and reduces data transmission costs.
  • Reliability: Guaranteed performance even in “comms-denied” environments or areas with poor cellular service.

About Sensory Inc. Sensory Inc. creates a safer and superior on device user experience through vision and voice technologies. Sensory’s technologies are widely deployed in consumer electronics applications including mobile phones, automotive, wearables, and smart home devices.

Media Contact: Amanda Defelice, Head of Marketing, Sensory Inc. [email protected]

Related News

Industry News
15th Apr, 2026
Sensory Announces the World’s Smallest and Most Powerful On-Device Speech-to-Text Engine
Todd MozerTodd Mozer
4 min read

Now Supporting TensorFlow Lite Micro and NPU Architectures SANTA CLARA, CA – April 15, 2026 – Sensory...

Industry News
27th Aug, 2025
The Voice AI Revolution: Key Moments That Shaped Today’s Conversational Technology
Todd MozerTodd Mozer
5 min read

Over the past half-century, voice AI has transformed from a niche laboratory experiment into a technology...

Industry News
30th Jan, 2025
Top 10 Reasons Innovative Companies Switch to Sensory
Todd MozerTodd Mozer
3 min read

In an era where data-conscious consumers are demanding more privacy, innovative companies are turning...