How long does it take to enroll a custom sound?

About 8 seconds of the target sound, or 4 or more repetitions, is sufficient to enroll a custom sound.

Sensory SoundID | On-Device Sound Detection & Classification AI

What Is Sensory SoundID?

Sensory SoundID detects, classifies, and interprets environmental sounds, from alarms and baby cries to mechanical alerts and sirens, with precise on-device accuracy. It runs on a wide range of embedded hardware and is trained on diverse real-world datasets to maintain reliability in noise-heavy environments. SoundID enables smarter products that can “hear” what’s happening around them and respond in real time, all without sending audio to the cloud. It powers solutions across automotive emergency vehicle detection, smart home safety, accessibility tools, industrial IoT, and more.

Request a Demo

Core Capabilities

Designed for real-world sound detection across noisy, unpredictable environments.

Low Power, High Accuracy

Multi-stage detection with deep revalidation. Combines fast low power first-stage detection with an optional neural second stage for pinpoint accuracy without heavy compute cost.

Noise-Robust Performance

Built for real-world audio. Uses Sensory’s proprietary noise mitigation pipeline to perform reliably in cars, homes, factories, or public spaces.

Extensive Sound Library

16 ready-to-use sounds. Includes alarms, sirens, knocking, glass break, baby cry, coughing, snoring, doorbells, and more, organized into simple sound packs.

User-Enrolled Sounds

Add custom sound events. Users can enroll new sounds with just ~8 seconds of target audio, ideal for unique or device-specific triggers.

Multi-Platform Flexibility

Runs almost anywhere. Supports Linux, Android, Windows, embedded ARM cores, and low-powered chips with no cloud dependency.

The Sensory SoundID Verification Flow

Designed to detect key sound events instantly while running fully on-device.

Step-by-Step Process

Step 1: Continuous Listening
The device monitors audio input locally without storing or sending it anywhere.
Step 2: First-Stage Detection
A lightweight model detects potential sound events quickly and efficiently.
Step 3: Neural Revalidation (Optional)
A deeper neural layer confirms the event for improved accuracy and reduced false alarms.
Step 4: Classification
The sound is classified into one of the built-in categories or matched against enrolled custom sounds.
Step 5: Trigger & Response
The device takes action, alerting the user, adjusting behavior, or communicating with other systems.

Good to Know

Because everything runs locally, SoundID remains fast, private, and reliable, even offline or in poor connectivity.

Sound ID Accuracy – Sensory Small & Medium

Trusted by
Global Innovators

See how leading brands use Sensory’s on-device AI to deliver faster, safer, and more intuitive user experiences at scale.

Andrew Doyle

VP for Frontline Workers

“Sensory’s technology has exceeded our high standards for accuracy, speed, and efficiency. By enabling hands-free control of key functions through voice commands, we’re boosting productivity and streamlining workflows for retail staff. This allows our frontline workers to focus on what matters most – delivering exceptional customer service.”

Ephrem Chemaly

General Manager & VP of the Automotive Business Unit

“By combining MediaTek’s expertise in generative AI technology with Sensory’s strengths in on-device voice AI, the collective efforts of our companies enable significant strides in providing next-level entertainment and security in vehicles powered by MediaTek Dimensity Auto.”

Cynthia Lee

Lead Product Manager

“Zoom is passionate about making collaboration easier, but we always put our customer’s privacy and security front and center. Sensory’s technology checked all the boxes for us: accurate, fast and private…”

Michael Anderson

CEO

“Sensory’s TrulyHandsfree technology is a key component in making the Piqo not just compact and powerful, but also incredibly user-friendly. This partnership enhances our ability to provide unmatched value and safety to our customers.”

Sascha Prueter

Chief Product Officer

“The smartest TV ever deserves the smartest approach to privacy. With Telly’s use of Sensory’s on-device speech-to-text and voice technologies, we are able to bring extremely fast, low-latency voice commands to the living room.”

Works Across Industries

Trained for real-world environments across vehicles, homes, hospitals, manufacturing floors and more.

Automotive & Transportation

Hands-free control and sound awareness for safer, smarter drives.

Consumer Electronics & Smart Home

Voice and sound AI that powers smarter homes, stores, and screens.

Mobile, PCs & Personal Device

Fast, private, and reliable voice AI on the go.

Healthcare & Medical Devices

Accessible, hands-free AI built for precision and privacy.

Retail & POS Systems

Embedded voice AI for faster, touch-free, and more personal in-store experiences.

Frequently Asked Questions

Everything You Need to Know

It supports 16 pre-trained sounds, including alarms, sirens, baby cry, coughing, glass break, door knocks, and more. Custom sounds can also be enrolled.

No. All detection and classification happen fully on-device, keeping audio private.

Yes, SoundID uses Sensory’s noise-robust signal processing and performs well in cars, factories, busy homes, and outdoor environments.

About 8 seconds of the target sound or 4+ repetitions.

Linux, Android, Windows, embedded ARM, and low-powered chips. It can run on both OS-based systems and bare-metal environments.

Wake Words

Speech-to-Text & Commands

Language Models & Grammars

Sound Identification

Biometrics

VoiceHub

Stick to the Heavy Lifting: Build the Best Cloud AI with Sensory Providing the Edge

Webinar Recap: “Hey Car, What’s Next?”

Voices from the Vault: 30+ Years of Sensory’s Most Exciting Voice Tech Adventures

10 predictions for Edge AI in 2026: LLMs gain Efficiency

Sensory SoundID

What Is Sensory SoundID?