AI That Listens, Sees, and Understands — On the Edge

Contact Us

TrulyNatural

Webinar Recap: Advanced Siren Detection – A Technical Deep Dive

2nd Mar, 2025

5 min read

About The Author

Todd Mozer

Todd Mozer

Founder & CEO

A serial entrepreneur with an IPO, an acquisition, 50+ patents, and a lifetime in audio-tech innovation. Todd has deep experience licensing and working with the largest tech companies in the world, including Amazon, Apple, Google, Microsoft, Samsung, and many others.

Webinar Recap: Advanced Siren Detection – A Technical Deep Dive

Sensory recently hosted a webinar that dove deep into the technology behind advanced siren detection systems and Sensory’s Emergency Vehicle Detection technology. Led by Sensory’s Andi Hagen, Director of Machine Learning, and Jeff Rogers, VP of Sales and Marketing, the webinar offered a technical, yet accessible, look at this technology.

Missed the live webinar? No problem! We’re recapping the key takeaways below. You can also watch the full webinar or download the slides here!

Deep Dive into EVD Systems

Andi kicked things off by highlighting the two pillars of an effective EVD system: technology and data. He stressed the importance of a diverse dataset to account for variations in siren sounds across different regions, environments, and distances. Sensory’s EVD solution is built on a foundation of extensive self-collected and web-scraped data.

Andi then gave an overview of Sensory’s unique two-tiered system: an optional DSP solution and an AP-level solution. The DSP component acts as a pre-filter, significantly reducing the audio workload on the head unit by filtering out approximately 99% of audio when no siren is detected.

The AP-level solution then takes a closer look, employing a two-stage process: a statistical first stage for quickly analyzing audio, followed by a deep net revalidation model for final verification of a siren event. Andi explained that the key to evaluating EVD accuracy lies in two critical metrics:

False Reject Rate (FRR): The rate at which the system misses genuine siren events.
False Alarm Rate (FAR): The rate at which the system incorrectly identifies a siren when there isn’t one.

The goal is to find the sweet spot on the ROC curve, balancing these two metrics. Sensory is targeting an impressive one false alarm in 24 hours of driving time – which, as Andi pointed out, translates to potentially weeks of real-world driving without a single false alert.

Andi then presented performance data for Sensory’s models, comparing them to open-source alternatives like YamNet and PaSST. He emphasized that Sensory’s advantage comes from its laser-focus on siren detection and the sheer volume of its self-collected dataset. Further, Andi reminds us that real-world implementation goes beyond just raw accuracy, highlighting the importance of low latency, precise timing, and seamless deployment across various automotive chipsets.

The Sensory Advantage

So, what makes Sensory’s EVD models stand out?

Speed: Reacts to sirens within a few hundred milliseconds.
Optimized for footprint & latency: Our tech maintains a sliding acoustic history of 1.5 seconds ensuring quick processing without sacrificing accuracy.
Platform versatility: Seamlessly integrates with a wide range of DSP and AP-level platforms, supporting various programming languages for easy development.
Accuracy: Our EVD models achieve an incredibly low FRR of just 1.6% on a 1.5-second window, ensuring that real siren events are detected with high reliability.
Real-world robustness: Comprehensive noise training across a vast library of sounds– road noise, engine hum, chatting passengers, and even music – ensures consistent performance in any driving environment.
Cost-Effective Adaptability: Works seamlessly with in-cabin microphones, leveraging existing hardware to reduce cost and simplify integration.

Q&A

The Q&A Session offered valuable insights into key considerations for EVD implementation. Here are some of the highlights:

Q: What is the biggest advantage of having multiple stages in your EVD system?

A: The primary advantage lies in efficiency. The DSP filters out most of the sound, reducing the workload for the AP-level solution. At the AP level, a statistical first stage further reduces the workload for deep net revalidation, which is more computationally costly. Without these stages, the neural network would have to be constantly engaged, leading to higher costs. The statistical first stage filters out obvious non-siren sounds at a cheap cost, allowing us to focus on the tougher cases.

Q: Have you tested the EVD system at negative SNR (Signal-to-Noise Ratio)?

A: Yes, we have tested at negative SNR levels. Performance degrades slightly in terms of the false reject rate as noise becomes stronger. However, our models are trained to handle these scenarios, with noise sometimes overshadowing the siren.

A: No, it’s not always necessary. The embedded stage is beneficial if you are sensitive to power consumption on the head unit. It reduces MIPS on the head unit by filtering out most non-siren sounds. However, if MIPS isn’t a concern, you can run without the DSP.

Q: What is the nature of the noise that you train the EVD system under?

A: We train under a wide variety of noises, including typical road noise, engine noise, human speech, and music. For music, we account for situations where music is playing from a smartphone or other sources.

Q: What is the goal in terms of accuracy that you are working towards?

A: Our goal is human-like performance. The system should detect a siren when an excellent hearing human can hear it.

Q: What are the advantages of in-cabin microphones?

A: The primary advantage is cost. Modern cars already have internal microphones that can be used for siren detection without adding additional hardware.

Get Started with Sensory EVD

Sensory’s EVD technology is actively being designed into OEMs today. Want to learn how Sensory’s EVD can give your vehicles a crucial safety advantage? Download our one-page overview, or get in touch with an expert today to discuss your specific needs and requirements.

Related Articles

Explore All Blogs

The Smart Squeeze: Hybrid LLMs with an On-Device NLU Edge

TrulyNatural

12th Jun, 2025

The Smart Squeeze: Hybrid LLMs with an On-Device NLU Edge

Todd Mozer

7 min read

Large Language Models (LLMs) are undeniably transformative, but their power often comes with a hefty...

When the Connection is Lost, Sensory’s Voice Commands Stay On

TrulyNatural

20th May, 2025

When the Connection is Lost, Sensory’s Voice Commands Stay On

Todd Mozer

3 min read

When you rely on voice assistants for everything from turning on the lights to managing your home security,...

Sensory Removes Barriers to Entry for Endpoint Voice Control (Arm Blueprint)

TrulyNatural

19th Apr, 2021

Sensory Removes Barriers to Entry for Endpoint Voice Control (Arm Blueprint)

Todd Mozer

1 min read

Sensory Inc. executive Joseph Murphy explores the obstacles in the path to full natural language control...