Sensory recently hosted a webinar that dove deep into the technology behind advanced siren detection systems and Sensory’s Emergency Vehicle Detection technology. Led by Sensory’s Andi Hagen, Director of Machine Learning, and Jeff Rogers, VP of Sales and Marketing, the webinar offered a technical, yet accessible, look at this technology.
Missed the live webinar? No problem! We’re recapping the key takeaways below. You can also watch the full webinar or download the slides here!
Andi kicked things off by highlighting the two pillars of an effective EVD system: technology and data. He stressed the importance of a diverse dataset to account for variations in siren sounds across different regions, environments, and distances. Sensory’s EVD solution is built on a foundation of extensive self-collected and web-scraped data.
Andi then gave an overview of Sensory’s unique two-tiered system: an optional DSP solution and an AP-level solution. The DSP component acts as a pre-filter, significantly reducing the audio workload on the head unit by filtering out approximately 99% of audio when no siren is detected.
The AP-level solution then takes a closer look, employing a two-stage process: a statistical first stage for quickly analyzing audio, followed by a deep net revalidation model for final verification of a siren event. Andi explained that the key to evaluating EVD accuracy lies in two critical metrics:
The goal is to find the sweet spot on the ROC curve, balancing these two metrics. Sensory is targeting an impressive one false alarm in 24 hours of driving time – which, as Andi pointed out, translates to potentially weeks of real-world driving without a single false alert.
Andi then presented performance data for Sensory’s models, comparing them to open-source alternatives like YamNet and PaSST. He emphasized that Sensory’s advantage comes from its laser-focus on siren detection and the sheer volume of its self-collected dataset. Further, Andi reminds us that real-world implementation goes beyond just raw accuracy, highlighting the importance of low latency, precise timing, and seamless deployment across various automotive chipsets.
So, what makes Sensory’s EVD models stand out?
The Q&A Session offered valuable insights into key considerations for EVD implementation. Here are some of the highlights:
A: The primary advantage lies in efficiency. The DSP filters out most of the sound, reducing the workload for the AP-level solution. At the AP level, a statistical first stage further reduces the workload for deep net revalidation, which is more computationally costly. Without these stages, the neural network would have to be constantly engaged, leading to higher costs. The statistical first stage filters out obvious non-siren sounds at a cheap cost, allowing us to focus on the tougher cases.
A: Yes, we have tested at negative SNR levels. Performance degrades slightly in terms of the false reject rate as noise becomes stronger. However, our models are trained to handle these scenarios, with noise sometimes overshadowing the siren.
A: No, it’s not always necessary. The embedded stage is beneficial if you are sensitive to power consumption on the head unit. It reduces MIPS on the head unit by filtering out most non-siren sounds. However, if MIPS isn’t a concern, you can run without the DSP.
A: We train under a wide variety of noises, including typical road noise, engine noise, human speech, and music. For music, we account for situations where music is playing from a smartphone or other sources.
A: Our goal is human-like performance. The system should detect a siren when an excellent hearing human can hear it.
A: The primary advantage is cost. Modern cars already have internal microphones that can be used for siren detection without adding additional hardware.
Sensory’s EVD technology is actively being designed into OEMs today. Want to learn how Sensory’s EVD can give your vehicles a crucial safety advantage? Download our one-page overview, or get in touch with an expert today to discuss your specific needs and requirements.