Your speakers are a USB keyboard now: inside the Katana ...

What happened

A security researcher publishing on blog.nns.ee under the handle Katana detailed a BadUSB variant that needs no attacker-side cable, no rogue cleaner with a thumb drive, and no insider plugging in a malicious dongle. The attack starts with sound: a crafted audio waveform, played through the victim's own speakers, is used to deliver a firmware update to a USB audio device that then re-enumerates itself as a Human Interface Device and starts typing.

The target class is broad. Plenty of consumer and prosumer USB audio gear — gaming headsets, conference speakerphones, capture interfaces, even some webcams with onboard mics — ship with DSPs that accept firmware updates over the same USB connection that carries the audio stream. Vendors typically use a manufacturer-defined control transfer or a vendor-specific endpoint guarded by little more than a magic number. The Katana write-up shows that for at least one widely shipped chipset family, that guard can be coaxed open via in-band audio: the DSP's mic-path firmware listens for an ultrasonic-ish trigger pattern, then enters a state where the host-side USB descriptor can be rewritten without a replug.

Once the descriptor flips, the operating system does what it has always done: trust the device. Windows, macOS, and Linux all auto-bind a new HID keyboard the moment one appears. From the user's perspective, nothing visibly changes. A small status LED might blink. Meanwhile the device emits keystrokes — `Win+R`, a PowerShell one-liner, a `curl | sh` — exactly the BadUSB playbook from 2014, delivered over a channel nobody hardens.

Why it matters

BadUSB has been a known-unsolved problem for a decade. The original Nohl/Lell talk at Black Hat 2014 made the point that USB's trust model is descriptor-based and descriptors are mutable, and the industry's response was roughly: don't plug in random USB sticks. Katana is interesting because it removes the 'plugging in' part of the threat model. The malicious peripheral is already on the desk, sold by a brand the user recognizes, and the trigger arrives through a webpage that autoplays audio, a Zoom call with a hostile participant, or a YouTube ad.

The air-gap implications are worse than they look. Security teams that ban USB drives still issue conference-room speakerphones, USB headsets for remote workers, and Yeti-class podcast mics for the marketing team. Every one of those devices is a candidate carrier if its DSP firmware update path is reachable from the audio stream — which is exactly the path the manufacturer designed in to enable in-field updates. The researcher notes that some affected devices accept the update without any user prompt, which means a single compromised ad network or a single rogue meeting participant can reach a fleet.

Community reaction on Hacker News (523 points within hours) split predictably. The first camp pointed out that this is, mechanically, just BadUSB with a novel delivery channel — the HID re-enumeration trick is old. The second camp pointed out that 'just BadUSB' is doing a lot of work in that sentence: the entire defensive posture against BadUSB assumes physical access, and Katana breaks that assumption cleanly. The interesting wrinkle is that USB-C audio adapters and DisplayPort-USB hubs sit in the same threat surface, because they too expose vendor-specific control endpoints that the OS happily forwards.

There's also a supply-chain story worth naming. Most of the DSP firmware in question is licensed from a handful of silicon vendors — Cirrus, Realtek, C-Media, a couple of Chinese fabless designers — and integrated by ODMs into devices sold under dozens of brands. A fix at the chipset level would propagate widely, but only on new shipments. Existing devices in the field are unlikely to receive a hardened firmware unless the brand has a real product-security function, which most peripheral vendors do not. Expect a long tail.

What this means for your stack

If you run a fleet, the short-term actions are unglamorous but real. On Linux, USBGuard with a default-deny policy that allowlists known vendor/product IDs and explicitly forbids HID-class descriptors on devices that initially enumerated as audio-class is the closest thing to a real mitigation. Windows shops can approximate this with Device Installation Restrictions GPO and a curated `AllowDeviceIDs` list, plus disabling 'Let Windows install drivers automatically' for HID-compliant keyboards. macOS is the weakest position here because the new-keyboard assistant is hard to disable cleanly outside MDM profiles that lock down `USBDeviceConfiguration`.

Procurement matters more than usual. Ask peripheral vendors whether their devices accept firmware updates over the audio control endpoint, whether those updates require a signed payload, and whether the device can re-enumerate as a different USB class without a physical replug. A device that can silently change its USB class is, definitionally, a BadUSB vector — the only question is who gets to trigger it. For high-assurance environments, the answer is to standardize on dumb peripherals: USB audio devices that are audio-class only, with no vendor-specific control interface and no in-field firmware update path. Such devices exist; they cost more; that is the trade.

For individual developers, the practical advice is narrower: be suspicious of headset and speakerphone vendors who advertise 'firmware updates' as a feature, mute your speakers when watching untrusted media on a machine that holds production credentials, and consider that the laptop in your hotel room is now reachable from the lobby TV.

Looking ahead

The long-arc problem is that USB's descriptor-based trust model is doing 2026 work with 1996 assumptions, and every new peripheral class — audio, video, storage, biometric — inherits the same composability that makes BadUSB possible. USB4 and Thunderbolt added IOMMU-backed DMA protection but did nothing about HID. The next plausible step is OS-level policy that treats class transitions as a privileged event requiring user confirmation, the same way browsers eventually treated autoplay. Until then, Katana is a reminder that the threat model for a 'speaker' has quietly expanded to include 'keyboard that takes voice commands from your enemies.'

Your speakers are a USB keyboard now: inside the Katana BadUSB attack

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Hacking your PC using your speaker without ever touching it

// community takes

Your speakers are a USB keyboard now: inside the Katana BadUSB attack

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Hacking your PC using your speaker without ever touching it

// community takes

// share this