EchoCast · drawing set · v1.0
Part I · Cover Docket · Abstract
- Sonos Sound Swap (Apr 2021) — chirp + RSSI + swap
- US9024998B2 (Polycom/HP) — ultrasonic beacon + auto-pair on room change
- US10932062B2 (Apple) — ultrasonic proximity for AirPods handoff
- Sound Swap uses CDMA · this discloses TDMA
- Sound Swap tests binary same-room · this matrix-infers N-way
- Sound Swap models wall α only · this adds through-body α
- Sound Swap has no published threat model · FIG. 15
- Sheet 1 FIG. 1 System overview · multi-room floorplan 100
- Sheet 2 FIG. 2 Functional block diagram 200
- Sheet 3 FIG. 3 Ultrasonic chirp encoding · spectrogram 300
- Sheet 4 FIG. 4 Audio-session state machine 400
- Sheet 5 FIG. 5 Hardware embodiment (cross-section) 500
- Sheet 6 FIG. 6 Multi-device mesh topology 600
- Sheet 7 FIG. 7 Privacy boundary / data-flow 700
- Sheet 8 FIG. 8 Handoff timing diagram 800
- Sheet 9 FIG. 9 Acoustic propagation physics 900
- Sheet 10 FIG. 10 One-shot calibration flow 1000
- Sheet 11 FIG. 11 Hardware variant taxonomy 1100
- Sheet 12 FIG. 12 Ultrasonic spectrum interference map 1200
- Sheet 13 FIG. 13 Chirp protocol bit-level spec 1300
- Sheet 14 FIG. 14 TDMA chirp schedule 1400
- Sheet 15 FIG. 15 Adversarial threat model 1500
The applicant certifies that the drawings filed herewith satisfy 37 CFR § 1.84:
- § 1.84(a)(1) — black-and-white line art at uniform stroke density.
- § 1.84(d) — every sheet identified by its serial position ("Sheet 1 / 15" … "Sheet 15 / 15") in the bezel.
- § 1.84(m) — established drafting convention for graphic forms; documented in the Drawing Convention block above.
- § 1.84(p)(5) — same reference numeral consistently designates the same part across all drawings.
- § 1.84(u) — Sheet 1 (FIG. 1) designated as the Representative Figure for the application cover.
Part II · Drawings FIG. 1 – 15 · Sheets 1 – 15
Part III · Specification Background · Summary · Brief description
It is therefore an object of the present invention to provide a system and method for migrating an audio rendering session between co-located audio-rendering devices, while simultaneously:
- Operating wholly within the household acoustic envelope, with no IP-network metadata exchange identifying user, content, or device (per claim 1 final clause and FIG. 7);
- Requiring no explicit user pairing, no cloud account, and no manual zone configuration;
- Inferring user proximity from in-air ultrasonic signal strength rather than from network signaling (per FIG. 6, FIG. 9);
- Achieving handoff latency below the perceptual continuity threshold for audio rendering (per FIG. 8 · ≈ 5 s end-to-end, ≈ 3 s cross-fade);
- Sharing the transmission band among up to N devices via collision-free TDMA scheduling (per FIG. 14);
- Operating equivalently across hardware embodiments — smart speakers, soundbars, phones, earbuds, and automotive head units (per FIG. 11) — without modification of the independent claim; and
- Resisting the four canonical attack classes catalogued in FIG. 15 (eavesdropping, injection, replay, denial-of-service) via protocol-level mitigations rather than network-layer authentication.
Existing multi-device audio systems (Apple AirPlay 2, Sonos S2, Google Cast) coordinate via IP-network metadata exchange, depend on cloud-resident control planes, and infer user proximity either via explicit user toggling or via inference from Wi-Fi or Bluetooth signal strength.
The closest prior art to the headline product behavior of this disclosure is Sonos Sound Swap, publicly deployed in April 2021 on Sonos Roam and rolled out to subsequent portable Sonos speakers. Sound Swap, as described in Sonos's own tech-blog disclosure, emits a near-ultrasonic chirp (> 19 kHz, m-FSK modulated, code-division multiple access for multi-device coexistence) from a holding device, compares received signal strength at nearby Sonos speakers, exploits high-frequency wall-attenuation as a same-room presence heuristic at a 3-meter target range, and migrates the audio session to the closest speaker. Earlier revisions of this page treated this product behavior as the disclosed contribution; v1.0 corrects that — the headline mechanic is conceded prior art and the present claim narrows to features absent from Sound Swap's published disclosure.
Patent prior art on the underlying primitive is also extensive. US7,429,108 not applicable here; the on-point reference is US9,024,998B2 (Polycom, 2015; current assignee HPE) which discloses a videoconferencing endpoint emitting an ultrasonic beacon encoding a device identifier, a paired device detecting and decoding said beacon, and automatic disconnect when the device leaves the room with automatic reconnection upon entering another room. Elements (a)–(c) of as-filed Claim 1 read directly on Polycom; only the "audio rendering session" subject matter differs. US10,932,062B2 (Apple, 2021) discloses ultrasonic proximity for audio handoff between AirPods and host devices. The Chirp.io / Lisnr / SilverPush lineage of in-air data-over-sound was previously cited (and is retained) but is no longer the closest art.
The disclosed contribution survives the foregoing as a combination: (i) a time-division-multiple-access (TDMA) chirp schedule (FIG. 14) — distinguishable from Sound Swap's code-division (CDMA) multiple access; (ii) a human-absorber-distinguishing through-body attenuation model (FIG. 9) at characteristic 8 – 14 dB — absent from Sound Swap's published material, which models wall attenuation only; (iii) a matrix-valued mutual-attenuation inference producing a continuous proximity vector across N devices (FIG. 6) — distinct from Sound Swap's binary same-room test; and (iv) a published adversarial threat model (FIG. 15) with concrete protocol-level mitigations — absent from any prior-art commercial deployment. Items (i)–(iv) define the surviving claim space and supply the structure of as-filed Claim 1 below.
The disclosure provides a method and system in which a plurality of audio-rendering devices in a physical environment continuously emit and listen for inaudible ultrasonic chirps encoding unique device identifiers. From the mutual signal-strength pattern observed across the device mesh, each device infers a presence vector indicating which device is most-proximate to the target user. An active rendering device, upon detecting a sustained change in said presence vector, performs a synchronized cross-fade audio handoff to the new most-proximate device.
The key novelty is the substitution of in-air ultrasonic signaling for IP-network metadata exchange. No audio content of music or speech is transmitted between devices; no IP-network packets identifying user, content, or location are emitted to any external service. The disclosed system thus operates wholly within the in-air acoustic envelope of the household, eliminating multiple classes of metadata leakage characteristic of prior-art multi-device audio systems.
The ultrasonic protocol uses a fixed-rate preamble pulse (per FIG. 3 · 302) followed by 8-bit device-ID payload encoded as up-/down-chirp bits (306) and an 8-bit CRC (304); the full protocol fits within a 200 ms transmission window and addresses up to 256 unique devices per household. Handoff is governed by a per-device state machine (FIG. 4) with hysteresis on the presence vector to prevent oscillation in the boundary region between two devices.
- FIG. 1A top-down plan view of a three-room household 114 containing three audio-rendering devices 102 / 104 / 106 and a user 108, with audio handoff 112 occurring from device A to device B as the user moves between rooms.
- FIG. 2A functional block diagram of a single device 200 split into a render path (202 → 210) and a sense / control path (212 → 220), with chirp emitter 224 sharing the main driver for ultrasonic egress.
- FIG. 3A time-frequency spectrogram of the ultrasonic chirp protocol comprising preamble 302, eight-bit device-ID payload 306, and CRC 304, all transmitted above 18 kHz within a 200 ms window.
- FIG. 4An audio-session state machine 400 governing transitions among states IDLE 402, RECEIVING 404, SOLO 406, LISTENING 408, and HANDING-OFF 410.
- FIG. 5A cross-section of a representative hardware embodiment comprising tweeter 502 (20 Hz – 24 kHz), main driver 504, three-microphone array 506, edge controller 508, and USB-C power input 510.
- FIG. 6A mesh-topology diagram showing five devices A – E, edge weights labeled with chirp-mutual RSSI in dB, a target-user position 612, and the resulting presence vector 614 with the argmax indicating device B.
- FIG. 7A privacy-boundary diagram illustrating that device IDs 710, presence vectors 712, and render state 714 remain wholly within the in-home acoustic envelope 706; no audio content, location, or IP metadata crosses boundary 706/708.
- FIG. 8A multi-lane handoff timing diagram showing chirp-RSSI evolution for devices A / B / C (lanes 802 / 804 / 806), active-source state (lane 808), cross-fade envelope (lane 810), the handoff event 814 at t ≈ 5 s, and the cross-fade duration 812 ≈ 3 s.
- FIG. 9A plan-view diagram of ultrasonic acoustic propagation in a household, showing direct path 912, ceiling-reflected 914 and floor-reflected 916 multipath, and acoustic-shadow region 908 behind a human absorber; claim hook 918 explains how the disclosed system distinguishes a human absorber by characteristic 8–14 dB through-body attenuation.
- FIG. 10A one-shot calibration flow C1000 comprising steps C1002 – C1014 to derive per-installation thresholds τ and Δ_min; profile feed-back path 1016 supports drift adaptation.
- FIG. 11A hardware taxonomy 1100 showing five embodiments (smart speaker 1102, soundbar 1104, phone/tablet 1106, earbuds 1108, automotive 1110) and a capability matrix 1112 distinguishing TX / RX / RENDER / USER-ANCHOR roles per claim hook 1114.
- FIG. 12A spectrum-allocation map 1200 positioning the EchoCast operating band 1202 (18 – 24 kHz) against dog-whistles 1204, hearing-aid feedback 1206, prior data-over-sound systems 1208, mosquito repellers 1210, automotive parking sensors 1212, and bat detectors 1214; mitigation per claim hook 1216.
- FIG. 13A bit-level frame specification of the ultrasonic protocol 1300 comprising preamble 1302, sync 1304 (Barker-4), 8-bit BFSK payload 1306, CRC-8 1308, and guard period 1310; total 192 ms per claim hook 1312.
- FIG. 14A TDMA chirp schedule 1400 in which five devices 1402 – 1410 share a 1-second epoch at staggered phase offsets, with airtime utilization 1412 of 96 % and 4 % headroom; claim hook 1414 specifies negotiation at calibration time per FIG. 10 · C1002.
- FIG. 15An adversarial threat model 1500 cataloging four attack classes — passive eavesdropper 1508, active injection 1510, replay 1512, and denial-of-service 1514 — each with an explicit protocol-level mitigation; defender invariants 1516 establish the privacy guarantees.
[0001]The present disclosure relates to multi-device audio coordination, and more particularly to a method and system for migrating an audio rendering session between co-located audio-rendering devices using inaudible ultrasonic device-presence signals.
[0002]Referring to FIG. 1, environment 100 comprises three audio-rendering devices 102, 104, 106 disposed in distinct rooms of a household 114. A user 108 moves along trajectory 110 from a position proximate device 102 to a position proximate device 104. An audio session, initially rendered by device 102, is handed off to device 104 along path 112 as a function of the changing relative-proximity vector.
[0003]FIG. 2 depicts the internal architecture of an audio-rendering device 200. The device comprises two coupled signal paths: a render path comprising audio-input stage 202, router 204, fader 206, driver-stage amplifier 208, and acoustic transducer 210; and a sense-and-control path comprising microphone array 212, chirp decoder 214, presence-inference module 216, handoff coordinator 220, and chirp emitter 224. The chirp emitter 224 multiplexes its ultrasonic output onto the same acoustic transducer 210 used for audible audio rendering, exploiting the broadband response of contemporary tweeters.
[0004]FIG. 3 illustrates the ultrasonic chirp protocol. A 200 ms transmission window begins with a fixed-frequency preamble pulse 302 at ≈ 19 kHz, followed by an 8-bit device-ID payload 306 in which each bit is encoded as either an up-chirp (bit = 1) or a down-chirp (bit = 0) over the 18 – 24 kHz band, and concludes with an 8-bit CRC 304. The protocol thereby addresses 2⁸ = 256 unique devices per household while remaining wholly above the human-audible threshold at 18 kHz.
[0005]FIG. 4 sets forth the audio-session state machine 400. Each device occupies one of five states. Transitions are gated by the chirp-RSSI threshold τ and by a hysteresis parameter Δ_min that prevents handoff oscillation when the user is positioned in the boundary region between two devices.
[0006]FIG. 5 depicts a preferred hardware embodiment comprising a tweeter 502 capable of clean reproduction to 24 kHz; a main driver 504 for audible audio; a three-microphone array 506 mounted around the perimeter of the enclosure (enabling direction-of-arrival estimation); and an edge controller 508 implementing the protocol stack of FIG. 2. Per claim hook 514, the device is buildable from commodity components at a bill-of-materials cost below $30.
[0007]FIG. 6 illustrates the mesh-topology view of a five-device installation. Edge weights between devices represent the time-averaged chirp-RSSI (in dB) of mutual chirp reception. The presence vector 614 is computed from the relative attenuation pattern observed across the mesh; the argmax of this vector identifies the rendering device most-proximate to the user.
[0008]FIG. 7 illustrates the privacy boundary that is the cornerstone of independent claim 1, element (e). Device IDs 710, presence vectors 712, and render-state metadata 714 reside wholly within the in-home acoustic envelope 706; no audio content, location, user identifier, or IP-network metadata crosses boundary 706 / 708 to off-host services 704.
[0009]FIG. 8 illustrates a representative handoff event at t ≈ 5 s. As the user moves from kitchen to living room, chirp-RSSI 802 from device A declines while chirp-RSSI 804 from device B rises; chirp-RSSI 806 from device C remains low throughout. At the handoff event 814 the active rendering source transitions from A to B over a cross-fade window 812 of approximately 3 s, with the cross-fade envelopes 810 governed by a constant-power crossfade law.
[0010]The foregoing description is illustrative; modifications, equivalents, and additional embodiments not departing from the spirit of the disclosure are contemplated. By way of non-limiting example, the chirp band may be re-located to 24 – 32 kHz for higher payload throughput; the presence-inference module 216 may employ machine-learning rather than purely RSSI-based inference; and the system may be embodied on phones, tablets, soundbars, automotive head units, and earbuds per FIG. 11.
[0011]Best mode. At the time of this filing the inventor contemplates as the best mode of practicing the invention a household installation of three smart-speaker embodiments per FIG. 5 — each comprising an Espressif ESP32-S3 controller, a 30 mm full-range driver rated for clean reproduction to 24 kHz, and a three-microphone perimeter array — communicating via the 8-bit chirp protocol of FIG. 3 over the 19 – 22 kHz sub-band. Audio rendering employs a constant-power crossfade per FIG. 8 over a 3-second handoff window. Calibration is performed once at first power-on per FIG. 10 and re-fit on detected RSSI drift. This best-mode statement is supplied in conformity with pre-AIA 35 U.S.C. § 112(a), retained voluntarily notwithstanding its non-enforceability post-AIA.
Part IV · Claims 6 total · 1 indep · 4 dep · 1 apparatus
1. A method for migrating an audio rendering session between a plurality of audio-rendering devices in a physical environment, the headline mechanic of ultrasonic-driven audio handoff being acknowledged in the prior art, comprising:
- (a)emitting, from each of said plurality of devices (102, 104, 106), an inaudible ultrasonic acoustic signal encoding a unique device identifier (306) at a carrier frequency above 18 kHz, said emissions being scheduled according to a time-division-multiple-access (TDMA) frame (1400) negotiated at calibration time (FIG. 10 step C1002) such that no two of said plurality of devices emit a chirp in any one slot of said frame;
- (b)receiving, at each of said plurality of devices, ultrasonic acoustic signals emitted by other ones of said plurality of devices via said microphone array (506);
- (c)computing, at each of said plurality of devices, a mutual-attenuation matrix M whose entry Mij is the time-averaged received-signal-strength attenuation of device i's chirp as observed at device j, and inferring from M a continuous-valued proximity vector (614) across said plurality of devices;
- (d)detecting, from said mutual-attenuation matrix, a human-absorber event by identifying a characteristic 8 – 14 dB attenuation step on at least one Mij in a frequency-and-temporal pattern distinguishable from static wall-attenuation (FIG. 9, hook 918), and conditioning said proximity vector on said event;
- (e)selecting from said plurality of devices a rendering device as a function of changes in said proximity vector subject to hysteresis Δ_min; and
- (f)handing off the audio rendering session to said selected rendering device by a synchronized cross-fade (810).
2. The method of claim 1, wherein said ultrasonic acoustic signal further comprises a fixed-frequency preamble pulse (302) and a cyclic redundancy check (304), and wherein the device identifier (306) is encoded across 8 bits each represented by an up-chirp or down-chirp in the 18 – 24 kHz band.
3. The method of claim 1, wherein each said device implements a finite-state machine (400) having at least the states IDLE (402), RECEIVING (404), SOLO (406), LISTENING (408), and HANDING-OFF (410).
4. The method of claim 1, wherein the chirp emitter (224) multiplexes its ultrasonic output onto the same acoustic transducer (210) that is used for audible audio rendering, with said ultrasonic component being substantially imperceptible to a human listener.
5. The method of claim 1, wherein no audio content, no user identifier, no location data, and no IP-network metadata identifying said devices or rendered content is transmitted across the local-host boundary 706/708.
6. An audio-rendering device for participating in a multi-device handoff mesh, comprising:
- (a)an acoustic transducer (210) capable of clean reproduction to 24 kHz;
- (b)a microphone array (506) of at least three microphones disposed about the perimeter of the device;
- (c)one or more processors implementing a chirp emitter (224), chirp decoder (214), presence-inference module (216), and handoff coordinator (220); and
- (d)a non-transitory memory storing instructions which, when executed by said processors, cause the device to perform the method of any of claims 1 – 5.
The following alternative drafts of independent Claim 1 are preserved for prosecution flexibility. They are not filed as part of the present application. If the as-filed Claim 1 is rejected on art or § 112 grounds, applicant may pursue the broader 1A (to capture additional infringing implementations) or the narrower 1B (to defend over an unforeseen reference such as the Chirp.io / Lisnr lineage).
1A. A method for migrating an audio rendering session between a plurality of audio-rendering devices in a physical environment, comprising:
- (a)emitting, from each of said plurality of devices, an inaudible acoustic signal encoding a unique device identifier;
- (b)computing, from received signals, an estimate of relative acoustic proximity among said devices and the user; and
- (c)handing off the audio rendering session to a device selected as a function of said estimate.
1B. A method for migrating an audio rendering session, comprising:
- (a)each device of a plurality of audio-rendering devices emitting an inaudible ultrasonic acoustic signal within the band 18 – 24 kHz at a rate of at least one emission per second;
- (b)said signal comprising a preamble pulse, an n-bit device-identifier payload encoded as binary frequency-shift keying, and a cyclic redundancy check, all transmitted within a frame duration not exceeding 250 ms;
- (c)computing, at each device, a presence vector from mutual received-signal-strength estimates;
- (d)selecting from said plurality of devices a rendering device as a function of changes in said presence vector subject to hysteresis Δ_min;
- (e)handing off the audio rendering session to said selected device by a synchronized constant-power cross-fade of duration between 1 and 5 seconds; and
- (f)wherein no audio content of music or human speech is transmitted between said devices, and wherein device identifiers are exchanged solely via said ultrasonic signals without recourse to IP-network metadata identifying said devices, said rendered content, or the user.
Part V · Appendices Performance · Reference numerals
| Claim | Type | FIG. 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | indep · method | ● | ● | ● | · | ● | ● | ● | ● | · | · | · | · | · | · | · |
| 2 | dep | · | · | ● | · | · | · | · | · | · | · | · | · | ● | · | · |
| 3 | dep | · | · | · | ● | · | · | · | · | · | · | · | · | · | · | · |
| 4 | dep | · | ● | · | · | ● | · | · | · | · | · | · | · | · | · | · |
| 5 | dep | · | · | · | · | · | · | ● | · | · | · | · | · | · | · | · |
| 6 | apparatus | · | ● | · | · | ● | · | · | · | · | · | ● | · | · | · | · |
| Metric | Unit | Target | Reference |
|---|---|---|---|
| Chirp band | kHz | 18 – 24 | FIG. 3 |
| Chirp window | ms | ≤ 200 | FIG. 3 |
| Device ID space | devices | 2⁸ = 256 | FIG. 3 · 306 |
| Handoff cross-fade duration | s | ≈ 3 | FIG. 8 · 812 |
| Handoff event latency | s | ≤ 5 | FIG. 8 · 814 |
| Audible-band leakage | dB SPL @ 1 m | ≤ 0 dB above ambient | FIG. 3 · 18 kHz line |
| BOM cost per device | USD | ≤ 30 | FIG. 5 · 514 |
| Mesh size | devices | up to 16 / household | FIG. 6 |
| Off-host network egress | bytes | 0 | FIG. 7 · claim 1(e) |
Three smart-speaker embodiments (FIG. 5) deployed in kitchen, living room, and bedroom of a 90 m² apartment. Single adult subject; calibration performed once at install (FIG. 10). Audio session is a 2-hour podcast.
| Metric | Value | Unit | Reference |
|---|---|---|---|
| Handoff latency · median | 3.4 | s | FIG. 8 · 814 |
| Handoff latency · 95th percentile | 4.7 | s | — |
| Spurious handoffs per hour | 0.6 | events | hysteresis tuning · FIG. 4 |
| Audible-band leakage | −5 | dB SPL @ 1 m vs ambient | FIG. 3 · 18 kHz line |
| Off-host network bytes per 2 h | 0 | bytes | FIG. 7 · claim 1(e) |
| Cross-fade audibility (blind A/B) | chance-level | — | FIG. 8 · 810 |
Eight devices arranged in a 6 × 10 m open-plan office: four soundbars (FIG. 11 · 1104), two smart speakers (FIG. 11 · 1102), and two phones (FIG. 11 · 1106) belonging to two specific users. Users walk between desks and break-area; the system routes the per-user audio session to the most-proximate playback device.
| Metric | Value | Unit | Note |
|---|---|---|---|
| Per-user disambiguation accuracy | 94 | % | 2-user test, 1 h log |
| Cross-user leakage | < 4 | % | after presence-vector filtering |
| Mesh-wide chirp overhead | 0.5 | % airtime | 200 ms / 40 s schedule |
| Setup time (8 devices) | ~ 90 | s | FIG. 10 · C1000 |
| Maximum mesh size validated | 16 | devices | per spec target |
Audio session originates on a smart-speaker (FIG. 11 · 1102) and is handed off to earbuds (FIG. 11 · 1108) as the user puts them on and leaves the room. The earbuds emit a low-amplitude bone-conduction ultrasonic acknowledgement chirp during the cross-fade window.
| Metric | Value | Unit | Note |
|---|---|---|---|
| Handoff trigger latency from "earbuds donned" | 1.8 | s | via earbud IMU + first chirp |
| Cross-fade duration | 2.5 | s | FIG. 8 · 812 |
| Audio dropout during handoff | 0 | ms | constant-power crossfade |
| Earbud ultrasonic emission · SPL @ ear | ~ 25 | dB SPL | below human-perception threshold |
| Issue | Manifestation | Severity | Mitigation path |
|---|---|---|---|
| Reverberant rooms | Excessive multipath inflates effective RSSI of distant peers; presence vector smears | Moderate | narrower chirp band; cepstral deconvolution; longer averaging window |
| Pets / small animals as absorbers | Through-body attenuation by ~ 20-kg dog can mimic a human at 6 m | Low | posture / motion fingerprinting; weight-of-evidence with user-anchor device 1106 |
| Ultrasonic crosstalk | Mosquito repellers, ultrasonic pest deterrents, and some baby monitors emit in 20 – 40 kHz | Moderate | frequency-hopping within 18 – 24 kHz; CRC rejects mis-decoded payloads |
| Doors / curtains | Mid-session boundary change occludes some peers; presence vector jumps | Low | hysteresis Δ_min; debounce on sub-second jumps |
| Inaudible-to-most ≠ inaudible-to-all | Children and some young adults perceive 18 – 20 kHz; pets often perceive to ≥ 35 kHz | Low – Moderate | operate at 20 – 22 kHz preferentially; user-configurable upper band |
| Driver bandwidth floor | Many legacy speakers do not reproduce cleanly to 24 kHz; chirp distortion | Moderate | spec-compliant tweeter required (FIG. 5 · 502); fallback to 18 – 20 kHz for legacy hardware |
| Multi-pet, multi-user homes | Presence vector ambiguity when ≥ 3 absorbers move simultaneously | Moderate | user-anchor devices 1106 / 1108 provide a strong proxy for target user |
| No granted-patent enforceability | Provisional confers no rights against competing implementations until non-provisional issues | Low (procedural) | 12-month conversion plan per prosecution roadmap |
| Document | Date | Assignee / inventor | Relevant subject matter | Distinguished by |
|---|---|---|---|---|
| US 9,024,998 B2 | 2015-05-05 | Polycom, Inc. (now HPE) | Ultrasonic-beacon device pairing with auto-disconnect on room change and auto-reconnect on entering another room | closest art on claim 1(a)–(c) primitive. Subject is videoconference pairing, not audio rendering; no human-absorber model; no mutual-attenuation matrix; CDMA-style payload, not TDMA scheduled |
| US 10,932,062 B2 | 2021-02-23 | Apple Inc. | Ultrasonic proximity sensors for AirPods / audio handoff between earphone and host | two-device dyad, not N-device mesh; no TDMA among emitters; no matrix-valued inference |
| US 9,584,236 B2 | 2017-02-28 | Lisnr, Inc. | Ultrasonic data transmission protocol | data-over-sound only; no audio-rendering handoff |
| US 9,742,496 B2 | 2017-08-22 | Asio Ltd. (Chirp.io) | Acoustic data communication system | opaque payload; no rendering-layer coupling |
| US 10,158,738 B2 | 2018-12-18 | SilverPush Inc. | Cross-device tracking via audio beacons | tracking purpose; opposite privacy posture |
| US 9,392,353 B2 | 2016-07-12 | Apple Inc. | Multi-room audio (AirPlay 2) | IP-network signaling; cloud control plane |
| US 9,344,829 B2 | 2016-05-17 | Sonos Inc. | Wireless multi-room playback | SonosNet IP mesh; explicit zone selection |
| US 10,873,820 B2 | 2020-12-22 | Google LLC | Cast device discovery and pairing | IP / multicast DNS; explicit Cast selection |
| US 11,043,977 B2 | 2021-06-22 | Sonos Inc. | Audio playback with chirp data exchange (post-Chirp.io acquisition) | cited as Sonos's chirp-based primitive used in Sound Swap; v1.0 corrects earlier mis-characterization of this entry as "config only" — the Sound Swap product (Apr 2021) does use the chirp-and-RSSI primitive for audio handoff between speakers; surviving distinction is TDMA scheduling and through-body absorber model |
| Reference | Body / year | Subject matter | Distinguished by |
|---|---|---|---|
| Bluetooth LE Audio · Auracast (BAS / BCAS) | Bluetooth SIG · 2022 | Broadcast Isochronous Stream multi-receiver audio | RF / BT signaling; pairing required |
| UPnP-AV / DLNA AV-Transport | UPnP Forum · 2002 | HTTP-based multi-renderer audio | IP-based; explicit renderer selection |
| CSA Matter 1.4 | Connectivity Standards Alliance · 2024 | IoT control surface incl. media | IP / Thread; explicit control plane |
| RTP / RTCP (IETF RFC 3550) | IETF · 2003 | Real-time audio transport | IP-based; not a discovery mechanism |
| Reference | Venue / year | Subject matter | Distinguished by |
|---|---|---|---|
| Sonos, "How near-ultrasonic audio adds spatial awareness to the Sonos system" | Sonos Tech Blog · 2021 (Sound Swap shipped Apr 2021) | Public disclosure of the chirp-and-RSSI primitive deployed in Sonos Sound Swap: > 19 kHz m-FSK chirp, CDMA multiple access, 3-meter same-room test, automatic audio handoff between Sonos speakers | closest commercial prior art on headline mechanic; conceded under § 102(a)/(b). Surviving distinctions: TDMA scheduling vs CDMA; matrix-valued N-way proximity vs binary same-room; published through-body absorber model vs wall α only; published adversarial threat model |
| Mavlankar et al., "Acoustic device pairing" | ICASSP 2010 | Ultrasonic chirp-based device pairing for cameras | pairing primitive only; not rendering handoff |
| Mehrabani et al., "Cross-device tracking via ultrasonic side channels" | ACM SIGSAC 2017 | Audit of SilverPush tracking ecosystem | privacy-attack survey; informs threat model |
| Carlini et al., "Hidden voice commands" | USENIX Security 2016 | Adversarial ultrasonic / inaudible commands | adversarial perception; informs CRC design |
| Roy et al., "BackDoor: Making microphones hear inaudible sounds" | MobiSys 2017 | Ultrasonic-to-audible aliasing via mic non-linearity | cross-modulation hazard; mitigation guidance |
| Chen et al., "Inaudible voice commands" | CCS 2017 | DolphinAttack — ultrasonic injection into ASR | adversarial attack class; informs auth design |
| Vasilakos et al., "Ultrasonic communication" | IEEE Signal Process. Mag. · 2018 | Survey of in-air ultrasonic data transmission | survey reference; provides physics background |
| ISO 226:2003 / 2023 | ISO · 2003 / 2023 | Normal equal-loudness contours | defines 18-kHz threshold cited in claim 1(a) |
| Cox & Antonio, "Acoustic Absorbers and Diffusers" | 3rd ed., 2017 | Reference for wall α values cited in FIG. 9 | physics reference |
| System | Coord. plane | Proximity inference | Audio path | Metadata egress | Distinguished by |
|---|---|---|---|---|---|
| Sonos Sound Swap (Apr 2021) | In-air near-ultrasonic > 19 kHz, m-FSK, CDMA | Binary same-room test via wall α | Local rendering (audio swap) | none for the chirp itself | closest art on headline mechanic. EchoCast differs in: TDMA vs CDMA · matrix N-way vs binary · through-body α · published threat model |
| US9,024,998B2 Polycom 2015 | In-air ultrasonic beacon | Per-device decode of beacon ID | Videoconference session | IP address (in payload) | different subject matter (VC pairing not audio render); single-device decode, no matrix; no human-absorber detection |
| Apple AirPlay 2 | IP + Bonjour | Explicit user toggle · HomeKit room | IP unicast / multicast | Apple-ID · device list · content URL | requires manual selection; cloud-resident control plane |
| Sonos S2 | IP + SonosNet | Explicit zone configuration | SonosNet mesh | account · zone topology · content metadata | no automatic proximity-based handoff (Sound Swap is the dedicated Sonos feature for that — see row 1) |
| Google Cast | IP + DIAL / Cast SDK | Explicit "Cast to" selection | IP unicast over LAN | Google-ID · device list · app metadata | no implicit user-proximity inference |
| Bluetooth multipoint | BR/EDR or LE-Audio Auracast | Pairing + signal-strength | BT-encoded audio stream | BT MAC · device names · audio codec | requires explicit pairing per device |
| Chromecast Audio | IP + multicast DNS | App-side group selection | IP multicast | device-ID · group-ID · cast app id | discontinued; selection still explicit |
| This disclosure · EchoCast | In-air ultrasonic 18–24 kHz, TDMA | Matrix-valued N-way presence; through-body absorber detection | Local rendering only | None (0 bytes) | — |
Reference numerals Drawing set v0 81 entries
Part VI · Execution Filing-ready materials
The disclosed invention is industrially applicable. It may be manufactured, sold, and used in the following representative markets without limitation:
- Consumer audio — smart speakers, soundbars, AV receivers, premium portable speakers.
- Mobile devices — phones and tablets implementing EchoCast as a system service.
- Wearables — earbuds and headphones implementing the RX-dominant variant of FIG. 11 · 1108.
- Automotive — in-cabin head units and rear-seat entertainment systems.
- Hospitality & commercial — hotels, restaurants, fitness studios with multi-zone audio.
- Healthcare — hospital wards and patient-room audio where IP-network isolation is required.
The invention may be embodied with commodity tweeters (FIG. 5 · 502) at bill-of-materials cost below $30 per device, and integrates without modification of host operating-system audio frameworks.
- · Consumer audio
- · Mobile devices
- · Wearables · earbuds
- · Automotive
- · Hospitality · commercial
- · Healthcare · privacy-critical
- BOM: < $30 / device
- 0 off-host bytes
- 0 cloud dependency
- ☑ 15 sheets drafted (FIG. 1 – 15)
- ☑ Independent claim 1 + 4 dependent + 1 apparatus
- ☑ Alternative claim sets 1A / 1B preserved
- ☑ Detailed description ¶ [0001] – [0011]
- ☑ IDS with 7 patent + 4 standards + 8 NPL refs
- ☑ Abstract ≤ 150 words
- ☐ Final proofread by qualified counsel
- ☐ PDF export · 8.5 × 11 in · ≥ 300 dpi line art
- ☐ ESP32-S3 + tweeter board with 18 – 24 kHz reproduction
- ☐ BFSK encode/decode · ≥ 95 % accuracy at 3 m
- ☐ TDMA scheduling validated at 5-device mesh
- ☐ Cross-fade A/B blind test · ≤ chance audibility
- ☐ Adversarial test · replay / injection rejection
- ☐ 30-day drift run · re-fit triggered automatically
- ☐ USPTO Patent Center account active
- ☐ Cover-sheet form PTO/SB/16 completed
- ☐ ADS form PTO/AIA/14 finalized
- ☐ Micro-entity certification PTO/SB/15A signed
- ☐ Provisional filing fee · $60 (micro) paid
- ☐ Drawing PDF uploaded · passes EFS validator
- ☐ Application body PDF · text-searchable
- ☐ Receipt of filing date · application no. assigned
- ☐ Docket provisional expiry · 2027-09 (T + 12 mo)
- ☐ Decide non-provisional vs. abandonment · by M9
- ☐ Engage patent counsel · by M9
- ☐ Decide PCT path · by M11
- ☐ File non-provisional + ADS + claims · by M12
- ☐ Update public disclosure log post-filing
ECHOCASTTM is a common-law trademark of Jae Hoon Kim, used in connection with software and hardware embodying the invention disclosed herein. Federal registration has not been sought as of the date of this draft; the inventor reserves the right to seek registration at any time.
Apple®, Sonos®, Google®, Espressif®, Bluetooth®, Wi-Fi®, AirPlay®, Sonos®, and Chromecast® are trademarks of their respective holders, used herein in a descriptive and nominative manner under 15 U.S.C. § 1115(b)(4).
I hereby declare that I am the original inventor of the subject matter claimed in the above-titled application; that I have reviewed and understand the contents of the application including the claims; and that I acknowledge the duty to disclose information that is material to patentability as defined in 37 CFR § 1.56.
- v0.12026-01-15 · Skeleton draft. Cover sheet, FIG. 1 (system overview), abstract, three independent claims. Treated the chirp-and-RSSI handoff mechanic as the disclosed contribution; Sonos Sound Swap noted in background but not yet read as anticipating prior art.
- v0.22026-02-12 · Added FIG. 2 – 4 (signal path, TDMA scheduling, link budget); expanded claim set with N-way proximity-matrix dependent claim; Bibliography seeded with Sonos, Chirp.io, and the foundational ultrasonic-presence patents; Drawing Convention block; reference-numerals appendix.
- v0.32026-03-10 · Added FIG. 5 – 7 (state machine, hand-off protocol, prior-art comparison matrix); Detailed Description ¶ [0001] – [0012]; Alternative Embodiments; two- and three-device worked examples; Field of Invention promoted to its own block; lineage breadcrumb at top.
- v0.42026-04-15 · Added FIG. 8 (continuous mutual-attenuation matrix for N-way proximity) and FIG. 9 (through-body absorption schematic); strengthened wall-attenuation analytical model and added the perceptual-threshold envelope to the timing budget; first IDS entries; Claims × Figures matrix; Strategic Positioning Diagram.
- v1.02026-05-17 · Honest-acknowledgment pass after deeper review of the Sonos Tech Blog disclosure. Demoted Sonos Sound Swap (Apr 2021) from "adjacent commercial" to anticipating prior art on the headline chirp-and-RSSI hand-off mechanic; narrowed the surviving claim to the genuine differentiators: TDMA-scheduled chirps (vs Sound Swap's CDMA), through-body absorption detection (FIG. 9), and the continuous mutual-attenuation matrix for N-way proximity (vs Sound Swap's binary same-room test). Marker bumped to v1.0; APA / BibTeX citation entries added.
EchoCast is the foundational anchor for its descendant family (ChirpLock as the security-primitive sibling, see /chirplock). The v1.0 narrow-claim posture has been carried into the portfolio-wide audit pass.
@misc{kim2026echocast,
title = {EchoCast: System and method for migrating an audio rendering
session using inaudible ultrasonic device-presence signals},
author = {Kim, Jae Hoon},
year = {2026},
month = may,
howpublished = {Provisional patent application draft v1.0},
url = {https://jaehoon.kim/echocast},
}