Field Notes  ·   ·  Engineering  ·  13 min read

Latency Budget in the Detect-Track-Engage Loop

From first radar return to kinetic round dispatch, every millisecond has to be justified. How we allocate the latency budget across sensor processing, AI inference, fire control, and mechanics.

By Eli Doran

Abstract timeline visualization showing millisecond-level processing stages

The detect-track-engage (DTE) loop latency is the single most constrained parameter in kinetic counter-drone system design. Everything else — sensor sensitivity, classification accuracy, interceptor kinematics — is conditional on having enough time in the loop to do it. If the total loop time exceeds the available reaction window for a given threat geometry, the system can't engage effectively regardless of how good each individual component is.

The available reaction window is not a system parameter you set — it's determined by the threat. A fixed-wing drone approaching at 25 m/s toward a 300m standoff perimeter gives you about 12 seconds from the moment it enters detection range to the moment it crosses the protected boundary. A swarm of multi-rotor drones diverging from a pop-up launch at 800m gives you less. The latency budget isn't fixed; it's whatever the threat leaves you.

Breaking Down the DTE Loop

The DTE loop has five sequential stages, each with its own latency contribution. Understanding where time is spent is prerequisite to knowing where to optimize.

Stage 1: Sensor Data Acquisition and Preprocessing (Target: <50ms)

Raw sensor data — radar I/Q samples, EO pixel frames, IR frames — must be acquired, digitized, and preprocessed before any target processing can occur. For a pulsed Doppler radar operating at 1ms pulse repetition interval (PRI) with a 64-pulse coherent integration window, one complete range-Doppler map takes 64ms before any CFAR detection can run. That integration time is not negotiable — it's the physics of coherent signal processing — but the PRF and integration window can be tuned to trade sensitivity for latency.

EO cameras operating at 30 fps add up to 33ms of frame acquisition time. High-frame-rate cameras at 120-240 fps reduce this to 4-8ms but increase the data rate and processing load proportionally. IR sensors in the MWIR band (3-5 μm) and LWIR band (8-14 μm) typically operate at 30-60 Hz due to detector readout constraints.

In our architecture, we run the radar at 500μs PRI with a 32-pulse integration window, giving a 16ms integration time per range-Doppler map at the cost of reduced sensitivity against very-low-RCS targets. We accept the sensitivity trade because the detection-range requirement for our engagement geometry is 600-800m, where a Group 1 UAS at 0.005 m² RCS still produces adequate SNR at 16ms integration.

Stage 2: Target Detection and Track Initiation (Target: <30ms)

CFAR (Constant False Alarm Rate) detection processing runs on the range-Doppler map to identify candidate targets above the detection threshold while maintaining a bounded false alarm rate. For an X-band radar with a 1km × 1km coverage cell containing potential clutter from tree lines, structures, and birds, CFAR cell averaging needs to be tuned carefully — too narrow a reference window misses adjacent clutter gradients; too wide averages across clutter cells and raises the detection threshold.

Track initiation — the process of converting a detection hit into a confirmed track with estimated position, velocity, and covariance — requires multiple detections before the track is declared valid. The standard approach is M-of-N initiation: declare a track valid if M detections occur in N consecutive radar scans. Tighter M-of-N thresholds (e.g., 3-of-4) reduce false tracks but add latency equal to N radar scan cycles.

We use a 2-of-3 initiation criterion, trading some false track rate for 1-scan-period shorter initiation latency. The false tracks that slip through are handled by the EO/IR confirmation step, not by tightening the radar initiation criterion.

Stage 3: Sensor Fusion and Classification (Target: <80ms)

Once a radar track exists, the EO/IR sensor is cued to the track position. The cue handoff introduces a slew-and-settle delay: the EO/IR gimbal must slew to the track azimuth and elevation and achieve pointing stability before the image data is usable. For a fast-slew gimbal with 300°/s slew rate and a 0.1° pointing error requirement, settling a 45° slew takes approximately 200ms including overshoot damping — this is the single largest contributor to classification latency in most practical architectures.

We've addressed this by maintaining a predictive pointing mode: the gimbal continuously follows the radar track prediction ahead of actual slew requests, so that when the track initiation fires a cue command, the gimbal is already within 5-10° of the target bearing and the settle time drops to under 50ms.

Classification inference — running the target signature through an AI classifier to determine threat type and confidence — runs on a GPU-accelerated inference engine. For a convolutional neural network operating on a 64×64 pixel EO crop and a 16×16 IR crop, inference time on an NVIDIA Jetson AGX Orin is approximately 8-15ms per frame. The challenge is not single-frame inference latency but classification confidence accumulation: a single frame classification at 73% confidence doesn't trigger engagement; a 3-frame running-average at 82% does. That confidence accumulation adds approximately 3 frame periods (100ms at 30fps) to the classification stage.

Stage 4: Engagement Authorization (Target: <200ms)

Engagement authorization is the step where automated threat assessment produces an engagement recommendation and presents it for human authorization or executes under pre-authorized rules of engagement. This is not a purely technical step — it involves the human decision cycle, and human decision latency is the least predictable variable in the system.

In ARES-1's current architecture, three engagement modes exist: full-manual (operator authorizes each engagement, typical decision latency 1-3 seconds), supervised-autonomous (operator can veto within a 500ms window, otherwise system auto-engages when classification confidence exceeds threshold), and automated (system engages without operator action under pre-authorized ROE parameters). Automated mode requires pre-authorization by command authority and is intended only for scenarios where the reaction window is too short for human-in-the-loop decision.

The 200ms target for this stage assumes supervised-autonomous mode where no veto is issued. The 500ms veto window is a deliberate design choice: 500ms is sufficient for an attentive operator to issue a hold based on visual confirmation, but is short enough that it doesn't add unacceptable latency to the engagement timeline for a fast-approaching threat.

Stage 5: Fire Control and Mechanical Execution (Target: <80ms)

Fire control computations — lead angle calculation using proportional navigation, muzzle velocity correction for air density and crosswind, and intercept trajectory prediction — are computationally light relative to the classification step. On the dedicated fire control compute node, these calculations complete in under 2ms. The mechanically limiting factor is launcher traverse rate and elevation adjustment to align with the computed intercept bearing.

A direct-drive brushless launcher with 200°/s traverse rate and 90°/s elevation rate can complete a 30° bearing correction and 15° elevation correction in approximately 150ms. Against a fast-crossing target, the computed intercept point shifts during this traverse, requiring a final fire-control update just before discharge. The fire-control loop runs at 100Hz, so the final update is at worst 10ms stale.

Projectile discharge to departure completes in under 5ms for a gun-type kinetic system.

Total Loop Budget and Where It Goes

Stage Target (ms) Worst Case (ms)
Sensor acquisition & preprocessing 50 80
Detection & track initiation 30 60
Sensor fusion & classification 80 200
Engagement authorization 200 500
Fire control & mechanical 80 200
Total (target / worst case) 440 1040

440ms in the target case, 1,040ms worst case. Against a 12-second reaction window, either is viable. Against a 3-second reaction window — a fast threat at close range — worst-case latency creates a real risk of missing the engagement window.

Where We're Spending Engineering Effort

The classification stage is our current primary latency reduction focus. The gimbal slew-and-settle problem is mechanical; we've solved it in software through predictive pointing, but there's residual settle time that we're working to reduce through gimbal control algorithm refinement. The confidence accumulation latency is a machine learning architecture question: can we achieve 82%+ confidence in fewer frames through better feature extraction, without increasing false-positive rate?

We're also investing in look-ahead detection: using radar track prediction to estimate where a target will be in 2-3 seconds and pre-queuing the EO/IR sensor and fire control to that predicted position before the classification stage completes. This allows overlapping stages that the sequential budget above shows as sequential — effectively buying back 100-150ms of available reaction time by parallelizing rather than pipelining.

The engagement authorization stage is the one place where latency reduction has a hard floor set by operator decision requirements, not by engineering. We're not trying to remove the human from the loop for standard threat scenarios — we're trying to ensure that when the human needs to make a decision, the system has given them the best possible information in the shortest possible time. That's a display and UX design problem as much as a latency problem, and it's a part of ARES-1 development we don't spend enough time talking about publicly.

All Field Notes