Field Notes  ·   ·  AI / Sensing  ·  11 min read

Why We Fuse EO/IR and Radar: The Confidence Problem in Threat Classification

Single-sensor classification of low-observable drones carries false-positive risk that no military operator can accept. How multi-modal fusion changes the confidence calculus.

By Eli Doran

Multi-spectrum sensor visualization showing overlaid radar and infrared detection layers

The false-positive problem in counter-drone classification is not academic. A kinetic engagement initiated on a misidentified target — a large bird, a hobbyist racing drone during a nearby event, a medical supply drone operating legally — has consequences that range from embarrassing to catastrophic depending on the context. For a military operator with ROE accountability, a false kinetic engagement is a potential rules-of-engagement violation. For a private facility operator, it's civil and potentially criminal liability.

Single-sensor classification cannot solve this problem adequately at the threat sizes and ranges that matter. Radar alone can detect small targets, but its classification information is essentially limited to RCS magnitude and Doppler velocity — insufficient to distinguish a DJI Phantom at 0.007 m² from a large crow at 0.006 m². EO cameras alone classify by shape and behavior, but fail in low-light, fog, smoke, and precipitation. IR sensors see thermal signatures at night but lose discrimination in midday heat when atmospheric and ground thermal gradients compete with the target signature.

Multi-modal fusion doesn't just add up the individual sensor capabilities. When properly implemented, it produces classification confidence that none of the sensors could achieve alone — because the sensors fail in non-overlapping conditions, and their outputs carry complementary information about different physical properties of the target.

What Each Sensor Contributes

X-Band Radar

An X-band radar operating at 9-10 GHz provides range, bearing, elevation, and radial velocity for detected targets. The frequency choice matters: X-band offers better sensitivity to small RCS targets than S-band (2-4 GHz) due to the shorter wavelength's interaction with small physical features, while L-band would require a much larger aperture to achieve comparable angular resolution.

Beyond the basic track state vector, micro-Doppler analysis of X-band returns provides additional discriminating information. A multi-rotor drone produces a characteristic micro-Doppler signature from its rotating propellers — typically in the 100-1000 Hz range depending on RPM — distinct from both fixed-wing aircraft (which produce a much cleaner Doppler return) and birds (whose wing-beat micro-Doppler is periodic but spectrally narrower and amplitude-modulated differently than propeller rotation). Micro-Doppler analysis requires coherent radar with sufficient pulse duration to resolve the Doppler shift of the propellers, which is a design constraint on the waveform.

Radar has two critical weaknesses for classification. First, at the RCS levels of Group 1 UAS (0.001–0.01 m²), the signal quality is often insufficient for high-resolution range profiling or inverse synthetic aperture imaging — the tools that work well for larger aircraft don't apply at this target class. Second, radar classification is inherently ambiguous about target type: a 0.008 m² return could be a threat drone or a bird, and radar alone can't reliably resolve that ambiguity without additional sensor modalities.

Electro-Optical (EO) Camera

An EO camera in the visible spectrum provides the most intuitive classification feature: shape. A multi-rotor drone photographed at 400m in adequate lighting has a recognizable geometric outline — symmetric rotor arms, central body mass — that distinguishes it from the irregular silhouette of a bird in flight. Modern convolutional neural networks trained on annotated drone imagery can achieve high classification accuracy on clear-day EO imagery at these ranges when the target subtends at least 10-15 pixels on the detector.

The "when the target subtends 10-15 pixels" qualifier is the limiting condition. At 600m against a 25cm wingspan Group 1 UAS, a focal length of 500mm on a 1/2.3" sensor yields approximately 8-10 pixels of target width — marginal for reliable CNN classification. Achieving reliable EO classification at 600-800m requires either a longer focal length (reducing field of view and increasing pointing precision requirements) or a larger sensor (increasing cost and form factor).

EO also fails in low-light conditions below about 0.1 lux — which includes overcast nighttime — and degrades in fog, smoke, or heavy precipitation. These are precisely the conditions under which threat drones may be preferentially operated, since they degrade the defender's visual identification capability.

Infrared (IR) Camera

A MWIR (3-5 μm) or LWIR (8-14 μm) camera detects thermal emission from the target. Drone motors and ESCs generate significant heat during flight — a multi-rotor with 2205-size motors in flight will have motor casing temperatures in the 60-90°C range, providing a strong thermal signal against a cooler sky background. LWIR is the more common choice for counter-drone applications because LWIR detector arrays are less expensive than MWIR and provide adequate detection range for Group 1-2 UAS.

The key advantage of IR over EO is all-weather, day/night capability. IR performance degrades in heavy rain (water absorption in the MWIR band) and in warm, humid conditions where the target-background contrast narrows, but it does not fail as completely in darkness as EO does. A threat drone operating at 0300 in overcast conditions is essentially invisible to an EO camera but thermally visible to a well-calibrated LWIR sensor.

IR classification is less shape-definitive than EO: the thermal image of a small drone at 400m is a bright spot with modest structural detail, not the sharp geometric silhouette available in EO daylight. But thermal behavior — pulse patterns, thermal distribution across the body, temperature delta from background — provides classification information that complements EO shape analysis.

The Fusion Architecture

Sensor fusion for threat classification is not a committee vote. It's a probabilistic estimation problem where the goal is to compute a posterior probability distribution over threat classes given all available sensor data. The standard mathematical framework is Bayesian fusion using an extended Kalman filter (EKF) or, for nonlinear observation models, an unscented Kalman filter (UKF) for state estimation, combined with a separate classifier for the discrete threat class variable.

In our implementation, the fusion architecture has two layers. The first layer is state estimation: the EKF maintains a track state vector (position, velocity, acceleration in 3D) that fuses radar range-bearing-elevation measurements with EO and IR image-plane position measurements to produce a better track than radar alone. The radar provides range information that EO and IR cameras (passive imagers) cannot contribute; the EO/IR sensors provide angular precision at short range that exceeds the radar's angular resolution.

The second layer is classification fusion: the radar micro-Doppler classifier, EO CNN classifier, and IR thermal pattern classifier each produce per-class probability vectors. These are combined using a Dempster-Shafer belief fusion approach rather than a simple Bayesian product, because the classifiers are not statistically independent — a challenging target in fog will produce correlated low-confidence outputs from both EO and IR sensors, and treating them as independent would artificially inflate the combined confidence. Dempster-Shafer allows explicit representation of uncertainty (the "don't know" mass function) that gets assigned when sensors disagree or produce low-confidence outputs.

The output is a combined threat class distribution with explicit confidence bounds. Our engagement authorization requires the fused posterior probability for the "threat drone" class to exceed 0.82, with no individual sensor's contrary classification having posterior probability above 0.30. That second condition prevents a case where two sensors agree and one strongly disagrees, which in our validation testing was a more reliable signal of a misclassification scenario than the primary confidence threshold alone.

The Calibration Problem

Sensor fusion sounds cleaner on paper than it is in the field. The practical challenge is that fusion depends on accurate cross-sensor calibration: the radar's output boresight needs to be co-registered with the EO/IR camera's boresight to within the system's angular resolution. At 500m range with a 0.1° angular resolution radar, a 0.2° boresight misalignment causes a 1.7m position error — enough that the EO/IR camera pointed at the radar cue may not have the target in its field of view.

Maintaining this calibration across temperature excursions (a system mounted outdoors goes through diurnal thermal cycles of 40-60°C in Alabama summer conditions), vibration, and any mechanical disturbance to the sensor mount requires either very rigid mechanical design, auto-calibration routines using known cooperative target signatures, or both. We do both: the sensor mount is designed to MIL-STD-810H shock and vibration requirements, and we run an auto-calibration cycle on startup using a coded retroreflector target at a fixed position.

We're not saying the fusion architecture we've described is the only viable approach — other EKF formulations and classifier fusion methods can achieve similar results. The specific numbers we cite (0.82 confidence threshold, 0.30 contrary threshold, the calibration procedure) are parameters we've tuned through testing, and they're subject to revision as we accumulate more operational data. The underlying principle — that multi-modal fusion is required for acceptable false-positive performance at military-grade confidence levels — is, we believe, not negotiable for a kinetic system.

Where Single-Sensor Systems Fail

We've spent time reviewing field reports from deployments of single-sensor counter-drone detection systems — primarily radar-only deployments at early-generation fixed-site security implementations. The consistent failure mode is not detection failure; it's false positives that degrade operator trust until the system is effectively ignored. A radar-only system at a coastal facility can generate hundreds of bird alerts per hour during migration season. Operators who spend three weeks responding to bird alerts and finding birds stop responding to alerts. When the actual threat drone arrives, the alarm fatigue means it gets ignored too.

Multi-modal fusion with EO and IR confirmation cuts that false alarm rate by an order of magnitude in our testing environments. The specific reduction factor depends heavily on the local environment — a coastal site with high bird traffic needs all three modalities; an arid inland site with low clutter can get further with radar-plus-one. The architecture should be matched to the deployment environment, which means site survey and false alarm rate modeling before system sizing, not after.

The confidence problem in threat classification is fundamentally about getting the false alarm rate low enough that operators maintain vigilance and operators trust the system — while getting the true positive rate high enough that actual threats don't slip through. Multi-modal sensor fusion is the engineering tool that makes that balance achievable at the range and target sizes the counter-drone problem demands.

All Field Notes