Skip to content

Camera Image Ingestion — Overview

This document is the single entry point for how images get from a physical sensor into the central ROS 2 host in the PhotogrammetricWAAM system.

There are exactly three hardware pipelines and two roles each camera can serve. Everything else in this folder (and in PhotogrammetricWAAM-Edge/ and ros2_ws/launch/) is one concrete instance of that 3 × 2 matrix.


HardwareEdge hostWireEdge softwareRole (1) HQ stillsRole (2) low-latency MJPG
IMX708 (Pi Cam 3)Raspberry Pi (CSI)WiFi/ETHsimple_picamera2_streamer/app.pypicamera2cv2.imencode → HTTP /stream /jpg /set✅ (planned)✅ (current default, 8 Hz)
OV2640 / OV5640XIAO ESP32-S3 Sense (DVP)WiFi/ETHCameraWebServer_for_esp-arduino_3.0.x.inoesp_camera → MJPG on :81/stream✅ (planned)✅ (current default)
DSLR (Canon / Nikon / Sony)Raspberry Pi (USB)WiFi/ETHmqtt__gphoto2_delegate.pygphoto2 capture-and-download✅ (only role)✗ (not supported)

ROS 2 client side (the kernel host) is uniform across all three: it runs image_publisher_node against either an MJPG stream URL (roles 2, IMX708 + ESP32S3) or consumes the on-disk JPG/RAW that the gphoto2 delegate dumps (role 1, DSLR and on-demand IMX708/ESP32S3). See ros2_ws/launch/image_publisher_client/README.md.


Every photogrammetry-grade camera in this system serves one of two roles at any given moment. The IMX708 and the ESP32-S3-attached OV2640/OV5640 are capable of either role; the DSLR is permanently locked to role (1).

Role (1) — Highest-fidelity still producer

Section titled “Role (1) — Highest-fidelity still producer”

Goal: Best possible JPG (or RAW) of one moment in time, on demand or at a slow cadence. Latency does not matter.

  • Maximum sensor resolution (e.g. IMX708 4608×2592, OV5640 2592×1944, DSLR ≥24 MP).
  • Highest JPG quality (low quantisation) — or RAW where available.
  • Capture is triggered, not free-running. One trigger → one (or one stack of) frames written to durable storage with a session ID.
  • Consumed by the photogrammetry / SfM pipeline downstream — not by RViz/Foxglove.
  • Control plane: MQTT (recipient-based topics — see mqtt__gphoto2_delegate.spec.md for the canonical request/response shape).

Goal: Real-time monitoring on the ROS 2 graph (visible in rqt_image_view, Foxglove, RViz). Per-frame fidelity is sacrificed for timeliness.

  • Down-rezzed and/or higher JPG compression (e.g. IMX708 at 2304×1296 @ 8 Hz, ESP32-S3 OV5640 typically HD/SVGA).
  • Continuous MJPG over plain HTTP (multipart/x-mixed-replace).
  • Encoded once at the edge, decoded once on the ROS 2 client by image_publisher_node, republished as sensor_msgs/Image on a per-camera namespace (/cam0/image_raw, /xiao_143/image_raw, …).
  • Control plane: HTTP (/set on RPi today; open work for ESP32-S3 — see TODOs below).
role-(1) role-(2)
"snapshot mode" "viewfinder mode"
┌────────────────────────┐ ┌─────────────────────────┐
any │ highest quality JPG │ │ smallest-possible JPG │
cam │ on demand, slow rate │ │ fast as possible, │
│ → durable storage │ │ free-running │
│ → SfM / photogrammetry │ │ → ROS 2 image topic │
│ → RViz/Foxglove via │ │ → live monitoring │
│ image_publisher of │ │ (rqt_image_view) │
│ a *file* path │ │ │
└────────────────────────┘ └─────────────────────────┘
▲ ▲
│ MQTT request/response │ HTTP GET /stream
│ (gphoto2-style topics) │ (multipart MJPG)

[ IMX708 sensor ]──CSI──▶[ Raspberry Pi ]──HTTP MJPG──▶[ ROS 2 host ]
libcamera/picamera2 image_publisher_node
+ cv2.imencode JPG → /camN/image_raw
app.py @ :8000

Edge: ros2_ws/edge/simple_picamera2_streamer/app.py. A single Python process that owns the camera, runs a capture thread at the configured FrameDurationLimits, and serves three endpoints:

EndpointMethodPurpose
/streamGETmultipart/x-mixed-replace MJPG, frame-rate-locked to the capture loop (currently 8 Hz)
/jpgGETOne latest JPG frame (single-shot)
/setGETSet ExposureTime, AnalogueGain, or LensPosition (puts AF into manual when LensPosition is given)

Role today: running role (2) only — see TODO todo-imx708-fb-roles for the mode-switch work.

Client: see Both tmuxp variants below.

2. OV2640 / OV5640 on XIAO ESP32-S3 Sense (DVP)

Section titled “2. OV2640 / OV5640 on XIAO ESP32-S3 Sense (DVP)”
[ OV2640 / OV5640 ]──DVP──▶[ XIAO ESP32-S3 ]──HTTP MJPG──▶[ ROS 2 host ]
esp_camera + httpd image_publisher_node
CameraWebServer_for_*.ino → /xiao_NNN/image_raw
stream @ :81/stream
OTA @ :8080/update
telemetry → MQTT broker

Edge: PhotogrammetricWAAM-Edge/photogrammetricWAAM_xiao_eyes_ov2640_ov5640/CameraWebServer_for_esp-arduino_3.0.x/.

Custom Arduino-ESP32 (3.0.x) firmware derived from Espressif’s CameraWebServer. Each board is statically configured by a single #define DEVICE_ID 1xx which also drives:

  • Static IP 172.31.1.<DEVICE_ID>
  • MQTT topics esp32s3/<DEVICE_ID>/{log,temp,rssi}
  • MQTT client id esp32s3-<DEVICE_ID>

The MJPG stream is served on port 81 (the canonical Espressif port — not the same as the IMX708 streamer’s :8000). OTA flashing lives on :8080/update.

Currently the firmware is hard-configured for role (1)-leaning settings (FRAMESIZE_5MP, set_quality(s, 6), set_aec_value(s, 800), awb=OFF, fb_count=1, CAMERA_GRAB_LATEST) — see TODO todo-esp32s3-fb-roles for the dynamic role-switching work.

Client: see Both tmuxp variants below.

[ Canon/Nikon/Sony ]──USB──▶[ Raspberry Pi ]──gphoto2 capture──▶[ shared FS ]
mqtt__gphoto2_delegate.py ─sync─▶ ROS 2 host
MQTT request/response image_publisher_node
→ /dslr_NN/image_raw

Edge: PhotogrammetricWAAM-Blender-UI/02__STILLS/_EDGE_CAMERA_DAEMON/.../mqtt__gphoto2_delegate.py.

A Python service that subscribes to {hostname}/gphoto2 (or ALL/gphoto2), shells out to gphoto2 --set-config … --capture-image-and-download …, writes the result into a session-ID’d directory, and publishes a structured {hostname}/gphoto2/response along with a photogrammetry/sync/available notification for the file-sync layer.

Role today: role (1) only. DSLRs do not stream MJPG in this stack. (gphoto2 --capture-movie exists but is intentionally out of scope — the DSLR is the fidelity reference.)

SSH operator view: INBOX/TMUXP_VIEWS/DSLR.tmuxp.yml opens parallel SSH sessions to the DSLR-hosting Pis (id2-rpi4.local, pi3m50.local).

Batch coordination across many DSLRs + Pi cams is the job of batch_request_delegate.py — one MQTT batch request fans out to N services and aggregates N responses into a single batch response.


Edge ↔ ROS 2 contract — the two halves

Section titled “Edge ↔ ROS 2 contract — the two halves”
PipelineServerListens onOutput
IMX708simple_picamera2_streamer/app.pyTCP :8000 (HTTP)MJPG /stream, JPG /jpg, control /set
ESP32-S3CameraWebServer_for_esp-arduino_3.0.x.inoTCP :81 (httpd) + :8080 OTAMJPG /stream, MQTT telemetry
DSLRmqtt__gphoto2_delegate.pyMQTT {host}/gphoto2JPG/RAW file + MQTT response

The ROS 2 host runs image_publisher_node (one per camera URL or per file path), which:

  1. Decodes the MJPG / JPG into an OpenCV Mat.
  2. Publishes sensor_msgs/Image on <__ns>/image_raw (and camera_info if provided).
  3. Republishes at the rate set by publish_rate.

Critical empirical finding (see simple_picamera2_streamer/README.md): publish_rate MUST match the edge capture rate exactly, otherwise OpenCV internally buffers MJPG frames and rqt_image_view shows stale frames from seconds in the past. With the IMX708 streamer at 8 Hz, the client must be launched with publish_rate:=8. — not 7.9, not 10.

Two tmuxp launch styles exist for this client side, depending on where you’re running from:

Plus a parameterised Python launch file at xiao_sense_esp32s3_eyes.py for the ESP32-S3 fleet, and esp32s3_eth.tmuxp.yml for the wired ESP32-S3 + Lepton thermal mix.

See ros2_ws/launch/image_publisher_client/README.md for the full namespace map and per-host IP allocation.


Open implementation work (tracked, not yet implemented)

Section titled “Open implementation work (tracked, not yet implemented)”

These are intentional gaps — documented here so the architecture page is the source of truth, then mirrored in the project todo list.

todo-esp32s3-fb-roles — Runtime role switching on the ESP32-S3 XIAO

Section titled “todo-esp32s3-fb-roles — Runtime role switching on the ESP32-S3 XIAO”

The OV2640/OV5640 firmware is currently hard-pinned to one operating point. Add a runtime mode-switch (over MQTT or HTTP) that reconfigures the camera without a reflash:

SettingRole (1) HQ stillsRole (2) low-latency MJPG
config.fb_count1 (max single-frame size in PSRAM)2 (pipeline encoder, hide latency)
config.frame_sizeFRAMESIZE_5MP (2592×1944)FRAMESIZE_HD or _SVGA
set_quality()low number = high quality (≈ 4–6)higher number (≈ 12–20)
config.grab_modeCAMERA_GRAB_WHEN_EMPTYCAMERA_GRAB_LATEST
set_exposure_ctrlmanual, locked AEC valueauto
set_whitebalmanual, locked WBauto OK

Rationale: with PSRAM at a premium, fb_count=1 lets a 5MP JPG actually fit; fb_count=2 hides JPG-encode latency for streaming.

todo-imx708-fb-roles — Mode switching in simple_picamera2_streamer/app.py

Section titled “todo-imx708-fb-roles — Mode switching in simple_picamera2_streamer/app.py”

Today app.py is built around picam2.create_video_configuration(main={"size": (2304,1296)}, buffer_count=4) — fixed at role (2). Add an endpoint (e.g. GET /mode?role=stills / …?role=stream) that:

  • For role (1): picam2.switch_mode_and_capture_file(...) against a still configuration at full sensor resolution, optionally RAW+JPEG, then revert.
  • For role (2): keep the current free-running 8 Hz video path.

This makes one IMX708 host serve both the SfM batch capture and the live viewfinder without contention.

todo-mqtt-bridge — Unify the control plane

Section titled “todo-mqtt-bridge — Unify the control plane”

Right now control is heterogeneous:

  • IMX708 is controlled by HTTP GET /set?ExposureTime=….
  • ESP32-S3 has only MQTT telemetry (log/temp/rssi) — control is via the Espressif web UI on :81/.
  • DSLR is fully MQTT (request/response, recipient-based).

Decide whether the RPi streamer and the ESP32-S3 firmware should adopt the same recipient-based MQTT contract as the gphoto2 delegate. If yes, the batch_request_delegate already aggregates across services and would Just Work.