Skip to main content

Data formats & tensor semantics

This page summarizes the format tokens used by InputOptions::format (and OutputTensorOptions::format) and how they map to Tensor layout, shape, and plane semantics.

Quick mapping table (raw video)

TokenMedia typeImageSpecLayout / shapePlanes
RGBvideo/x-rawRGBHWC, shape [H,W,3]dense (no planes)
BGRvideo/x-rawBGRHWC, shape [H,W,3]dense (no planes)
GRAY8video/x-rawGRAY8HW, shape [H,W]dense (no planes)
NV12video/x-rawNV12HW, shape [H,W]composite planes: Y then UV
I420video/x-rawI420HW, shape [H,W]composite planes: Y, U, V

Notes:

  • GRAY is normalized to GRAY8.
  • NV12/I420 require even width/height.
  • For packed formats (RGB/BGR/GRAY8), depth = channels and is validated against shape when present.

Tensor media type (application/vnd.simaai.tensor)

If InputOptions::media_type is set to application/vnd.simaai.tensor, the format token is interpreted as a dtype (e.g., FP32, BF16, INT8) or a known tessellated format (e.g., MLA). In this case:

  • Layout must be explicit: HWC, CHW, or HW (no Planar).
  • Shape rules:
    • HWC => [H,W,C]
    • CHW => [C,H,W]
    • HW => [H,W] (depth inferred as 1)
  • format is validated against the dtype in Tensor::dtype.

OutputTensorOptions limitations (important)

Session::add_output_tensor() currently:

  • Supports only UInt8 output tensors.
  • Forces SystemMemory via a capsfilter.
  • Does not transform layout (e.g., layout=CHW is metadata only).

If you need other dtypes or layout transforms, insert explicit nodes or do a post‑processing step in your code.

Mapping examples

RGB (dense):

simaai::neat::Tensor t = /* RGB tensor */;
auto map = t.map_read();
const uint8_t* bytes = static_cast<const uint8_t*>(map.data);

NV12 (composite planes):

simaai::neat::Tensor t = /* NV12 tensor */;
auto nv12 = t.map_nv12_read();
if (nv12) {
const uint8_t* y = nv12->view.y;
const uint8_t* uv = nv12->view.uv;
}

I420 (composite planes):

simaai::neat::Tensor t = /* I420 tensor */;
auto i420 = t.map_i420_read();
if (i420) {
const uint8_t* y = i420->view.y;
const uint8_t* u = i420->view.u;
const uint8_t* v = i420->view.v;
}

Depth vs channels

  • For packed video: depth == channels (RGB/BGR = 3, GRAY8 = 1).
  • For tensor media: depth is derived from the chosen layout and shape.

Sample payload tags

Sample::payload_tag is the preferred label for downstream consumers. It supersedes the deprecated Sample::format field.

See also