What is it and why?

The VSS Taxonomy defines what data "entities" (Signals and Attributes) we can deal with, and are used in the protocol(s) defined by W3C Automotive Working group, as well as other initiatives inside and outside the vehicle.

But in addition to VSS itself, we need to define the data-exchange formats for measured values of those Signals.  This starts by defining terms, but quickly develops into defining one or several variants of the actual message content format, whether in JSON or other.

Relationship to Protocols

Data formats sometimes overlap protocol definitions because some protocols (but not all) define the data format in its specification.  VISS / W3C Gen2 is an example of a protocol definition that defines both the protocol interactions between client and server, and the data exchange format that fits the VSS model.  Ultimately, any chosen (stack of) protocols must at some point define the transferred data formats, otherwise no understandable exchange can be had, and this page is intended to support the development of such a definition.

Looking at a wider set of protocols it is clear that we have some more work remaining.

The data exchange protocols we discuss fall into different categories, each requiring some more work on defining value exchange data formats:

  1. A protocol does not (yet) cover all variations of data exchange.
    1.  When this page was written, W3C VISS protocol (v2) supported subscription to updates and on-demand fetching of the current value of one or several, specified signals in one go.  In its latest form it has a significant number of query parameters and filters, and supports the fetching of a series of historical recorded values (i.e. TimeSeries according to the definitions below).  VISS v2 now specifies features that mirror most of the types of messages listed here, with the exception of Snapshots.
  2. A protocol defines only a "transport"
    1. We often discuss protocols that define some behavior of data transfer, such as pub/sub semantics, but they are designed to be generic and therefore support any type of information to be transferred by the protocol.  This means they do not (can not) define the format of the content of the data container (payload).  Such transport protocols are set up to transfer any arbitrary sequence of bytes.   This makes those technologies widely applicable, but selecting them is not enough without also defining the payload format.  Examples of some such protocols would be MQTT or WAMP, but the principle extends to many generic protocols and frameworks.
  3. A protocol defines transport, query semantics, and even a few expectations for the exchanged data format, but is still generic and requires additional definitions to become unambiguous for a particular case.
    1. Example:  GraphQL is a generic technology that clarifies a bit more about expected data semantics and formats but it still requires a schema to be defined to indicate the exact underlying data model, what types of queries can be made using the GraphQL language, and other details such as the datatypes that are expected to be returned.   A schema must be defined for GraphQL, and for other similar situations, and that schema might also be derived from this generic analysis.
    2. Example 2:  To consume and process data in Apache Spark, Kafka and presumably for many other generic data-handling frameworks we also need to define schemas that define the format and content of the transferred data, in a similar fashion.  These protocols might also match category 2/3.


Related references



Definitions

(proposal, open for discussion)

Signal:

Request:

Job:  


(warning) This page is currently concerned with the payload, and not a full protocol definition, so no further definition of Request or Job is made here.

Observation:

Data Package:

=   A delivery of data sent at a particular time.

(think of it as the whole Message that is in response to a Request)

This will likely need to include some metadata regarding the request:

Additional Metadata

Following input given in the W3C data TF:



Record:


(question) Why not just use a single record type (superset of all functionality)?  

A: The reason would be to optimize the performance and bandwidth.  In other words, don't transfer what is not needed for a certain case.  If a timestamp is not needed, we should make sure we support transferring data without providing a timestamp, for example. Hence, this proposes a simple class hierarchy of sub-types of Record.


Record Subtypes:

+ Record types which specify the signal name inside: 

Record types which specify the geospatial position in addition to the time value:

(N.B.  In this proposal, Geospatial records always include a time stamp, because it seems to be the overwhelmingly dominant usage, but variations without time stamp would of course be possible)

DerivedRecord and StatisticsRecord


Overview



TimeStamp

NOTE:  The exact format of time stamps (and any other data representation) may differ when these concepts are translated to different protocols or languages, as long as the original meaning as required by the VSS specification remains.

1. Text Format

One option is to use a string and it is then recommended to use the ISO 8601 standard format, with fractional seconds (e.g. microseconds) and always UTC (Zulu) time zone. 

Real, "Wall clock time"

Relative to a previously predefined time stamp reference:

Binary format

For some purposes more efficient binary encodings should be considered, such as an integer of appropriate size, usually containing fractions of seconds relative to a known starting point.



Bundle

TimeSeries

Snapshot:

Note that values in a Snapshot need a record type that specifies the signal, i.e. (a subtype of) SpecifiedRecord, since different signals are included in the same message.


Stream:





Examples, using JSON


(Plain) Record:

{
   "value" : " 100.54"
}

TimestampedRecord:

{
   "ts" : "2020-01-10T02:59:43.491751Z # Zulu time, ISO std with microseconds
 "value" : "42"
}

GeospatialRecord:

{
"pos" : "[format tbd]"
   "ts" : "2020-01-10T02:59:43.491751Z # Zulu time, ISO std with microseconds
 "value" : "42"
}

SpecifiedRecord:

{
"signal" : "vehicle.Chassis.Axle2.WheelCount"
   "value" : "2"
}

SpecifiedTimestampedRecord:

{
"signal" : "vehicle.Body.ExteriorMirrors.Heating.Status"
   "ts" : "2020-01-10T02:59:43.491751"
   "value" : "false"
}

TimeSeries:


    "signal"  :  "vehicle.body.cabin.temperature"
"count" : "132" # Might be redundant information, optional.
    "values" : {
        { 
   "ts" : "2020-01-10T02:59:43.491751"
   "value" : "42.5"
     },
        { 
   "ts" : "2020-01-10T02:59:43.491751"
   "value" : "43.0"
     },
... 130 more records
  }
}


Snapshot:


"timeperiod" {
"start" : "2020-01-10T02:00:00Z",
"end" : "2020-01-11T01:59:59Z"
},
"values" : {
{
    "signal" : "vehicle.body.cabin.temperature",
     "value" : "22.0",
     "ts" : "2020-01-10T02:59:43.491751"
},
{
    "signal" : "vehicle.drivetrain.engine.rpm.average",
      "value" : "3200",
     "ts" : "2020-01-10T02:59:44.100403"
}
}

todo: Examples of Derived/Statistics Records

Other example representations


Above we used a simple JSON encoding for the data (as an example). A more space-efficient binary format is also possible, and here we can reuse existing technologies (AVRO, Protobuf, Thrift, CBOR, ...)

Note that the AVRO schemas are also written in JSON, so unlike the examples above this is not an example of the data content →  JSON is used here to describe how data will  be structured. 
The data content is stored/transferred by AVRO implementations according to the schema.  It uses an efficient binary encoding for the values.

AVRO-schema example

{
  "type" : "record",
  "name" : "SpecifiedTimeStampedRecord",
  "fields" : [
    { "name" : "signal_identifier", "type" : "string" },
    { "name" : "ts", "type" : "long" },
    { "name" : "value", "type" : "Value" }
  ]

... where Value is a union of all the possible VSS types.
This is a little convoluted because values can be any plain data type, or an Array of such datatypes:

{
  "type" : "record",
  "name" : "Value",
  "fields" : [
    { "name" : "item", "type" : [
      "int", "long", "float", "double", "string", "boolean",
        { "type" : "array",
          "items" : [ "int", "long", "float", "double", "string", "boolean" ]
        }
      ]
    }
  ]
}

More AVRO encoding here: vss-tools (serializations branch)