Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: From today's meeting. In Green text, ideas about additional content and direction

...

  1. Motivation: Why to use HV:

Today, we see the advent of multicore system-on-chip (SoC), originally design for the mass-market of consumer electronics, entering the critical infrastructure of cars and trucks. Buzzwords like cell-phone on wheels have been coined. However, this is only the beginning, in the near future we will see the advent of central compute platforms, i.e., massive multicore SoCs thing over not only fundamental functions in the vehicle but also its control. This is a game changer for the SW stacks we use in our cars, including the underlying operating systems.

SUMMARY: More cores in SoCs → changes the SW stacks

Traditional automotive multicore SoCs put a clear focus on low-power, low temperature, deterministic timings and high reliability, to only mention the major characteristics. In contrast, the aforementioned consumer-electronic SoC designs are sacrifice reliability, low-power and low thermal dissipation for high compute bandwidth.

SUMMARY: Optimize for (average) performance, not automotive requirements

Feedback: There is more diversity than this chapter suggests. Some may choose a more consumer-like processors but there are also modern multi-core processors that are automotive grade
some vendors create massive compute power but with high power needs (heat), others may be better at keeping low power.

Moreover, deterministic timing for guaranteeing low service latencies even in worst-case scenarios is traded with service strategies which optimize the average ant not the worst case. An example to this is as follows: in a manycore system with private L1 and a shared L2 cache, SW executing on different cores will mutually evict each other's cache entries.

SUMMARY: Optimizing for average performance, not real-time. Example: shared caches


Opinions on the high level purpose of the paper.

...We need to explain why virtualization is actually needed.  (It is still not fully accepted as necessary by all)
 → Certain concrete security/safety issues that can be shown clearly and that HV can
 → System flexibility is another very important point.

→ Idea:  There could of course be multiple Whitepapers, if we want to concentrate on a certain area, and avoid others.

Interaction between general-purpose and dedicated cores is poorly understood.


Today, we see the advent of multicore system-on-chip (SoC), originally design for the mass-market of consumer electronics, entering the critical infrastructure of cars and trucks. Buzzwords like cell-phone on wheels have been coined. However, this is only the beginning, in the near future we will see the advent of central compute platforms, i.e., massive multicore SoCs thing over not only fundamental functions in the vehicle but also its control. This is a game changer for the SW stacks we use in our cars, including the underlying operating systems.

SUMMARY: More cores in SoCs → changes the SW stacks


Traditional automotive multicore SoCs put a clear focus on low-power, low temperature, deterministic timings and high reliability, to only mention the major characteristics. In contrast, the aforementioned consumer-electronic SoC designs are sacrifice reliability, low-power and low thermal dissipation for high compute bandwidth.

SUMMARY: Optimize for (average) performance, not automotive requirements

Feedback: There is more diversity than this chapter suggests. Some may choose a more consumer-like processors but there are also modern multi-core processors that are automotive grade
some vendors create massive compute power but with high power needs (heat), others may be better at keeping low power.


Moreover, deterministic timing for guaranteeing low service latencies even in worst-case scenarios is traded with service strategies which optimize the average ant not the worst case. An example to this is as follows: in a manycore system with private L1 and a shared L2 cache, SW executing on different cores will mutually evict each other's cache entries.


SUMMARY: Optimizing for average performance, not real-time. Example: shared caches


This in turn will significantly add to their execution times, data and instructions must be reThis in turn will significantly add to their execution times, data and instructions must be re-fetched from the main memory, where modern on core pre-fetchers intend to lower the waiting times from core-perspective. Cache eviction is, however, not the only source of trouble. Each time when fetching an item from the main memory, the execution on a core is suspended until the actual memory fetch has been served. The resulting waiting time depends on the number of pending memory access requests from all the other cores and the complex memory access pattern implemented by modern DRAM controller. A common pattern prioritizes hits into the open row buffer, instead of serving memory read requests in a standard first-in-first-out manner. With a strategy for increasing the open row buffer hit rate, one allows memory read request to overtake each other inside the DRAM controller. Whilst this lowers the average service time as reads from an open memory bank row can be served faster, this may contribute to the waiting time of some other read.

SUMMARY: Details on the cache problem.

Obviously, there is no free lunch and the increase in compute bandwidth comes a long with a significant increase in the complexity of the behavior of SW executing on such modern multi-core SoCs, put together with higher energy consumptions and higher thermal dissipations as these SoCs run on much higher frequencies. This clearly points to the question, what can we do with these processors as they are not only more powerful, but also much more costly?

There are two main promises made:

The use of multicore and higher compute powers allows one to integrate multiple functions, resp. independent SW stacks into a single electronic control unit (ECUs) and thereby reducing the number of onboard ECUs and their cablings. Commonly, the different SW stacks come along with their different operating systems, ranging from tiny footprint operation systems like FreeRTOS up to monolithic giants like Android to be used in modern head units.
SUMMARY: Increased consolidation

the other cores and the complex memory access pattern implemented by modern DRAM controller. A common pattern prioritizes hits into the open row buffer, instead of serving memory read requests in a standard first-in-first-out manner. With a strategy for increasing the open row buffer hit rate, one allows memory read request to overtake each other inside the DRAM controller. Whilst this lowers the average service time as reads from an open memory bank row can be served faster, this may contribute to the waiting time of some other read.


SUMMARY: Details on the cache contention problem.



Obviously, there is no free lunch and the increase in compute bandwidth comes a long with a significant increase in the complexity of the behavior of SW executing on such modern multi-core SoCs, put together with higher energy consumptions and higher thermal dissipations as these SoCs run on much higher frequencies. This clearly points to the question, what can we do with these processors as they are not only more powerful, but also much more costly?

There are two main promises made:


  1. The use of multicore and higher compute powers allows one to integrate multiple functions, resp. independent SW stacks into a single electronic control unit (ECUs) and thereby reducing the number of onboard ECUs and their cablings. Commonly, the different SW stacks come along with their different operating systems, ranging from tiny footprint operation systems like FreeRTOS up to monolithic giants like Android to be used in modern head units.
    SUMMARY: Increased consolidation


  2. Deployment of new applications which are solely possible by having not only multiple compute cores available, but also so-called hardware accelerators. Examples to such accelerators are image processing units and graphic processing units, today, directly synthesized onto a SoC. Put this to an extreme, one sees chip designs, which are in fact dominated by the accelerator support, rather than the number of general-purpose cores put onto the chip. Commonly, HW vendor provide support by Linux drivers rather than making HW drivers available as part of an AUTOSAR MCAL.

    SUMMARY: Hardware enables new compute-intensive functions


With both cases, one can spot the following communalities:

  1. SW is executing in parallel and may try to access obviously shared resources like accelerators and network interfaces at the same time.

  2. Contention on the use of a processor’s infrastructure like memory busses and memory mapped registers can easily result in unwanted side-effects, e.g., process data arrives too late for the aforementioned reasons or data even becomes inconsistent due to read/writer races.

  3. New and legacy applications to be put together on the same chip come along with their own operating systems, let the latter be highly specialized or of a general purpose nature.

To successfully deploy different SW stacks and their operating systems on a single SoC either calls for the use of hand-crafted mechanisms and protocols for sharing resources among the different operating systems or that one puts a supervisory SW-layer in place. It is up to this extra SW layer to orchestrate the use of the processor like any other operating system. Simply putting everything on top of a single operating system is infeasible, as it would require a porting of all of the legacy applications to the target operating system as far as possible or requires to re-implement hardware drivers for the target platform and the chosen operating system.

      SUMMARY: The challenges with these new systems include shared use of singe-use features, and bus contention, etc.


Still, there is a difference between standard operating systems and such a supervisory SW-layer. The supervisory SW layer, also commonly denoted as hypervisor requires execution rights at a higher level of privileges as the operating systems running on top of it. This is as with any operating system, the latter executes at a higher level of privileges as its userspace applications. This is to execute privileged instructions, i.e, instructions which change the state of the processor or to restore the context of a shared resource whenever the latter is handed over to a different user, resp. application.

SUMMARY: A hypervisor is required as the solution (?)


  1. Use of legacy systems with minor modifications,

    1. address what kind of modifications we expect,



  1. What does the HW(vendor) to support platform virtualization

Also address problem of open source firmware and driver (MCAL) qualification when running virtualized drivers (see also section 3). HV helps with this by

  • isolating critical devices from non-critical ones, allows one to build systems with mixed-criticality w.r.t. safety-relevance.

  • qualified driver needs to come with the driver host or even the HV provider


SUMMARY: Hardware support for virtualization is included in modern processors

Kai, Adam & Bernhard

This is directed towards the HV vendor to avoid the problems we have seen in the past.

Content: All modern processors, including Arm Cortex-A’s and Intel’s x86, support the virtualization of operating systems by means of providing adequate functionality for providing a virtual view of the system and having system software, the hypervisor, have full control of guest operating systems. A microkernel can offer support for this functionality. As already described, the microkernel will only offer the necessary functionality and all other support for running VMs shall be implemented in user-level functionality. For supporting virtualization extensions of the CPU, the microkernel provides the functionality to create VM containers and context switch those between other VMs and normal programs on the microkernel. The virtual platform, that is required to run a guest operating system, is provided by user-level virtual machine monitor (VMM). A common design pattern is to use one VMM per VM, using the isolation features of the microkernel to protect VMs among each other.

SUMMARY: This paragraph speaks a lot about microkernel / HV / software layer also, and only a small part about actual Hardware features?

Dmitry mentions i.mx 8 has special features that simplify device sharing/assignment to VMs, e.g. USB that could be interesting case-study information.
Details pending (make sure to check what is public information first).


  1. Surveillance, Isolation (Timing and Spatial) and all that

To establish well-defined behavior of SW at platform-level several design paradigms can be followed, where each prioritize different aspects, e.g., fault-detection versus information hiding, high-performance vs. good worst-case timing behavior. At the bottom-line it appears that one of the fundamental principles of establishing safe and secure execution environments is about isolation and surveillance.



Sharing of HW in the presence of parallel system executions

Isolation properties in the presence of parallel systems executions:

Spatial isolation, hiding of secrets

Temporal isolation, implicit and explicit shared resources…

Also include use of special purpose Guest/OS for isolating a specific functionality, i.e, building safety and security island

Kai & Adam

  1. Inter-core communication

Matti, Dimitri


Ideas:

Split into two main tracks.   There are cores with direct communication and those that have inter-core links.
...comes down to what can be communicated in an atomic manner.
The size of the mailbox is one atomic unit.  Other links are serial, so the size of atomic unit is essentially one bit only.
Reference MCU-style hypervisors.

Shared memory is also a communication method...
 – Discuss (lack of) atomicity, buffer sealing etc.
 – Discuss cache coherency and other complications...
     Atomic operations e.g. ARM Read-and-exchange, Check-and-set etc.   Those are only guaranteed on some memory types.  Undefined behavior on some areas which include caches.  Caches need to be coherent (have the same value).
Can be undefined or even undocumented.

*
Inter-processor-interrupt IPI used to trigger the reading of a data item.

Hardware mailboxes are usually a word or a few words.  Once written and interrupt triggered, the value is locked down and not changeable (by the original writer) – avoid "time of check, time of use" type bugs and vulnerabilities.

How do different processor architectures provide features for this ? ARM, X86... MIPS...?

New interconnects may guarantee some of the coherency requirements  CCIX, OpenCAPI...is dead....  NVLINK, also some things in PCI Express 4.0?  CXL (Intel, based on PCI)

Open Asymmetric Multiprocessing - OpenAMP - messaging standards built on top of this...  Often the implementation uses the hardware capabilities for mailboxes/links etc.

Cache locking?



...

Deployment of new applications which are solely possible by having not only multiple compute cores available, but also so-called hardware accelerators. Examples to such accelerators are image processing units and graphic processing units, today, directly synthesized onto a SoC. Put this to an extreme, one sees chip designs, which are in fact dominated by the accelerator support, rather than the number of general-purpose cores put onto the chip. Commonly, HW vendor provide support by Linux drivers rather than making HW drivers available as part of an AUTOSAR MCAL.

SUMMARY: Hardware enables new compute-intensive functions

With both cases, one can spot the following communalities:

  1. SW is executing in parallel and may try to access obviously shared resources like accelerators and network interfaces at the same time.

  2. Contention on the use of a processor’s infrastructure like memory busses and memory mapped registers can easily result in unwanted side-effects, e.g., process data arrives too late for the aforementioned reasons or data even becomes inconsistent due to read/writer races.

  3. New and legacy applications to be put together on the same chip come along with their own operating systems, let the latter be highly specialized or of a general purpose nature.

To successfully deploy different SW stacks and their operating systems on a single SoC either calls for the use of hand-crafted mechanisms and protocols for sharing resources among the different operating systems or that one puts a supervisory SW-layer in place. It is up to this extra SW layer to orchestrate the use of the processor like any other operating system. Simply putting everything on top of a single operating system is infeasible, as it would require a porting of all of the legacy applications to the target operating system as far as possible or requires to re-implement hardware drivers for the target platform and the chosen operating system.

      SUMMARY: The challenges with these new systems include shared use of singe-use features, and bus contention, etc.

Still, there is a difference between standard operating systems and such a supervisory SW-layer. The supervisory SW layer, also commonly denoted as hypervisor requires execution rights at a higher level of privileges as the operating systems running on top of it. This is as with any operating system, the latter executes at a higher level of privileges as its userspace applications. This is to execute privileged instructions, i.e, instructions which change the state of the processor or to restore the context of a shared resource whenever the latter is handed over to a different user, resp. application.

SUMMARY: A hypervisor is required as the solution (?)

  1. Use of legacy systems with minor modifications,

    1. address what kind of modifications we expect,

  1. What does the HW(vendor) to support platform virtualization

Also address problem of open source firmware and driver (MCAL) qualification when running virtualized drivers (see also section 3). HV helps with this by

  • isolating critical devices from non-critical ones, allows one to build systems with mixed-criticality w.r.t. safety-relevance.

  • qualified driver needs to come with the driver host or even the HV provider

...

Kai, Adam & Bernhard

This is directed towards the HV vendor to avoid the problems we have seen in the past.

Content: All modern processors, including Arm Cortex-A’s and Intel’s x86, support the virtualization of operating systems by means of providing adequate functionality for providing a virtual view of the system and having system software, the hypervisor, have full control of guest operating systems. A microkernel can offer support for this functionality. As already described, the microkernel will only offer the necessary functionality and all other support for running VMs shall be implemented in user-level functionality. For supporting virtualization extensions of the CPU, the microkernel provides the functionality to create VM containers and context switch those between other VMs and normal programs on the microkernel. The virtual platform, that is required to run a guest operating system, is provided by user-level virtual machine monitor (VMM). A common design pattern is to use one VMM per VM, using the isolation features of the microkernel to protect VMs among each other.

SUMMARY: This paragraph speaks a lot about microkernel / HV / software layer also, and only a small part about actual Hardware features?

Dmitry mentions i.mx 8 has special features that simplify device sharing/assignment to VMs, e.g. USB that could be interesting case-study information.
Details pending (make sure to check what is public information first).

  1. Surveillance, Isolation (Timing and Spatial) and all that

To establish well-defined behavior of SW at platform-level several design paradigms can be followed, where each prioritize different aspects, e.g., fault-detection versus information hiding, high-performance vs. good worst-case timing behavior. At the bottom-line it appears that one of the fundamental principles of establishing safe and secure execution environments is about isolation and surveillance.

Sharing of HW in the presence of parallel system executions

Isolation properties in the presence of parallel systems executions:

Spatial isolation, hiding of secrets

Temporal isolation, implicit and explicit shared resources…

Also include use of special purpose Guest/OS for isolating a specific functionality, i.e, building safety and security island

Kai & Adam

  1. Inter-core communication

Matti, Dimitri

  1. Sharing Devices -- Virtio

...

SUMMARY: Details on the cache (contention) problem.

SUMMARY: What can we do with these new processors?

...