Digressions – July 2019


I used to have less than 512 KB of memory and 1,6 MHz clock speed, whole  these gigahertz of clock rates and gigabytes of memory makes lazy programmers. We've just made systems wouldn't work. We made these horrible systems without knowing how they work and why they work. And when they fail we don't know why they fail and where they fail.

Says Joe Armstrong, in one of his talks. So we don't need extremes we need something reasonable to keep everything up and ready. If we don't know our errors, we don't know how to recover or detain them. Changing and making software without knowing what are we changing is the worst thing that we can do to ourselves.


Extreme ideas are drawn on paper not debunked on your notes app.


Distributed systems that are working cross-domain should take mentioned security cautions:

  • It has to protect itself against malicious attacks from the newly assembled interconnected domains.
  • New domains have to protect themselves against malicious attacks from the distributed system.

Always separate administration of your clusters, otherwise you will end up administratively blocking interconnected cluster which may fail with a simple mistake you've made. Everything should be administratively scalable.


Scalability principles

Replication

  • Replicate important resources.
  • Distribute the replicas.
  • Use loose consistency.

Distribution

  • Distribute across multiple servers.
  • Distribute evenly.
  • Exploit locality.
  • Bypass upper levels of hierarchies.

Caching

  • Cache frequently accessed data
  • Consider access patterns when caching
  • Cache timeout
  • Cache at multiple levels.
  • Look first locally
  • The more extensively something is
    shared, the less frequently it should be
    changed.

General

  • Shed load, but not too much.
  • Avoid global broadcast
  • Support multiple access mechanisms
  • Keep the user in mind

– Neumann [1994]


Myriad ILP

In order to overlap the execution of operations
from successive iterations of our kernel loop, we
applied modulo scheduling [7]. This allows a new
iteration of the loop to begin before the previous
one completes.

Using VAU multiplication operations and SAU
horizontal summations, we can perform all the 64
required multiplications and 48 of the 64 required
additions using 16 instructions. The remaining 16
additions can be performed using VAU (vertical)
additions — accumulating the current result to the
C buffer.

The VAU is thus the critical resource, being
busy for at least 20 cycles in any instruction
schedule, performing the multiplications and the C
accumulation.

– Ionica, M. and Gregg, D. (2015). The Movidius Myriad Architecture's Potential for Scientific Computing. IEEE Micro, 35(1), pp.6-14.


Kernel Recognition unrolls the innermost loop an “appropriate” number of times and searches for a repeating pattern. This repeating pattern then becomes the “loop body.”

Modulo Scheduling selects a (minimum) schedule for one loop iteration such that, when the schedule is repeated, no constraints are violated. Length of thisschedule is called the initiation interval, II.

– Sweany, Software Pipelining by Modulo Scheduling