10k

System Design Interview and beyond Note11 - Protect clients from server

Protect Clients from servers

Synchronous and asynchronous clients

Admission control system

Blocking I/O and non-blocking I/O clients

  1. Admission control system : system with load shedding and rate limiting -> back exception -> HTTP 429 and 503

  2. Client should avoid immediate retry -> retrying with exponential backoff and jitters is better

    1. Delays give servers to scale out
    2. Jitters distributes request more evenly
  3. What if servers are not elastic and the problem is not transient ?

  4. Blocking / non-blocking client (sync and async )

    1. Blocking:
      1. if the thread want to send requests to server, it starts new connection or reuse existing connection form the pool of persistent connections and execute I/O operations ->
      2. thread is blocked waiting for I/O to complete
      3. New thread for new connection
    2. Non-blocking
      1. Application threads don't use connection directly, use a request queue
      2. A single IO thread dequeue thread from the queue and establish a connection or obtain from the pool.
      3. Single thread for multiple requests
    3. image-20240328072033830
  5. Pros

    1. Sync : simplicity , easier to write test and debug client-side application; Latencies are lower in case os smaller number of concurrent request
    2. Async : higher thought in case of large number of concurrent request-> more efficient in handling traffic spikes ; More resilient to server outrages and degraded server performances -> client piling up request in the queue is much cheaper than piling up threads
  6. Different : how they handle concurrent thread

  7. How many concurrent requests a client can generate ? An example

    image-20240328072548287

  8. Nowadays , industry are shifting to non-blocking -> blocking server -> blocking clients

  9. No matte which server model can be suffered from this issue when the inching request exceeds the outgoing traffic and the processing request is accumulating -> exhausting the resources

  10. One way to resolve is to stop send request for a while , to allow server to process -> circuit breaker pattern

Circuit breaker

Circuit breaker finite-state machine

Important considerations about the circuit breaker pattern

  1. When the client received failed response from server, it counts them and when reach a limit it stops calling the server for a while;

  2. There are some open source libs for using -> Resilience4j or Polly -> you specify thress things

    1. Exception to monitor
    2. Count of exception(threshold)
    3. Time to stop sending request (timespan)
  3. A state machine that explain the state transition

    image-20240328074520942

    If from in half open, there are some other exception, it remains half open

  4. Things to consider

    1. Timer or health checks to determine when to transition to half open state (an health check endpoint)
    2. Circuit breaker instance has to be thread safe -> instances are shard -> use lock
    3. With rejected request -> buffer, failover, fallback, back pressure , cancel ..

Fail-fast design principle

Problems with slow service (chain reactions cascading failures) and ways to solve them

  1. Bad service

    1. Doesn't work (low availability)
    2. Doesn't know how to stand with faults (not fault tolerant)
    3. Doesn't know how to quickly recover from failures(not resilient)
    4. Doesn't always return accurate results(not reliable)
    5. Lose data from time to time(not durable)
    6. Doesn't know how to scale quickly(not scalable)
    7. Returns unpredictable results(supports a weak consistency model)
    8. Hard to maintain (doesn't follow operational excellence guidelines)
    9. Poorly tested (low unit/functional/integration/performance test coverage)
    10. Not secure(violates CIA triad rules)
    11. ...
  2. It's better to fail fast than fail slow. Fail immediately and visibly -> distributed system and OOD

  3. e.g.

    1. Object initialization: we should throw exception if we can't initiate completely

      1. java public class SomeClass { private final String username; // immutable object public SomeClass(@NotNull String userName) { this.username = userName; } .... }
    2. Precondition: Implementing precondition for input parameters in a function:

      1. java public static double sqrt(double value) { Preconditions.checkArgument(value >= 0.0); ... }
    3. Configuration validation: set configuration file and read properties from it. Fail fast and don't rely on default values

      1. java public int maxQueueSize() { String peroperty = Config.getProperty("maxQueueSize"); if (property == null) { throw new IllegalStateException("..."); } ... }
    4. Request validation : return exception back to client and don't try to set value and continue handling

      1. java private void validateRequestParameter(String param) { if (param == null) { throw new IllegalArgumentException("..."); } ... }
  4. Slow service kill themselves along with their servers, e.g.

    1. image-20240329073752984

    2. When a single consumer start to slow down, all the sender threads will be impacted , then the notifications queue will be consumed slower and thus fill up message, and producers will be slow and will not continue push or pulled(back pressure) messages by the queue.

      1. Cascading failure: one component causes the whole system component to fail (different part in a system)

        image-20240329074055239

      2. Chain reaction Or Server identified the slow consumer and no more send messages to it(like using circuit breakers) -> this will cause more load on the other consumers and slow down them (system type of components)

        image-20240329074310085

      3. For cascade failures, client should protect themselves .-> convert slow queries to fast queries by identify or isolate bad dependencies.

        • Timeouts
        • Circular breaker
        • Health check
        • Bulkhead
        • Chain reactions : server need to protect themselves
          • Load shedding
          • Auto scaling (quickly add redundant capacity)
          • Monitoring (quickly identify and replace failed servers)
          • Chaos engineering
          • Bulkhead

Bulkhead

How to implement

  1. Partition resources into groups of limited size and isolated groups , to isolate the impacted of failed parts from others health parts .

  2. The example in the above can be -> instead of having a single thread pool -> we have several thread pool for each consumer, each pool has a limited number of threads . One consumer and pool slow down will not impact others .

  3. More examples -> this pattern is used in a service that has many dependencies

    1. We have a order service, and depend by the inventory service and payment service
    2. We can do three option
      • Limit the number of connection to each dependency - easy to implement but we need to trust client won't mess the service
        • Set up separate connection pool for each dependency
        • Specify thee number of each pool
      • Limit the number of concurrent request for each dependency by counting requests
        • Count how many simultaneous request are made to each dependency ;
        • When reach limit, all requests to that dependency are immediately rejected.
      • Limit the number concurrent request by create per-dependency thread pools - highest isolation but expensive and complex
        • Create a separate fixed size thread pool for each dependency

    image-20240329081414036

  4. Find group limits can be hard -> and theses limits needs to be revised from time to time -> load test

Shuffle sharding

How to implement

  1. Bad client (make server suffer)
    1. Create a flood of request (a way more than a typical client )
    2. Send expensive request (CPU intensive , large damage, heavy response)
    3. Generate poisonous request (security bugs)
  2. Load shedding and rate limiting are not enough? The whole system can be organic each part should make effort
  3. We have 8 servers , divide into 4 groups and serve certain client , -> difference with bulkhead(availability) -> shuffle(scalability)
  4. Blast radius = number of clients / numbers f shards
  5. The shuffle is two clients in a group in a serve while they are not in a group in other serves -> this will allow lower impaction on the server -> for example, if 1 is down, 5 is also down in the above chart but when shuffle, when one is down , 7 and 4 is impacted in that server but 4 and 7 can be served in the server 6 and server 8 continually
  6. image-20240329083053482

  7. Note

    1. No possibility to isolate everyone to every one -> but chances decrease with less serve impacted.
    2. Clients need to know how to handle server failures-> timeout and retry failed requests
    3. Shuffle sharding needs a intelligence rounding component
    4. Assign clients to shuffle sharding in stateful and stateless manner
      1. Stateless : when assigning new client, we don't see current assignment
      2. Stateful: has a storage on the shard and client, choose a shard that minimize the overlap
Thoughts? Leave a comment