10k

System Design Interview and beyond Note7 - Efficient Communication

Push vs Pull

Pros and cons of push and pull model

  1. Push: broker pushes messages to the consumer;(broker in charge)
    1. Pro: Optimized for latency: broker can send out message immediately ;
    2. Pro: client(consumer) logic is lightweight -> only send message and don't track message status ;
    3. Pro: load balancing among competing consumers ;
    4. Con: flow control-> need to send fast or slow depends on the consumption speed;
    5. Con: due to this flow control, the broker logic is more complex.
  2. Pull: consumer pull data from broker;(consumer in charge)
    1. Con: polling a tight loop: consumer needs to check often ;
    2. Con: client logic is more complex;
    3. Pro: optimized for throughput -> control consume speed ;
    4. Pro: better suit for message replay;
      1. Push model send the message out and may delete it; if we don't delete it and want to keep it for a while; -> challenge is : track the next message position for every consumer;
      2. Pull mode for each consumer to track the last message processed; -> send to broker when request(pull) next messages; -> if there is a bug and fixed in 24 hours later, consumers may rewind the position 24 hour back and replay all messages one more time.
    5. Pull based model is better for replay, but push also can do it though.

Host Discovery

How to design a DNS-like system

How DNS works

Anycast network routing method.

  1. Consumer and broker needs to find each other.

  2. Lets start with a single machine situation: ->

    1. Broker has a host name and ip,

    2. Consumer start up and look for ip address by host name in DNS

      image-20240314080335709

  3. DNS: like a phone , maintain a list of hostname and translate to IP address , large amount

  4. One server cannot hold many data-> data chunked -> to accelerate lookup -> b-tree in storage.

    1. How to chunk(split) -> range or domain (com, org, info...)
  5. But if every request goes from the root server and do a level look up query, this would be bottleneck and single point failure

    1. Single point failure -> redundancy -> each node is actually a set of servers.
    2. Performance -> use cache

    image-20240314081006091

  6. Actual workflow

    1. Consumer looking for ip of broker;
    2. Consumer talk to local cache in the OS, if found return the IP, else
    3. Not found -> talk to ISP
    4. IPS check its cache, if found return the IP , else
    5. Not found -> talk to top level servers -? TLD DNS server -....-> final level ;
    6. Authoritative DNS server return the ip and TTL to ISP DNS resolver(server)
    7. Then return those messages to the consumer
  7. Basic knowledge

    • Hundreds of root name servers spread all over the world
    • Root servers are divided into 13 sets;
    • 12 orgs operates each set of servers;
    • Every root severs has there IPs and every ISO DNS serves knows list of 13 root servers
    • Any cast routing is used to distribute request across root servers based on load and proximity("distance")
    • Caching used different places to off-load the root servers (browser, OS, iSP resolvers)
  8. Anycast: network routing method

    1. Typically a machine has an IP address ;

    2. Anycast allows multiple host have the same IP address within a network

    3. Routers in a network will select a single host in a group based on least-expensive routing strategy.

      image-20240314082144888

  9. Pros

    1. Anycast provide high availability -> single machine down, others still work
    2. Anycast -> load balancing
    3. Widely used in DNS and CDN
    4. Help mitigate DDoS -> route request among serves
  10. DNS records types

    name type Value
    foo.example.com A 192.0.1.18
    bar.example.com CNAME foo.example.com
    exaple.com TXT "some text"

Service Discovery

Server-side and client side discovery patterns

Service registry and its applications

  1. In micro services, when all instances of service A wants to identify all instance and port of service B, how to find them?

  2. Server side discovery : use reverse proxy or load balancer, they know all the information of service B, service A send request to load balance and it will forward the request to an available service B instance.

  3. Client side discovery: service registry, service B sends all its instance info to the registry and instance in service A fetch the info from registry, then send request to service B;

  4. Although DNS is like the service side discovery and can be used, it's not the best :

    1. Big cluster servers are added and removed quickly. Some may be unavailable ;
    2. DNS with its propagation delays and cache doesn't suit for big and dynamic clusters
  5. Service registry in detail:

    1. Several server talk to each other(HTTP/TCP)
    2. Service instance needs to send heartbeat to registry
      1. If no heartbeat, it will be unretisgter

    image-20240315201835910

    1. Service A may cache registry data in case of service registry are not available and service A can still send request to B
  6. Implementation

    1. In application
      1. Pro: Implemented in the same language with application
      2. Cons Request a new client for every language
    2. Deamon thread
      1. Pro: Different programming language
      2. Pro: less error prone since daemon thread has its own memory and faults are isolated
      3. Cons: Harder to implement
      4. Pro: when more application in a single machine, we only need one registry (no need for each application, its information can be shared )

    image-20240315202449996

Peer Discovery

Peer discovery option

Membership and failure detection problems

Seed nodes

How gossip protocol works and its applications

  1. How does service A and B find service registry ? And how does service registry know each other?

  2. Option1: enumerate all the service registry info and share the info to other registry and service instances. -> Apache Zookeeper use this.

    image-20240317084537360

  3. In the above option, Deploy this info list to a machine or upload to a shared database -> updating requires updates the list redeploy or re-upload

  4. DNS -> TXT -> Netflix Eureka adopt this way.

  5. Database -> gossip protocol -> as long as people are connected, info give to one person will be propagated to the network.

  6. Use case for peer discovery

    1. Newly added machine can quickly identify other machines in the cluster
    2. Machines in the cluster quickly identify a failed machine ;
    3. Every machine knows the peers
    4. -> membership recognition and failure detection
  7. Gossip algo

    1. Server A in the cluster
    2. Server B join the cluster, how do they know each other ?
    3. Server A is a seed who is known to all the others;
    4. Node B needs to somehow know the node A's info; -> a static file contains the list for hostnames of seed nodes; or DNS
    5. Node A and node B shared info to each other;
    6. Node C joined; the same pattern and node C and node A shared info to each other. -> node C will know B; -> B and C can start gossiping to each other;
  8. Cassandra

    image-20240317085850817

  9. How the cluster know other node is down? -> heartbeat counter -> an integer ; this heartbeat is send with messages -> nodes maintain a list of others heartbeat and time stamp(last update) -> check periodically, if no heartbeat for a while -> mark died -> will not gossip with this down node. -> but still take to the down node to see if it's back

  10. Gossip protocal

    1. p2p
    2. Decentralized (no coordinator)
    3. Fault tolerant (some down don't affect the clsuter)
    4. Lightweight fast and efficient;
  11. Gossip application

    1. Database (Riak, Cassandra, Dynamo)
    2. Service discovery system(Consul)
    3. Object storage (S3 spread info to the system)
    4. Distributed cache (Redis share info)
    5. Distributed rate limite

How to choose a network protocol

When and how to choose between TCP, UDP and HTTP network

  1. we know how nodes find each other -> next we talk about how they talk to each other. -> network protocol

  2. TCP:

    1. Reliable and ordered

    2. Connection oriented -> connection estabilished before transmit packages

      1. 3 steps handshake

        image-20240317095103712

      2. Gossiping has similar pattern

      3. Clients may establish multiple connections servers , TCP connections consumes resources(mainly memory) -> modern servers can handle hundreds of thousands simultaneous TCP connections. e.g. IoT ; but handling many request is not equal to handling many connections.

        1. Thread per connection
    3. Fast and reliable -> speed matters and we need to avoid losing data.

  3. HTTP: define how network message frame in client and reassemble in server

    1. HTTP/1.0: connection per request
    2. HTTP /1.1: persistent connections: multiple request in one connection ;
    3. TCP for security is not enough, in private network it's ok but in public network, arbitrary TCP traffic is not allowed.(direct TCP/UDP request) -> need HTTP
    4. A bit slower but reliable
    5. Good for public api
  4. UDP: no connection, no handshake

    1. Fast not reliable -> data loss is tolerable

Network protocols in real-life systems

  1. What network protocol should we use to publish messages to the messaging system and to retrieve messages?

    Depends on the user, internal or external or both

    • Public : HTTP -> AWS SQS, SNS, Kinesis ;
    • Internal : TCP -> provide max throughput -> Kafka -> use a custom binary protocol over TCP
      • Producers and consumers initiate socket connections to the broker machine and maintain these persistent connections ;
      • We can access kaka clsuter from the internet
        • Open TCP ports to the world;
        • By some proxy tools
  2. A microservice, called by client and other services -> protocol choose?

    1. HTTP for internal and public -> RESTful;
    2. If performance matters and it's private use -> TCP / UDP -> Thrift can do this.
  3. Database

    1. If it'a public cloud database service -> HTTP -> DynamoDB
    2. More often -> database is not public accessible -> reliable and fast -> TCP -> Oracle , TCP/Cassandra
  4. Database replication

    1. Reliable and performance matters -> TCP
    2. CouchDB -> HTTP (it's ok)
  5. Gossip membership -> heartbeat / distributed cache problem / rate limiting (server talk to each other of the info)/ leader election -> what protocol

    1. Membership problem : TCP or UDP (gossip have many rounds)
    2. Distributed cache : TCP or UDP(some cache miss is fine)
    3. Rate limiting : both are fine(tcp slower but more accurate, udp fast but may under report request count...)
    4. Leader election: both are fine. Consider cluster size, if small they are both fine but fi it's big, TCP will cause more overhead due to connection establishments .

Videos over HTTP

Problem of media streaming

Adaption streaming

  1. It depends : web browser / mobile ...

  2. Mobile native application -> all three options are available -> libraries supported

  3. Web browser -> all three options are available -> libraries supported

    1. webRPC : a collection of protocols and interfaces that enable communication of P2P . UDP by default, high quality
    2. WebSocket : TCP
    3. Adaptive streaming : HTTP
  4. Older model :

    image-20240317111749308

  5. Better user experience: switch video quality / switch language -> this can't be done by this model

  6. And network condition are not considered ; and if we want to support multi-language or video quality , we need to upload video files for each resolution and language combination -> can be mitigated by separate video files into chunks .

  7. But we need re-download file each time user change the language or video resolution option.

  8. Chunks -> each is a 2-10s segment.

  9. Media player in browser (client) uses a bit rate adaption algorithm to select higher bit rate allowed; when network becomes bad, it select lower video quality segments ; when network is back good, higher video quality segments is downloaded.

    image-20240317112459591

  10. Same idea applies to the audio selection

  11. Metadata file: index/manifest : For clients to know what video/audio qualities are available ->download first

    image-20240317112811780

  12. Live streaming can be same idea

    1. Uploader upload videos / live stream pushed -> files are persisted on storage
    2. At the same time many jobs triggered to generate segments for different quality / language
    3. These files are stored into storage , and index/manifest are stored in a separate database;
    4. When video viewer open browser and request a video, web server push a player app to browser ;
    5. Then get the manifest file downloaded; video in proper quality available are played (segments files are retrieved from storage)

CDN

How to use CDN

How CDN works

Point of presence(POP)

Benefits of CDN

  1. Where and how store content? -> two to consider:

    1. Read latency for India, china, Europe user request(if the storage is in the US);
    2. Scalability: server capacity and network bandwidth for many request
  2. Bring content closer to viewers , and replication and caching -> this solution is CDN.

  3. CDN: distributed network of servers placed across the globe with the purpose of delivering web content to users as fast as possible.

  4. How to use CDN:

    1. Image upload to a application server;

    2. Server register the image to CDN;

    3. CDN returns the new URL to the service;

    4. When user request to view the image, the new URL will be returned from the closest CDN server;

    5. If there is no such recourse, the CDN will make a request to the original server; -> like a read thought cache mechanism -> this whole thing is called pull CDN.

    6. Push CDN: content is pushed to CDN servers every time it is publish to the original server.

    7. TTL needs to be configured ; (HTTP headers can help control it)

      image-20240318074941640

    8. POP: point of presence, geographically located servers ; in each POP, two types of servers:

      1. Cache server: cache contents
      2. Request counting server: know all cache server in current POP
    9. The request routing server address is registered to the ISP; ISP's DNS resolver will route to the closest POP-> how -> anycast -> then browser make a call to the content server;

    10. When a POP is busy (unavailable)-> it may forward the request to others;

    11. Video delivery -> update manifest to CDN urls

    12. Benefits

      • performance (low latency due to distance ) ;
      • scalability (horizontally scalable POPs);
      • reliability (duet data redundancy , failover to a different server or POP, load balancing) ;
      • Anycast can allow load balancing to prevent DDoS.

Push and pull technologies

Short polling

Long polling

Websocket

Server-sent events

  1. Short polling : client make calls to server and server response back whatever data it has;

    1. Will response none, one, or several messages
    2. Simple to understand and easy to implement
    3. Hard to define poll interval : on the one hand we want data pulled as fast as possible ; on the other hand, this will consume too much resources (mostly on the broker side, authentication, connection management , authorization, rate limiting , TSL termination, request validation ....)
  2. Long polling: client make call to broker, if it has message it response, but if no messages, client wait broker, broker wait messages ... -> two possible things will happen: new message to the broker and message is send to the consumer; or no messages , broker send empty response to consume after a waiting time

    1. This value can be static file configured or
    2. Passed from broker
  3. After a message is returned, consumer initiate a new request to broker;

  4. How data is pushed from server:

    1. TCP bidirectional , but HTTP doesn't allow server use HTTP connection at his discretion;
    2. TCP send bidirectional byte stream data (not developer friendly)
  5. Long polling is more suitable for messaging system

    1. Long polling reduces the number of empty responses;
  6. Short polling if consumer wants immediate response -> e.g. consumer use single thread to poll multiple queues -> if long polling, this thread may wait for a queue if no messages -> others will be blocked even if they have messages -> best practice-> one thread per queue

  7. Websocket

    1. Message based protocol
    2. Works over HTTP ports 8 and 443(secure port)
    3. Reuse TCP connection initially established for an HTTP request
    4. Major browser and web server supported
  8. Everything starts from client with a addition header field : Upgrade: websocket , if server support , it will response with this field , once the handshake established on the initial TCP connection, they can send message via websocket

    1. Fast
    2. Bi-directional
    3. Natively supported by browser
    4. Alternative of TCP when TCP is not available for security reasons.
  9. Server-push : some time we don't need bi-directional->e.g. server push notification to client

  10. Workstream

    1. Client send a mail request
    2. Server response a JS code to establish a persistent HTTP connection (connection: keep-alive);
    3. Server check new mails, and response back to client each time new mail comes;
    4. If connection drops, client will try to re-connect
    5. Message data is string , message can send JSON/XML data format
  11. Pros and cons

    1. Pro: simplicity (easy to implement in client and server); Server Sent Events are sent over HTTP(no need a custom protocol)
    2. Con: mono directional (server ->client ); limit to text data(UTF-8)(no binary data); limit number of connections a single browser can have.

    Push and pull technologies in real life systems

    What tech would you choose for various system design

    1. Warm up

      Techs Push/Pull Short/long Notes
      AWS SQS Pull Both normally use long polling to eliminate empty response
      Apache Kafka Pull Short and long likewise 1. Use a proxy to accept HTTP request , 2. and proxy talk to Kafka over TCP Make consumer fetch often(short polling like) ; or wait until more messages come (long polling like)
      AWS Kinesis pull(HTTP) Short(wise) receive a batch message, process them and make a call to the next patch; if no message, client sleep one second
      RabbitMQ Push websocket
      Apache ActiveMQ Push websocket
    2. Choose in real cases

    3. App Solution Reasons
      Realtime chat Websocket
      long polling with HTTP from client
      Server push with HTTP from client
      Bidirectional communication, reliable; But can be achieved by long polling(support older version browser); server push with HTTP response (easy to implement)
      News feed Websocket
      long polling
      short polling
      server push
      Stock price Websocket
      server send event
      High frequency message with small payload, long/short polling is not good here due to the message header is big compared to the message content itself
      Browser game Websocket
      Collaborative editor Websocket
      long polling with HTTP from client
      Server sent events

      They can achieve but some are not suitable . For example, we can push notifications clients using websocket but if we don't have client send back message, we can use SSE.

    4. Always consider pros and cons, try to simplify things and avoid over engineering.

    Large-scale push architectures

    C10K and C10M problems

    Examples of large scale push architectures

    The most noticeable problems of handling long-lived connections at large scale

    1. From server perspective, handling thousands connections may be an issue -> C10K: how many concurrent connections a single server can handle. -> solved

    2. C10M: handle 10M concurrent connections on a single machine -> solved

    3. Handling millions of concurrent request not equals to handling millions of concurrent connections.

      1. Handling millions of concurrent request is about speed of processing;
      2. Handling millions of concurrent connections is about efficient scheduling connections. -> impossible now
    4. Mail.Ru architecture

      image-20240319075439883

      1. First use short polling -> 60% return empty response
        1. Browser makes HTTp request every few seconds;
        2. Server make call to the storage often;
      2. The websocket
        1. Broswer establishe a connection to server;
        2. When there is new messages, they are pushed to a queue from storage;
        3. The server gets email , it send message to users browser ;
    5. Netflix (streaming service)

      1. Every thing starts from the API gateway(zuul), performa authentication and authorization, logging and monitoring; request routing and load balancing, rating limiting and load shedding..., support push messaging(websocket and server push event )
      2. Use open Netflix app; a persistent connection is establish to an Zuul server;
      3. After successful authentication, serve register the users in push registry(Cassandra, DynamoDB, Redis), who store information about which server is connected which Zuul server; -> to push message to right client
      4. Message producers(backend services) send messages to message queue service(Kafka); message with ID for who to send the message
      5. Then Kafka then send message to processor, and processor looks for client user in the push registry;
      6. If found, mean user app still connected to the API gateway server Zuul, then message processor connect directly to the zuul server and pass message to it
      7. Server push message to user ;
      8. If not found, (user not connected anymore), the message is dropped.
    6. Single zuul server can handle tents of thousands connections; a cluster of such servers can handle millions of connections.

    7. Push Registry -> use highly available store -> low latency

      1. The write happens when app connected to zuul server(once), and read happens for each message (more reads)
    8. Kafka high scalable

    9. Message processor : many instances in parallel -> auto scaling ->Netflix runs in cloud -> cost saving

    10. Handling long lived connections of millions connections is a challenge:

      1. Blocking I/O :Thread per connect is poorly scalable, -> non blocking I/O is better -> zuul is based on Netty(non blocking )
      2. Server restarts -> we can force clients to reconnect to a different server or migration connections without reconnecting clients -> avoid thundering herd problem(when many clients try to establish new connection at the time) -> slower shutdown, not all at once
      3. Server failure : many small serve is better then one small server -> clients will try to connect to others, if one big serve is down, many clients will try to connect others at the same time-> hard for system
      4. Older load balancer versions cut WebSocket connections : connection s are cut after some time periods inactivity ; newer version can support this by adding proxy natively. Or another option is to run a TCP load balancer at layer 4 instead of as an HTTP LB at layer7
Thoughts? Leave a comment