10k

System Design Interview and beyond Note9 - Deliver Data quickly

How to deliver data quickly

Batching

Pros and cons of batching

How to handle batch requests

  1. Batching : group messages into a singe request
  2. Pro:
    1. Increase throughput: less requests , less overhead , less connections
    2. Decrease cost(especially for cloud)
  3. Cons:
    1. Complex in sender and receiver
      1. Sender: takes time buffer messages and send out based on time or size; can be hard to implement and configure
      2. Receiver: process message one by one but what if one message fails? Roll back all of them and let sender send again to reprocess? If not how do we let sender to send the fail messages ?
  4. How does server handle batch reuqets
    1. Treat entire request as a single atomic unit -> request succeeds only when all nested operations complete successful
    2. Treat each nested operation independent and report back failures for each individual operation(common in practice)
  5. Batch request format
    1. Set of n request batched together - individual request combine , each has header and body
    2. List of N resource batch together -> message combine and return a list of successful and failed
  6. Failed request
    1. Retry entire batch : perfect if each request is idempotent , no harm to successful request and failed has a chance to retry
    2. Retry each failed individually
    3. Another batch request containing only failed individual operations
      1. And 3. Require additional effort for client but doesn't require idempotency
  7. SQS batch API
    1. Consumer pull 10 messages
    2. Delection ack returens to SQS;
    3. Consumer need to check the response , Successful message are mark processed, failed will not
    4. When the invisible flag timeout, those message will be processed by this or other consumer again .
  8. Kafka heavily relies on batching

Compression

Pros and cons

Compression algo and the trade-offs

  1. Less bits used to transfer after compression

  2. Pros:

    1. Lower throughput while transmission messages over the network since we have less data to transfer
    2. Less data to store -> increase storage capacity
    3. Decrease cost (some service rate on data amount)
  3. Application:

    1. Server compress http data for faster transfer (browser decompression)
    2. Database : RockDB: SSTable
    3. Messaging system
  4. The bigger the more effective

  5. Compression and decompression consume computational resources , but that's ok

  6. Types:

    1. Lossless : permanent delete data for compaction, used in multimedia data , audio video (streaming)
    2. Lossy : store without losing : HTTP request and response
  7. Compression alg trade off

    1. Compression Speed: important for write heavy app
    2. DeCompression speed : important for read heavy app
    3. Compression ratio: important for store data on disk
  8. Algorithms

    alg Compression Speed Decompression speed Compression ratio
    Deflate(Gzip): standard for HTTP compression B B A
    Snappy: created by google, used extensively in google projects like bigTable and map reduce . Many NoSQL dbs supports it. A- A B
    Zstandard : created by Facebook , widely used in file system and database A- A- A+
Thoughts? Leave a comment