System Design Interview and beyond Note9 - Deliver Data quickly

Published on March 23, 2024

How to deliver data quickly

Batching

Pros and cons of batching

How to handle batch requests

Batching : group messages into a singe request
Pro:
1. Increase throughput: less requests , less overhead , less connections
2. Decrease cost(especially for cloud)
Cons:
1. Complex in sender and receiver
  1. Sender: takes time buffer messages and send out based on time or size; can be hard to implement and configure
  2. Receiver: process message one by one but what if one message fails? Roll back all of them and let sender send again to reprocess? If not how do we let sender to send the fail messages ?
How does server handle batch reuqets
1. Treat entire request as a single atomic unit -> request succeeds only when all nested operations complete successful
2. Treat each nested operation independent and report back failures for each individual operation(common in practice)
Batch request format
1. Set of n request batched together - individual request combine , each has header and body
2. List of N resource batch together -> message combine and return a list of successful and failed
Failed request
1. Retry entire batch : perfect if each request is idempotent , no harm to successful request and failed has a chance to retry
2. Retry each failed individually
3. Another batch request containing only failed individual operations
4. 1. And 3. Require additional effort for client but doesn't require idempotency
SQS batch API
1. Consumer pull 10 messages
2. Delection ack returens to SQS;
3. Consumer need to check the response , Successful message are mark processed, failed will not
4. When the invisible flag timeout, those message will be processed by this or other consumer again .
Kafka heavily relies on batching

Compression

Pros and cons

Compression algo and the trade-offs

Less bits used to transfer after compression
Pros:
1. Lower throughput while transmission messages over the network since we have less data to transfer
2. Less data to store -> increase storage capacity
3. Decrease cost (some service rate on data amount)
Application:
1. Server compress http data for faster transfer (browser decompression)
2. Database : RockDB: SSTable
3. Messaging system
The bigger the more effective
Compression and decompression consume computational resources , but that's ok
Types:
1. Lossless : permanent delete data for compaction, used in multimedia data , audio video (streaming)
2. Lossy : store without losing : HTTP request and response
Compression alg trade off
1. Compression Speed: important for write heavy app
2. DeCompression speed : important for read heavy app
3. Compression ratio: important for store data on disk

Algorithms

alg	Compression Speed	Decompression speed	Compression ratio
Deflate(Gzip): standard for HTTP compression	B	B	A
Snappy: created by google, used extensively in google projects like bigTable and map reduce . Many NoSQL dbs supports it.	A-	A	B
Zstandard : created by Facebook , widely used in file system and database	A-	A-	A+