1. Understand the problem and establish design scope
Ask questions and clarifications
-
An example?
You have a url
https://www.systeminterview.com/q=chatsystem&c=loggedin&v=v3&l=long
,and your service create a short url:https://tinyurl.com/y7keocwj
, when you click the url, it redirect you to the original url. -
What is the traffic volume?-> 100 million per day
-
How long is the shorten url? -> as short as possible
-
What characters are allowed in the shorten url? -> a-z, A-z, 0-9
-
Are we allowed to update and delete the short urls? -> no
Basic use case:
- URL shortening: given the long url, return the short one
- URL redirecting: click the short url, redirect to the original url
- Availability, scalability, fault tolerance
Estimation
Write per day: 100 million requests per day
Write per second: 10*8 / 24/3600~= 1160
Assume read/write is 10:1, we got read per sec is 11600
Assume this service will run for 10 year, the record number would be 100 million*365*10=365 billion records
Assume average Len of url us 100 byte, storage over 10 years = 365 billion*100Byte = 36.5 billon KB = 36.5 million MB = 36500 GB = 36.5 TB
2. High level design
API Endpoint
- Use restful API
POST api/v1/shorten
, parameter would be{url: <longURL>}
, return short urlGET api/v1/<shortURL>
, redirect to corresponding long url.
URL redirecting
Figure 1 shows what happens when you enter a tinyurl onto the browser. Once the server receives a tinyurl request, it changes the short URL to the long URL with 301 redirect.
The detailed communication between clients and servers is shown in Figure 2.
301 redirect - the requests permanently moved to the long url
The browser caches the long url in the first response (301), and later will request the long url directly
302 redirect -the requests temporarily moved to the long url
Each time the browser send a request, it will go to the short url first and them get 301, and then request the long url.
Use 301 redirect makes sense for there would be too many unnecessary requests to the tiny url server, however, if analytics is important, 302 redirect is a better choice as it can track click rate and source of the click more easily.
URL shortening
Hash table is a good choice for storing such thing: {shortUrl, longUrl}. And for the function we could implement a hash function.
3. Deep dive
Data Model
We could store the data in a hash table in memory, but the resource is limited and it's hard to scale.
We could store the data in a relational database
Hash function
Hash value length
We are going to fun this service for 10 year, 365 billion records generated. We have a-z,A-Z, 0-9 62 characters available in total. To support 365 billion records, 65^n >= 36 billion. After calculation it's 7.
Hash function and collision
-
Some well known hash functions: CRC32, MD5, or SHA-1.
Hash function Hash value (Hexadecimal) CRC32 5cb54054 MD5 5a62509a84df9ee03fe1230b9df8b84e SHA-1 0eeae7916c06853901d9ccbefbfcaf4de57ed85b -
No of them are suitable. CRC has 8 digits which is greater than 7. Remember, we need to keep the short url as short as possible.
-
If we use CRC32 8 digits and remove the last digits, that would be 7 but we got more chances of collision.
-
We could append some predefined strings to the url and rehash until we got a non-repeated key.
-
It's expensive to check if a key exists in database, so we can use bloom filter which is a effective technique to check is an element is a member of a set,
-
Base62 conversion
-
Base conversion is also commonly used for shorten url.
-
For unitizing this base conversion technique, we need to introduce unique ID for each records.
-
This is how it works:
convert 1115710 to base 62 representation (1115710 represents 11157 in a base 10 system).
- From its name, base 62 is a way of using 62 characters for encoding. The mappings are: 0-0, ..., 9-9, 10-a, 11-b, ..., 35-z, 36-A, ..., 61-Z, where ‘a’ stands for 10, ‘Z’ stands for 61, etc.
- 1115710 = 2 x 622 + 55 x 621 + 59 x 620 = [2, 55, 59] -> [2, T, X] in base 62 representation. Figure 6 shows the conversation process.
- Thus, the short URL is
https://tinyurl.com/2TX
Comparison
Hash + collision resolution | Base 62 conversion |
---|---|
Fixed length | Not fixed, goes up with ID |
No need of UUID | Depends on unique ID(I think it's because of the conversion is number things and original long url contains characters) |
Collision exists and needs to be handled | No collision because of unique ID |
No order | Can be guessed next feasible url if ID increate be 1 for a new entry-> security |
URL shorten deep dive
URL redirecting deep dive
(My original design)
4. Wrap up
If there are extra time:
- Rate limiter for the service
- Server Scaling
- DB scaling
- Analitics
- Availability, consistency and reliability.