System Design I
Recently, I read a book about System Design called
System Design Interview: An Insider’s Guide by Alex Xu, II edition.
Here’s my take on it.
System Design Interviews are one of the most difficult questions to tackle as the interviewee is expected to design an ideal architecture for a software system!
It is essential to understand all the basic and necessary set-up requirements before we start designing.
Do you know what happens when you type in a URL in an address bar?
Here’s where the concept of Domain Name System (DNS) comes into picture. Usually, the Domain Name System (DNS) is a paid service provided by 3rd parties and not hosted by our servers.
1. Users access websites through domain names, such as api.mysite.com.
2. Internet Protocol (IP) address is returned to the browser or mobile app. In the example, IP address 15.125.23.214 is returned.
3. Once the IP address is obtained, Hypertext Transfer Protocol (HTTP) requests are sent directly to your web server.
4. The web server returns HTML pages or JSON responses for rendering
How do you make the choice of Databases though?
We can choose between 2 options.
Relational Databases (RDBMS)- Relational databases represent and store data in tables and rows. You can perform join operations using SQL across different database tables. The most popular ones are MySQL, Oracle database, PostgreSQL, etc.
Non-Relational Databases (NoSQL) - These databases are grouped into four categories: key-value stores, graph stores, column stores, and document stores. Join operations are generally not supported in non-relational databases. Popular ones are CouchDB, Neo4j, Cassandra, HBase, Amazon DynamoDB, etc.
When your website gets more traffic, we need to make adjustments to your infrastructure to handle additional workloads. There are 2 ways in which we can increase the scalability of any website.
Horizontal vs Vertical scaling
Vertical scaling, referred to as “scale up”, means the process of adding more power (CPU, RAM, etc.) to your servers.
Horizontal scaling, referred to as “scale-out”, allows you to scale by adding more servers into your pool of resources.
When traffic is low, vertical scaling is a great option, and the simplicity of vertical scaling is its main advantage. Unfortunately, it comes with serious limitations.
• Vertical scaling has a hard limit. It is impossible to add unlimited CPU and memory to a single server.
• Vertical scaling does not have failover and redundancy. If one server goes down, the website/app goes down with it completely.
Horizontal scaling is almost always more desirable than vertical scaling because you don’t get caught in a resource deficit.
Load Balancer
A load balancer evenly distributes incoming traffic among web servers that are defined in a load-balanced set. This enhances scalability, and ensures that server runs irrespective of work load. This prevents Denial of Service.
- If server 1 goes offline, all the traffic will be routed to server 2. This prevents the website from going offline. We will also add a new healthy web server to the server pool to balance the load.
- If the website traffic grows rapidly, and two servers are not enough to handle the traffic, the load balancer can handle this problem gracefully. You only need to add more servers to the web server pool, and the load balancer automatically starts to send requests to them.
The primary purpose of a cache is to increase data retrieval performance by reducing the need to access the underlying slower storage layer.
Cache
A cache is a temporary storage area that stores the result of expensive responses or frequently accessed data in memory so that subsequent requests are served more quickly. The cache tier is a temporary data store layer, much faster than the database.
After receiving a request, a web server first checks if the cache has the available response. If it has, it sends data back to the client. If not, it queries the database, stores the response in cache, and sends it back to the client. This caching strategy is called a read-through cache.
CDN work as cache but for static content.
Content Delivery Network
A CDN is a network of geographically dispersed servers used to deliver static content. CDN servers cache static content like images, videos, CSS, JavaScript files, etc.
It enables the caching of HTML pages that are based on request path, query strings, cookies, and request headers.
The workflow of CDN is demonstrated above.
Shared Storage or No-SQL
When we horizontally scale our web tier, we need to move state (for instance user session data) out of the web tier. A good practice is to store session data in persistent storage which can be a relational database or NoSQL. Each web server in the cluster can access state data from databases. This is called a stateless web tier.
Data Center
Finally, this is what an ideal data center should look like!
In the event of any significant data center outage, we direct all traffic to a healthy data center.
Several technical challenges must be resolved to achieve a multi-data center setup:
• Traffic redirection: GeoDNS can be used to direct traffic to the nearest data center depending on where a user is located.
• Data synchronization: Users from different regions could use different local databases or caches. In failover cases, traffic might be routed to a data center where data is unavailable.
• Test and deployment: It is important to test your website/application at different locations. Automated deployment tools play a major role to maintain consistency in data throughout different locations.
Thanks for reading this article. In the near future, I will be posting more on System Design, so stay tuned! ❤
Connect with Me!
Feel free to get in touch with me or email me at vidhik2002@gmail.com!