Replication Explained - Designing Distributed Systems (Part 2)
For Part 1, check this out Designing Distributed File Storage Systems - Explained with Google File System(GFS) Overview: Replication in distributed systems involves the process in which we can have multiple copies of data at various locations to guard against unknown and random hardware failures, hence ensuring the availability of the system as a whole. While considering replication, we have to consider the basic assumption that machines will have uncorrelated failures , otherwise, replication will not help in any way. Replication should be considered or not, the number of replicas, etc all depends on the use case and the amount of inconvenience or how much it will cost you if you lose the data and compute power at a given point in time. Replication is achieved, intuitively , when we have two servers, one primary and a replica server, and we have to keep them in synchronize in a way if the primary server fails at any point in time, the replica server should have everything it needs t