Building modern software systems is a challenging job now days. Clear understanding of the systems, knowledge of various techniques and architecture helps to cope with these challenges.
Your system can scale like Zoom did it during Covid-19 without issue or fail and face challenges like Signal faced on 15th-January 2021.
At-least you need clear understanding of these 3 elements and know that what they mean along with various techniques, architectures and algorithms used to achieve these goals.
System should be performing correct functions and working correctly with desired level of performance. Even when things go wrong. The most common way things go wrong are Hardware faults, Software Errors and Human errors.
Hardware faults can be reduced by adding redundancy to individual hardware components to reduce the failure rate. As long as we can restore a backup onto a new machine quickly, the downtime is not fatal.
Software faults tend to cause more than hardware faults and can be reduced by carefully thinking about assumptions and interactions in system. thorough testing and monitoring of the system in production.
Human errors can be reduced by introducing good practices and good design. Setting up details monitoring of the systems, allow quick recovery rollback strategies.
Reliability is not only important for Rockets and Jets, Issues in business applications can cause huge lost in revenue and reputation.
Growth of systems can cause issues, if you haven’t design and developed systems with ability to cope with increased load. Performance shouldn’t be decreased even when load in increased. Loads i.e requests per second, read/write ratio and active users etc.
Two Approaches can be taken to cope with load:
Scaling up (vertical scaling): move to a more powerful machine.
Scaling out (horizontal scaling): distribute the load across different machines.
Always think horizontal scaling whenever possible
Maintainability is the ability of the system to support changes. System must be easy to understood, repaired, or enhanced. Cost of maintaining system is higher than initial development.
- System should be simple and easy for new engineers to understand and enhance it.
- It should be easy for operation teams to keep the system running smoothly.