Book Notes : Designing Data-Intensive Applications
Category: System Design
Tag:
By:
David
On:
Sat 20 February 2021
Good Book on System Design. To be updated.
Notes
- Part 1. Basics
- 1 Core ideas
- three main concerns of software systems
- Reliability: The system should continue to work correctly
- Good fault tolerance as it is impossible to reduce fault to zero
- Deliberately inducing faults to test the system
- Scalability: The system's ability to cope with increased load
- The system can grow and scale with easy
- eg. Designing Twitter: use a hybrid approach for common users and celebrities
- Scaling
- vertical scaling: making the machine more powerful
- horizontal scaling: distributing the load across multiple machines
- Maintainability: Different people can work on the system productively
- Design principles
- Operability: make it easy for the ops team to keep it running
- Simplicity: keeping it as simple and avoid complexity. make it easy for new engineers to onboard
- Evolvability: make it easy for changes into the future
- Design principles
- Reliability: The system should continue to work correctly
- three main concerns of software systems
- 2 DB: Data models and query
- relational document and graph
- Object-Relational Mismatch (impedance mistmatch): db is relational while app code is object oriented
- 3 DB: Data storage
- storage engines: log structured, page oriented
- increasing db index speeds up reads, but slows down writes
- SSTable (sorted string table): key value storage -> LSM-tree
- B-tree: self-balancing tree that maintains sorted data
- Column Oriented Storage
- 4 Encoding
- changes happen to requirements, application code, and data -> compatibility
- backward compatibility: newer code can read data that was written by older code
- forward compatibility : older code can read data that was written by newer code
- Formats (need encoding and decoding between the two)
- in memory: objects and data structures (decoding: parsing, deserialization, unmarshalling)
- on disk/over network: encoded (encoding: serialization, marshalling)
- encodings
- JSON, XML, CSV
- problems
- ambiguity around encoding of numbers. precision ?
- No binary string support
- Optional schema support for XML and JSON, none for CSV
- problems
- binary encoding: more compact
- binary encoding libraries: Protocol Buffers, Apache Thrift
- JSON, XML, CSV
- schema evolution
- adding a field:
- backward: can't make it required, must be optional or have a default value
- forward: adding new field is fine, as long as it is a new tag number
- removing a field:
- backward: can only remove a field that is optional
- forward: can never use the same tag number again
- renaming a field: is like removing and adding at the same time: do this with caution
- adding a field:
- changes happen to requirements, application code, and data -> compatibility
- 1 Core ideas
- Part 2. Distributed
- 5 replication
- 6 Partitioning/Sharding
- 7 Transactions
- 8 Failovers
- 9 Consistency
- Part 3. Processing Data
- 10 Batch
- 11 Stream
- 12 Future