nsasilent.blogg.se

There aren’t many references that are from 2016 or 2017. At the end of each chapter there’s each time a huge list of references, and many of them are almost a decade old. Most concepts are of course still true today, but you can see the book focuses more on the early big data stuff (Hadoop, HDFS) and much less on more modern systems like Spark (especially Databricks) and Snowflake for example. That doesn’t seem very old, but in today’s world of constantly changing technology, it’s already ancient. If you need to select a certain database product, it’s helpful to know what some features actually do.Īnother “problem” with the book is that it’s already 5 years old. If you start a large data project, it’s good to know what leaderless replication is for example. All of the information is still useful though. All very interesting if you want to learn more about the concepts behind distributed systems, but again, I would have liked more design patterns. Part 3 talks about batch processing and stream processing, and the future of data systems.

Part 2 dives deeper into the distributed concepts such as transactions (which are obviously harder on multiple systems), replication, partitioning, consistency and consensus. Part 1 of the book explains basic concepts such as databases, data models, query languages and storage.

Most of the book is spent on explaining concepts and how they impact large scale distributed data systems. I expected more of this design patterns, but unfortunately that’s not the case (in the final chapter there are some general design principles, but that’s it). The author then gives an example on how this could be solved. If they post a message, you suddenly need to update millions of timelines. At the start of the book, there’s a use case of how a distributed system could support a platform like Twitter, where some users have millions of followers. I learned a lot about the challenges of distributed systems (scalability, transactions, consistency etc.), but for a book of which the title starts with “designing”, it doesn’t actually talk about designing that much. There’s some jargon in there, and although Martin does a great deal of effort to explain concepts thoroughly, some (basic) concepts are just left as-is. It’s quite technical, and I wouldn’t recommend it to anyone who doesn’t have a basic grasp of databases (relational or NoSQL). It’s quite a big book (around 545 pages), but I enjoyed it. The author has worked at companies such as LinkedIn, where he has built large distributed systems to handle data, so I guess he knows what he’s talking about (he’s also a researcher at Cambridge University) The book Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann was recommended to me by a colleague.