Our AI writing assistant, WriteUp, can assist you in easily writing any text. Click here to experience its capabilities.

Design Decisions for Scaling Your High Traffic Feeds

Summary

This article discusses the different design decisions involved with scaling high traffic feeds. It covers the basics of how feed systems work, describes the history of the feed system at Fashiolista, and explains five design decisions to consider when building your own solution. These decisions include denormalizing or normalizing data, selective fanout based on producer and consumer, using priorities, choosing between Redis and Cassandra, and using an open source package such as Feedly.

Q&As

What is the purpose of feeds in large startups?
The purpose of feeds in large startups is to show activities by the people you follow.

What tools were used in the three major redesigns of Fashiolista's feed system?
The tools used in the three major redesigns of Fashiolista's feed system were PostgreSQL, Redis, and Cassandra.

What are the design decisions to consider when building a feed system?
The design decisions to consider when building a feed system are denormalize vs normalized, selective fanout based on producer, selective fanout based on consumer, priorities, and Redis vs Cassandra.

What strategies can be used to reduce the load caused by high profile users?
Strategies that can be used to reduce the load caused by high profile users are selectively disabling fanouts, only fanning out to active users, and using different priorities for the fan-out tasks.

What are the advantages and disadvantages of using Redis and Cassandra for feed systems?
The advantages of using Redis for feed systems are that it is easy to setup and maintain, and it has low memory usage. The disadvantages are that all data needs to be stored in RAM, and there is no support for sharding built into Redis. The advantages of using Cassandra for feed systems are that it has plenty of storage space and is supported by Datastax. The disadvantages are that it is quite hard to use if you normalize your data, and the Cassandra Python ecosystem is still rapidly changing.

AI Comments

👍 This is an incredibly informative and helpful article. I appreciate all the detail and research that went into it.

👎 This article is too long and could have been written in fewer words.

AI Discussion

Me: It's about designing decisions for scaling high traffic feeds. It covers the basics of how these feed systems work, the history and background of feed systems, and some design decisions to consider when building your own solution. It also discusses the trade offs between using denormalized vs normalized data, selective fanout based on producer or consumer, and using priorities to reduce the impact of high profile users. Finally, it talks about the pros and cons of using Redis vs Cassandra for the storage backend.

Friend: Interesting. So what are the implications of this article?

Me: Well, it shows that it's important to understand the basics of how these feed systems work before trying to scale them. It also explains the various design decisions to consider when building a feed system, such as denormalizing vs normalizing data, selective fanout based on producer or consumer, and using priorities to reduce the impact of high profile users. Finally, it provides insights into the pros and cons of using Redis vs Cassandra as the storage backend.

Action items

Technical terms

Consistent hashing algorithm
A consistent hashing algorithm is a type of hash algorithm that is used to assign data to different nodes in a distributed system. It is designed to minimize the amount of data that needs to be moved when a node is added or removed from the system.
Cloud Computing
Cloud computing is a type of computing that relies on shared computing resources rather than having local servers or personal devices to handle applications. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources.
Fanout
Fanout is the process of pushing an activity to all of a user's followers. It is used in feed systems to ensure that all followers of a user are notified of their activities.
Denormalize
Denormalization is the process of combining data from multiple tables into a single table. It is used to improve query performance by reducing the number of joins that need to be performed.
Normalized
Normalization is the process of breaking down a table into smaller tables and establishing relationships between them. It is used to reduce data redundancy and improve data integrity.
Priorities
Priorities are used to assign different levels of importance to tasks. In feed systems, they can be used to prioritize fanouts to active users over fanouts to inactive users.

Similar articles

0.9999988 Design Decisions for Scaling Your High Traffic Feeds

0.84388226 Blue Sky: Can Twitter be owned by its users?

0.8347242 How a startup loses its spark

0.83242255 Intro to Kubernetes – Containers at Scale

0.8316026 A New Kind of Startup is C oming

🗳️ Do you like the summary? Please join our survey and vote on new features!