案例

News Feeds
Stats Server
Web Crawler
Amazon Product Page

News feed（信息流）

Define feed

Organize

aggregate（分類）
dedup（去重）
sort（排序）

Level1.0

Database Schema:

User
Friendship
News

GetNewsfeed:

merge news
Newsfeed vs News

Why bad?

100+ friends

1Query-->Get friends list

1Query-->

SELECT news

WHERE timestamp>xxx
AND sourceid IN friend list
LIMIT 1000

IN is slow

Either Sequential scan or 100+ index queries

Level 2.0

Pull vs Push

Pull:Get news from each friend,merge them together.(NewsFeed generated when user request)

Push:NewsFeed generated when news generated.(we have another table to store newsfeed,may cause duplicate news)

Push:

1Query to select latest 1000 newsfeed.
100+ insert queries(Async)

Disadvantage:News Delay.

Level 3.0

Popular star(Justin Bieber)

Flowers 13M+

Async Push may cause over 30 minutes(13M+ insertions,delay too long)

Push+Pull

for popular star,don't push news to flowers

for every newfeed reqiest,merge non-popular user newfeed(push) and popular users newsfeed(pull)

Level 4.0

Push disadvantage

Realtime
Storage(Duplicate)
Edit

Go back to PULL:

Cache users' latest (14days) news
Broadcast multiple request to multiple servers(Shard by userld).
Merge & sort newsfeed
Cache newsfeeds for this user with timestamp

Click Stats Server

How are click stats stored

A poor candidate will suggest write-back to a data store on every click

A good candidate will suggest some form of aggregation tier that accepts clickstream data,aggregates it,and writes back a persistent data store periodically

A great candidate will suggest alow-latecy messaging system to bugger the click data and transfer it to the aggregation tier.

If daily,storing in hdfs and running map/reduce jobs to compute stats is a reasonable approach

If near real-time,the aggregation logic should compute stats

PS：要如何統計鼠標點擊的次數以及相關區域呢？普通的程序員會將每次點擊的數據（log）直接存儲在數據庫一層。比較好的程序員會在前段與數據庫間加一個中間層，為點擊的數據流做一次聚合，每隔一段時間（1分鐘或10分鐘）做一次刷新，存儲到數據庫，大大減輕了后端的壓力。優秀的程序員綜合以上的兩種情況，對于數據量很大，實時性效果不高的情況下，可以通過分布式的批處理方式，將刷新聚合層的時間定位一天。對于時效性強的要適當縮短刷新時間。

Cache Requirement

When a request comes look it up in the cache and if it hits then return the response from here and do not pass the request to the system.
If the request is not found in the cache then pass it on to the system.
Since cache can only store the last n requests,Insert the n+1th request in the cache and delete one of the older requests from the cache
Design one cache such that all operations can be done in O(1)-lookup,delete and insert.

PS:如何設計cache（LRU設計相關）：

在層中緩存部分請求的處理方式，如果接收的請求在層中存在對應的處理方式，則無需把請求發送到后端系統
如果在層中找不到對應處理，則發送需求到后端
以定長隊列的形式緩存，緩存最近的n個需求，頭進尾出
將層中的匹配操作算法控制在O(1)范圍

Web Crawler

爬蟲

Amazon Product Page

The product page includes information such as

product information
user information
recommended products(what do other customers buy after viewing this item,recommendations for you like this product,etc)

Reference

http://highscalability.com
The Log:What every software engineer should know about real-time data's unifying abstraction
Job Interviews:How should I prepare system design questions for Goole/Facebook Interview?
HOW TO ACE A SYSTEMS DESIGN INTERVIEW
<Design Pattern>
<Design_Oatterns_For_Dummies.pdf>
http://www.hiredintech.com/app

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

（GeekBand）系統設計與實踐案例分析

（GeekBand）系統設計與實踐案例分析

案例

News feed（信息流）

Organize

Level1.0

Database Schema:

GetNewsfeed:

Why bad?

100+ friends

Level 2.0

Pull vs Push

Push:

Level 3.0

Popular star(Justin Bieber)

Push+Pull

Level 4.0

Push disadvantage

Go back to PULL:

Click Stats Server

How are click stats stored

Cache Requirement

PS:如何設計cache（LRU設計相關）：

Web Crawler

爬蟲

Amazon Product Page

The product page includes information such as

Reference

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

（GeekBand）系統設計與實踐 案例分析

案例

News feed（信息流）

Organize

Level1.0

Database Schema:

GetNewsfeed:

Why bad?

100+ friends

Level 2.0

Pull vs Push

Push:

Level 3.0

Popular star(Justin Bieber)

Push+Pull

Level 4.0

Push disadvantage

Go back to PULL:

Click Stats Server

How are click stats stored

Cache Requirement

PS:如何設計cache（LRU設計相關）：

Web Crawler

爬蟲

Amazon Product Page

The product page includes information such as

Reference

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

（GeekBand）系統設計與實踐案例分析