My thoughts as an enterprise Java developer.

Saturday, June 18, 2022

NetFlix: Active-Active for Multi-Regional Resiliency

 

https://netflixtechblog.com/active-active-for-multi-regional-resiliency-c47719f6685b

All services available in region so no cross-region requests

Direct to region by location instead of latency to maintain control

Service to direct request: Zuul

Error request at start when downstream won’t be able to handle. Dynamic with scaling.

Test abilities of new services with production load

Writes need to tell all caches to invalidate. Can a new request happen before the data is synced?

Tools to deploy the same code to multiple regions

Multiple levels of chaos: service, availability zone, region, region connection. Some levels will cause total loss of service

Automatic failover

Even though this is 9 years old, it is still rare.

Does the type of Netflix data change make it easier for them? If a customer gets stale data it probably doesn't have a large impact.