How hard can it be

Download as PPTX, PDF

0 likes157 views

The document discusses some of the challenges involved in building a real-time chat platform and outlines an architecture using microservices. It notes that while building such a platform may seem simple, there are many complexities under the hood including user authentication, moderation tools, image processing, and preventing DDOS attacks. The document then outlines an example architecture using frontend servers, backend servers, Redis for data storage, and services like chat, cheering, and feeds that communicate via RabbitMQ. It also discusses design principles like auto-scaling, fallback servers, and routing messages between services.

Software

• Authentication
• Slowmode
• Moderator
• Admins
• Subscribers
• Timeout
• Ban
• Limits per user
• Notify messages
• Imagelog
• Unlimited Rooms
Not that easy!
• IP-Ban
• Raffle
• Voting
• Whisper
• Blacklist
• Block user
• Posting Images
• DDOS Protection
• Limits per channel
• Chatlog
• System messages

Get Image
Check
Image
Adult
Check
Save on
CDN

• Click limits
• Spamming
• Load balancing
• Limits for frontend
• Authentication
• Extendable
Not that easy!

• Bots
• Statistics
• Main KPI for streams
• Advertisement
Not that easy!

WebSocket Load
balancing
Permission and
security model
(Admin, Mods, ...)
Frontend Server Backend Server
UI
Frontend
Server
data storage
Redis
Cluster
Smashcast
REST-API
Backend
Server
Auto scaling
Auto scaling
Long Polling Fallback
Fallback
Server

• Small, cheap machines
• Frontend handles the connections, no logic
• Backend stateless, can be restarted/upgraded any time
• When a frontend breaks it affects only for a few user
• Socket.io for handling websockets
• Up & Downscale as needed
Servers

Let‘s do this again for
all other services!

• Easy to use
• Good performance
• Easy to cluster
• Great web-interface for monitoring
• Proven stability
RabbitMQ

Load
balancing
Frontend
Server
Chat
Service
Auto scaling
RabbitMQ
Cluster
Cheering
Service
Feed
Service
Viewcount
Service
Auto scaling

JoinChannel
JoinChannel
OK
Send & Receive
Data

• Some services don’t need a login
• There is always the need for schedule things
• Services need to send & get information from and to the
API
• What happen when a frontend server dies?
Examples

Load
balancing
Frontend
Server
Chat
Service
Auto scaling
RabbitMQ
Cluster
Cheering
Service
Feed
Service
Viewcount
Service
Auto scaling
Login
Service
Cron
Service
API
Service
Cleanup
Service

Routed to
cheering
service
Cheering
service
updates
Redis, etc.
Stores msg in
cron service
to get sent
back in the
future
Cron service
sends back to
cheering
service
User clicks
cheering icon
Cheering
service
collects data
and sends it
Routed to
channel
User gets
update

• Split chat into smaller services
• A lot of new services
• Open up infrastructure for 3rd party services
• ???
Future

How hard can it be

1. How hard can it be? How hard can it be?

6. REALTIME

8. • Authentication • Slowmode • Moderator • Admins • Subscribers • Timeout • Ban • Limits per user • Notify messages • Imagelog • Unlimited Rooms Not that easy! • IP-Ban • Raffle • Voting • Whisper • Blacklist • Block user • Posting Images • DDOS Protection • Limits per channel • Chatlog • System messages

9. 10 ≠ 100000

10. Images

12. 100Mb GIFs...

13. Logfiles!

15. 50k in a channel...

16. DDOS!

17. Get Image Check Image Save on CDN

18. Get Image Check Image Adult Check Save on CDN

20. • Click limits • Spamming • Load balancing • Limits for frontend • Authentication • Extendable Not that easy!

24. • Bots • Statistics • Main KPI for streams • Advertisement Not that easy!

25. 1st Version

27. WebSocket Load balancing Permission and security model (Admin, Mods, ...) Frontend Server Backend Server UI Frontend Server data storage Redis Cluster Smashcast REST-API Backend Server Auto scaling Auto scaling Long Polling Fallback Fallback Server

28. • Small, cheap machines • Frontend handles the connections, no logic • Backend stateless, can be restarted/upgraded any time • When a frontend breaks it affects only for a few user • Socket.io for handling websockets • Up & Downscale as needed Servers

29. Let‘s do this again for all other services!

30. 2nd Version

32. • Easy to use • Good performance • Easy to cluster • Great web-interface for monitoring • Proven stability RabbitMQ

33. Server Structure

34. Load balancing Frontend Server Chat Service Auto scaling RabbitMQ Cluster Cheering Service Feed Service Viewcount Service Auto scaling

35. What‘s Happening

36. JoinChannel JoinChannel OK Send & Receive Data

49. Generic Services

50. • Some services don’t need a login • There is always the need for schedule things • Services need to send & get information from and to the API • What happen when a frontend server dies? Examples

51. Load balancing Frontend Server Chat Service Auto scaling RabbitMQ Cluster Cheering Service Feed Service Viewcount Service Auto scaling Login Service Cron Service API Service Cleanup Service

53. Routed to cheering service Cheering service updates Redis, etc. Stores msg in cron service to get sent back in the future Cron service sends back to cheering service User clicks cheering icon Cheering service collects data and sends it Routed to channel User gets update

54. Microservice Library

55. Demo

56. Is it Working?

58. Future

59. • Split chat into smaller services • A lot of new services • Open up infrastructure for 3rd party services • ??? Future

60. Thank you! max@smashcast.tv

61. We are hiring!

65. „Self“ DDOS

67. Communication Flow

Editor's Notes

#2: So, how hard can it be?
#3: Thats me, 1980/81 with my first computer, anyone know the computer? I have studed arts, lived in new york & berlin, have made startups and have crashed startups
#4: Smashcast, was until April Hitbox, but Hitbox got bought by Azubu, a competitor and now we are Smashcast
#5: What is Smashcast? This is the frontpage
#6: And thats a stream page, you see the live stream, chat, etc.
#7: Real time is important. When something happens on the stream viewers wants to react as fast as possible. Thats why we have build a real time infrastructure based on websockets. Now a few explanations of the elelemts on the site that use this infrastructure.
#8: The chat, ca do everything a chat needs to do, including posting images, gifs, selfies, etc.
#9: Sounds easy, but isnt when you want to scale it!
#10: And there is a big difference between a small whatsapp group and a huge real time chat!!!
#11: Take for example images in chat, sounds easy! Look at the person in the top left corner 
#12: Just link to the source and you are done! We had this for two years…. Until we realized: there is a problem!
#13: For example: someone posts a 100+mb gif, then alle viewers will start to download it and, when there internet is slow, dont forget, there is a 3Mbit stream running next to the chat, the stream will lag, giving a bad user experience! But there are bigger problems with this images in the chat!
#14: Imagine you hate one of the smaller streamer (like 5-20 viewers) on smashcast. You set up a small server with a nice gif on it and you post this gif, when the streamer is live, in his chat. And now you have the IP-address of all his viewers and the IP of the streamer in your log files! So next step is
#15: Googling DDOS and bring down this streamer you hate! So easy! But there is even a third problem!
#16: Imagine a stream has 50k viewers and someone posts a gif. The server where the gif is hosted must be strong, because he will get now 50k hits at the same time!
#17: Ist like a DDOS…
#18: So we need to get the image, check it and save it in the CDN
#19: And we are testing AWS machine learning for porn detection
#20: Back to the realtime features. Cheering is another on on the site. It allows people to cheer for a team or a stream.
#21: Lot of things to do! And thats just the beginning! Again, here you run quite fast into problems with too many messages you send to the users, etc.
#22: Another feature on the site is the feed below the stream. All viewers get updates via the websocket
#23: Last but not least the viewcounter.
#24: Thats this number here. Maybe the most important thing on the site
#25: So this number is the main KPI for all streams, the bigger the better so a lot of people try to influence this number. Updates are send out in realtime or every 10 seconds (depending on how big the stream is) to all viewers.
#26: So, lets explain how we started it the realtime system a few years ago, the famous first version.
#27: We went with nodejs & redis, redis because it is a great product for storing data that you need fast. When we have a lot of users the redis servers make thousends of requests/seconds without any problems and AWS offers a very good managed version of redis. Nodejs because of its fast io for this, nowadays i would maybe move to go.
#28: So we went with a typical frontend/backend setup, the frontend handles the websocket connection and is quite dump, the backend all the chat logic. Fallback server is for the less than 1% that don’t support websockets, some providers block them and some older android versions too
#29: We use AWS Single core machines This 1st version worked fine, but the problem is:
#30: So we had a similar infrastructure for the viewcounter and when we worked on cheering & the feed we would have to build a similar infrastructure for this too. So we need a different approach
#31: Sounds easy, or? Exists since 30 years.
#32: Thats when we decided to use rabbitmq. We could have used Kafka too but i have quite some experience with rabbitmq and it fits perfect. Anyone using it too?
#33: As i said, it is easy to use, easy to mantain, and the best thing is the web interface, i will show it to you later.
#34: So, how does the new server structure looks like? This is Mike Pence, Vice President of the USA while visiting NASA….
#35: We still have the frontend server (with fallback, of course) and in between the rabbitmq cluster, which distribute the messages to the services and then back to the frontend.
#36: So how does this work now? How is a command from the frontend send to the backend, processed and back to the frontend?
#37: First, you need to login into every service. Ok, this flow is not that complicated, but wait for it! 
#38: This is a login message the userinterface sends to the frontend server he is connected to. In this case he wants to login (joinchannel) to the chat service for the channel „karlus“. The frontend server is then sending this message to the rabbitmq-cluster
#39: In rabbitmq there are two exchanges defined: fromFrontend & toFrontend. The frontendserver are connected to both, on one they are listening and on the other they are sending messages. So this message is send to the fromfrontend exchange because it is from the frontend server.
#40: Here you can see this. The frontend server sets the routing and the routing key is chat.joinchannel.karlus because it is a joinchannel command for the chat service for the channel karlus
#41: The fromFrontend exchange routes now all messages where the routing key starts with chat. To the chat queue.
#42: The chat backend servers (in this case two servers) are connected to this chat queue and rabbitmq is distributing the messages via round robin to the chat backend servers. This is how the backend servers than get their messages. They have no clue that the message is coming from a frontend server, could come from other services, etc.
#43: The chat backend servers then process the message, in this case do the login, etc. and then send back a message to rabbitmq to the „toFrontend“ exchange.
#44: Each frontend server has his own queue that is connected directly to the „tofrontend“ exchange. With this setup it is possible for the backend to send a message directly to one frontend server (for example, the loginmsg, because this is only for one user) or to all frontend servers (for example, a chat msg).
#45: After processing the message the backend server sends ther message back to rabbitmq. For a normal chat message this would be like this, again, the routin gkey is the service, command & channel.
#46: The loginmsg is a bit different because it is send directly to the frontend server that send the message because the other frontend servers dont need to see it, so there is no routing key, only the queue is specified where it is send to.
#47: And this is how this looks in the user interface? Green are the messages from the user interface, white the messages from the frontend server.
#48: Here you see the login msg we just saw
#49: And here the message coming back from the chat backend server.
#50: When we build this system we realized that we need some generic services that can be used by other services.
#51: For example some services dont need a login A cronjob is needed for example for the cheering service to send status updates back to the viewers or for the viewcount server. The connection to & from the API is based on a service, so other services can send a message to a service and this will then interact with the REST API. The cleanup service is there to log out viewers from other services when a frontend server goes down.
#52: So we added this needed services to the same rabbitmq cluster
#53: Some services are really simple, this is the main function for the login service for example.
#54: This can lead to quite complicated flows.
#55: To handle this we have our own library that we use to connect & use with rabbitmq
#56: https://ec2-52-3-222-66.compute-1.amazonaws.com/#/ https://rabbitmq/#/
#57: Well, and at the end, is the chat system working? Does it scale?
#58: Well, i dont have a screenshot about our latest record that was close to 200k, but this one shows you a channel with 100k people. All 154k connections where handled by 16 frontend servers and 8 backend servers, costing us around $20 for the evening.
#59: Well, and at the end, is the chat system working? Does it scale?
#60: As i said, it is easy to use, easy to mantain, and the best thing is the web interface, i will show it to you later.
#61: Just one mor think:
#62: Just one mor think:
#63: Just one mor think:
#64: It was during at that time biggest event ever, 60k people on one stream and suddenly all of them saw this.
#65: And we did this!
#66: I know, this sound s stupid, but i will give you two examples: Imagine you have a stream with 100k viewers. Every time a new viewer comes to this stream he/she gets the info about how to get the stream from our server. Now imagine the streamer has a problem, lets say his computer crashes and the stream drops, mean is getting black or stucked. What does 100k people do?
#67: This. And lets hope that your api can handle this! And they wont stop until trhe have a stream again!
#68: Sounds easy, or? Exists since 30 years.

How hard can it be

More Related Content

What's hot (9)

Similar to How hard can it be (20)

Recently uploaded (20)

How hard can it be

Editor's Notes