Biggest problems when developing Serverless Functions
Not always functions succeed. They crash. Failed Operations do not get retried. They disappear.
How to retry cloud functions on failure: https://cloud.google.com/functions/docs/bestpractices/retries
One function contains several different kinds of small Operations. One database Operation. One Google Drive Operation. One message queue Operation. One external system http call Operation. One of the Operations fails, and on function retry most of the Operations happen twice.
On http request perform all validations, send a single message with POST body into the messages queue, and return 200, extremely quickly.
PubSub subscriber receives a message, and sends out several single Operations messages via various topics.
Each message gets processed by some PubSub subscriber.
If one Operation / message / event fails, only that one gets retried again.
Succeeded Operations get removed from the messages queue. (by sending ACK)
About message queues: https://medium.com/@mvaldiearsanur/publish-and-receive-google-pub-sub-message-in-node-js-a34504db2844
If it is not enough to send out several independent Operations because the Process is complex. (“workflow”)
AWS Step Functions: https://aws.amazon.com/step-functions/
Azure offer: https://www.techtarget.com/searchapparchitecture/tip/How-Azure-Logic-Apps-works-and-when-to-choose-it
HTTP Function gets called twice because of timed out connection. Payment happens twice, which is really bad.
Idempotence: https://developer.mozilla.org/en-US/docs/Glossary/Idempotent
How to make it happen: https://www.javacodegeeks.com/2021/06/making-post-and-patch-requests-idempotent.html
When the Function sends out 5 events via different PubSub topics, it might fail in the process. On retry the Function sends out the events again, and it produces many duplicate events. Two identical SMS messages could be sent to the client. Two payments could be made.
Events deduplication means ensuring that each event will be processed just 1 time. When it arrives in several copies for whatever the reason.
Easy way is to use AWS SNS: https://docs.aws.amazon.com/sns/latest/dg/sns-fifo-topics.html
When 5 minutes time window for duplicates check is not enough: https://dev.to/napicella/deduplicating-messages-exactly-once-processing-4o2
I once created a similar solution for Google Cloud Functions by using Firestore. It could be Redis. It was capable of ensuring deduplication over a week, or a month, or a year. But it will cost dearly.
The business process is very complex, or the transaction is very complex.
In complex cases, BPMN solutions like Camundo could be used.
Advanced distributed transactions algorithms could be used: https://en.wikipedia.org/wiki/Two-phase_commit_protocol
Transactions monitors could be used: https://en.wikipedia.org/wiki/Transaction_processing_system
Functions do not validate incoming data. What comes in goes directly into the database. Insider hacker (your colleague) can inspect the code and see the opportunity.
Joi validator: https://joi.dev/
NestJS Java projects style validator: https://docs.nestjs.com/techniques/validation
Functions do not have graceful shutdown. They waste database connections pool, until their connections time out. Could be other client connections waste as well.
Programmer writes just Unit Tests to get the Code Coverage. Or does not do even that. When deploying the function in the Cloud, nothing works. Especially, in problem situations. AWS quality is a perfect example for the situation.
Integration / e2e tests must be used:
Subscribe to SNS topics. Websockets, whatever.
Call webhook.
After a timeout, check the database.
On received PubSub messages, verify the situation.
Verify what you got from Web Sockets.
This tests the whole process.
Single function tests must use the database and other cloud services. That allows to work with real data, and real cloud services. Postman test in written, repeatable format!
Edge cases and error cases do not get handled in the code.
Good craftsmanship philosophy must be learned. Pride in what you do and how you do it. No excuses for laziness.
Endless cascades of the Functions. One event triggers other events, which trigger even more events. One night in the cloud leads to a 100K EUR bill?
Google recommended to check the timestamp of when the event was created, and, after a little while, to not proceed with what is in the function. Another method is to put some data properties in the database document to stop the flow.
AWS functions just die after a few retries. You have to put them in the dead letter queue to not lose your processes.
No logging. No debug messages. No error messages. When an error happens, nobody can figure out why.
AWS CloudWatch: https://aws.amazon.com/cloudwatch/
Google Cloud Logging: https://cloud.google.com/logging?hl=en
No special error reports. When an error happens which could affect business outcome, nobody gets to know about it. For sure, not timely enough.
You can send an email in the Function code to the programmer, or the boss. You can include important data which would allow you to fix business problems very quickly.
Function starts too slowly. Client gets a response after 10s.
Starting a new function instance requires starting a Docker container, booting up OS, services, connecting to the database:
Make Cold Start super short by using Google Distroless containers, and Golang programming language. No Java. Ruby. Python. Javascript. No big Docker images.
Cloud configuration to always keep some Functions started, even if it costs more.
Functions could be configured to have faster cpu - you can add even several processor cores, and memory.
Global SaaS Leader | AI-First Project & Product Management Expert | Bridging IT, Supply Chain, and AI | Digital Transformation Innovator | Based in Okayama, Japan | Embracing Local Culture and Learning Japanese (JLPT N4)
1yIn my opinion serverless solution shouldn't mean using stateless components just to keep it simple. A Business Process Analyst should document the details of the workflow and acceptance criteria by going through both functional and non-functional requirements.
AI Agents, Python, Kotlin, Node.js, Java, Golang, Julia, Kubernetes, Serverless, AWS, GCP, Azure, DevOps, Terraform, Ansible, Low Level Neural Networks programmer.
1yThe more you learn, the more complex your backends become. Are simple backends even possible?