[Vatech.io] Enterpise Error Alert and Backup System with Make.com. The problems we faced during 5 years of working at Make and their solutions

:handshake: 1. Introduction

Our team has been professionally developing automations for Enterprise clients for 5 years. During this period, we have created more than 1000 scenarios in 30 Make.com accounts. We strive to maintain business process uptime at 99.999% and to ensure high performance so we do need to instantly and promptly respond to any errors that occur in scenarios and lead to their stopping. We identified two main problems that negatively affect maintaining the uptime of business processes at the highest level.
This post is specifically about using Make API to build a monitoring system for the status of scenarios.
Let’s dive into the problems first.

:smiling_imp: 2. The issues we faced when working with Make.com:

1. Make.com’s error alerts do not provide full control over the status of scenarios out of the box.
The standard tools offered by Make.com for tracking errors are email alerts about scenario stops, as well as internal alerts within the Make.com account. Before the launch of Make API, we used an email parser and received error notifications from it. However, this method appeared to be outdated and inconvenient: emails arrive constantly, information in them comes with a delay, the need to parse each email also introduces errors and reduces stability. After the launch of Make API, we finally had the opportunity to do this without using email. This completely changes the logic of working with errors and allows everything to be done natively.

2․ Deleting scenarios or accounts due to human error.
The second problem, which may seem less noticeable at first glance but is extremely critical related to Make, is the deletion of scenarios or deleting the entire client’s account in Make.
It may seem funny, but we have encountered situations several times when employees of the organization on the client’s side, for one reason or another, deleted the most important scenarios in the Make account! It’s even worse when the client completely deletes their entire account in Make. It may seem strange, but so far, Make.com does not offer any opportunities to recover a deleted scenario or retrieve scenarios from a deleted Make account. After such cases, the client is left with nothing: scenarios cannot be returned, and the account is deleted irretrievably. This leads to colossal financial and time losses, as all the work done on developing and configuring systems is destroyed in an instant!

:metal: 2. How the monitoring and backup system works.

Fortunately, our system using Make API allows you to recover a deleted scenario or even an entire account from a backup, which is created daily. Let’s have a look at how it works.
n this part, I want to list the stack of tools on which the monitoring system works:

1. The core of the system is Make scenarios. We use several scenarios that directly through Make API poll, check status, save the blueprint of the client’s scenarios. Connecting the client to the monitoring system is as simple as possible. We have a system account that needs to be added as an admin in the client’s Make.com account. After that, we get full access to all scenarios and start collecting, filling, and monitoring the client’s scenarios. Additional scenarios are used for the Slack bot and manage messages and buttons in them. I can write a separate big post about our internal Slack system because the functionality includes dozens of features that are not worth listing here, but they are important for the work of the development team from the inside. Let me know if this is interesting in the comments!

2. Dashboard and database in Airtable. Here we store every active scenario from the connected Make.com accounts. In Airtable, we store parameters such as:

  1. Scenario URL
  2. Scenario name
  3. Date of the last check
  4. Current status of the scenario
  5. Team member responsible for this scenario
  6. Link to the backup folder on Google Drive, where the latest version of the scenario is stored
  7. Client data, linking scenarios to our projects from CRM.
  8. Necessary system IDs for Slack, Make, Google Drive.

3. Google Drive. This is where folders are created for each client’s account. For all active scenarios, a JSON blueprint file of the current version of the scenario is saved. These files are used as a backup, and in case of scenario deletion, we quickly restore the current version and relaunch it back into operation.
4. Slack channel Errors. The monitoring polls every 5 minutes all scenarios, and when the status of the scenario becomes Turned OFF, we receive a detailed message in the channel of the following type:

The error message also contains:

  • Name and link to the scenario
  • Name of the responsible person
  • Details about the last error for quick understanding of the problem and the time it occurred
  • Buttons for quickly restarting the scenario. Often, it’s just necessary to rerun it. There’s no need to access Make. You can also set up auto-restart of the scenario in the Airtable dashboard.
  • Buttons to disable notifications for the current issue for 3-24 hours. This is useful when an error occurs due to external services and it does not require our direct involvement, or the scenario is turned off because it’s currently being worked on.

Plans to enhance this system include:

  • Incomplete executions monitoring. I see that Make does not offer a logical solution to the problem I described in another topic. However, I see options for solving the problem on our own. Let me know if you’re interested in learning what we achieved! I’ll post update about this to this topic
  • Adding additional information to notifications, complicating the logic overall. The system is constantly being improved, for example, the Auto restart function was introduced quite recently. Plans include adding more internal features to simplify developers’ work.

3. :+1: The advantages of the monitoring and backup system:

For our clients, the value of the system lies in:

  • The absence or minimal downtime of business processes. Currently, clients do not notice technical failures in scenarios. Now, when they write to us “Look, something broke here” we are already dealing with the problem, and most often, it has already been solved. Clients see the notification in their email about an error or scenario stop, but by the time they reach out, the scenario is already launched and working. Thus, clients can rely on us and be confident that any error automatically triggers the work process and does not require the involvement of client resources, or significantly reduces them.
  • Some clients like to figure things out themselves and be aware of everything happening with their system 24/7. For them, we additionally duplicate notifications in their Slack. This gives them additional control and transparency in the operation of their business.
  • People sometimes make mistakes, and a curious employee with access to Make can delete a scenario or even the entire account! Our system has saved businesses in such cases several times. For example, we could quickly detect and restore an accidental deletion of a complex critical scenario. It was the core of the system. The client’s losses from the backend downtime for a day would amount to $5000-10000, not including the cost of redeveloping the scenario. Restoring and setting up the backup took only 30 minutes instead of 2-3 days!
  • In short, our clients feel calm and secure. They are confident in the system’s stability and trust us.

For the development team, the benefits are as follows:

  • Immediate notification of problems allows focusing on important tasks here and now. This eliminates chaos and allows the error to be corrected at an early stage. We do not need to sift through hundreds of logs and look for the cause of the error. Every problem solved here and now simplifies the team’s work. Proper distribution to responsible directs tickets to the scenario creator, rather than to a random employee.
  • The system ensures the functionality of multiple heavily loaded scenarios that utilize unreliable APIs, such as automatically fixing server errors in HubSpot. On their side, this bug occurs on average 3-5 times daily. The system ensures the resumption of operation within 1 minute, saving 1 hour of developer work.
  • The ability to build more complex systems and create backup routes. If we are confident that we possess current information, we can use this knowledge in future tasks. Development becomes more pleasant and simpler because the developer knows that the error system can back him up.
  • In the case of mass problems on Make’s side or 3rd party services, we are the first to understand what happened because we see a similar pattern of errors across several accounts. This also helps to make a decision quickly. For example, this happened when Monday.com modules were incorrectly updated, and scenarios with Monday.com started to mass produce an error.

:rocket: Conclusion

In conclusion, I express gratitude to the entire Make community for their support and help! It seems that I haven’t described how the system works in detail, but the post turned out huge anyway. I will be happy to answer any questions you may have while exploring the article.

By the way, we can create the same system for you and your clients, dear Makers. I know that for advanced specialists, assembling such a system is within their capabilities. However, if you want a ready-made turnkey solution, please contact us. We will be glad to provide a system tailored to your and your clients’ needs.

To contact us, visit our vatech.io website or reply here.

Thanks for the reading, and good luck with Make.com!


Key words:
Automation solutions Make.com / Error alert system for scenarios
Scenario backup and recovery / Custom API integrations
Monitoring and management / Error handling solutions
Comprehensive backup services / Enterpise Workflow optimization
Advanced automation techniques / Streamlined error resolution

7 Likes

:fire::fire::fire:
Very solid job! Thanks