Azure North Europe stack - Pay-as-you-go projects - Billing api problems

2022-09-22 17:50 UTC - We are investigating issue with our Billing Api on Azure North Europe stack (https://connection.north-europe.azure.keboola.com/). Non Pay As You Go projects should not be affected.

Next update in 30 minutes.

2022-09-22 18:20 UTC - The issue was caused by heavy load on the underlying database during parallel migration of some projects to new Queue. Services are now returning to normal operations. It is possible that some orchestrations will have their schedule delayed. We are very sorry for any inconvenience.

Update on Scheduled Maintenance of AWS EU Stack on 2022-09-24

We’re sorry to let you know that the previously announced maintenance on September 24th will have more of impact on the platform operations than previously thought. Due to unforeseen circumstances, we’re unable to ensure the previously announced scope of the maintenance, and we need to turn off the platform completely for about 90 minutes. 

The AWS EU maintenance will start at 9:30 AM CET, and we expect it to finish by 11 AM CET. 

During the maintenance, you will not be able to access your data and projects. All network connections will be terminated by the "HTTP 503 – down for maintenance" status message.

All running tasks will be monitored by us and restarted in case of any interruption. Orchestrations, flows, and running jobs will be generally delayed, but not interrupted. However, feel free to re-schedule your Saturday orchestrations to avoid this maintenance window.

We apologize for any inconvenience and thank you for your understanding.

Delayed orchestrations on Azure North Europe stack

2022-09-07 11:35 UTC - We are investigating delayed orchestrations on Azure North Europe Keboola Connection stack (https://connection.north-europe.azure.keboola.com). Next update in 30 minutes.

2022-09-07 12:25 UTC - After our investigation, problem starts at 10:00 UTC. It is caused by outage of our Billing Api. Orchestrations that should have be scheduled after this time, are delayed in the order of minutes or tens of minutes. This applies to standard projects.

Orchestrations of Pay As You Go projects are not scheduled at the moment.

Next update in 60 minutes.

2022-09-07 14:00 UTC - We implemented a hotfix and the situation is returning back to normal. Orchestrations should be scheduled properly now. Problems were caused by the outage of Azure CosmosDB (more information on https://status.azure.com/en-gb/status)

We are monitoring a situation, next update in two hours.

2022-09-07 17:10 UTC - All orchestrations are executed without delay. We apologize for any inconvenience.


Azure North Europe stack - Pay-as-you-go projects - Billing api problems

2022-09-07 10:55 UTC - We are investigating problems with our Billing Api on Azure North Europe stack (https://connection.north-europe.azure.keboola.com/). This is caused by degraded performance of one of the Azure services. Non Pay As You Go projects should not be affected.

Next update in 30 minutes.

2022-09-07 12:10 UTC - Problems with one of the Azure services still persist and our Billing Api is still unavailable. We are waiting for service recovery by Azure Support.

All of Pay As You Go projects are affected:

  • Your credit balance is temporary on zero value
  • Orchestration jobs are not scheduled at the moment
  • You cannot manually execute any component

Next update in 60 minutes.

2022-09-07 13:15 UTC - After our investigation, problem starts at 10:00 UTC. Azure service is still not working properly, but we implemented fix to avoid this situation until Azure Support resolve the root cause.

We keep monitoring the situation closely. At the moment Billing Api service is available, jobs and orchestrations are running.

Next update in 120 minutes.

2022-09-07 17:10 UTC - All Keboola Connection services are running normally. The incident is resolved. Problems were caused by the outage of Azure CosmosDB (more information on https://status.azure.com/en-gb/status)

We apologize for any inconvenience.

Jobs Failing on Internal Errors in Azure North Europe stack

2022-08-30 9:10 UTC - We are investigating some jobs failing on internal error since 6:30 UTC. Next update in 30 minutes.

2022-08-30 9:20 UTC - We identified the root cause in some DNS resolving errors in Azure VMs, see Azure status for details: https://status.azure.com/en-us/status. We are restarting the running nodes which should solve the problem. Next update in 30 minutes.

2022-08-30 9:40 UTC - The Kubernetes nodes that are running containers with jobs were restarted and no error is visible in the logs since then.

Failing Synchronous Actions

Synchronous actions, a mechanism behind some UI features, like testing credentials or listing available databases on a distant server, were affected by a bug and were not working properly from 12:50 UTC until 13:25 UTC when the revert of defective release was finished and the functionality is back to normal since then.

Jobs Failing on Internal Errors in Azure North Europe stack

2022-08-20 14:57 UTC - we are investigating some jobs failing on internal error. Next update in 30 minutes.

2022-08-20 15:30 UTC - we see the internal error is caused by one of the internal components failing to call Azure API, however don't know the root cause. We restarted one of the instances and see no error for now. We continue monitoring the issue.

Scheduled Maintenance of AWS EU Stack 2022-09-24

On 24th September 2022 between 10:00 CET and 10:30 CET, https://connection.eu-central-1.keboola.com will undergo planned maintenance during which one of our internal databases will be upgraded. As a result, any loads or unloads to storage will be paused during that time for up to 30 minutes. Any currently running jobs will continue to run. However, they may be delayed by up to 30 minutes as well.

Delayed processing of jobs in AWS US stack

2022-08-15 10:40 UTC - We are investigating component job delays in connection.keboola.com. Next update when new information will be available or in hour.
2022-08-15 11:14 UTC - We have identified the root cause of the problem and we are working on a solution. Next update when new information will be available or in hour.
2022-08-15 14:04 UTC - We've added another worker to help with the workload and the processing times has returned to normal

2022-08-15 15:32 UTC - We are investigating reoccurrence of the issue causing jobs stuck in Created state. Next update when new information will be available or in hour.

2022-08-15 15:59 UTC - The reoccurrence was resolved. We keep monitoring the situation closely, but at the moment job runtimes should be back to the normal.