Stuck jobs in AWS US and EU stack

14:08 UTC: We are investigating created jobs stuck to start processing in AWS US and EU stacks. The next update in 15 minutes.

14:23 UTC: We found the root cause, rolled back the previous working version and newly created job now should process as expected. We are still trying to push previously created jobs (created within 1 hour ago) to the processing state. The next update in 15 minutes.

14:50 UTC: We have released the stuck jobs in US stack, we continue to release the stuck jobs in EU stack. Next update in 30 minutes.

15:15 UTC: All stuck jobs in EU have been released. Everything is operational now and running as expected.

Newly created transformations run failing on unrecognized option

10:10 UTC: We are investigating newly created transformations (in last 24 hours) that if run fails with an error `Unrecognized option "blocks" under "container"`. Next update in 15 minutes.

10:30 UTC [resolved]: We have identified the root cause and revert the latest changes of UI. From now on if a user creates a new transformation it will run without an error. However already created transformation are still affected, and manual update of the transformation is needed. In such case on the transformation detail page go the raw config editing page by appending "/raw" to the transformation detail page url. Then remove the topmost "blocks" array from the configuration and save it. 

Azure North Europe stack - jobs API higher error rate

07:36 UTC: We are experiencing higher error rate of Jobs API, we are investigating it. Next update in 15 minutes. 

07:52 UTC [resolved]: We identified the root cause - the projects migration running in the background caused higher load to our metadata database. This led to slower api response and in some cases timeout of the api request. However this didn't affect any running jobs nor running new ones. Everything is running as expected and fully operational now.

Scheduled Maintenance of AWS EU Stack starting now

As announced previously, we’re starting a planned maintenance on the AWS EU stack. One of our internal databases will be upgraded.

Orchestrations, flows, and running jobs will be generally delayed, but not interrupted. 

We'll update this post when the maintenance is finished. 

UPDATE 09:06 UTC: The maintenance is taking longer than expected. Next update in 30 minutes or when new information is available.

UPDATE 09:33 UTC: The maintenance has successfully finished and the stack is now fully operational.

UPDATE 10:30 UTC: Unfortunately the database needs another immediate upgrade. We'll be turning on maintenance back on shortly. Next update in 30 minutes.

We're sorry for any inconvenience and thanks for your understanding. 

UPDATE 11:00 UTC: We are starting second round of maintenance on the AWS EU stack. Stack should be back up and running within 90 minutes.

UPDATE 12:33 UTC: The maintenance has successfully finished and the stack is now fully operational.

Azure North Europe stack - Pay-as-you-go projects - Billing api problems

2022-09-22 17:50 UTC - We are investigating issue with our Billing Api on Azure North Europe stack (https://connection.north-europe.azure.keboola.com/). Non Pay As You Go projects should not be affected.

Next update in 30 minutes.

2022-09-22 18:20 UTC - The issue was caused by heavy load on the underlying database during parallel migration of some projects to new Queue. Services are now returning to normal operations. It is possible that some orchestrations will have their schedule delayed. We are very sorry for any inconvenience.

Update on Scheduled Maintenance of AWS EU Stack on 2022-09-24

We’re sorry to let you know that the previously announced maintenance on September 24th will have more of impact on the platform operations than previously thought. Due to unforeseen circumstances, we’re unable to ensure the previously announced scope of the maintenance, and we need to turn off the platform completely for about 90 minutes. 

The AWS EU maintenance will start at 9:30 AM CET, and we expect it to finish by 11 AM CET. 

During the maintenance, you will not be able to access your data and projects. All network connections will be terminated by the "HTTP 503 – down for maintenance" status message.

All running tasks will be monitored by us and restarted in case of any interruption. Orchestrations, flows, and running jobs will be generally delayed, but not interrupted. However, feel free to re-schedule your Saturday orchestrations to avoid this maintenance window.

We apologize for any inconvenience and thank you for your understanding.

Delayed orchestrations on Azure North Europe stack

2022-09-07 11:35 UTC - We are investigating delayed orchestrations on Azure North Europe Keboola Connection stack (https://connection.north-europe.azure.keboola.com). Next update in 30 minutes.

2022-09-07 12:25 UTC - After our investigation, problem starts at 10:00 UTC. It is caused by outage of our Billing Api. Orchestrations that should have be scheduled after this time, are delayed in the order of minutes or tens of minutes. This applies to standard projects.

Orchestrations of Pay As You Go projects are not scheduled at the moment.

Next update in 60 minutes.

2022-09-07 14:00 UTC - We implemented a hotfix and the situation is returning back to normal. Orchestrations should be scheduled properly now. Problems were caused by the outage of Azure CosmosDB (more information on https://status.azure.com/en-gb/status)

We are monitoring a situation, next update in two hours.

2022-09-07 17:10 UTC - All orchestrations are executed without delay. We apologize for any inconvenience.


Azure North Europe stack - Pay-as-you-go projects - Billing api problems

2022-09-07 10:55 UTC - We are investigating problems with our Billing Api on Azure North Europe stack (https://connection.north-europe.azure.keboola.com/). This is caused by degraded performance of one of the Azure services. Non Pay As You Go projects should not be affected.

Next update in 30 minutes.

2022-09-07 12:10 UTC - Problems with one of the Azure services still persist and our Billing Api is still unavailable. We are waiting for service recovery by Azure Support.

All of Pay As You Go projects are affected:

  • Your credit balance is temporary on zero value
  • Orchestration jobs are not scheduled at the moment
  • You cannot manually execute any component

Next update in 60 minutes.

2022-09-07 13:15 UTC - After our investigation, problem starts at 10:00 UTC. Azure service is still not working properly, but we implemented fix to avoid this situation until Azure Support resolve the root cause.

We keep monitoring the situation closely. At the moment Billing Api service is available, jobs and orchestrations are running.

Next update in 120 minutes.

2022-09-07 17:10 UTC - All Keboola Connection services are running normally. The incident is resolved. Problems were caused by the outage of Azure CosmosDB (more information on https://status.azure.com/en-gb/status)

We apologize for any inconvenience.

Jobs Failing on Internal Errors in Azure North Europe stack

2022-08-30 9:10 UTC - We are investigating some jobs failing on internal error since 6:30 UTC. Next update in 30 minutes.

2022-08-30 9:20 UTC - We identified the root cause in some DNS resolving errors in Azure VMs, see Azure status for details: https://status.azure.com/en-us/status. We are restarting the running nodes which should solve the problem. Next update in 30 minutes.

2022-08-30 9:40 UTC - The Kubernetes nodes that are running containers with jobs were restarted and no error is visible in the logs since then.