Problem on US stack

2022-11-01 21:01 UTC - We are investigating problem on US stack, the problem can cause tasks to get stuck. Next update in 30 minutes or when new information will be available.

2022-11-01 21:30 UTC - We still have not been able to find the cause of the problem, we are still looking for the cause and will let you know when we have more information.

2022-11-02 06:06 UTC - We are still investigating issues with synchronous actions (feature which allows eg. testing credentials of Database Data Source) and Python/R workspaces start and stop. Jobs processing is not affected at the moment.

2022-11-02 11:42 UTC - We're working with AWS support engineers to find the root cause of the issue. They have acknowledged the issue and engaged their team to resolve the issue. 

2022-11-02 13:32 UTC - AWS team was able to resolve the issue on their side. Everything is running as expected and the platform is fully operational now. We are sorry for the inconvenience.


Stucked jobs processing

2022-10-31 11:47 UTC - We are investigating stucked jobs processing on multiple stacks. Next update in 30 minutes or when new information will be available.

2022-10-31 12:14 UTC Stucked jobs were caused by earlier release. We rolled back the release but it didn't unblock currently stucked jobs. We are working on the fix and it should be released within 90 minutes.

2022-10-31 14:23 UTC - We're fixing the stucked jobs stack by stack. All stacks should be fixed within 20 minutes.

2022-10-31 15:11 UTC - All jobs has been fixed and platform was stabilized. Everything is running as expected and fully operational now. We are sorry for the inconvenience.


Job failures

Today at 10:26 UTC with latest release of Job Runner, we have introduced a bug resulting in all component jobs to end with Application error. 

We have reverted to previous version now.

We are very sorry for any inconvenience this might have caused.

Stuck jobs in AWS US and EU stack

14:08 UTC: We are investigating created jobs stuck to start processing in AWS US and EU stacks. The next update in 15 minutes.

14:23 UTC: We found the root cause, rolled back the previous working version and newly created job now should process as expected. We are still trying to push previously created jobs (created within 1 hour ago) to the processing state. The next update in 15 minutes.

14:50 UTC: We have released the stuck jobs in US stack, we continue to release the stuck jobs in EU stack. Next update in 30 minutes.

15:15 UTC: All stuck jobs in EU have been released. Everything is operational now and running as expected.

Newly created transformations run failing on unrecognized option

10:10 UTC: We are investigating newly created transformations (in last 24 hours) that if run fails with an error `Unrecognized option "blocks" under "container"`. Next update in 15 minutes.

10:30 UTC [resolved]: We have identified the root cause and revert the latest changes of UI. From now on if a user creates a new transformation it will run without an error. However already created transformation are still affected, and manual update of the transformation is needed. In such case on the transformation detail page go the raw config editing page by appending "/raw" to the transformation detail page url. Then remove the topmost "blocks" array from the configuration and save it. 

Azure North Europe stack - jobs API higher error rate

07:36 UTC: We are experiencing higher error rate of Jobs API, we are investigating it. Next update in 15 minutes. 

07:52 UTC [resolved]: We identified the root cause - the projects migration running in the background caused higher load to our metadata database. This led to slower api response and in some cases timeout of the api request. However this didn't affect any running jobs nor running new ones. Everything is running as expected and fully operational now.

Azure North Europe stack - Pay-as-you-go projects - Billing api problems

2022-09-22 17:50 UTC - We are investigating issue with our Billing Api on Azure North Europe stack (https://connection.north-europe.azure.keboola.com/). Non Pay As You Go projects should not be affected.

Next update in 30 minutes.

2022-09-22 18:20 UTC - The issue was caused by heavy load on the underlying database during parallel migration of some projects to new Queue. Services are now returning to normal operations. It is possible that some orchestrations will have their schedule delayed. We are very sorry for any inconvenience.

Jobs Failing on Internal Errors in Azure North Europe stack

2022-08-30 9:10 UTC - We are investigating some jobs failing on internal error since 6:30 UTC. Next update in 30 minutes.

2022-08-30 9:20 UTC - We identified the root cause in some DNS resolving errors in Azure VMs, see Azure status for details: https://status.azure.com/en-us/status. We are restarting the running nodes which should solve the problem. Next update in 30 minutes.

2022-08-30 9:40 UTC - The Kubernetes nodes that are running containers with jobs were restarted and no error is visible in the logs since then.

Failing Synchronous Actions

Synchronous actions, a mechanism behind some UI features, like testing credentials or listing available databases on a distant server, were affected by a bug and were not working properly from 12:50 UTC until 13:25 UTC when the revert of defective release was finished and the functionality is back to normal since then.