Orchestrator stuck on application error during job creation on north-europe.azure.keboola.com

2024-10-14 16:30 UTC  We are observing a small number of instances where errors occur during job creation, and you may encounter the error message: “Decryption failed: Deciphering failed.” As a result, orchestrations may become stuck in the terminate state. If you experience this issue, please contact our support team.

We are actively investigating the situation and will provide an update later this evening.

2024-10-14 21:40 UTC We have successfully identified the affected orchestrations and deployed a fix that automatically terminates them. We now consider this incident resolved. We sincerely apologize once again for the inconvenience caused.

Errors in the AWS EU Stack

We are experiencing problems on our AWS EU stack (https://connection.eu-central-1.keboola.com/). We are deeply sorry for the inconvenience this may cause. In the user interface, you can issue an error alert or task slowdown processing jobs. Next update 30 minut.

Sep 26 08:34 UTC: We identified and fixed an overload on one of our Kubernetes node. All systems are now running normally. We’ve implemented measures to prevent recurrence.

Thank you for your patience.

Slowing Down Queue Jobs on AWS EU Stack

We noticed a slowdown in our AWS EU stack’s job queue from 12:00 to 14:00 UTC, due to a temporary service performance issue. We sincerely apologize for any inconvenience this may have caused.

Our team has resolved the problem and we are taking steps to prevent future occurrences. Thank you for your understanding and patience.

Error on installing python packages in transformations

7:20 UTC: We are investigating an error when installing a python package in the transformations.
You may see errors such as: Job "XXXXXXX" ended with a user error "Failed to install package: io".
Next update in 15 min.

UPDATE 7:38 UTC: We have rolled back a previous version and all operations are back normal. We're sorry for this inconvenience. 



List of tables in buckets may not work correctly

Today at 15:30 UTC we noticed a problem with the listing tables in Storage. Tables fail to display only to new users in the project. There can be related issues such as not being able to load data to workspace. All of which apply only to new users. We are seeing this problem across all stacks and regions. 

We are working on a fix, next update in 30 min.

UPDATE 16:10 We have identified the root cause and are working on a fix. Next update in 2 hours.

UPDATE 19:00 The incident is now resolved, and tables are displayed correctly in storage for all users.

We apologize for the inconvenience.

Limited service disruption for AWS US

A limited service disruption on AWS EU stack will start at 15:00 UTC today, as announced earlier. Storage jobs, Queue v1, and Orchestration (in projects with Queue v1) processing will stop and new jobs will be delayed until the upgrade is completed. All running jobs will be cancelled, but will resume after the upgrade.

All APIs and other unaffected services, such as Workspaces and Queue v2 jobs, will remain operational, though their operations may be delayed due to the Storage job delays. We will provide an update when the service disruption starts and ends.

We apologize for any inconvenience caused and thank you for your understanding.

Update 15:00 UTC: The limited service disruption has begun.

Update 15:35 UTC: The service disruption has been resolved and the stack is now fully operational. 

Thank you for your patience.

Investigating higher latency through all stacks

As of 29 November 13:45 UTC, we are investigating higher latency for some requests in the through all stacks.
  • It might leads to errors in the UI
  • Job processing is not affected
We'll be doing a rollback to the previous version. Next update in 30 min.

UPDATE 2023-11-29 13:20 UTC - All operations are back to normal and everything is fully working.