Increased job wait times in AWS US and EU stack

We're investigating increased job wait times in AWS US stack (connection.keboola.com) and AWS EU stack (connection.eu-central-1.keboola.com) . Next update in 15 minutes or when new information is available. 


UPDATE 12:55 UTC: We have identified the problem and rolled back previous version of our service.

UPDATE 13:05 UTC: All services are now operating normally.

Increased error rate in AWS US stack

We're investigating increased error rate in AWS US stack (connection.keboola.com). Next update in 15 minutes or when new information is available. 

UPDATE 04:40 UTC: We have identified and replaced a number of corrupted nodes with healthy ones, and operations are now back to normal. We apologize for the inconvenience caused.

UPDATE 05:40 UTC: This issue appears to be ongoing, and a new symptom has been identified: jobs are taking longer to start than usual, or are getting stuck in a waiting state. The next update will be in 30 minutes.

UPDATE 06:25 UTC: We're still investigating the issue. Next update in 30 minutes.

UPDATE 07:15 UTC: We have found the root cause and we're fixing it. 

UPDATE 08:44 UTC: The root cause was fixed and all operations are back to normal.


Broken output mapping on legacy queue

2023-01-13 22:45 UTC We have identified an issue with the legacy queue system. Specifically, during Snowflake transformation, the incremental output mapping could ignore filters configured in the "Delete Rows" process, resulting in all rows in the target table being deleted.

The problem began with a release that took place today at 9:30 UTC. At 22:15 UTC we rolled back to a previous version, which has resolved the issue for the time being.

We are still investigating the root cause of the problem and apologize for any inconvenience this may have caused.

2023-01-18 8:35 UTC We found the root cause of the problem and deployed the fixed version. 

Invalid credits balance on PayAsYouGo projects

12:10 UTC On PayAsYouGo projects on the https://connection.north-europe.azure.keboola.com/ stack, an incorrect credit balance may be displayed. The situation will be fixed shortly.

12:15 UTC The credits are now reported correctly again. In case you attempted to run a job within the incident timeframe, it erroneously failed with "You do not have credits to run a job". Please restart such jobs. We sincerely apologize for the trouble.


Failed jobs on eu-central-1 stack (AWS EU)

2023-01-09 07:45 UTC - We have identified an issue on one of the servers running Queue jobs on the EU Central 1 (AWS EU) stack. Numerous jobs are stuck in a terminating state and we are currently investigating the cause of the issue.

2023-01-09 08:05 UTC - We have unblocked the stuck jobs, which were unexpectedly terminated. We are investigating the root cause of the node failure.


Some jobs running two hours longer in AWS EU

In very rare circumstances, some (less than 10 per day) jobs may be delayed by almost exactly two hours in the AWS EU stack (connection.eu-central-1.keboola.com). During this period, the job will be stuck doing nothing for two full hours, and unfortunately terminating the job will not help

We are currently trying our best to debug and fix an underlying network connectivity issue. If you have any questions or concerns, please reach out to our support.

We are sorry for this inconvenience and will provide an update on this post once we know more or have an ETA of the fix.

Update 2023/01/23 8:40 - We have implemented a fix for the issue and so far there are no occurrences of this issue in the past 12hours. We're continuing to monitor the issue thouroughly.


Job failures on eu-central-1 stack (AWS EU)

2022-12-30 08:15 UTC - We are investigating occasional job failures that started on December 29, 2022 at 11:00 PM UTC. We will provide an update with new information when it becomes available.

2022-12-30 09:12 UTC - The error rate is lower, but there are still some occurrences of errors. We are investigating the root cause and will provide an update with new information when it becomes available.

2022-12-30 10:38 UTC - We have identified and fixed the problem, which was caused by rate limiting on the container registry. The last error occurred at 10:08 AM UTC. We are monitoring all systems closely.

2022-12-30 11:23 UTC - We don't see any new occurrences of errors. Platform is fully operational and incident is resolved. 

Failed jobs on eu-central-1 stack (AWS EU)

We have discovered a problem on one of servers running Queue jobs on eu-central-1 stack (AWS EU). Jobs were terminated unexpectedly in between 08:20 UTC and 09:20 UTC. The problem has been removed and all jobs should be running OK now again. We are still looking for the root cause to prevent it happening again in the future. We apologize for any inconvenience this may have caused.

Failed jobs on eu-central-1 stack (AWS EU)

We have discovered a problem on one of servers running Queue jobs on eu-central-1 stack (AWS EU). Jobs were terminated unexpectedly from 12:00 AM CET. We are investigating the cause of the problem

Update 12:50 PM CET - The problem has been removed and all jobs should be running OK now again. We are still looking for the root cause to prevent it happening again in the future.

We apologize for any inconvenience this may have caused.