Orchestration jobs end after a second but child job still running

2022-05-05 13:10 CET - Starting with 9:10 CET it could happen that a parent job (orchestration or orchestration phase) ends with success even though the child jobs are still running. The child jobs will finish their work normally, but things may appear out of sync in case of nested or chained orchestrations.

The issue is fixed for Azure stacks since 12:50 CET, the fix for AWS stacks is under way.

UPDATE 2022-05-05 14:33 CET - We discovered one of the manifestations may be that all the phases started at once.

UPDATE 2022-05-05 23:30 CET - The issue is fixed since 12:50 CET.

AWS Stacks Maintenance Announcement 2022-05-28

Maintenance of Keboola Connection AWS stacks will take place on Saturday, May 28th, 2022 and should take less than three hours.The following stacks will be affected:

During the maintenance, you won’t be able to access your data and projects. All network connections will be terminated by the "HTTP 503 - down for maintenance" status message.

We will monitor all running tasks and restart any affected by interruption. Orchestrations (flows) and running transformations will be generally delayed, but not interrupted. However, feel free to re-schedule your Saturday orchestrations to avoid this maintenance window.

Stuck jobs in AWS us-east-1 [resolved]

2022-04-15 10:30 UTC - We are investigation stuck jobs on a new queue. Some jobs will probably end up with an internal error. Next update in 15 min.

2022-04-15 10:50 UTC - The situation was resolved, however 6 jobs were terminated by an internal error with no automatic restart. We apologize for the inconvenience and please feel free to restart your jobs manually.

Affected jobs:

  • job-839828541
  • job-839828546
  • job-839830974
  • job-839831721
  • job-839831773
  • job-839831782

Longer jobs runtime on new queue in AWS eu-central-1

2022-04-11 15:04 UTC - We are investigating transient delays in jobs processing. It manifests as a two hours gap without any activity in job events. It is happening randomly across projects and configurations, most of the occurrences are around 04:00 UTC. Only jobs running on new queue are affected. We are investigating the issue, next update in three hours or when new information will be available.

2022-04-11 16:54 UTC - We have increased minimum number of nodes which might help to avoid the issue happening again. Meanwhile we are investigating the root causes of timeouts. We are also working on decreasing timeouts from two hours to much lower value to prevent unnecessary job runtime increase in case of networking issues. Next update when new information will be available.

2022-04-14 12:54 UTC - We have reduced the timeouts from two hours to two minutes. This will prevent a job to get stuck for such a long time when a connection issue occurs. We are still investigating the root networking problem. Next update when new information is available.

Stuck Orchestrations [resolved]

2022-04-05 5:28 UTC - We are investigation stuck orchestration jobs on a new queue. The next update in 15 minutes.

2022-04-05 5:50 UTC - We can see the problem occurs in AWS regions, however we haven't found the root and continue investigation. Next update in 15 minutes.

2022-04-05 6:10 UTC - We rolled back previously deployed version of queue internal component and it seem to unblocked the stuck orchestrations jobs. We don't see any stuck orchestrations for now. We continue monitoring the situation and investigate for the root cause.

2022-04-05 9:10 UTC - We identified a root cause and now preparing a fix. However as of previous quick fix we are not noticing the stuck orchestrations anymore.

2022-04-05 11:40 UTC - We deployed fix and everything is running operational now. The root cause was a misconfigured network access for an internal Queue component.

Delayed processing of jobs in AWS EU stack

2022-04-01 07:36 UTC We are experiencing higher number of jobs in waiting state more than usual. We are investigating the issue. Next update in one hour or when new information will be available.

2022-04-01 08:36 UTC Backlog is cleared, delays were caused by increased traffic.

Scheduled Maintenance (Azure North Europe Region)

Scheduled maintenance of Azure North Keboola Connection (https://connection.north-europe.azure.keboola.com) will take place on Thursday, Mar 31th 2022 at 05:00 pm UTC and should take less than one hour. 

It should not affect project's running jobs, these will be queued during the short project maintenance. Orchestrations and running transformations will be generally delayed, but not interrupted. 

During the maintenance, you can't access your data and projects. All network connections will be terminated by "HTTP 503 - down for maintenance" status message.

We'll update this status about the progress.

UpdateMar 31th, 16:57 UTC - All changes done, in the end not maintenance was necessary.

Some Python Transformations Failing on Date Parsing Error

2022-03-16 12:45 UTC
We have discovered some failing python transformations that throw an exception with the message "bad escape \d at position".  This error was caused by a breaking change to the underlying regular expression library.

If your transformation is failing in this way you can fix it by specifying "regex==2022.03.02" in the packages input.

2022-03-18 15:00 UTC regex package is not part of our base images, so there will be no further actions. Users have set working version of package in their dependencies

High error rate from Sklik API

We are seeing high error rate response from Sklik API

Since 2022-03-15 00:00 UTC we are experiencing jobs failures due to a shortage in Sklik API. We are going to monitor the situation and keep you posted.

Since 2022-03-15 09:30 UTC last error from Sklik API was 8:34 UTC, for now it looks like Sklik API operates normally.