Stuck storage jobs in Azure North Europe Stack

Today, 16th of June  since 3:03 UTC we are experiencing jobs are stuck on import and export data. It is due to a Snowflake incident in Azure west europe region https://status.snowflake.com/ where the warehouse of the Azure North Europe stack is located.

We monitor the snowflake incident and keep you updated here.

UPDATE 6:15 UTC - the snowflake incident is still ongoing, with the last update at 05:28 UTC: "We've identified an issue with a third-party service provider, and we're coordinating with the provider to develop and implement a fix to restore service. We'll provide another update within 60 minutes.". The issue is most likely due to a problem in Azure, which informed about an incident in West Europe region see https://azure.status.microsoft/en-us/status.

UPDATE 7:00 UTC  - we see progress, that is storage import/export data jobs are being processed. However the snowflke incident is still open, we continue to monitoring it.

UPDATE 8:00 UTC [resolved] - Snowflake has resolved incident stating "We've coordinated with our third-party service provider to implement the fix for this issue, and we've monitored the environment to confirm that service was restored. If you experience additional issues or have questions, please open a support case via Snowflake Community.". We don't see any more stuck jobs so we conclude it is resolved as well.

Degraded AWS US/EU Stack (connection.keboola.com,connection.eu-central-1.keboola.com)

2023-06-13 19:40 UTC Service components.keboola.com is degraded due incident in AWS US-EAST-1 Region https://health.aws.amazon.com/health/status we are monitoring situation.

2023-06-13 20:15 UTC Incident in AWS is affecting also our oauth authorization service in AWS US Stack (connection.keboola.com). All components relying on oauth authorization could be affected and may randomly fail. 

2023-06-13 20:20 UTC We are investigating slower jobs processing in AWS US Stack (connection.keboola.com)

2023-06-13 20:40 UTC Incident in AWS US-EAST-1 is causing jobs to be stuck on both AWS US (connection.keboola.com) and AWS EU (connection.eu-central-1.keboola.com) stack. This includes components jobs and services which are running jobs as part of their workflow, like creation of workspace.

2023-06-13 20:55 UTC AWS is reporting incident as resolved. All services are running normally, some jobs may still take longer to process due to large number of jobs waiting in queue. We are monitoring situation. 

Stuck jobs and failures in AWS EU stack

2023-06-06 02:08 UTC We experienced incident on connection.eu-central-1.keboola.com. Some jobs ended in error due to an underlying node failure. We're still investigating the root cause.

Update 2023-06-06 02:38 UTC The incident has been resolved. A small number of jobs on the connection.eu-central-1.keboola.com stack either ended by timeout or with a "Component terminated. Possibly due to out of memory error" error message during a recent incident. 

We are continuing to monitor the situation closely to prevent any reoccurrence. 

Platform Update: Transition to Datadog for Platform Logs Monitoring - Vendors Only

Beginning June 1st, 2023, we are transitioning our platform logs monitoring system from Papertrail to Datadog. This is a platform-level change and does not affect user experience or functionality. Regular users are not affected by this change.

For our 3rd party Keboola component vendors, this change modifies the way you receive application error notifications:

  1. Email Notifications Only: Notifications will now be sent exclusively via email. Webhook support may be considered in the future.

  2. Notification Email Address: Vendors previously notified via Papertrail or generic webhook will now receive notifications to the email address specified in their vendor profile. Vendors who were already receiving notifications via email will continue to do so at the same email address.

  3. New Sender Email Address: All notifications will come from alert@dtdg.eu.

Should our vendors have any questions or concerns regarding this change, please contact us at support@keboola.com.

Slowdown of processing of jobs on Azure North Europe stack

Since 13:40 UTC we're seeing job starting with delays on https://connection.north-europe.azure.keboola.com/ We're investigating the situation. Next update in 30 minutes.

14:14 UTC - All systems are now operating normally.

If your project run out of credits and you have enabled automatic top-up, this would have failed between approximately 13:40 to 14:10. Restarting the job will trigger automatic top-up correctly now.

We apologize for any inconvenience caused.

Orchestrations not starting on legacy job queue

2023-04-26 11:00 UTC - We have discovered a problem with orchestrations not starting on the legacy queue. We are currently investigating possible causes.

2023-04-26 11:30 UTC - The problem was caused by a release earlier today, and as a result, no orchestrations on the legacy queue were run since 08:10 UTC. We have done rollback of the release and orchestrations should be functioning properly again as of 11:30 UTC. We apologize for any inconvenience caused.

Limited service disruption for AWS US

A limited service disruption on AWS US stack will start at 09:00 a.m. UTC today, as announced earlier. Storage jobs processing will stop and new jobs will be delayed until the upgrade is completed. All running jobs will be cancelled, but will resume after the upgrade.

All APIs and other unaffected services, such as Workspaces and Jobs, will remain operational, though their operations may be delayed due to the Storage job delays. We will provide an update when the service disruption starts and ends. 

We apologize for any inconvenience caused and thank you for your understanding.

Update 08:50 a.m. UTC: The limited service disruption has begun.

Update 09:20 a.m. UTC: The service disruption has been resolved and the stack is now fully operational. 

Thank you for your patience.

Limited service disruption for AWS EU

A limited service disruption on AWS EU stack will start at 07:00 a.m. UTC today, as announced earlier. Storage jobs processing will stop and new jobs will be delayed until the upgrade is completed. All running jobs will be cancelled, but will resume after the upgrade.

All APIs and other unaffected services, such as Workspaces and Jobs, will remain operational, though their operations may be delayed due to the Storage job delays. We will provide an update when the service disruption starts and ends. 

We apologize for any inconvenience caused and thank you for your understanding.

Update 6:55 a.m. UTC: The limited service disruption has begun.

Update 07:45 a.m. UTC: The service disruption has been resolved and the stack is now fully operational. 

Thank you for your patience.

Python workspace can't be created from transformation

2023-04-13 07:40 UTC We are investigating failing workspace creation on EU stack (connection.eu-central-1.keboola.com). The issue is manifesting as Loading data to workspace failed: Client error:... when you try to create a workspace from python transformation. More information within the hour.

2023-04-13 08:40 UTC We are still investigating root cause. This issue happens only when creating new workspace from python transformation with empty input mapping on connection.eu-central-1.keboola.com. More information within the hour.

2023-04-13 09:15 UTC The service disruption has been resolved and the stack is now fully operational.

We are very sorry for the inconvenience.