High application error rate on connection.us-east4.gcp.keboola.com stack

2024-11-20 08:00 UTC: Since 04:00 AM, we have been experiencing a high error rate on connection.us-east4.gcp.keboola.com affecting Snowflake operations. This includes Snowflake transformations, storage import/export jobs, and other processes, resulting in intermittent application errors.

2024-11-20 09:40 UTC: We're still waiting for confirmation from Snowflake.

2024-11-20 12:00 UTC: We're still waiting for confirmation from Snowflake.

2024-11-20 14:00 UTC: We're still waiting for confirmation from Snowflake.

2024-11-20 15:00 UTC: We haven't seen an error since 11:20 UTC and Snowflake have confirmed a fix on their side. All operations are back to normal. This is the last update. 

We're sorry for the inconvenience and thanks for your patience.

Planned partial maintenance on Saturday, November 16, 2024 for AWS multi-tenant stacks

The announced partial maintenance of connection.keboola.com and connection.eu-central-1.keboola.com will start in one hour at 07:00 UTC. We will update this post with the progress of the partial maintenance for each stack.

Update 07:00 UTC: The scheduled partial maintenance for connection.keboola.com has begun.

Update 07:15 UTC: The maintenance of connection.keboola.com has been completed, and all services have been scaled back up. The platform is fully operational, and jobs are now being processed as usual. All delayed jobs will be processed shortly. Thank you for your patience. Maintenance of connection.eu-central-1.keboola.com will start at 08:00 UTC.

Update 08:00 UTC: The scheduled partial maintenance for connection.eu-central-1.keboola.com has begun.

Update 08:15 UTC: The maintenance of connection.eu-central-1.keboola.com has been completed, and all services have been scaled back up. The platform is fully operational, and jobs are now being processed as usual. All delayed jobs will be processed shortly. Thank you for your patience.

Job Delays on Azure North Europe stack

13.11.2024 2:16 UTC - Due to an incident on Snowflake, the jobs on Azure North Europe stack (https://connection.north-europe.azure.keboola.com/) are executing very slowly. So far, there doesn't seem to be any errors. The symptoms include jobs running longer than usual, workspaces not starting, table preview not loading data.

The cause of the issue on Snowflake side is not yet known - feel free to view the details at https://status.snowflake.com/incidents/kjd4lpptzmkh.

Update 2:48 UTC - Snowflake incident is still in progress.

Update 3:10 UTC - Snowflake incident is still in progress.

Update 4:35 UTC - The cause of the incident has been identified as an outage of 3rd party provider. 

Update 5:59 UTC - The incident has been identified as Azure connectivity issue and published here https://azure.status.microsoft/en-us/status. There is no ETA available.

Update 7:33 UTC - While the Azure incident is still not resolved, the Snowflake performance is now back to normal. The processing of jobs and other operations of Keboola platform is also back to normal. We will continue to monitor the situation closely.

Update 8:16 UTC - There are still delays in processing jobs and finishing flows. We're working on a fix. 

Update 9:28 UTC - All jobs should now be processing normally. The Azure incident is still not resolved however, so errors may still re-appear

Update 10:50 UTC - All operations are back to normal. 

We are sorry for this inconvenience and thanks for your patience.

Flows failures

On November 12, between 8:00 and 19:00 UTC we have experienced flow failing with error "Cannot process orchestration configuration: Unrecognized option 'isFake'" . It was due to an recent update on the UI, we have reverted to the previous working version and it working as expected. The error affected only flows created or edited within the above timeframe. We have fixed the root cause and fixed most of the affected flow configurations.

However, if you still experience the problem (since the unrecognized option was stored in the configuration of the flow) the option may have to removed manually. To do so, please go to "debug mode" mode of the Flow configuration (on the flow detail page click on three dots -> debug mode) and remove any "isFake": true/false property from the flow configuration. If you are unsure about the process, please contact our support.


Job failures on Azure North Europe stack

12.11.2024 10:30 UTC - Since approximately 10:00 UTC, there is an issue running Jobs on Azure North Europe stack https://connection.north-europe.azure.keboola.com/. Jobs cannot be run (including scheduled ones). We have identified the root cause and working on a fix.

Other stacks are not affected.

Update 10:57 UTC - The root cause is resolved now and new jobs are now starting properly. However there might be flows stuck in processing or terminating state not able to finish. We're working on a fix for this.

Update 11:27 UTC - The fix will be deployed within the next 30 minutes.

Update 12:10 UTC - We are sorry for the delay, the fix will be released any minute now. 

Update 12:21 UTC - All stuck jobs are now processed and all flows work as expected. 

We are sorry for this inconvenience and thank you for your patience.

Notifications failure on GCP stacks

On November 8th, 2024, between 8:17 AM and 8:28 AM UTC, we experienced a brief outage in our notification service on all GCP stacks due to a deployment misconfiguration. The issue was identified and promptly resolved, and the service has been fully operational since 8:28 AM UTC. During this period, no notifications were sent.

We apologize for any inconvenience this may have caused and appreciate your understanding.

Planned partial maintenance on Saturday, November 2, 2024 for all GCP stacks

The announced partial maintenance has just started. The platform has been scaled down and is no longer accepting new jobs. We expect a brief downtime in 30 minutes, around 8:00 UTC for GCP europe-west3, and 8:30 UTC for GCP us-east4 respectively. We will update this post with the progress of the partial maintenance.

2024-11-02 8:00 UTC: The announced partial maintenance of europe-west3 stack is ongoing, and we are expecting downtime to begin any minute now. We will continue to update this post as the maintenance progresses.

2024-11-02 8:13 UTC: The maintenance of europe-west3 stack has been completed, and all services have been scaled back up. The platform is fully operational, and jobs are now being processed as usual. All delayed jobs will be processed shortly. Thank you for your patience.

2024-11-02 8:30 UTC: The announced partial maintenance of us-east4 stack is ongoing, and we are expecting downtime to begin any minute now. We will continue to update this post as the maintenance progresses.

2024-11-02 8:39 UTC: The maintenance of us-east4 stack has been completed, and all services have been scaled back up. The platform is fully operational, and jobs are now being processed as usual. All delayed jobs will be processed shortly.

🎉 This concludes GCP maintenance! Thank you for your patience.

Errors on the GCP EU stack

We are experiencing problems on our GCP EU stack (https://connection.europe-west3.gcp.keboola.com/). We are deeply sorry for the inconvenience this may cause. In the user interface, you can issue an error alert or task slowdown processing jobs. Next update in 30 minutes.

Oct 28th 21:11 UTC: We're still investigating scheduling issues within our underlying infrastructure. Next update in 30 minutes.

Oct 28th 21:18 UTC: The issue has been resolved, and job listing should now work as expected. Thank you for your patience, and sorry for any inconvenience.

Planned partial maintenance on Saturday, October 26, 2024 for all Azure stacks

The announced partial maintenance has just started. The platform has been scaled down and is no longer accepting new jobs. We expect a brief downtime in 30 minutes, around 12:00 UTC. We will update this post with the progress of the partial maintenance.

Update 12:00 UTC: The announced partial maintenance is ongoing, and we are expecting downtime to begin any minute now. We will continue to update this post as the maintenance progresses.

Update 12:22 UTC: The maintenance has been completed, and all services have been scaled back up. The platform is fully operational, and jobs are now being processed as usual. All delayed jobs will be processed shortly. Thank you for your patience.