Increased error rate in all stacks

UTC 9:40: We're seeing reports of increased number of application errors in all stacks. It seems that mostly exporting tables is affected.

We're investigating the issue. Next update in 15 minutes.

UTC 9:55: The issue was caused by temporary internal inconsistency during deployment of one of our services. Approximately 30 jobs failed across all stacks. The issue is now resolved. We apologize for the konvencie.


Slow jobs start in AWS EU stack

July 10th, 12:20 UTC: We are investigating slow jobs start in AWS EU stack that we started to experience since 5th of July at CET midnight time.

July 10th 13:30 UTC: We have implemented certain measures that we believe could mitigate the issue; however, we have not yet identified the root cause. We will continue to closely monitor the situation and conduct further investigation. The next update will be provided tomorrow (July 11th) or as soon as new information becomes available.

July 11th 11:33 UTC: We are still experiencing intermittent slow job starts during peak times, and our investigation is ongoing. The next update will be provided as soon as new information becomes available.

July 13th 10:46 UTC: At 09:45 UTC, we deployed multiple optimizations to address and reduce job start delays. We will continue to closely monitor the situation, and we will provide the next update as soon as new information becomes available.

July 14th 06:34 UTC: Significant improvements have been achieved since the previous deployment, restoring performance to pre-July 5th levels. We continue to monitor the situation closely to maintain stability. Thank you for your patience and support.

July 17th 06:44 UTC: Performance is back to pre-July 5th levels, the issue is now resolved. We apologize for any inconvenience caused.

Telemetry - Data issue in kbc_usage_metrics_values table

After last telemetry update, incremental processing of kbc_usage_metrics_values table might caused showing of higher credits usage for some projects and usage breakdowns.

The data have been processed in full to ensure wrong records are fixed or removed.
The Telemetry Data Extractor has been switched to force full load till Monday 17th July, so the data are fixed in projects using this component with incremental load.

Project UI not loading

2023-07-04 11:15 UTC We are investigating problems with UI loading on all stacks.

2023-07-04 11:35 UTC Project UI is now working. Root cause was bug in UI deployment.

We apologize for any inconvenience caused.

Jobs list failures

2023-06-30 12:20 UTC We are investigating problems with listing jobs on all stacks. The error is manifested by the Invalid configuration for path \"job.branchType\": BranchType must be one of dev, default message.

Next update in 30 min

2023-06-30 12:45 UTC [resolved] We have re-deployed the last functional version and the problem is now solved.

We apologize for any inconvenience caused.

Storage job failures in the AWS EU stack

We are observing an increased number of faulty storage jobs, resulting in the error message "Cannot import data from Storage API“ in connection.eu-central-1.keboola.com. The main cause has been identified and resolved, and now all systems should be running smoothly. We will continue to monitor the situation, and the next update will be provided in 30 minutes.

We apologize for any inconvenience caused.

UPDATE 7:20 UTC [resolved] All systems are functioning normally, and the incident has been resolved and closed.

New Outbound IP Addresses for Keboola Connection: Last Call

This is a reminder that the deadline to update your whitelist for the new outbound IP addresses is approaching. It is crucial to act before June 30, 2023, to avoid any disruption to your connectivity.

If you are still seeing the following alert in your projects, then you have not yet migrated to the new IP addresses:

Please note that if you have not manually updated your whitelist by the deadline, Keboola will perform the switch globally. This means that your projects will be automatically switched to the new IP addresses after June 30, 2023.

If you have not yet migrated, please follow the actions required as described in the New Outbound IP Addresses announcement.

Stuck storage jobs in Azure North Europe Stack

Today, 16th of June  since 3:03 UTC we are experiencing jobs are stuck on import and export data. It is due to a Snowflake incident in Azure west europe region https://status.snowflake.com/ where the warehouse of the Azure North Europe stack is located.

We monitor the snowflake incident and keep you updated here.

UPDATE 6:15 UTC - the snowflake incident is still ongoing, with the last update at 05:28 UTC: "We've identified an issue with a third-party service provider, and we're coordinating with the provider to develop and implement a fix to restore service. We'll provide another update within 60 minutes.". The issue is most likely due to a problem in Azure, which informed about an incident in West Europe region see https://azure.status.microsoft/en-us/status.

UPDATE 7:00 UTC  - we see progress, that is storage import/export data jobs are being processed. However the snowflke incident is still open, we continue to monitoring it.

UPDATE 8:00 UTC [resolved] - Snowflake has resolved incident stating "We've coordinated with our third-party service provider to implement the fix for this issue, and we've monitored the environment to confirm that service was restored. If you experience additional issues or have questions, please open a support case via Snowflake Community.". We don't see any more stuck jobs so we conclude it is resolved as well.

Degraded AWS US/EU Stack (connection.keboola.com,connection.eu-central-1.keboola.com)

2023-06-13 19:40 UTC Service components.keboola.com is degraded due incident in AWS US-EAST-1 Region https://health.aws.amazon.com/health/status we are monitoring situation.

2023-06-13 20:15 UTC Incident in AWS is affecting also our oauth authorization service in AWS US Stack (connection.keboola.com). All components relying on oauth authorization could be affected and may randomly fail. 

2023-06-13 20:20 UTC We are investigating slower jobs processing in AWS US Stack (connection.keboola.com)

2023-06-13 20:40 UTC Incident in AWS US-EAST-1 is causing jobs to be stuck on both AWS US (connection.keboola.com) and AWS EU (connection.eu-central-1.keboola.com) stack. This includes components jobs and services which are running jobs as part of their workflow, like creation of workspace.

2023-06-13 20:55 UTC AWS is reporting incident as resolved. All services are running normally, some jobs may still take longer to process due to large number of jobs waiting in queue. We are monitoring situation.