Increased rate of API Errors

On 2021-08-15, between 22:40 and 11:10 CET we experienced increased rate of Storage API errors. We identified the problem and resolved it. As a consequence some jobs may have failed to start. We're very sorry for this inconvenience. 

Delayed processing of jobs in AWS US stacks

2021-08-02 07:40 UTC - We are investigating component job delays in connection.keboola.com.  Next update when new information will be available or in hour.

UPDATE 2021-08-02 08:13 UTC - We have identified the root cause of overload and added more capacity. Backlog is cleared and new jobs are processed immediately. Jobs started before the incident might be still running longer than usually. We are going to monitor the situation and keep you posted.

UPDATE 2021-08-02 10:14 UTC - Incident is resolved. All operations are back to normal. We're sorry for this inconvenience.



Corrupted telemetry data

We are currently investigating an issue regarding corrupted data obtained via our Telemetry Data component (keboola.ex-telemetry-data).

We have most probably identified the issue and we're working on a fix.

We are very sorry for any inconvenience that this might caused you.

Next update in 15:00 UTC.

Update: We have modified the component so that it now loads data using full loads only. To ensure that you have the correct telemetry data, all you need to do is run the extractor (or wait for your pipeline to run it). We will re-implement the incremental fetching in the following months.


Delayed processing of job in Azure North Europe stack

Since 2021-07-22 06:00 UTC We are experiencing number of jobs in waiting state more than usual. We are going to monitor the situation and keep you posted.

UPDATE 2021-07-22 07:40 UTC We replaced an unhealthy instance with a new one and the issue has been resolved. We're sorry for this inconvenience. We continue the analysis.

Job delays and unsuccessfull job terminations in all Azure stacks

Since 2021-07-20 17:00 UTC some jobs processing may be delayed and job termination requests may be unsuccessful in all Azure stacks. The total number of affected jobs or requests is very small.

This bug was introduced due to a network settings change. The change has been reverted and is currently being deployed to all Azure stacks. If you experience any of the mentioned symptoms please get in touch with our support so we can mitigate the issue faster.

We're very sorry for this inconvenience. 

Delayed processing of job in Azure North Europe stack

Since 2021-07-21 07:00 UTC We are experiencing number of jobs in waiting state more than usual. We are going to monitor the situation and keep you posted.

UPDATE 2021-07-21 08:00 UTC We replaced an unhealthy instance with a new one and the issue has been resolved. We're sorry for this inconvenience.

Elasticsearch incident in AWS EU Central region

We are experiencing timeouts during communication with Elasticsearch running in the AWS EU Central region. It affects the ability of Connection to create Storage events. 

We are investigating the problem and let you know about its progress in an hour.

UPDATE 2021-07-13 13:30 UTC: We confirmed that the problems are caused by an incident with EC2 instances in AWS (see their status page for more details. We are monitoring the situation. Next update in an hour.

UPDATE 2021-07-13 14:30 UTC: AWS is working on resolving the connectivity issues and our monitoring seems to be calming down too so probably the problems will be fixed soon. Next update in an hour.

UPDATE 2021-07-13 15:30 UTC: AWS resolved the connectivity issues and we are replacing all affected instances preventively to eliminate other potential problems. 

Snowflake operations failing

We are experiencing troubles with loading data to Snowflake, probably related to null values in columns set as PK. The issue causes some jobs to fail with the error "NULL result in a non-nullable column" or similar. The issue affects different snowflake components (transformations, writer)

We're investigating the causes for this issue. Next update in one hour.

UPDATE 8:40 UTC: We are still investigating the problem. We managed to reproduce the problem in a test and are performing other steps to find the root cause. Next update in one hour.

UPDATE 9:00 UTC: The root cause of the problem is the fact that Snowflake did not implement NOT NULL constraint on columns set as primary keys in the past but started to enforce it in the latest release. See this announcement for further details: https://docs.snowflake.com/en/release-notes/2021-06-bcr.html#create-table-command-change-to-enforcement-of-primary-keys-created-by-command 

You can fix the problem immediately by removing the null values from columns set as primary keys. We are talking to Snowflake support if it is possible to put back the behavior for selected tables or databases for customers who cannot fix it easily now (as it would e.g. break some incremental loading settings) so please reach us at support@keboola.com if you are in such situation. 

UPDATE 10:00 UTC: We were able to roll back this change with Snowflake support and postpone it for four weeks, giving you time to prepare your data. If you are running your own Snowflake warehouse you will need to disable this change on your own.

UPDATE 11:02 UTC: We've verified in multiple jobs that the issue has been resolved on all our Snowflake accounts.
Unfortunately, there were cases where the issue was not mitigated. The reason is that the change has been released worldwide to all Snowflake clients. So if you're using your own Snowflake account in Keboola (either as backend or in Snowflake writer), you need to rollback the release bundle on your Snowflake account as well.
You need to run the following query on your Snowflake instance

SELECT SYSTEM$DISABLE_BEHAVIOR_CHANGE_BUNDLE('2021_05');

You will need ACCOUNTADMIN role to do that. This will temporarily rollback the problematic behavior. We will either provide a fix on platform level if possible or publish a migration guide to mitigate this once this change will be reenabled in approximately 4 weeks. 

Increased error rate on Snowflake backend in all stacks

Since 2021-06-30 18:00 UTC we are experiencing increased errors on Snowflake backend in all stacks due an incident in Snowflake. We are going to monitor the situation and keep you posted. 

Update 19:07 UTC
Snowflake identified the root cause and they continue working on mitigating the problem.

Update 20:07 UTC
Snowflake claims that infrastructure is up and running. 

Update 2021-07-02 07:30 UTC
Snowflake has published report about the incident