Delayed Storage Events in AWS EU stack

2022-02-02 17:55 UTC We are experiencing delayed indexing of Events in our Storage. We are investigating the issue.

2022-02-02 19:23 UTC Problems with Storage Events delays were caused by increased load to our APIs. At the moment all of delayed events should be already processed and available in events feed. We are monitoring this situation, whether it will not happen again.

Corrupted telemetry data

Dec 9 2021, 11:38 UTC We are currently investigating an issue regarding corrupted data obtained via our Telemetry Data component (keboola.ex-telemetry-data).

Next update in 60 minutes.

Dec 9 2021, 13:07 UTC We have identified the issue in our telemetry data and fixed it. The issue might cause job with no existing configuration to not be assigned to its actual project telemetry data.

We have modified the component so that it now loads data using full loads only. To ensure that you have the correct telemetry data, all you need to do is run the extractor (or wait for your pipeline to run it). We will re-implement the incremental fetching in the following months.

We are very sorry for any inconvenience caused. 

Failing Output Mappings into Storage

2021-08-28 10:25 UTC - We have noticed increase rate of failed jobs when importing data into Storage. They are failing with user errors Some columns are missing in the csv file.

2021-08-28 10:50 UTC - We identified the problem and rolled back the previous version of our Storage.

2021-08-28 11:50 UTC - Failed jobs was related with minor Storage changes released on 2021-08-27 between 8-9 AM UTC.

Table imports started after that could finished with User Errors "Some columns are missing in the csv file." when columns names in the source file was ending with non-alphanumeric characters.

We're sorry for this inconvenience.

GoodData Writer failures in EU region

There is some problem on GoodData API since about 08:00 UTC which causes failures on some model updates and data loads.

We are investigating the problem with GoodData support and will keep you updated.

Update 14:20 UTC - The GoodData Technical Support team is still investigating this issue.

Our GoodData writer component has not changed since January, so we are waiting for a clarification of the root cause from their support team.

Update 15:10 UTC - This issue is related to GoodData's release today. They are preparing a hotfix for it and expect it to be deployed in a few hours.

Update June 5, 08:20 UTC - The problem has been resolved. GoodData deployed a hotfix for their API last night. Since June 4 21:00 UTC we have not seen any new errors from the GoodData Writer.

Degraded Snowflake Performance (EU region) - March 2, 2020

We're experiencing degraded Snowflake performance affecting all operations in the EU region.

This problem can affect our users in the following ways:

  • Job execution time may be longer than usual.
  • Jobs are randomly failing with the error message "Result not found".

This is similar to the issue we observed last week, which was caused by the Snowflake Cloud Service Layer in EU region being overloaded.

We have reported it to Snowflake support and will keep you posted as soon as we have an update.

UPDATE Mar 3, 10:43 CET - The performance of Snowflake operations has currently improved. We're still working with Snowflake support on mitigating this issue permanently.

UPDATE Mar 4, 19:50 CET - We were hit again today with degraded performance of Snowflake's Cloud Service Layer. Between 3:30 PM - 5 PM CET we observed a large increase in compilation time of Snowflake queries. After 5 PM it returned back to normal. We updated Snowflake with our latest observations and are hopeful for a quick resolution. We will update here with any more information within the next 12 hours.

UPDATE Mar 5, 06:35 CET  - We hit again Snowflake performance degradation in last six hours. Snowflake is working on issue mitigation with highest severity. We will provide an update in three hours.

UPDATE Mar 5, 09:49 CET  - Performance degradation peaks still occurs. We are in touch with Snowflake on resolution. We will provide an update in three hours.

UPDATE Mar 5, 13:43 CET  - There has been no change of performance for the past few hours and its degradation still occurs. We will provide an update in three hours.

UPDATE Mar 5, 16:18 CET - We've been working with Snowflake support on mitigating the performance degradation. From our monitoring it seems that Keboola platform in EU region is now stabilised (since around 14:00). We will provide next update in three hours. 

UPDATE Mar 5, 16:43 CET - The issue is a highest priority case with Snowflake support. We're closely collaborating on finding a permanent solution to the performance degradation. Status update from Snowflake: 

Snowflake acknowledges the performance degradation issue reported by Keboola Czech SRO since February 24, 2020 and tracks it under the case #00097987 as a critical incident.

Snowflake Support together with Snowflake Engineering are continuously collaborating and conducting an in-depth investigation to identify the root cause and provide a viable solution to you on priority.

At this time, additional resources have been allocated in the Snowflake Services Layer which should show improved performance values.

We will provide next update in three hours. 

UPDATE Mar 5, 20:55 CET - The performance is now stabilised and jobs are running as expected. We are monitoring the situation and We will provide next update in 12 hours.

UPDATE Mar 6, 09:23 CET - The performance has worsened slightly during the night job peak. We are monitoring the situation and  will provide next update in 3 hours.

UPDATE Mar 6, 12:55 CET - The Snowflake Engineering team made some optimizations of the Cloud Service Layer for our account.  Since 7:40 AM CET there have been no "Result not found" errors and job performance should return to normal state.

We will continue monitoring the situating and will provide another update in 6 hours.

UPDATE Mar 6, 21:45 CET - All operations are back to normal. Thanks for your patience and understanding.

GoodData writer failures (US region)

We have encountered a small number of failed jobs in US region between 4:46 AM - 9:17 AM UTC.

Affected jobs finished with Application Error caused by invalid SSL connection to GoodData and we're investigating the issue.

UPDATE Nov 24, 1:42 PM (UTC) - Only projects having custom GoodData domains was affected by this issue. Problem was with validation of a renewed SSL certificate. The fix is deploying right now.