Snowflake Incident Investigation

We were notified from Snowflake support about problem that could potentially affect Keboola Connection Snowflake Backed projects.

The issue happened between 9:19pm UTC August 8th and 6:45pm August 9th UTC.

There is a full alert from Snowflake below. We are investingating with them if we can find out if there are some affected KBC projects.

We will keep you informed of any progress.

UPDATE 2017-08-12 We have received list of projects which might be affected by this issue. In these projects one or more queries which might return wrong results were executed. We have notified these projects in KBC. If you need any assistance please contact support@keboola.com.


Dear Snowflake Customer,


We have identified a problem that could produce incorrect results in some queries that perform memory intensive join operations. It is unlikely that this issue impacts you; however, we wanted to make sure you were aware of the problem. The issue was isolated to the US-West region between 2:19pm Pacific time on Tuesday, August 8th and 11:45am Pacific time on Wednesday, August 9th. Queries performed in other regions or outside that timeframe were not impacted.


Because of the nature of the issue, it is difficult for us to determine if any of your queries experienced the problem. However, we are able to perform further analysis on a query-by-query basis. If you have a question about whether a specific query was impacted, please submit a support request to Snowflake support via the Support Portal or via email to support@snowflake.net .


We are very sorry about this issue and will do everything possible to help you resolve any problems this may have caused you.


Components errors

We observe components failures hosted on quay.io from 08:25  AM CET. Python Transformations, Facebook Extractor, Google Sheets Writer are between affected components.

Quay.io outage is not reported yet. To avoid this issue in future we are moving all components to Keboola provided AWS repository.

We will inform about the progress.

9:09 AM CET RESOLVED Last error was seen at 08:53 AM CET. Quay.io has confirmed and fixed the issue.

End of life announcement for synchronous table exports

The deprecated table synchronous exports will be shut down on 21st Aug 2017. 

This API call was replaced by Asynchronous table export for full table unloads and Data Preview for small samples of table data.

All Keboola Connection components use the new methods. As there are only a few projects currently using the deprecated method directly, these projects will be notified individually. For all other projects there is no action required.

Week in Review - July, 31, 2017

New Features

  • RStudio and Jupyter Sandboxes now support (and enforce) HTTPS connections.

Updated Components

Qlik and Looker writers were updated:

  • Improved and unified UI

  • Optional provisioned Redshift database

Fixes

  • Fixed freezing of columns in Google Sheets Writer


Job failures

There were jobs failures between 4:30 AM - 5:30 AM CET. Failures were caused by low disk space of one of worker servers.

We are sorry for this inconvenience and we're taking steps to mitigate this problem in the future.

Deprecated Facebook Ads API v2.8 and below

Facebook is deprecating all Marketing API versions prior to v2.9 on Wednesday, July 26, 2017 and recommends upgrading to version v2.9 immediately. Facebook Ads Extractor uses v2.8 by default for all newly created configurations but after July 26 all newly created configurations will have default api-version set to v2.9.

The migration should only require changing the api-version parameter in all existing configurations of our Facebook Ads Extractor. However, since there may be some breaking changes, we strongly recommend to change the api-version in your configurations manually, review any possible changes and take the appropriate actions. For more details on the new version, please read Marketing API Changes in v2.9 in the Facebook API changlelog. Of those changes we'd like to highlight the deprecation of date_preset values which are replaced with new ones, e.g. last_3_days, replaced by last_3d .

In other Facebook news, they have announced version v2.10 of Facebook API and Facebook Marketing API.

Storage Jobs Stuck in Processing State

We have noticed an increased number of Storage jobs stuck in the processing state, in rare cases causing a complete halt and queuing of all new Storage jobs in a project.

We have identified the root cause. Due to the Snowflake incident last night it seems some of the database transactions were still open and they blocked queries from consecutive jobs. We have terminated all orphaned transactions and jobs started processing and restarting. 

Please note, that queued jobs are processed in random order. 

We are sorry for this inconvenience.

Sudden Jobs Failures on July 6th, 2017

On July 6th, 2017, between 9:16-9:18am CET one of our internal databases was forced to update and restart by AWS. Result of this action was Application error failure of all running components jobs. The error may show up significantly later than the restart.  We recommend to review your orchestrations jobs and take action if needed. We are sorry for this inconvenience and we're taking steps to mitigate this problem in the future.

Jobs queues overload (resolved)

since 3:50am CET we are experiencing server queues overload. We are still investigating the issue and will inform about the progress.

UPDATE 5:45am CET- We found the possible root of the cause, Snowflake queries are being unusually queued, furthermore we are unable to raise power or number of cluster of our Snowflake warehouses and we waiting for Snowflake Support findings.

UPDATE  6:10am CET from Snowflake support: Engineering has started mitigation steps to address this issue, we will provide another update shortly.

UPDATE 7:00am CET: Snowflake queries queues has got empty and everything looks to be back to normal.

UPDATE RESOLVED 9:00am CET:  We confirm the issue has been resolved, the cause was the unexpected queueing of Snowflake queries. Snowflake confirmed and rolled back their latest release. However, as a consequence our waiting jobs reached backoff threshold and timeout, which resulted in orchestration failures. We recommend to review your project orchestrations jobs and take action if needed.