Incident with Snowflake in the US Region

We are currently investigating an increased error rate from snowflake in the US region from approximately 10:00PM CEST.

We will update here as soon as we know more.

UPDATE 11:05 PM CEST: We are handling the issue with Snowflake support. So far all Snowflake operations in US region seem to be failing. Next update at 11:30 PM or sooner if there are any new information or situation changes.

UPDATE 11:30 PM CEST: Snowflake rolled back the release they made today and everything has returned to working condition.

UPDATE 12:00 PM CEST: We're very sorry for this inconvenience. The error started at 12:58 PST (19:58 PM UTC) and lasted until 14:24 PST (21:24 PM UTC). All new Snowflake connections in the US (including those from your DB clients) were failing during the period.

Unfortunately you will need to restart any failed jobs or orchestrations from this time period.

EU region was not affected by this issue.

Snowflake Slowdown in EU

A scaling script running at 12:00 AM CEST failed to scale up the Snowflake warehouse in EU region. All storage and transformation jobs in the EU were affected by this issue and were significantly slower than usual. 

To help process the queued load we have scaled up the warehouse at 9:45 AM CEST and will keep it running until all load is processed.

We're sorry for this inconvenience and we'll be implementing safeguards to prevent this from happening again. 

Degraded Snowflake Performance (EU region) - April 8, 2020

We are investigating decreased performance of Snowflake in EU region which unfortunately reoccured after previous resolutions. We are in touch in with Snowflake support. Job performance and sandbox loading times may be affected. Next update at 12:30pm UTC.

Update 12:30 UTC: We are handling the performance issue with Snowflake support, we've offset the slowdown by scaling up the cluster. We'll have more information in about an hour. We caught the issue early on, so we hope it will have minimal impact on jobs, apart from a small slowdown. So far we've seen 3 job failures because of this across the whole EU region. We'll post another update at 14:30 UTC or sooner if there are any new information or situation changes. 

Update 14:30 UTC: We are still working with Snowflake on resolving the issue. The situation is currently stable and we did not see any jobs failing since the last update. Our main goal is currently to mitigate the issue before the midnight job surge. Next update in 18:30 UTC or sooner if there are any new information or situation changes. 

Update 18:15 UTC: We are still working on mitigating the slowdown. We've seen only 3 related job failures since last update, so we still consider the situation stable. We think that the issue will be resolved in the following hour.

Update 19:30 UTC: We're monitoring the situation and the performance is improved and close to previous values. We should have fresh aggregated monitoring data an aprroximately 15 minutes and we expect them to show complete recovery to the standard performance. 

Update 20:01 UTC: The issue has been resolved. 

Errors in Generic Extractor Post-Mortem

Summary


On April 4, 2020 at 10:07 UTC, we deployed a version of Generic Extractor which contained a bug.
Some Generic Extractor jobs failed with the following error:

CSV file "XXX" file name is not a valid table identifier, either set output mapping for "XXX" or make sure that the file name is a valid Storage table identifier. 

Generic Extractor was reverted to its previous version at 14:08 UTC. The error affected 10% of all Generic Extractor jobs running during the four-hour period. We sincerely apologize for the trouble this may have caused you.

What Happened?

We changed the output generation rules so tables are always generated even if empty. Table names are normally generated using the outputBucket setting. However, it can also be done using undocumented alternative settings via ID or name properties. Unfortunately, the new code did not take the alternative settings into account and failed to generate correct table names.

What Are We Doing About This?

We have extended the tests to cover the undocumented settings, though we recommend you stick with the documented ones.

Errors in Generic Extractor jobs

Today we have released a version of Generic extractor in which a bug was present. It caused certain specific configurations to fail with the error:

CSV file XXX file name is not a valid table identifier, either set output mapping for XXX or make sure that the file name is a valid Storage table identifier. 

We have reverted the release. We sincerely apologize for the error. We will publish a postmortem next week.


Orchestrations API increased error rate in EU

There are some problems causing errors of Orchestrations API responses in EU region. We are investigating and will give here more details in under an hour.

UPDATE Apr 2 11:32 CEST - The errors stopped occurring by now. We are watching it and investigating the root cause.

UPDATE Apr 2 12:05 CEST - We've found out that API servers were flooded with some unexpected requests bursts. We've upgraded the infrastructure and will find a way how to prevent such a situation for next time.

Week in Review - March 31th, 2020

UI Improvements

  • Action buttons are now directly accessible when hovering over list items in transformations and components which use generic input or output mappings.
  • We added a new modal to improve the orchestration set-up experience.  You can now more easily schedule orchestrations on an hourly, daily or weekly basis. There's still an option to set up a custom schedule.
  • When you want to edit tables or edit credentials in your database writers, you no longer have to click on the “Edit” button, you can directly edit the values and push “Save“ button.
  • We added a new modal for database writers that support provisioned credentials(Redshift, Snowflake). You can now directly create provisioned credentials.

Minor Improvements

  • Julia transformation and sandbox have been updated to julia1.4


Transformation failures

We’re currently experiencing a transformation failures, we are investigating the problem. Next update in one hour.

UPDATE March 31, 6:17 AM UTC: We've identified the issue and deployed rollback. Transformations started after 6:11 AM should run without any issues. We’re monitoring to ensure transformations are running as expected. Next update in one hour.

UPDATE March 31, 7:30 AM UTC: The rollback was finished and no other issues were reported within the last hour. We are going to investigate the root cause and publish post mortem soon.

Transformation errors

Since March 26, 4:00 PM UTC we are experiencing failures for starting transformations in US and EU regions with error Storage API bucket 'configuration_id' with configuration not found.

Error was caused by incorrect configuration.

We're investigating the issue and will update this post with our findings.

We apologize for the inconvenience.

UPDATE March 26, 4:35 PM UTC: Problem was fixed.

Degraded Snowflake Performance (US region) - March 24, 2020

Since March 24, 8:15 am UTC we are seeing decreased performance of Snowflake in US region. That may cause degradation in performance jobs and sandbox loading in US region. We are investigating the causes. Next update in one hour.

UPDATE Mar 24, 10:10 UTC  - Performance should be back to normal, we're closely monitoring the situation.