Table primary key consistency incident

We have identified a bug in the primary key implementation in Storage which could lead to improper data deduplication. Only a very limited number of tables is affected by this bug – 7 tables in all KBC projects in all stacks. We'll be contacting owners of the affected projects soon to help fixing the affected tables.

Root Cause

The deduplication stopped working when a column used in a compound primary key was deleted.

During this operation, the information about the whole primary key was unknowingly dropped in the Snowflake backend and this was not propagated correctly to our Storage metadata that still contained the primary key (minus the deleted column). In Snowflake, commands such as ALTER TABLE ... DROP COLUMN ... immediately drop the whole primary key if it’s dropping a column of a compound primary key. The deduplication process retrieves primary key information from the DESCRIBE TABLE ... Snowflake command which shows no primary key in the affected tables, but our metadata still incorrectly shows that a primary key is set.

We are implementing a fix that will store and retrieve primary keys from a single source.

Operating with Primary Key Columns

  • Deleting columns which are part of the table primary key is no longer supported in Storage.
  • To delete a primary key column please drop the primary key first.
  • To change primary key of a table you will need to first remove the primary key and then set it again.

Week in Review -- October 08, 2018

Updated Components

  • MS SQL Server extractor has had a driver update.  Advanced queries now have slightly different behaviour so it is recommended to switch to simple table and column selected configurations.  Where that is not possible, please see the documentation for further help.
  • Snowflake writer - minor bugfixes and enhancements
    • primary key is added when creating a table
    • nonexistent schema/warehouse validation
    • warehouse detected correctly for users with different login name and user name
  • Database extractors - column datatypes are now shown in UI

Minor Improvements

    • You can now change the sharing type of a shared bucket in the storage console.

    Stalled jobs in EU region

    Around 1:30am CEST one of the job worker instances stopped processing assigned jobs. This could have lead to jobs being stuck in the processing state for a long time without any activity.

    At 11:15am CEST the worker instance was terminated and all unfinished jobs started processing on other instances.

    We're sorry for this inconvenience.

    Failed and delayed jobs in EU region

    Database storing locks was restarted at 03:49 UTC which caused the job failures. Also some of the jobs were queued after this failure.

    The backlog of all jobs was cleared at 06:15 UTC. The system is fully operational now. We're working on infrastructure changes which should prevent similar issues.

    Scheduled US and EU Maintenance

    There will be a maintenance period on Saturday, October 6th, 2018 from 8:00am CEST and should take less than 5 hours.

    We will be upgrading component job indices and metadata databases.

    All projects in and will be inaccessible during the maintenance.

    Degraded Snowflake Performance (US region)

    Since September 25 we're experiencing degraded Snowflake performance affecting all Snowflake operations. 

    We're sorry for this inconvenience, we're working with Snowflake to fix this issue.

    Update, October 8

    Snowflake Engineering team has discovered and fixed the issue (waiting for an official statement from Snowflake). We're seeing operation times going back to normal.