We have identified a bug in the primary key implementation in Storage which could lead to improper data deduplication. Only a very limited number of tables is affected by this bug – 7 tables in all KBC projects in all stacks. We'll be contacting owners of the affected projects soon to help fixing the affected tables.
Root Cause
The deduplication stopped working when a column used in a compound primary key was deleted.
During this operation, the information about the whole primary key was unknowingly dropped in the Snowflake backend and this was not propagated correctly to our Storage metadata that still contained the primary key (minus the deleted column). In Snowflake, commands such as ALTER TABLE ... DROP COLUMN ...
immediately drop the whole primary key if it’s dropping a column of a compound primary key. The deduplication process retrieves primary key information from the DESCRIBE TABLE ...
Snowflake command which shows no primary key in the affected tables, but our metadata still incorrectly shows that a primary key is set.
We are implementing a fix that will store and retrieve primary keys from a single source.
Operating with Primary Key Columns
- Deleting columns which are part of the table primary key is no longer supported in Storage.
- To delete a primary key column please drop the primary key first.
- To change primary key of a table you will need to first remove the primary key and then set it again.