Snowflake operations failing

We are experiencing troubles with loading data to Snowflake, probably related to null values in columns set as PK. The issue causes some jobs to fail with the error "NULL result in a non-nullable column" or similar. The issue affects different snowflake components (transformations, writer)

We're investigating the causes for this issue. Next update in one hour.

UPDATE 8:40 UTC: We are still investigating the problem. We managed to reproduce the problem in a test and are performing other steps to find the root cause. Next update in one hour.

UPDATE 9:00 UTC: The root cause of the problem is the fact that Snowflake did not implement NOT NULL constraint on columns set as primary keys in the past but started to enforce it in the latest release. See this announcement for further details: https://docs.snowflake.com/en/release-notes/2021-06-bcr.html#create-table-command-change-to-enforcement-of-primary-keys-created-by-command 

You can fix the problem immediately by removing the null values from columns set as primary keys. We are talking to Snowflake support if it is possible to put back the behavior for selected tables or databases for customers who cannot fix it easily now (as it would e.g. break some incremental loading settings) so please reach us at support@keboola.com if you are in such situation. 

UPDATE 10:00 UTC: We were able to roll back this change with Snowflake support and postpone it for four weeks, giving you time to prepare your data. If you are running your own Snowflake warehouse you will need to disable this change on your own.

UPDATE 11:02 UTC: We've verified in multiple jobs that the issue has been resolved on all our Snowflake accounts.
Unfortunately, there were cases where the issue was not mitigated. The reason is that the change has been released worldwide to all Snowflake clients. So if you're using your own Snowflake account in Keboola (either as backend or in Snowflake writer), you need to rollback the release bundle on your Snowflake account as well.
You need to run the following query on your Snowflake instance

SELECT SYSTEM$DISABLE_BEHAVIOR_CHANGE_BUNDLE('2021_05');

You will need ACCOUNTADMIN role to do that. This will temporarily rollback the problematic behavior. We will either provide a fix on platform level if possible or publish a migration guide to mitigate this once this change will be reenabled in approximately 4 weeks. 

Increased error rate on Snowflake backend in all stacks

Since 2021-06-30 18:00 UTC we are experiencing increased errors on Snowflake backend in all stacks due an incident in Snowflake. We are going to monitor the situation and keep you posted. 

Update 19:07 UTC
Snowflake identified the root cause and they continue working on mitigating the problem.

Update 20:07 UTC
Snowflake claims that infrastructure is up and running. 

Update 2021-07-02 07:30 UTC
Snowflake has published report about the incident 

New Python/R Workspaces Data Loading Issue In AWS

Today, June 14, 2021, we released a number of new features on our AWS stacks.

Unfortunately we have noticed an issue that when loading larger datasets to the new Python or R workspaces the job will fail.

We're investigating the root cause and will update here when we know more or the issue gets resolved. 


UPDATE 17:59 UTC  - We've deployed fix which should resolve the loading issues. We continue to monitor the situation.

Slowdown processing of Transformation jobs in AWS EU stack

Beginning with 4:05 UTC 2021-06-11 we're seeing slowdown processing of transformation jobs in AWS EU stack. We're working on a fix. Next update in 30 minutes.

UPDATE 7:02 UTC We increased capacity of infrastructure and the situation is better. We continue to monitor the situation.

UPDATE 7:30 UTC  The situation is stable. All operations are back to normal. We're sorry for this inconvenience.

Python transformation update

The current environment for transformations is fairly outdated (due to compatibility concerns) and needs an update. New transformation environments are now available with an update plan. You may choose to leave your environment in auto update mode or lock your environment to either the current or the future version.

Action needed:
If you want to prevent the new transformations from being updated automatically later this month, you must select the version you wish to use. This update is planned for June 22, 2021.

The following options are available:

  • Leave your setting as is: “1.4.1 - Python 3.8.5 + Pandas 0.25.3 (updates automatically)”. The transformations will update automatically to Python 3.9.5 + Pandas 1.2.4 on June 22, 2021. We recommend choosing this option if you are not relying on Pandas, as it is very likely that your transformations will work without any changes.
  • Lock your transformation environment to the current version by selecting “0.3.3 - Python 3.8.5 + Pandas 0.25.3 (locked)”. The update on June 22 will not affect the transformation.
  • Lock your transformation to the latest future version: “Python 3.9.5 + Pandas 1.2.4 (locked)”. The update on June 22 will not affect the transformation.
  • If you do not want to jump versions, you can choose to lock your transformation to an intermediate version: “Python 3.8.10 + Pandas 1.1.5 (locked)”. The update on June 22 will not affect the transformation.

The Sandbox and Workspace environments will keep running the current version until the switch date on June 22, 2021, at which point they will switch to the latest version (we're jumping the intermediate Python 3.8.10 + Pandas 1.1.5). If you want to continue to use the current (outdated) version of Pandas in the Sandbox/Workspace after this date, you'll have to install it manually.

In the future, we will also be offering at least two versions of the Python transformations: one that updates automatically (the default and current behavior) and one that does not auto-update. That way you can always choose whether your transformation environment is auto-updating or not.

Increase In Configuration Errors

2021-05-10 13:25 CET

We have noticed a slight increase in job failures for some components today since this morning's release of the job runner. 
We are investigating the root cause of the issue and will update when more information becomes available.

2021-05-10 14:35 CET

All systems are back to fully operational status.  We are continuing to monitor for further instances of this error, and we are working on a preventative measures plan to reduce the impact of this type of incident in the future.  

Failing Facebook Ads extractor

Since 2021-04-28 01:00 UTC we are experiencing Facebook ads extractor failures on error "Please reduce the amount of data you're asking for, then retry your request"

It is caused by a bug in Facebook API that have been reported and currently being investigate by a Facebook backend team.

Link to the Facebook bug ticket: https://developers.facebook.com/support/bugs/503443564145524/

We continue to watch the Facebook bug ticket and will update here once we know more.

What can be done now

If you have access to the Facebook bug report you can raise importance/severity of it by leaving a comment there.

One possible workaround that might work is to retrieve data with smallest window possible that is adding .date_preset(yesterday) parameter to the query, e.g:

insights.action_attribution_windows(28d_click).action_breakdowns(action_type).level(adset).date_preset(yesterday).time_increment(1)

Post-mortem: MSSQL extractor errors

This is a post-mortem of the MSSQL extractor errors incident.

We found a root-cause, PHP sorting function is not guaranteed to be stable. It is fixed in PHP 8.0 (https://wiki.php.net/rfc/stable_sorting), but we used 7.4 in the extractor (which is also still supported).

We have learned that in older versions of PHP, a sort function can randomly swap elements with the same value if there are more than 16 values. As the error did not take effect with the lower number of items, our tests did not find it. 

We've fixed the bug and added a tests for sorting more than 16 items.

Increased API error rate in Azure North Europe

Since 2021-04-20 18:30 UTC we are experiencing increased error rate on all APIs in Azure North Europe. Our engineering team is working to identify the root cause. Next update in 1 hour.

UPDATE 08:45 UTC: We have restarted a faulty container and the situation seems to be stabilised. Next update in 1 hour. 

UPDATE 09:00 UTC: The increased error rate might have caused delays in job processing. 

UPDATE 09:50 UTC: The container does not show any further symptoms of the failure, all operations are back to normal.