Failing Jobs

Since 2018-12-19, 19:13:00 UTC we're experiencing a higher rate of application errors in all regions due to an outage in AWS ECR. 

We're investigating the issue and update this post once there.

We apologize for the inconvenience.

Update 19:30 UTC

This outage also affects the Developer Portal.

Update 20:05 UTC

As of 19:51 UTC the issue is resolved by Amazon

Stalled jobs in EU region

Around 1:30am CEST one of the job worker instances stopped processing assigned jobs. This could have lead to jobs being stuck in the processing state for a long time without any activity.

At 11:15am CEST the worker instance was terminated and all unfinished jobs started processing on other instances.

We're sorry for this inconvenience.

Scheduled US and EU Maintenance

There will be a maintenance period on Saturday, October 6th, 2018 from 8:00am CEST and should take less than 5 hours.

We will be upgrading component job indices and metadata databases.

All projects in and will be inaccessible during the maintenance.

Degraded Snowflake Performance (US region)

Since September 25 we're experiencing degraded Snowflake performance affecting all Snowflake operations. 

We're sorry for this inconvenience, we're working with Snowflake to fix this issue.

Update, October 8

Snowflake Engineering team has discovered and fixed the issue (waiting for an official statement from Snowflake). We're seeing operation times going back to normal.

Increased Error Rate in MySQL Extractor

We have encountered an increased error rate in MySQL Extractor after a new version release on Friday, Sept 14, 16:00 UTC.

These errors happened only when the job had a connection issue and had to reconnect to the MySQL server.

The previous version is now rolled back and we're working on a fix.

We're sorry for this inconvenience.

Update Sunday, Sept 16, 2018, 08:15 UTC 

The MySQL Extractor has been updated to resolve this issue. 

Snowflake Outage (US Region)

We have encountered an increased number of Snowflake connection errors between 04:11 UTC and 04:21 UTC. This may have caused failed storage and component jobs.

Furthermore Snowflake announced possible SQL query failures between 02:24 UTC and 03:30 UTC. 

We're sorry for this inconvenience, all systems are operational now. 

Investigating incident (US region)

We're currently investigating an incident in the US region that may have caused component job and orchestration failures today, Tuesday, Sep 11, between 02:30 UTC and 06:30 UTC.

Update 08:30 UTC

The DB storing job locks (preventing jobs from running multiple times on multiple workers) was restarted at 02:33 UTC. All connections were terminated and all component jobs and transformations running at that time were disconnected. This could lead to any or both of the following situations:

  • Failure at any time later during the job execution
  • A new parallel job execution

Any jobs started after the DB restart were not affected by this issue.

We apologize for this inconvenience. We're planning an infrastructure change to prevent such huge impact during similar situations. 

If you have any further questions, please use the support button in your project.