Investigating incident (US region)

We're currently investigating an incident in the US region that may have caused component job and orchestration failures today, Tuesday, Sep 11, between 02:30 UTC and 06:30 UTC.

Update 08:30 UTC

The DB storing job locks (preventing jobs from running multiple times on multiple workers) was restarted at 02:33 UTC. All connections were terminated and all component jobs and transformations running at that time were disconnected. This could lead to any or both of the following situations:

  • Failure at any time later during the job execution
  • A new parallel job execution

Any jobs started after the DB restart were not affected by this issue.

We apologize for this inconvenience. We're planning an infrastructure change to prevent such huge impact during similar situations. 

If you have any further questions, please use the support button in your project.