On Sat, Feb 25, 04:05 AM UTC+1 one of our AWS RDS server was restarted. This lead to loss of connection between our lock server and job processes. The outcome may be following
- Orchestrations, that have all tasks successfully finished, but the orchestration itself failed
- Failed tasks within orchestrator; the task execution finished successfully, only the job status was not saved correctly
- Job was scheduled to execute again while it was still running, and especially in transformation jobs this lead to simultaneous execution in the same database/workspace
We will resume all orchestrations, that have failed tasks due to this outage (starting at the task following the failure).
We're sorry for this inconvenience, we're working towards mitigating this bug in the future.