Randomly Failing Orchestrations

We're experiencing an AWS-wide DNS problem. Every 01 minute of each hour the DNS does not respond, which can cause failing attempts to connect between components in Keboola Connection. This will cause the running orchestration to fail.

It is reported all across AWS:

https://forums.aws.amazon.com/thread.jspa?threadID=161837&tstart=0

If your orchestration fails with an application error around 01 minute of an hour (it can propagate to the job a bit later, eg. at XX:02), you're most likely affected by this issue. Just re-running the orchestration will solve the problem. 

Please bear with us while we're trying to solve this issue with AWS Support. Here are some examples of failed orchestrations that might help you identify this issue.


Redshift Transformations: Input Mapping Enhancements

These options were turned off for VIEWs (input mappings from Redshift Storage to Redshift Transformation created as a VIEW):

  • Data Types (will inherit Storage data types in the future)
  • Sort Key
  • Dist Key

These options are still present for TABLE input mappings and for MySQL Storage or Transformation.

And on the other hand we turned on Days filter for all Input mapping (was previously turned off for all Redshift transformations). 

Partial server outage

One of Storage API backend servers was unavailable between 6:30am - 6:50am PST. This might have caused some orchestration failures, some UIs might not be responding (eg Orchestration UI, stuck jobs in the Writer UI).  Sorry for any inconvenience.

GoodData Maintenance Window

GoodData will be performing maintenance on our hardware on Saturday, September 13th around 1pm CET (4am PST). The expected length of the maintenance window is about 5 hours. Some of our client's GoodData projects will be inaccessible at that time. We apologize in advance for any inconvenience - the maintenance is necessary in order to prevent errors in near future. We will be monitoring all data load orchestrations and will restart them as needed.

Redshift Reboot

Due to peak load we had to reboot the main (shared) Redshift cluster. All transformation and sandboxes are stopped and purged, data stored on this cluster is temporarily unavailable. Will be back soon so you can resume your operations. 

We apologize for any inconvenience and kindly remind you, that you can have your own Redshift cluster. For more information get in touch with support@keboola.com.

Server outage

Part of our frontend infrastructure became inaccessible. This might have caused some orchestration failures, some UIs might not be responding (eg Orchestration UI, stuck jobs in the Writer UI). We're putting the server back online and will resume all failed orchestrations. Sorry for any inconvenience.

Update 3:30pm PST: The problem still persists.

Update 5:10pm PST: Problem resolved. All failed orchestrations were resumed. 

Update 11:30pm PST: We're experiencing network connectivity issues. Some orchestrations might end up with a CURL 52, 36 or 35 errors. Feel free to retry, while we're trying to get this resolved.

Update 10:00am PST: All problems resolved, everything running OK now. 

Update 5:00pm PST: We got back from AWS support: "Yesterday, we experienced an issue on our end that impacted some traffic going to any public IP address, where the source interface's MTU was above 1500. Since your instance most likely has an MTU of 9001 (Commonly default for large instances), it may have been impacted by this."