Docker updates

Quay Integration

(Read on if you build components for KBC, feel free to go on with your life if you don't):

Our technically brave customers and partners might have noticed an outage in DockerHub automated build system that affected our docker-based components on the 16th and 17th of November. We were not happy with Docker's (lack of) communication regarding this outage. To prevent from similar occurrences in the future (and to provide alternative) we have implemented integration with Quay as well. Quay provides the same functionality as DockerHub, offering nicer UI on top of it. In case a future DockerHub outage affects you in any way, you can switch your images to Quay and we can hot-swap component configuration.

Support for image tags

On November 9th DockerHub changed default tag of the latest automated image build from latest to master (branch name). When adding a Docker image as a component in Keboola Connection you can now choose which tag will be pulled from the repository.

Curious about Docker integration in Keboola Connection? You can build your own (data) apps and run them on our infrastructure, read more here.


Configuration encryption

To address security of passwords and other components that require stronger protection, KBC now allows to encrypt certain values in stored configurations. All attributes prefixed with a hashmark sign (#) are automatically encrypted during save. The key is derived from the used component and project and there are no means in any UI or API to decrypt the value. The original value is available only internally and only to the app during its runtime.

What does that mean? When you save your password as an encrypted attribute, even you cannot decrypt it. It becomes available only in the application and in the project it was encrypted and the values cannot be transferred to any other apps or projects. Your passwords are safe and cannot be retrieved even by user with admin rights to your KBC project.

We hope this makes you feel safer! :-)

Note to developers and tech partners: The encryption is completely transparent. You only need 2 simple things: 

  1. tell us that your component uses encryption
  2. prefix all encrypted attributes with # (eg. password => #password)

The infrastructure takes care of the rest. Your application will "see" the decrypted value.

Stopped Docker jobs

Due to a spike in AWS SPOT instances price our Docker workers were shut down around 12am UTC. This affects all jobs that are running on Docker components. We're working on fixing this issue and hope to resume all operations shortly. Thanks for your patience.

Update 04:30am UTC: All operations back to normal, all jobs should have resumed their execution. There was a minor failure with a Docker image for Generic Extractor, some of its jobs have failed with this error

User error: Container 'keboola/docker-generic-extractor:latest' failed: no such file or directory Error response from daemon: Cannot start container 08763383d5370bcdd6e1479da00ae369fe5d845c33485df5337239cc7bdd9c90: [8] System error: no such file or directory

This issue is now fixed and if you have encountered this error, please restart the job. 

Thanks for bearing with us and we're sorry for the inconvenience. 

Internal logging system failure

Our internal logging system was struck with a failure on some nodes in our infrastructure. This could lead to one of following

  • stalled jobs
  • failing jobs without any message
  • untraceable application errors

This outage lasted between 6:20am-3:00pm CEST (4:20am-1:00pm UTC / 9:20pm-7:00am PST).

We're sorry for any inconvenience, you can restart affected jobs, all systems are fully functional now.

Redshift writer

Redshift writer is the new addition to the growing list of writers. You can now write your data to any Redshift warehouse, or if your project includes Redshift, we can provision you with it's own database - that's useful when you need to perform read-only operations in an external application, such as chart.io.

Redshift query limits

We have introduced query limits for Redshift clusters to prevent deadlocks and keep the clusters in good shape:

  • 5 queries in parallel (further queries will be waiting in a queue for 60 minutes)
  • 60 minutes execution time per query

If the query time exceeds 60 minutes, it will be terminated with a user error "Query cancelled on user's request" and/or "An exception occurred while executing".

These limits will take place during the next maintenance window of each cluster (after it reboots). 

If you need to change the limits, contact us at support@keboola.com.

EDIT September 22nd: Due to a high number of requests for a higher number of concurrent queries we've increased the limit from 2 to 5 concurrent queries on each cluster.


Blocking SELECT queries in transformations

A new version of Transformation API was released today with a new feature - we're blocking all SELECT queries. 

These queries do not perform any real operation to your data (if not accompanied with CREATE TABLE or CREATE VIEW) and caused our servers to load the result in memory. If your transformation contains a pure SELECT query, it will fail with a Query not valid error message. 

The fix is easy - delete or comment the SELECT query, it won't effect your transformation.

Thanks for your understanding. 

Loading binary files in R transformations

If you want to load some data in your R transformation but it can't be stored in a table, you can now use Saved Files feature. This allows you to specify multiple tags and for each tag the engine will download the latest stored file in File uploads with this tag. If no file is found, the transformation will fail. The files are then stored in /data/in/user/{tag} and the manifest files in in /data/in/user/{tag}.manifest.

This comes extremely handy when you externally pre-generate binary data for a transformation (model, bucketing criteria). You just upload the file to the File upload and assign a certain tag, that you will then use in a transformation.