Rare encryption errors affecting job execution across all Azure Stacks

We are currently experiencing very rare encryption errors across all Azure stacks, resulting in immediate job failures. These errors occur very infrequently, approximately at a rate of 1 in 50,000 jobs.

Affected jobs have either a very short runtime (approximately 1 second) or no recorded runtime. These jobs typically lack events other than the error message itself. The specific error messages observed include:

  • Internal error
  • Decryption failed: Deciphering failed.
  • Value "***" is not an encrypted value. 

These errors manifest in two distinct ways: transient and permanent.

Transient errors occur during the scheduling or initial starting phase of a job. Restarting the affected job resolves these transient issues.

Permanent errors occur during the saving of configurations or state, notably involving OAuth configurations when storing new refresh tokens after a job successfully completes. Subsequent job runs retrieve this improperly encrypted value, causing the job to fail with an application error. Unlike transient errors, restarting does not resolve permanent errors. To correct permanent errors, the configuration itself must be updated to remove or correct the incorrectly encrypted value.

We are proactively monitoring all occurrences of these errors. In the case of permanent failures, we are directly contacting affected customers to resolve the issue promptly.

We apologize for any inconvenience this may cause. Please do not hesitate to reach out to our support team if you have further questions or require assistance.

UPDATE 2025-04-23 

We have identified the root cause of these issues and will be carefully deploying a fix in the coming days. We continue to proactively monitor all occurrences of these errors. In the case of permanent failures, we are directly contacting affected customers to resolve the issue promptly.

Missing Log Events

Due to a bug introduced in the latest deployment, all Job Queue jobs started between 2025-04-15 10:30 UTC and 12:00 UTC are missing approximately 1/3 of their log events. The issue has been identified and resolved.

Only AWS and Azure stacks were affected — GCP was not impacted.

Audit logs were not affected and remain fully intact.

We sincerely apologize for the disruption this has caused.

Planned partial maintenance on Saturday, April 12, 2025 for all Azure stacks

The announced partial maintenance has just started. The platform has been scaled down and is no longer accepting new jobs. We expect a brief downtime in 30 minutes, around 12:00 UTC. We will update this post with the progress of the partial maintenance.

Update 12:00 UTC: The announced partial maintenance is ongoing, and we are expecting downtime to begin any minute now. We will continue to update this post as the maintenance progresses.

Update 12:10 UTC: The maintenance has been completed, and all services have been scaled back up. The platform is fully operational, and jobs are now being processed as usual. All delayed jobs will be processed shortly. Thank you for your patience.

Authorization issue with kds-team.ex-onedrive component on AWS and Azure Stacks

2025-04-06 07:21 UTC We are facing a problem with the authorization of the `kds-team.ex-onedrive` component. 

The problem manifests itself in not being able to authorize a new component or reauthorize an old one. The action ends with an authorization error message. The problem appears on our AWS and Azure stacks. GCP stacks seem to be without problem.

Update 2025-04-06 09:00 UTC We have determined the cause of the problem and are working on a fix. 

Update 2025-04-06 09:30 UTC - We have solved the problem and the component authorization should be functional again on all stacks. 

We apologize for the inconvenience.

Scheduled Partial Maintenance of all Azure stacks – April 12, 2025

We would like to inform you about the planned maintenance of all Keboola stacks hosted on Azure.

During the database upgrades there will be a short service outage on all Azure stacks, including all single-tenant stacks and Azure North Europe multi-tenant stack (connection.north-europe.azure.keboola.com). This will take place on Saturday, April 12, 2025 between 11:30 and 12:30 UTC.

Effects of the Maintenance

During the above period, services will be scaled down and the processing of jobs may be delayed. For a very brief period (at around 12:00 UTC) the service will be unavailable for up to 10 minutes and APIs may respond with a 500 error code. After that, all services will scale up and start processing all jobs. No running jobs, data apps, or workspaces will be affected. Delayed scheduled flows and queued jobs will resume after the maintenance is completed.

Detailed Schedule

  • 11:30–12:00 UTC: processing of new jobs stops.
  • 12:00–12:15 UTC: service enhancement period.
  • 12:15 UTC: processing of jobs resumes.

Azure North Europe stack - Billing API problems and job delays

We are investigating a brief outage of our Billing API on the Azure North Europe stack (https://connection.north-europe.azure.keboola.com/) between 23:03 and 23:19 UTC. Non–Pay-As-You-Go projects were not affected.

Update 23:35 UTC: The service disruption also caused job delays across all projects in this stack during the affected period. All operations have since returned to normal. We apologize for the inconvenience.


Planned partial maintenance on Saturday, March 15, 2025 for AWS multi-tenant stacks

The announced partial maintenance of connection.keboola.com and connection.eu-central-1.keboola.com will start in one hour at 08:00 UTC. We will update this post with the progress of the partial maintenance for each stack.

Update 07:00 UTC: The scheduled partial maintenance for connection.keboola.com  has begun.

Update 07:21 UTC: The maintenance of connection.keboola.com  has been completed, and all services have been scaled back up. The platform is fully operational, and jobs are now being processed as usual. All delayed jobs will be processed shortly. Thank you for your patience. Maintenance of connection.eu-central-1.keboola.com will start at 08:00 UTC.

Update 08:00 UTC: The scheduled partial maintenance for connection.eu-central-1.keboola.com  has begun.

Update 08:25 UTC: The maintenance of  connection.eu-central-1.keboola.com has been completed, and all services have been scaled back up. The platform is fully operational, and jobs are now being processed as usual. All delayed jobs will be processed shortly. Thank you for your patience. 

Degraded services GCP EU west3 stack

2025-03-13 23:46 CET – We are investigating a service degradation on the https://connection.europe-west3.gcp.keboola.com/ stack.

2025-03-14 00:01 CET – Services on the https://connection.europe-west3.gcp.keboola.com/ stack have resumed normal operations. Component events (level INFO) display only 50% of the logs.

2025-03-14 22:09 CET – To avoid database bottleneck during peak hours this night we have implemented events sampling. Component events (level info) will display only 50% of the logs. All storage events and component events with higher level will not be affected by this change. 

Degraded services GCP EU west3 stack

2025-03-12 06:00 UTC - we are investigation possible service degradation on https://connection.europe-west3.gcp.keboola.com/ stack

2025-03-12 07:15 UTC - we are still working on the issue, stack is operational, but jobs are being delayed

2025-03-12 08:15 UTC - All services are fully operational now however you may experience job delays due to limited worker capacity. We have identified a database bottleneck and are working towards getting to full capacity.

2025-03-12 23:40 UTC - To avoid database bottleneck during peak hours this night we have implemented events sampling. Component events (level info) will display only 50% of the logs. All storage events and component events with higher level will not be affected by this change. 

2025-03-13 09:15 UTC - Sampling of the component events has been removed and all events created from now on are displaying.