Keboola Status

Azure Stacks: Workspace, Data Apps and Jobs Startup Failures

14:00 UTC: All Azure stacks are currently experiencing issues starting Python Workspaces, Data Apps and Jobs. The root cause appears to be a failure to pull the required Docker images.

On startup you may see a long start which ends with an internal error. We apologize for the situation.

UPDATE 14:52 UTC: We’ve applied a fix and all systems are now operating normally. We apologize for the disruption.

Degraded performance of Azure Storage accounts in all Azure West Europe stacks

Since May 13, 20:00 UTC, we have been seeing intermittent delays when performing service management operations on Azure Storage accounts hosted in the West Europe region. Storage availability and data-processing workflows remain fully operational; however, you may notice jobs delays.

Azure’s latest update (not publicly available):

Current Status: Our monitoring shows that our mitigation strategy has worked and less than 5 % of the traffic is impacted at this stage; most customer impact should be mitigated at this stage. We continue to monitor our infrastructure and expect to see the delays decrease over the next few hours.

We will update this post as new information becomes available. If you have any questions or concerns, please reach out to our support team.

Update May 16, 08:00 UTC: Azure's latest update (not publicly available):

Service restored, and customer impact mitigated.

We can confirm the issue is resolved with our findings.

We are sorry for this inconvenience.

Failing jobs of Legacy transformations

We are currently experiencing job failures related to the keboola.legacy-transformation component.

Our team is actively working on reverting to the previous stable release to resolve the issue.

We sincerely apologize for any inconvenience this may have caused and appreciate your patience.

Update 10:56 UTC: The issue has been fixed by reverting to previous release.

UI Rendering Issue in Transformations (fixed)

We’ve identified an issue in the Keboola UI where the list of tables in the transformation may render incorrectly after a table has been deleted from the list. This can result in visually corrupted table rows (e.g., mixed or misaligned entries).

This is a UI-only issue — data processing is not affected. Reloading the page restores the table view correctly.

We’ve successfully reproduced the problem and are working on a fix.

Update 13:50 UTC: the issue has been fixed.

Scheduled Partial Maintenance of all GCP stacks – May 24, 2025

We would like to inform you about the planned maintenance of all Keboola stacks hosted on GCP.

During the database upgrades there will be a short service outage on all GCP stacks, including all single-tenant stacks and GCP US and EU multi-tenant stacks (connection.us-east4.gcp.keboola.com, connection.europe-west3.gcp.keboola.com). This will take place on Saturday, May 24, 2025 between 05:30 and 06:30 UTC.

Effects of the Maintenance

During the above period, services will be scaled down and the processing of jobs may be delayed. For a very brief period (at around 06:00 UTC) the service will be unavailable for up to 10 minutes and APIs may respond with a 500 error code. After that, all services will scale up and start processing all jobs. No running jobs, data apps, or workspaces will be affected. Delayed scheduled flows and queued jobs will resume after the maintenance is completed.

Detailed Schedule

05:30–06:00 UTC: processing of new jobs stops.
06:00–06:15 UTC: service enhancement period.
06:15 UTC: processing of jobs resumes.

Missing telemetry data and billing data across all stacks [resolved]

We are currently investigating an issue involving missing telemetry data that appears to affect all Azure stacks. The issue began on April 25, 2025, at approximately 08:30 UTC.

We will continue to update this article with additional information as our investigation progresses.

UPDATE 13:00 UTC

Additionally, since April 23, 2025, at approximately 08:00 UTC, we have identified missing billing data for Workspaces, DWH Direct Query, Data Streams, and Data Apps across all stacks (not limited to Azure).

All missing telemetry and billing data will be backfilled within the next few hours.

UPDATE [April 29, 2025, 05:38 UTC]

The incident has been resolved. All missing telemetry and billing data across all stacks have now been successfully backfilled. No further impact is expected, and all systems are operating normally.

We apologize for the inconvenience during the incident.

Schedule set using UI orchestrator/flow configuration results into wrong crontab expression

We have identified an issue where configuring schedules via the UI orchestrator/flow leads to an off-by-one-day error. For example setting an execution for Sunday would incorrectly represent it as Saturday. See included media how to check for the inconsistency of a date.

Incident Timeline:

Start: April 25, 2025 11:42 AM UTC
End: April 26, 2025 10:10 AM UTC

Impact:
Schedules created through the UI during this timeframe have been translated into wrong crontab expressions, causing misaligned execution days.

Action Required:
If you configured any schedules via the UI, please review and correct them manually to ensure they are aligned with the intended execution days.

We apologize for the inconvenience

Rare encryption errors affecting job execution across all Azure Stacks

We are currently experiencing very rare encryption errors across all Azure stacks, resulting in immediate job failures. These errors occur very infrequently, approximately at a rate of 1 in 50,000 jobs.

Affected jobs have either a very short runtime (approximately 1 second) or no recorded runtime. These jobs typically lack events other than the error message itself. The specific error messages observed include:

Internal error
Decryption failed: Deciphering failed.
Value "***" is not an encrypted value.

These errors manifest in two distinct ways: transient and permanent.

Transient errors occur during the scheduling or initial starting phase of a job. Restarting the affected job resolves these transient issues.

Permanent errors occur during the saving of configurations or state, notably involving OAuth configurations when storing new refresh tokens after a job successfully completes. Subsequent job runs retrieve this improperly encrypted value, causing the job to fail with an application error. Unlike transient errors, restarting does not resolve permanent errors. To correct permanent errors, the configuration itself must be updated to remove or correct the incorrectly encrypted value.

We are proactively monitoring all occurrences of these errors. In the case of permanent failures, we are directly contacting affected customers to resolve the issue promptly.

We apologize for any inconvenience this may cause. Please do not hesitate to reach out to our support team if you have further questions or require assistance.

UPDATE 2025-04-23

We have identified the root cause of these issues and will be carefully deploying a fix in the coming days. We continue to proactively monitor all occurrences of these errors. In the case of permanent failures, we are directly contacting affected customers to resolve the issue promptly.

Missing Log Events

Due to a bug introduced in the latest deployment, all Job Queue jobs started between 2025-04-15 10:30 UTC and 12:00 UTC are missing approximately 1/3 of their log events. The issue has been identified and resolved.

Only AWS and Azure stacks were affected — GCP was not impacted.

Audit logs were not affected and remain fully intact.

We sincerely apologize for the disruption this has caused.

Planned partial maintenance on Saturday, April 12, 2025 for all Azure stacks

The announced partial maintenance has just started. The platform has been scaled down and is no longer accepting new jobs. We expect a brief downtime in 30 minutes, around 12:00 UTC. We will update this post with the progress of the partial maintenance.

Update 12:00 UTC: The announced partial maintenance is ongoing, and we are expecting downtime to begin any minute now. We will continue to update this post as the maintenance progresses.

Update 12:10 UTC: The maintenance has been completed, and all services have been scaled back up. The platform is fully operational, and jobs are now being processed as usual. All delayed jobs will be processed shortly. Thank you for your patience.