GoodData writer internal errors

A new release of the GoodData writer published on September 24, 2019, 3:17 pm UTC contained a bug affecting certain jobs in all regions. The bug caused the jobs to finish with an internal error.

We rolled it back to a previous version on September 25, 5:57 am UTC and all jobs are working now. We apologize for the inconvenience.

Jupyter sandbox not starting

We have a problem with starting Jupyter sandboxes started at 12:58 UTC. 

We will provide an update when we'll have more information.

UPDATE 13:15 UTC - Problem is fixed. All services are running.

Week in Review — September 23, 2019

New Features

Julia sandboxes and transformations

We are happy to introduce a new Julia sandbox and Julia transformations. Both are in Beta preview for now; to try either of them, write a ticket to the support, and we will enable them for you. You can find more information about the Julia transformations in our documentation.

User roles

User roles are now available in all projects. We have added a new guest role that has only limited access to projects' resources. Learn more about roles in our documentation.

New Components

Brightloom Extractor (previously known as Eatsa) — downloads all your transactions and item details from the Brightloom POS system.


Job errors in the EU region [post-mortem]

Summary

On Sunday, September 15th at 01:32 UTC, orchestrator and other component jobs started failing in the EU region. In the following hours, our worker servers weren't able to handle the workload, and the job backlog started to increase. We manually resolved the incident, and the platform was in full operation with a clean backlog at 08:26 UTC.

What happened?

One of the MySQL instances was automatically restarted and patched on September 15th at 01:32 UTC.
The instance is required for the lock mechanism for job processing, and it also stores information about queues for the worker servers. The 2-minute downtime of the database instance caused a failure of the jobs that were running at the moment. Additionally, the running workers weren't able to fetch the information about the queues, and some of them gave up restarts and stopped. With only half of the processing capacity left, the workload could not be processed.

Once we discovered the incident, we replaced all our worker servers and added more capacity to clean up the backlog faster.

What are we doing about this?

We have implemented notifications about upcoming instance patches and are going to perform updates during scheduled and announced maintenance windows.

We are also working on a completely new job processing and scheduling mechanism that will prevent similar issues from occurring down the road. We sincerely apologize for the inconvenience caused.


Week in Review -- September 16, 2019

New Features, Improvements and Minor Fixes


New Components

Job errors in EU region

We are investigating job failures in EU region started at 1:32 UTC.

We will provide an update when we'll have more information. 

UPDATE 06:06 UTC - We have identified the issue and fixed the cause. Backlog is processing now.

UPDATE 07:54 UTC - There is still backlog of orchestration jobs. We have increased the processing capacity. It should be cleared in half an hour.

UPDATE 08:26 UTC - The backlog was cleared. All services are running.

We apologize for the inconvenience, we'll share more details in a post-mortem.

Week in Review -- September 9, 2019

New Features, Improvements and Minor Fixes

  • When checking events in the job detail, the Load More button tries to load up to 1,000 events at once.
  • You can search in component configurations when adding a new task to an orchestration phase.

  • Refreshing your token no longer breaks access to your project.
  • Remove Empty Files and Folders processor has a new option available to remove files with whitespace characters only: remove_files_with_whitespace


New Components