Week in Review — February 19, 2020

New Components

  • Mailgun v2 — Mailgun is an email automation service. It offers a complete cloud-based email service for sending, receiving and tracking email sent through your websites and applications.
  • Drive CX — Downloads Drive CX data about locations, purchase details, employees, and surveys.
  • AzureML Model Deployment — Allows you to deploy your trained model to Azure Container Instances and query it via an API.
  • MS SharePoint Lists writer — SharePoint empowers teamwork with dynamic and productive team sites. A list in SharePoint is a collection of data that gives you and your co-workers a flexible way to organize information.

New Features

  • Orchestrator supports setting the timezone over the API (support in Keboola Connection's UI will follow soon).
  • MongoDB extractor supports the SRV protocol for specifying connection, the so called seed list.
  • Snowflake Data Warehouse Manager now creates schemas with "MANAGED ACCESS" and grants "ON FUTURE" to all objects of the schema. This means that recreating tables (drop and load) doesn't require re-granting for all roles/users previously granted.
  • ThoughtSpot writer now checks if a database and a schema exist.

UI Improvements

  • When creating new Input mapping in a Snowflake transformation, the load type Clone is used if the source table is cloneable and bigger than 100 megabytes.
  • JSON editor now shows all fields, not only the required ones.
  • Transformation detail page warns about unsaved queries/scripts if you attempt to run a transformation or leave/close a page.

  • Orchestration detail page supports the load of older orchestration jobs.

  • In Transformation, you can see the basic Input Mapping setting without the opening detailed modal.

  • Create buttons (table, column) in the Storage explorer were moved to the top right corner.

  • Transformation Output Mapping modal is now more compact.

  • Organization selector shows the Maintainer name ("KBC Internal" in the example below).

  

Minor Improvements

  • Python environment is 3.8.1 and includes Java and H2O.
  • Jupyter notebooks now save notebook files to Keboola Storage — both on a manual and auto save.
  • Input mappings that are using CSV files(e.g. python/r transformation components) now have a global limit of 100 GB per table.
  • GoodData extractor uses more verbose logging.
  • Most emails from Keboola Connection have a new design.

Broken Loads from 2020-01-28 to 2020-01-29 [post-mortem]

Summary

On 2020-01-28 09:00 UTC, we deployed a version of Keboola Connection containing a bug. It resulted in loads from transformations to storage were missing our internal _timestamp value. This issue was hard to detect and persisted till 2020-01-29 08:00 UTC. Backfill was applied and all missing _timestamp fields were set to value 2020-01-29 00:00:00 UTC at 2020-01-31 16:30 UTC.  The effect of the tables not having the _timestamp set was that jobs which used this table for incremental loading had no reference for the newest data.

What Happened?

There was an error in our upgrade of the library responsible for loads. An incorrect parameter set resulted in timestamps not being set during load. Such a scenario was not covered by our tests, and this situation was not caught during our peer review process. We immediately deployed the previous functioning version of Keboola Connection as soon as the problem was identified. That itself took about 15 minutes. This was an issue that affected some customers' data so backfill was carefully discussed and tested. Unfortunately, we were also impacted with an issue in the 3rd party build system we use which prevented us from performing the backfill of the missing timestamps on 30th January. Finally, between 2020-01-31 09:30 UTC and 2020-01-31 16:30 UTC all impacted project were back filled.

Timetable

  •  2020-01-28 09:00 Version containing a bug deployed
  •  2020-01-29 08:00 Rollback
  •  2020-01-29 Investigation of issue, impact assessment
  •  2020-01-30 Testing of backfill
  •  2020-01-31 09:30 Start data backfill
  •  2020-01-31 16:30 Data backfill done

What Are We Doing About It?

We're extending the software tests to include more scenarios including test of _timestamp presence on all types of load. We're also working on improving our public incident response to post more frequent updates. 

Original status of issue: https://status.keboola.com/investigating-problems-with-incremental-lods

If your data was affected with this issue, our backfill is not enough for your specific case and you are not in contact with our support yet, feel free to get in touch. Our professional services team will provide all necessary help. 

Week in Review — January 30, 2020

New Features and UI Improvements

  • There is a new ui picker that allows you to easily switch between projects and their organizations.

  • Detail of a transformation bucket was tweaked. Button for adding transformations is reachable better, phases and backends are better arranged and design of the list of transformations was united with similar lists in other components.

  • Sidebars across the whole application were united and polished.

  • Editing descriptions across the whole application was polished too.

  • Input mappings in a transformation detail can be reordered using drag & drop.

Updated Components

  • There is a new template for getting projects and tasks in detail within Asana Extractor.
  • Updated Typeform Extractor  to download data from Responses API instead of Data API

New Components

  • Microsoft Dynamics 365 Extractor and Writer by our KDS team allows downloading and sending data to this CRM tool.
  • Microsoft SharePoint Lists Extractor by KDS team allows downloading collections of data from SharePoint lists.
  • Azure Blob Storage Extractor and Writer by KDS team allows reading and writing data to Azure Storage similarly to our S3 Writer.
  • OneDrive Writer by Jakub Bartel which uploads files from Storage to OneDrive or SharePoint account.

Investigating problems with incremental loads

We are investigating problems in Storage which could lead to inappropriate processing of incremental loading and related functionality. We will keep you updated.

UPDATE Jan 29 9:45 CET - Problem is that incremental load from transformations is not setting _timestamp column. We had identified a release that was causing trouble and finished its revert. Unfortunately, already created table rows having the _timestamp with null values still won't be processed by incremental loading. You can fix this situation manually by running a full load. Now we are running tests to isolate the root cause and preparing global correction of its side effects.

UPDATE Jan 29 14:00 CET - The _timestamp values are written correctly for all newly added table rows since the morning release revert. So now the problem persists only with the rows added previously (ca during yesterday and today's night) which still contain empty timestamp values and so won't be processed e.g. by writers configured to incremental loads. (You can overcome it temporarily by setting them to full load.) Now we are working on backfilling missing timestamp values and so fixing the incremental functionality for these affected data.

UPDATE Jan 29 18:00 CET - The backfill of missing timestamps is being prepared and should be ready tomorrow.

UPDATE Jan 30 15:00 CET - The backfill mechanism is released. Now we are performing last checks before applying it to all affected data. 

UPDATE Jan 30 16:30 CET - Unfortunately, we are forced to leave the backfill for tomorrow. There is an incident in Travis CI which is slowing down our work the whole day and is still not resolved and we decided not to risk running it rashly during the night to be able to support it properly if anything goes wrong.

UPDATE Jan 31 10:30 CET - All preparations are done and preliminary test finished. We are just starting with the backfill.

UPDATE Jan 31 12:00 CET - The backfill is running and ca. 10 % of affected data is fixed.

UPDATE Jan 31 14:00 CET - The backfill is running and ca. 20 % of affected data is fixed.

UPDATE Jan 31 15:30 CET - The backfill is still running and ca. 25 % of affected data in US region and ca. 40 % in EU region are fixed.

UPDATE Jan 31 17:00 CET - The backfill is still running and ca. 70 % of affected data in US region and ca. 100 % in EU region are fixed.

UPDATE Jan 31 17:30 CET - The backfill is finally finished. All the affected data are fixed and have _timestamp column set to "2020-01-29 00:00:00". So that you should update your settings if you have incremental loads set to lower value then 3 days.

SQL Validation Outage

SQLdep API suddenly stopped working and because of that transformations validation and SQL analysis are not working properly. (You will probably get false negatives for all validation requests.)

We are investigating the issue and will let you know once we know more.

UPDATE: We've resolved the issue and the integration with the service is working again. It lied in a change of SQLdep API address which we had unfortunately missed.

Week in Review — January, 24th, 2020

Updated Components

Tableau TDE writer

  • will now retry when a request failed - this should greatly improve reliability as network glitch or temporary unavailability of remote server won't fail the job. Instead it will wait a little and retry the request. 

New Features

  • Every new Keboola Connection user will now get Guide mode, regardless if they were invited by current user or registered themselves. It will guide them hands-on though the basics of using Keboola Connection

Minor Improvements

  • In Transformation detail, Backend and Phase can be changed from the sidebar

Server errors in US region

There was increased amount of server errors in US region that resulted in issues in Keboola Connection UI. No jobs were affected. 

Jan 22 15:50 UTC - increased amount of server errors

Jan 22 16:08 UTC - the issue is mostly resolved

Jan 22 16:19 UTC - the issue is completely resolved

Jan 22 18:28 UTC
 - the post was amended, original version mistakenly mentioned EU region was affected while actually this was an issue in the US region.



Week in review — January 20, 2020

New Features and Updates

Incremental Fetching

The following extractors now support Incremental Fetching:

  • Snowflake Extractor
  • Oracle Extractor
  • Redshift Extractor
  • Firebird Extractor
  • Cloudera Impala Extractor 

Job Performance

When a job is finished, we show you how long it took. If a job takes 10% longer (or less) than the average, we'll show you the difference.

Transformation Buckets

The list of Transformation Buckets is no longer collapsible and looks like other lists of configurations in an application.

Bucket details in Data Catalog

In a source project (a project which shares a bucket and "owns" data), you are able to see the bucket detail.

  • You can see how the bucket is shared (to organization members, projects members, selected projects, etc.).
  • You can edit the description of the bucket — it will be available for those users who want to link the bucket to their project.
  • You can see a list of tables and bucket properties (size, row count, etc.).
  • Also, if someone uses the bucket, you can see which project it is (Used by section).

In a destination project (where you can see and use a bucket):

  • You can see the description of the bucket and decide if you want to use it.
  • You can see a list of tables and bucket properties (size, row count, etc.).
  • If you decide to use the bucket, you will also see the link to Storage.

Show tables searched by metadata in Writers

In case you use our new feature Scaffolds, related writers (e.g., the Snowflake database writer) will show you all input tables that are searched by metadata.

Environment updates

  • R updated to 3.6.2
  • Python updated to 3.7.5


New Components


VPN connection errors

We are investigating VPN connection issues for extractors and writers in both regions. 

UPDATE Jan 01, 10:03 AM (UTC) - Our VPN provider is investigating the issue

UPDATE Jan 01, 11:54 AM (UTC) - Our VPN provider has found the root cause and fixed the issue. All VPN connections should be working again.