Week in Review -- April 30, 2018

Core

  • Improved generated configuration changes descriptions
  • Added configuration version to jobs results of Docker-based components (it is not yet available for legacy components like transformation and gooddata-writer)
  • Refreshed Manage API docs with working examples
  • Fixed loading of large tables for R-studio and Jupyter sandboxes
  • Fixed random CSV Import upload errors in EU region

Components

  • Improved "show details" experience for input and output mappings
  • Added visibility of columns non-existing in Storage to writers
  • Increased query timeout for all Keboola Provisioned Snowflake writers from 15 seconds to 15 minutes
  • Added support of unconventional column names to MySQL extractor
  • Removed static state from MongoDB extractor

Processors

  • Added support of snappy format to processor-decompress
  • Added processor filter-files
  • Added support for sanitization of invalid utf-8 in processor-iconv

Developers

New Debug API call is available (replaces very rarely used sandbox, dry-run and input-data calls). It creates a snapshot of the data directory used for running the component and stores it in your KBC project. To learn more, feel free to go through the API Docs or through the tutorial. In short the API call:

  • uses the same calling convention as the Run API,
  • filters encrypted values from the data directory,
  • works with all components (previously only those without encryption were supported),
  • works with Processors,
  • works with Configuration Rows,
  • works also with broken components and configurations (even if the run fails, you'll still get a snapshot of the data directory).

Python transformations

Pip version 10 was released recently which removes the pip.main method (more reading). The recommended way to install packages from within python is:

import subprocess
import sys
subprocess.call([sys.executable, '-m', 'pip', 'install', '--disable-pip-version-check', 'PACKAGE_NAME'])

Currently there are 70 transformations using the removed pip methods. If your projects are using them we'll be contacting you with a list of affected transformations. This breaking change introduced in pip is currently blocking us from upgrading python to 3.6.5 where pip 10 is used by default.

Orchestration Notification Updates Resulted in Deleted Tasks

There was an update to the orchestrator this week that had an unfortunate side-effect.  If you updated your orchestrations' notifications it would delete the orchestration's tasks.

Thankfully, the orchestrations are versioned, so if this happened to you, we will restore the tasks from the last version.
If you have any concerns about this please contact us at support@keboola.com.  

For what it's worth, updating notifications will no longer delete orchestration tasks, please accept our humble apologies if you were affected.

Introducing Guide Mode

We are happy to announce the immediate availability of Guide Mode. In Guide mode, the Keboola Connection user interface will switch to an interactive tutorial which will guide you through the basics of using Keboola Connection. 

Guide mode is designed for new users and works best on empty projects. Therefore, when you invite a new person to Keboola Connection, they will receive a special link in their invitation email:

The link leads to the try.keboola.com page. By following the link, they will receive a 15day demo project with the Guide mode activated. 

The Guide Mode is the very first step in creating a replacement the old Academy. We are gradually going to fill it with more advanced content, but in the mean time try it out and let us know what you think.


New version of AdWords Extractor

We have just released a new version of AdWords Extractor. It works with AdWords API v201802 (see the Release notes).

The previous version of the extractor is deprecated and you can use our migration tool which will migrate your AWQL queries. However, you have to reauthorize the extractor and give it access to your AdWords data again. The previous version uses AdWords API v201710 which will be switched off on 11 July 2018.

Week in Review -- March 19, 2018

New Components

Asana Extractor

We’re happy to welcome the Asana Extractor to our family. It can extract your projects and tasks from the Asana application which is designed to help teams track their work. This component was developed by Leo Chan.

Thoughtspot Writer

We're likewise delighted to announce a new writer to Thoughtspot that is now available for public use. 
Thoughtspot is a "search and AI-driven analytics platform".

DynamoDB Extractor

We also released a beta version of the DynamoDB extractor. It does not have any UI yet, and has to be configured via JSON. If you are feeling adventurous, please give it a try and let us know how it goes.

Marketing Miner Extractor

Lastly, but in no way least, we have a new extractor for Marketing Miner that allows you to fetch your project rank tracking data from Marketing Miner. 

New Features

  • The project API Tokens section now shows when a token was refreshed: 

Minor Improvements

  • We've modified the storage job polling to reduce component job run times.  The greatest speedups will be observable in small to medium sized data loads.  
  • Artificial limits were removed from CSV file import. Previously the upload had to go through in 10 minutes. Now it's left to the decission of your web browser. Please note that it still holds that large files should be uploaded through the API.

  • Further improvements to Output mapping. The destination bucket is now prefilled from the transformation name.

Fixes

  • The MSSQL extractor was updated to correctly handle databases with case-sensitive collations.

  • The Email Attachments extractor now supports incremental and addresses in angle brackets, ex: `Joe <email@example.com>`

  • Developer portal vendors can now approve requests to join via the request email.



New Email Attachments Extractor

There’s a new version of Email attachments extractor (previously known as Pigeon extractor) you can use from the Keboola Connection’s Extractors tab. It serves for importing csv files to the Storage by sending them as attachments to a generated email address.

Email address for sending csv attachments is generated automatically and the new extractor has a fresh and simpler UI.

The old version is deprecated and will be discontinued on April 6. Please migrate to the new version in upcoming weeks. There is no automatic migration script because you need to generate new email addresses but the switch should be very easy.

Farewell to Custom Science

Yes, we are going to deprecate the Custom Science application. We introduced it more than two years ago as an alternative to components. Unlike components, it was easy to implement and use. However, we've made a lot of progress in simplifying component development.

The latest additions are a simplified component creation workflow, a component generator tool, and a rewritten developer documentation. See a 10 minute video (or this one for Gitlab) on how to create a Hello World component. All of this means that creating a component is much easier that it was two years ago and is definitely worth the effort. 

At the same moment, Custom Science (CS) is producing more and more problems, specifically:

  • We have no trace of what code was actually executed. That means when something breaks, we don't know if the code was changed in the meantime or not. When something was successful, we don't know for sure which version it was. We can't run a configuration with a previous version of the code.
  • There is a direct dependency on the git repository, and while Github and Bitbucket outages are neither common nor long, they do account for dozens of failed jobs (last year).
  • Risk of loss: If you lose access to the git repository, the jobs immediately fail. There is nothing we can do about it. No grace period. No way back. This can easily happen when people change positions or leave their company.
  • Dependency: Typically, there is only one person which can fix broken CS. If an issue arises, we don't know who the person is and can't contact them. Even if we do know the person, they might not respond. In the meantime, we have no way for a workaround (i.e. reverting to the last working state).
  • Poor security: If the repository is private, we need credentials to it. These should be dedicated robot credentials, but most people use their own. Plus, it's your code repository, so why should you give us credentials to it?
  • Poor performance: CS can easily spend 1-2 minutes on the warm up. If it is installing packages, then it is even more because they are being installed on every run.

We are fully aware that there are some disadvantages of converting every CS into a component. Specifically:

  • It takes several minutes before the updated code is deployed in KBC.
  • The initial setup takes several minutes of your work.

The first issue is not going to change any time soon (we will work on shortening the delay, but there will always be some delay). We tried to minimize the second issue – you can follow our migration guide, or see a 10 minute video of migration (done manually and using our tool) or see the new Component development tutorial.

Overall, CS is great for experimenting. The problem is that we are unable to draw the line between experimenting and production use. And CS in production usually causes countless problems. We are aware that creating components is not ideal for ad hoc stuff, and we're going to improve that too before the final demise of Custom Science which will be October 1, 2018.

Facebook and Instagram extractors failures

Some of the configurations of Facebook and Instagram extractors are failing during import to Storage. 

We are working on a fix and we'll update this status when the issue is resolved.


UPDATE 09:56 AM UTC - The issue was resolved. All Facebook and Instagram extractors configurations should be working again.

Time Travel Restore

Snowflake has a wonderful feature that they call Time Travel.  It allows you to replicate your table from its state in the past.  We're happy to announce initial support for this great feature in Keboola Connection. 

To begin with, every project with a Snowflake backend has been set to retain data history for 7days. That means that you can restore a table to how it existed at any point within the last week.  It is possible to increase the data history retention period, so if you're interested in doing that please let us know by using the support button in your project


We've added this restoration method to the snapshots tab in the storage console:


Restoring a table is very simple, just use the calendar to pick the date and time, give the new table a name, and choose which bucket to put it in.


We plan on extending the use of this feature to be able to use time travel replicas as an input option for transformations and to create a "Storage Trash".  

Happy travelling!