New S3 Extractor

This one took us a while, but we believe it's worth it. We carefully gathered feedback and made the most commonly used features accessible through a new streamlined UI. And there's even more under the hood.

The original AWS S3 extractor was renamed to Simple AWS S3. It stays fully supported and is not being deprecated. There's no need to migrate your configurations.

There are several major differences between the original and the new extractor. The new AWS S3 extractor

  • can download multiple files/tables using a single set of credentials.
  • fully supports incremental loads.
  • is more flexible.

The UI of the new extractor supports many features, but the extractor is not limited by its UI: it is the first component that openly supports processors. Opening the JSON editor (aka Power User Mode) opens up the configuration to endless possibilities. The extractor itself does only a simple job – downloads a set of files from S3. All other jobs (decompression, CSV fixing, setting the manifest file, etc.) are delegated to processors. You can order and configure the processors so that they handle the files as required. You can even develop your own processor in case you're missing something. We're fully aware that this is not an easy concept to grasp, but it's intended for advanced users. Not advanced? Use the UI.

The list of available processors will be kept and updated in the Developer Portal list of components. A full description of the extractor is available in our documentation.

One step closer to replacing legacy Restbox. The HTTP extractor will follow shortly. 

Week in Review -- March 5, 2018

Improvements

  • Schema in Snowflake Extractor is no longer a required connection parameter. If not set, the table selector allows you select tables from the whole database.
  • Snowflake extractor now supports importing these semi-structured data types: `VARIANT`, `OBJECT`, and `ARRAY` 

  • Updated two factor authentication in Keboola Developer Portal. The SMS authentication is now deprecated. All new users will have to use either Google Authenticator or Duo Mobile app.
  • MySQL extractor now has an option to enable compression of data sent over network
  • Enhanced Output Mapping selector
  • R in Sandbox and Transformations is updated to 3.4.3, also the Tidyverse package is now installed by default.

Bug fixes

  • Data Takeout was randomly failing on backing up your data to S3.
  • Task editor in Orchestrator produced errors when orchestration had configured dozens of tasks.
  • In Twitter extractor template, if a user made a mention of your account, the details of that user account weren't downloaded. Edit and save existing configuration to remedy this issue.

New Email Attachments Extractor

There’s a new version of Email attachments extractor (previously known as Pigeon extractor) you can use from the Keboola Connection’s Extractors tab. It serves for importing csv files to the Storage by sending them as attachments to a generated email address.

Email address for sending csv attachments is generated automatically and the new extractor has a fresh and simpler UI.

The old version is deprecated and will be discontinued on April 6. Please migrate to the new version in upcoming weeks. There is no automatic migration script because you need to generate new email addresses but the switch should be very easy.

Farewell to Custom Science

Yes, we are going to deprecate the Custom Science application. We introduced it more than two years ago as an alternative to components. Unlike components, it was easy to implement and use. However, we've made a lot of progress in simplifying component development.

The latest additions are a simplified component creation workflow, a component generator tool, and a rewritten developer documentation. See a 10 minute video (or this one for Gitlab) on how to create a Hello World component. All of this means that creating a component is much easier that it was two years ago and is definitely worth the effort. 

At the same moment, Custom Science (CS) is producing more and more problems, specifically:

  • We have no trace of what code was actually executed. That means when something breaks, we don't know if the code was changed in the meantime or not. When something was successful, we don't know for sure which version it was. We can't run a configuration with a previous version of the code.
  • There is a direct dependency on the git repository, and while Github and Bitbucket outages are neither common nor long, they do account for dozens of failed jobs (last year).
  • Risk of loss: If you lose access to the git repository, the jobs immediately fail. There is nothing we can do about it. No grace period. No way back. This can easily happen when people change positions or leave their company.
  • Dependency: Typically, there is only one person which can fix broken CS. If an issue arises, we don't know who the person is and can't contact them. Even if we do know the person, they might not respond. In the meantime, we have no way for a workaround (i.e. reverting to the last working state).
  • Poor security: If the repository is private, we need credentials to it. These should be dedicated robot credentials, but most people use their own. Plus, it's your code repository, so why should you give us credentials to it?
  • Poor performance: CS can easily spend 1-2 minutes on the warm up. If it is installing packages, then it is even more because they are being installed on every run.

We are fully aware that there are some disadvantages of converting every CS into a component. Specifically:

  • It takes several minutes before the updated code is deployed in KBC.
  • The initial setup takes several minutes of your work.

The first issue is not going to change any time soon (we will work on shortening the delay, but there will always be some delay). We tried to minimize the second issue – you can follow our migration guide, or see a 10 minute video of migration (done manually and using our tool) or see the new Component development tutorial.

Overall, CS is great for experimenting. The problem is that we are unable to draw the line between experimenting and production use. And CS in production usually causes countless problems. We are aware that creating components is not ideal for ad hoc stuff, and we're going to improve that too before the final demise of Custom Science which will be October 1, 2018.

Facebook and Instagram extractors failures

Some of the configurations of Facebook and Instagram extractors are failing during import to Storage. 

We are working on a fix and we'll update this status when the issue is resolved.


UPDATE 09:56 AM UTC - The issue was resolved. All Facebook and Instagram extractors configurations should be working again.

Week in Review -- February 19, 2018

New components

Bug fixes and smaller improvements

  • Bug fix in Currency extractor - exchange rates for Danish Krone (DKK) and Icelandic Krona (ISK) were not updated for some time because of a bug in its configuration.
  • Snowflake extractor now offers views in the tables list too.

Time Travel Restore

Snowflake has a wonderful feature that they call Time Travel.  It allows you to replicate your table from its state in the past.  We're happy to announce initial support for this great feature in Keboola Connection. 

To begin with, every project with a Snowflake backend has been set to retain data history for 7days. That means that you can restore a table to how it existed at any point within the last week.  It is possible to increase the data history retention period, so if you're interested in doing that please let us know by using the support button in your project


We've added this restoration method to the snapshots tab in the storage console:


Restoring a table is very simple, just use the calendar to pick the date and time, give the new table a name, and choose which bucket to put it in.


We plan on extending the use of this feature to be able to use time travel replicas as an input option for transformations and to create a "Storage Trash".  

Happy travelling!

Week in Review -- February 12, 2018

New Components

  • Google Trends extractor: this component, developed by Leo Chan (cleojanten@hotmail.com), allows to extract search trends for given keywords in a specified region.

Deprecations 

Indexed columns

With the deprecation/removal of the MySQL backend, we deprecated indexed columns because there is no more use for them. You can search/filter through any column now without the need to mark it as indexed.

The following attributes will be removed from manifest files by the end of March 2018:

  • indexed_columns – with the deprecation of the MySQL backend, there is no need to define indexes.
  • rows_count and data_size_bytes – these values are not (and never were) in sync with the input table data and are useless.
  • attributes – table attributes are replaced by table metadata.
  • is_alias – this is something that has nothing to do with the exported data.

Fixes

  • The Developer portal is now available under a new URL: components.keboola.com (instead of apps.keboola.com). The main reason is that we used the word application in two meanings, and that was confusing. For example, there were applications of type Extractor but also applications of type Application. From now on, everything is a Component. Components are of four types: Extractors (loading data from somewhere), Writers (writing data somewhere), Applications (manipulating data), and Processors (data processing helpers).

Week in Review -- January 30, 2018

Plantyst Extractor

To those who are collecting data from productions machines to Plantyst, you can employ new extractor made by BizzTreat and start doing complex data analysis.

Stories.BI writer

You can automatically push data to Stories.bi and get automatic insights instead of crunching business data by hand.


Updated Components

  • Sklik extractor has new variable accountID
  • YouTube extractor has new version. It is based on Generic Extractor. Old extractor will be deprecated on March 1, 2018
  • Snowflake extractor is now a bit faster and has better error handling
  • Geneea NLP App is now available in EU region
  • BingAds extractor is now available in EU region
  • Facebook extractor with new Page Tokens can newly fetch Page Reviews
  • Twitter extractor is now available in EU region
  • Snowflake and Redshift writers has fixed eventual columns mismatch.


Minor Improvements

  • Quick search in component list was improved - it has better accuracy
  • Component name can be finally submitted by pressing ENTER