New Email Attachments Extractor

There’s a new version of Email attachments extractor (previously known as Pigeon extractor) you can use from the Keboola Connection’s Extractors tab. It serves for importing csv files to the Storage by sending them as attachments to a generated email address.

Email address for sending csv attachments is generated automatically and the new extractor has a fresh and simpler UI.

The old version is deprecated and will be discontinued on April 6. Please migrate to the new version in upcoming weeks. There is no automatic migration script because you need to generate new email addresses but the switch should be very easy.

Farewell to Custom Science

Yes, we are going to deprecate the Custom Science application. We introduced it more than two years ago as an alternative to components. Unlike components, it was easy to implement and use. However, we've made a lot of progress in simplifying component development.

The latest additions are a simplified component creation workflow, a component generator tool, and a rewritten developer documentation. See a 10 minute video (or this one for Gitlab) on how to create a Hello World component. All of this means that creating a component is much easier that it was two years ago and is definitely worth the effort. 

At the same moment, Custom Science (CS) is producing more and more problems, specifically:

  • We have no trace of what code was actually executed. That means when something breaks, we don't know if the code was changed in the meantime or not. When something was successful, we don't know for sure which version it was. We can't run a configuration with a previous version of the code.
  • There is a direct dependency on the git repository, and while Github and Bitbucket outages are neither common nor long, they do account for dozens of failed jobs (last year).
  • Risk of loss: If you lose access to the git repository, the jobs immediately fail. There is nothing we can do about it. No grace period. No way back. This can easily happen when people change positions or leave their company.
  • Dependency: Typically, there is only one person which can fix broken CS. If an issue arises, we don't know who the person is and can't contact them. Even if we do know the person, they might not respond. In the meantime, we have no way for a workaround (i.e. reverting to the last working state).
  • Poor security: If the repository is private, we need credentials to it. These should be dedicated robot credentials, but most people use their own. Plus, it's your code repository, so why should you give us credentials to it?
  • Poor performance: CS can easily spend 1-2 minutes on the warm up. If it is installing packages, then it is even more because they are being installed on every run.

We are fully aware that there are some disadvantages of converting every CS into a component. Specifically:

  • It takes several minutes before the updated code is deployed in KBC.
  • The initial setup takes several minutes of your work.

The first issue is not going to change any time soon (we will work on shortening the delay, but there will always be some delay). We tried to minimize the second issue – you can follow our migration guide, or see a 10 minute video of migration (done manually and using our tool) or see the new Component development tutorial.

Overall, CS is great for experimenting. The problem is that we are unable to draw the line between experimenting and production use. And CS in production usually causes countless problems. We are aware that creating components is not ideal for ad hoc stuff, and we're going to improve that too before the final demise of Custom Science which will be October 1, 2018.

Facebook and Instagram extractors failures

Some of the configurations of Facebook and Instagram extractors are failing during import to Storage. 

We are working on a fix and we'll update this status when the issue is resolved.


UPDATE 09:56 AM UTC - The issue was resolved. All Facebook and Instagram extractors configurations should be working again.

Time Travel Restore

Snowflake has a wonderful feature that they call Time Travel.  It allows you to replicate your table from its state in the past.  We're happy to announce initial support for this great feature in Keboola Connection. 

To begin with, every project with a Snowflake backend has been set to retain data history for 7days. That means that you can restore a table to how it existed at any point within the last week.  It is possible to increase the data history retention period, so if you're interested in doing that please let us know by using the support button in your project


We've added this restoration method to the snapshots tab in the storage console:


Restoring a table is very simple, just use the calendar to pick the date and time, give the new table a name, and choose which bucket to put it in.


We plan on extending the use of this feature to be able to use time travel replicas as an input option for transformations and to create a "Storage Trash".  

Happy travelling!

Week in Review -- February 12, 2018

New Components

  • Google Trends extractor: this component, developed by Leo Chan (cleojanten@hotmail.com), allows to extract search trends for given keywords in a specified region.

Deprecations 

Indexed columns

With the deprecation/removal of the MySQL backend, we deprecated indexed columns because there is no more use for them. You can search/filter through any column now without the need to mark it as indexed.

The following attributes will be removed from manifest files by the end of March 2018:

  • indexed_columns – with the deprecation of the MySQL backend, there is no need to define indexes.
  • rows_count and data_size_bytes – these values are not (and never were) in sync with the input table data and are useless.
  • attributes – table attributes are replaced by table metadata.
  • is_alias – this is something that has nothing to do with the exported data.

Fixes

  • The Developer portal is now available under a new URL: components.keboola.com (instead of apps.keboola.com). The main reason is that we used the word application in two meanings, and that was confusing. For example, there were applications of type Extractor but also applications of type Application. From now on, everything is a Component. Components are of four types: Extractors (loading data from somewhere), Writers (writing data somewhere), Applications (manipulating data), and Processors (data processing helpers).

New UI section for API Tokens

We are glad to introduce a new UI for Storage Api Tokens that can be now found under the Users & Settings section. We will be removing the old one found under Storage section. The new UI covers the same functionality as the old one.

As a security measure, the token itself will not be shown anymore, only once after its creation. The only way to see an existing token in the UI is to send it via email (temporary link to token is sent) or refresh it and get a new token string. On the backend, the token can still be seen in the response from the tokens list api call but will be removed in the near future.


Failed Jobs

Today (8.1.) at 20:05 - 21:56 UTC a number of jobs failed with internal or encryption error. This was caused by a bug affecting OAuth configurations. We have reverted the internal release. We do apologize for this enormous mess up.


Year 2017 in Review

Since most of us are enjoying the winter holiday, the usual Week in Review would be half-empty. It's perhaps time to review the year with the benefit of hindsight. A lot of stuff happened last year which is easily lost in the daily commotion, so let's have a little recap.

The big things

  • Keboola Connection in EU region — Although this might not look exactly revolutionary, it required immense amount of work to make it so our backend is not tied to a specific region any more. This of course opens doors to other regions, which are going to be much easier to do.
  • Developer Portal — This allows anyone to create and deploy applications in Keboola Connection. Hosting of Docker images, automated deployment and testing integration — all inclusive.
  • Big files support and other performance improvements — We all have big data, don't we? There were a number of limitations in several places which didn't allow processing of files larger than 2GB or 5GB. Though there are new limits now (around 100GB), it's much harder to hit them. A lot of other performance improvements were achieved, among those the recent up to 80% speedup in workspace load is worth bragging about.
  • Component Dockerization — This is a ton of completely invisible work which has no immediate impact on you. Once we are done with this, we'll be able to rework job processing stuff and offer super flexible projects (ever wanted long running jobs? or jobs requiring XXX GBs of RAM?). There is still a lot to do, but we're not dawdling.
  • Shared buckets — Although they are not perfect, they hugely simplify sharing things in larger organizations.
  • Processors — Quite a hidden nerdy feature which is slowly making it into production. This, together with Developer portal opens the door for simplified implementations (especially in extractors) of those "Oh, it's perfect, I just need to convert this one little thing ...".
  • Trash of deleted configurations — A lifesaver for many.
  • RStudio and Jupyter Sandboxes — Do they count as a big thing too?
  • New database extractors — The database extractors were greatly simplified.  You no longer need to remember your database schema, and can make your configuration in just a couple of clicks.
  • UI/UX improvements — No shocking things happened (yet?).  but we certainly put much more focus on this area throughout the year. There are a various new features (markdown descriptions, merging table slices, finer input mapping granularity). But we are also putting a lot of effort in unifying the look and feel of different Keboola Connection parts and smoothing the flow. Hopefully this makes Keboola Connection more pleasurable to work with.

Security improvements

We're keeping an eye over security all the time, and constantly improving. To name a few things:

  • Project Access Approval
  • New Google Sign-in
  • CSP (Content Security Policy) used in entire Keboola Connection
  • Better secured Keboola Connection cookies and sessions
  • Display of all active account sessions

New components

There were literally a ton of new and updated components. Ok, maybe not a ton, but definitely a lot. To name a few:

  • Facebook Extractor, Facebook Ads Extractor, Azor Extractor, Snowflake DB Extractor, BigQuery Extractor, Papertrail Extractor, Dropbox extractor, Pipedrive extractor, CJAffiliate extractor, Dark Sky Extractor
  • GeoIP application, What3words Application, Data Health Application
  • SAS Writer, Google BigQuery Writer, GoogleSheets Writer, Snowflake Database Writer, Qlik writer, Looker writer, Salesforce Analytics Cloud Writer
  • Dozens of other connections thanks to Generic Extractor.

What's next?

  • We are working on replacing the Storage Console as the most ancient part of Keboola Connection UI. This is big, so we'll replace it in parts, but it's already in the works.
  • New GoodData Writer is being made. We want to get rid of its "special" behavior and make it a standard writer. Some of the not-exactly-writery features will go into separate tools.
  • New Pigeon "extractor" is in progress. Again, this is an old component with quirky behavior so it has to be replaced with something reliable.
  • Also a new S3 extractor is waiting behind the door. This will be in many ways a conceptually different extractor taking advantage of processors and having configurations organized in a different way.
  • We're working on making it easier to learn Keboola Connection. It's not quite finished yet, but we definitely want to make life easier for new users.
  • Removing MySQL backend. It's been there a long time and it's sad to see it go. Actually, no it isn't. We'll throw a party when it's gone for good.
  • Database writers are going to get a larger update.

And beside that?

Life is a bitch, so we can't make promises, but:

  • Better managed wish list is something both you and we wish for.
  • RStudio and Jupyter Sandbox have many improvements in queue.
  • We have quite a few (almost revolutionary) ideas about Input mapping.
  • Shared credentials (need three configurations connecting to the same database server?) would make a lot of situations easier to handle.
  • Transformations are going to get simplified (no phases, no dependencies, just transformations).
  • We'd like to support even bigger data (no big is too big), so thanks everyone for pushing the limits.
  • Orchestrator needs quite a few updates and so does Generic Extractor

We hope we meet those and many other goals but that's all for now. 

We wish you all a very happy and successful New Year.

Job Failures Saturday, November 18, 2017

We have been experiencing temporary technical difficulties today around 3:00 AM CET and 4:00 PM CET.

Some component jobs may have failed as a result.  We have identified and fixed the issue. All systems have returned to normal operations and all jobs are now being processed normally.