Year 2017 in Review

Since most of us are enjoying the winter holiday, the usual Week in Review would be half-empty. It's perhaps time to review the year with the benefit of hindsight. A lot of stuff happened last year which is easily lost in the daily commotion, so let's have a little recap.

The big things

  • Keboola Connection in EU region — Although this might not look exactly revolutionary, it required immense amount of work to make it so our backend is not tied to a specific region any more. This of course opens doors to other regions, which are going to be much easier to do.
  • Developer Portal — This allows anyone to create and deploy applications in Keboola Connection. Hosting of Docker images, automated deployment and testing integration — all inclusive.
  • Big files support and other performance improvements — We all have big data, don't we? There were a number of limitations in several places which didn't allow processing of files larger than 2GB or 5GB. Though there are new limits now (around 100GB), it's much harder to hit them. A lot of other performance improvements were achieved, among those the recent up to 80% speedup in workspace load is worth bragging about.
  • Component Dockerization — This is a ton of completely invisible work which has no immediate impact on you. Once we are done with this, we'll be able to rework job processing stuff and offer super flexible projects (ever wanted long running jobs? or jobs requiring XXX GBs of RAM?). There is still a lot to do, but we're not dawdling.
  • Shared buckets — Although they are not perfect, they hugely simplify sharing things in larger organizations.
  • Processors — Quite a hidden nerdy feature which is slowly making it into production. This, together with Developer portal opens the door for simplified implementations (especially in extractors) of those "Oh, it's perfect, I just need to convert this one little thing ...".
  • Trash of deleted configurations — A lifesaver for many.
  • RStudio and Jupyter Sandboxes — Do they count as a big thing too?
  • New database extractors — The database extractors were greatly simplified.  You no longer need to remember your database schema, and can make your configuration in just a couple of clicks.
  • UI/UX improvements — No shocking things happened (yet?).  but we certainly put much more focus on this area throughout the year. There are a various new features (markdown descriptions, merging table slices, finer input mapping granularity). But we are also putting a lot of effort in unifying the look and feel of different Keboola Connection parts and smoothing the flow. Hopefully this makes Keboola Connection more pleasurable to work with.

Security improvements

We're keeping an eye over security all the time, and constantly improving. To name a few things:

  • Project Access Approval
  • New Google Sign-in
  • CSP (Content Security Policy) used in entire Keboola Connection
  • Better secured Keboola Connection cookies and sessions
  • Display of all active account sessions

New components

There were literally a ton of new and updated components. Ok, maybe not a ton, but definitely a lot. To name a few:

  • Facebook Extractor, Facebook Ads Extractor, Azor Extractor, Snowflake DB Extractor, BigQuery Extractor, Papertrail Extractor, Dropbox extractor, Pipedrive extractor, CJAffiliate extractor, Dark Sky Extractor
  • GeoIP application, What3words Application, Data Health Application
  • SAS Writer, Google BigQuery Writer, GoogleSheets Writer, Snowflake Database Writer, Qlik writer, Looker writer, Salesforce Analytics Cloud Writer
  • Dozens of other connections thanks to Generic Extractor.

What's next?

  • We are working on replacing the Storage Console as the most ancient part of Keboola Connection UI. This is big, so we'll replace it in parts, but it's already in the works.
  • New GoodData Writer is being made. We want to get rid of its "special" behavior and make it a standard writer. Some of the not-exactly-writery features will go into separate tools.
  • New Pigeon "extractor" is in progress. Again, this is an old component with quirky behavior so it has to be replaced with something reliable.
  • Also a new S3 extractor is waiting behind the door. This will be in many ways a conceptually different extractor taking advantage of processors and having configurations organized in a different way.
  • We're working on making it easier to learn Keboola Connection. It's not quite finished yet, but we definitely want to make life easier for new users.
  • Removing MySQL backend. It's been there a long time and it's sad to see it go. Actually, no it isn't. We'll throw a party when it's gone for good.
  • Database writers are going to get a larger update.

And beside that?

Life is a bitch, so we can't make promises, but:

  • Better managed wish list is something both you and we wish for.
  • RStudio and Jupyter Sandbox have many improvements in queue.
  • We have quite a few (almost revolutionary) ideas about Input mapping.
  • Shared credentials (need three configurations connecting to the same database server?) would make a lot of situations easier to handle.
  • Transformations are going to get simplified (no phases, no dependencies, just transformations).
  • We'd like to support even bigger data (no big is too big), so thanks everyone for pushing the limits.
  • Orchestrator needs quite a few updates and so does Generic Extractor

We hope we meet those and many other goals but that's all for now. 

We wish you all a very happy and successful New Year.

Slower job processing

We're experiencing slower Docker components jobs processing, many jobs stalled in waiting state. Finding the root cause, hopefully we'll be back online soon.

UPDATE 9:40 AM CET: All operations are back to normal, the stalled jobs were caused by a misbehaving Redshift cluster. We're going to investigate the root cause. 

We're very sorry for this inconvenience.


Week in Review -- November 30, 2017

Database extractors

After recent announcement of new database extractors we are bringing other improvements to these extractors.

  • By default the new bucket is created for each extractor configuration. Previously extractor configurations were sharing one bucket which led to collisions.
  • You can reload the list of tables fetched from database. It is useful when you are tuning credentials permissions or you have just added some tables to the database. 
  • Primary key is now validated against table created in storage. Warning is thrown if the configured key is different than the key defined on the table.

Generic extractor

The new ignoreErrors option was introduced. This option allows you to force Generic Extractor to ignore certain extraction errors.  Read more in documentation.

Fixes

  • Google Drive Writer in append mode was overwriting rows instead of appending new rows under some circumstances
  • Google Big Query extractor was ignoring files exported to multiple slices
  • Tableau Writer is now available in the EU region


New version of AdWords Extractor

We have just released a new version of AdWords Extractor. It works with AdWords API v201710 (see the Release notes) and you no longer need your own developer token to use it.

The previous version of the extractor is deprecated and you can use our migration tool which will migrate your AWQL queries. However, you have to reauthorize the extractor and give it access to your AdWords data again. The previous version uses AdWords API v201705 which will be switched off on 28 March 2018.

Week in Review -- November 21, 2017

New Features

  • There is a new extractor for what3words service (https://what3words.com/)

Improvements

  • Configuration of MySQL Extractor allows parameter database to be optional
  • Loading data to Storage Workspace allows renaming of columns
  • Configuration errors from DarkSky Extractor are properly marked as user errors
  • Unified incremental load settings for all database writers

Job Failures Tuesday, November 21, 2017

We have been experiencing temporary technical difficulties today around 9:15 AM CET and 9:20 AM CET.

Some component jobs may have failed as a result. We're investigating the issue and post an update when the root cause is found.

UPDATE 11:35 AM CET Jobs storage was temporarily unavailable for about two minutes. Jobs scheduling wasn't affected and running jobs were waiting until storage came up again so these jobs weren't affected. Unfortunately few orchestrations have failed, we'll do step to prevent this in the future.


Job Failures Saturday, November 18, 2017

We have been experiencing temporary technical difficulties today around 3:00 AM CET and 4:00 PM CET.

Some component jobs may have failed as a result.  We have identified and fixed the issue. All systems have returned to normal operations and all jobs are now being processed normally.

Google Analytics Extractor - Custom Authorization

Recently we have been hitting quota limits of the Google Analytics API with our extractor. As these limits are shared between all users of the extractor, we decided to add a new authorization option - Custom Authorization.

With Custom Authorization, you can provide your own API keys for the Google Analytics extractor, so you will have your own quota limits. This way you can monitor these limits as you need, and other users won't be affected.

Please check out our step-by-step guide on how to set up Custom Authorization in the Google Analytics Extactor.

In case of any problems or questions, please contact us on support@keboola.com.

Developer Portal

As part of our effort to make Keboola Connection an open platform we would like to announce the availability of the Keboola Developer Portal - https://apps.keboola.com/. Some of you may already be aware that it is possible to modify applications in KBC through the Developer Portal API. Now we are adding an application to the API so that you can get things done more easily.

The Keboola Developer Portal is a completely separate application from Keboola Connection. It is region-less because it has no access to customer projects. The Developer Portal provides an authoritative list of KBC applications for each region. Each Keboola Connection stack reads this list regularly. As usual in Keboola, the Developer Portal is done in an API first approach, so everything that the application does can be done programmatically too.

What does the Developer Portal do?

  • It allows users registered to vendors to create and modify applications in KBC without Keboola Tech support.
  • It provides service accounts for automating deployment and testing.
  • It provides Docker AWS ECR repositories for applications (thus avoiding Quay and Dockerhub issues).
  • It provides an authoritative list of publicly available approved applications.

Anyone can sign up for the Developer Portal. However until you join a vendor or create a new one, you will not be able to do anything. Both creating and joining a vendor must be approved. Joining a vendor must be approved by the vendor administrator, and creating a vendor must be approved by Keboola Tech. Once a member of a vendor, you can modify and add new applications to that vendor. There's no need to fill a checklist and go through our support. When you create an application, it becomes immediately available for you to use. Using the Developer Portal, you can also set up automated deployment for your application.

Keboola Tech still has some things to do:

  • approve new vendors
  • change some controlled properties of an application (e.g. memory limit)
  • approve a new application before it can become public

Next steps:

  • We still have a long list of things to be done on the Developer Portal, so we consider it as beta quality at the moment.
  • We are soon going to hold a developer meet-up in the Czech Republic where we’d like to hear about your experience developing applications for KBC (in January probably).
  • If you have previously used the Developer Portal API, we encourage you to try the application. If you are having trouble joining your vendor, please contact us on support (there may be some friction depending on how your account was created).
  • If you have created complex applications using Custom Science, go ahead and check the Developer Portal, it may now be more convenient to turn them into full applications.
  • If you have never created an application for Keboola Connection, we highly encourage you to try it - it's much easier now. In fact a component can be created from scratch in an hour.

 

Deprecating MySQL Storage and Transformations

Support for MySQL in Keboola Connection is coming to its end. Here's what will happen.

Effective immediately

  • New projects and projects without existing MySQL transformations will not be able to create new MySQL transformations.

MySQL Storage Backend (supported until January 2018)

  • The default storage backend for all projects is immediately switched from MySQL to Snowflake.
  • All MySQL buckets in all projects will be migrated to Snowflake in January 2018. This will not affect any operations, only a short maintenance on the project will be required.
  • No changes in the project are required.
  • You can apply for a sooner migration at support@keboola.com.

MySQL Transformations (supported until April 2018)

  • Your existing MySQL transformations will need to be migrated to Snowflake by the end of April 2018.
  • If you need any help migrating your MySQL transformations, contact support@keboola.com.

These steps will allow us to deprecate a piece of the legacy infrastructure and focus on the state of the art technologies. The Snowflake storage backend and transformations have significant performance and scaling benefits, so your projects will run faster than on MySQL without any extra charge.