Week in Review -- October 9th, 2017

The longstanding issue with slow SHOW DATABASES query executions on Snowflake has finally been resolved. We will now gradually enforce snowflake query execution limits where applicable by contract. Before turning the limits back on, we will check the query history and notify you if you may be affected.
(Query times in milliseconds)

New Features

Minor Improvements

We continue with cleaning up the KBC UI. So far, we have fixed many little annoying issues:

  • List of configurations now shows when the configuration was created (with exact date on hover) 
  • Cleaned up Job list (removed unused stuff).
  • Backend in Transformations is now preselected to default project backend.
  • CSV importer now supports .TSV and .GZ files too.
  • "Test Credentials" button and result polished.
  • Removed 'sticky' buttons when editing configurations.

Fixes

  • Fixed files input mapping not showing in certain configurations.
  • Fixed a couple of errors with the Save button when editing transformation queries.
  • Fixed bug in Redshift writer with mixed case table names.

Developers

  • The encryption API is simplified to a single endpoint. If you manually encrypt values for KBC, please follow the docs. We will be contacting you if you are using the deprecated API calls.


New Generic Extractor Documentation

Long neglected, long awaited, it's here!

We are done with a major overhaul of our Generic Extractor documentation. It is now completely moved to developers.keboola.com. The new documentation contains over 100 runnable examples. We have also created a tutorial which guides you through how to configure the Generic Extractor for a new API.

We have also a configuration map with links to documentation of all parts of the configuration.

In case you haven't heard of the Generic Extractor, it is our universal REST API client. It reads data (in JSON format) from an API according to how you configure it, then converts the results to CSV files and imports them into KBC Storage. This way you can extract data from new APIs in tens of minutes. You can use the Generic Extractor to create a new extractor or perform ad-hoc extractions from an API. 

If you are interested, give a try to the Tutorial.

Week in Review -- March 6, 2017

Minor Improvements

  • We have upgraded the Google Sign-in to the latest most secure version.

Fixes

  • Dropping a empty primary key in Storage no longer causes an error.

  • Creating a Storage table with a missing S3 file now returns a proper error message.

  • Handle correctly a situation when a primary key is supplied as an array to a Storage table.


AWS S3 Outage Resolved

Amazon services are almost restored. Our infrastructure recovered too, everything should be running now. Though the waiting times are expected to be longer. 

We're sorry for this inconvenience and thanks a lot for your understanding and patience.

Call for testers: RStudio and Jupyter Sandboxes

We have been working on RStudio and Jupyter Notebook sandboxes (these are the missing sandboxes for R and Python transformations) for some time and we now feel that it is time to move to the next phase. 

We can now enable these on a per-user basis for early adopters. If you are interested in playing with these Sandboxes please contact us at support@keboola.com (don't forget to tell us your KBC email).

Limitations:

  • Sandboxes are available only as plain sandboxes (not in a transformation bucket yet)
  • HTTPS is not yet supported.
  • Sandbox disk space is limited to 10GB.
  • Memory is limited to 8GB.
  • The UI allows loading only tables to the Sandbox. Loading input files and transformation script is supported only by the API.
  • Sandboxes will be deleted after 5 days.
  • Adding data to existing sandboxes is not supported.

Otherwise, it's like RStudio connected to your KBC project:

The CSV file locations are the same as in transformations (/data/in/tables/). In the meantime, we will work on:

  • HTTPS Support.
  • Creating better UI and integration with transformations.
  • Allowing tables and files to be added to a running sandbox.


Week in review -- December 27, 2016

Hello everyone,

Last week we have updated the R environment for both custom science and transformations to version 3.3.2.

Apart from that not much has happened last week because our office has been seized:


If everything goes as planned, this should be our last post of the year. So we wish you Happy New Year!

Job failures

Job Failures

There were jobs failures between 4 AM - 9 AM UTC caused by one of our servers dying in a horrible slow and painful death.

We're sorry for this inconvenience, we have restarted the affected orchestrations.


New Currency Extractor

We have created a new currency exchange rate extractor. This component allows you to extract currency exchange rates as published by European Central Bank (ECB). Its configuration is dead simple, all you need to do is select the source currency (USD and EUR are currently supported).


Migration

This extractor provides the same data in the same format as the old extractor (which had to be configured through support). We encourage you to switch to the new one - set it up as any other extractor. We will keep running the old extractor till January 2017. Just let us know once you don't need it, so that we can deactivate it and stop magically pushing data to your project.

Like in the old currency extractor, the resulting data contains gaps for bank holidays (including weekends). If you would like to fill the gaps with last known value, you can use a little SQL script we hacked together.

Weeks in Review -- October 19, 2016

Even though the weekly status has been a little weak recently, it does not mean that we're dawdling. We are working on some quite big internal things, which take much more than a week. For example (in no particular order):

  • Developer Portal, which will enable 3rd party developers to manage their applications.
  • Keboola Connection in EU Region.
  • OpenRefine transformations.
  • Validation and simplification of input and output mapping.
  • Internal changes to transformations and sandboxes using so called Workspaces.
  • RStudio and Jupyter Sandboxes.
  • Collecting more statistics about running jobs:
  • New Redshift, Oracle and MSSQL Writers and DynamoDB, Facebook and S3 Extractors.
  • Support of LuckyGuess applications (e.g. Anomaly Detection) on Snowflake.
  • Shared buckets between projects (to replace copying data via Restbox).
  • Resolving problem with ever growing tables on Redshift.
  • Plus we fixed the annoying problem that yelled "Table already exists" when loading data into sandbox.

All of the above are in various state of completeness (except the last item, which is complete). So stay tuned for more announcements when these features are finished.


Custom Science

In fall last year we introduced Custom Science applications at Keboola Meetup, it was exactly one day old at that time. From that time we worked in a lot of improvements (support for private repositories, encryption, error handling and documentation) and many bug fixes. Now we also improved the UI, so this post is a summary of what is currently possible.

Custom Science is the easiest way ever to integrate arbitrary code (in R and Python and now also in PHP) into your KBC projects. To create a CS application, you just need to provide a Git repository, which contains the code to transform your data, we will take care of the rest.

Sounds similar to transformations? Good, because it really is, but with some awesome twists. With custom science:

  • you can quickly integrate a 3rd party application or API
  • the developer of the application does not need to have access to your project 
  • you do not need to have access to the application code
  • you have much more freedom in the code organization

What is it good for?

The point of all this is that you can hire a 3rd party developer, point him to our documentation and let him do his work. When he is done, you can integrate his application with a few clicks. We see this as an ultimate connection between You and hardcore Ph.D. data scientists who have algorithms you're ailing for (but they do not have the infrastructure to run them in a usable form). We care about protecting you, because those developers do not need access to your project. If you are really concerned, than you can also isolate the application completely from any network. We also care about protecting the developers, because if they share a git repository with you, they can use a private repository, without sharing the password with you. We don't really want to interrupt anyones business and we try to stay out of the way as much as possible, so what the application does is completely upon the agreement between you and 3rd party.

You might also consider using custom applications for your own transformations code. We are long aware that some of your transformations are really complicated. You can now take a complex transformation, split it into several files, classes (or whatever you want), and you can run tests on it. Again, your code can be stored in private git repositories, and be protected if you want to. Also, this way, you can share transformation code between multiple KBC projects.

Differences to transformations

  • To create transformation code, you need access to KBC project, to create Custom Science code, you don't need to.
  • Transformation code is accessible to anyone in project, Custom Science code can be hidden.
  • Transformation code must be stacked into a single script, Custom Science code can be organized freely.
  • Transformations are tied to project, Custom Science code is in separate repository and can be shared between projects.
  • Transformations are versioned as changes in the configuration in the KBC project. Custom Science is versioned using tags in a git repository.
  • Transformations should have input and output, Custom Science does not need to, so it can take role of extractors or writers.
  • Transformations have no parameters, Custom Science can be parametrized. 

Q & A

Why is it called Custom Science?

Because it is designed to connect a 3rd party (data) scientist with you, so that he can provide you with Science customized to your needs.

Will I have to rewrite all Transformations to Custom Science?

Certainly not. Transformations are there to stay. Custom Science is another option, not replacement.

Will Custom Science be limited to R and Python and PHP?

It depends on demand. If you require another language, let us know. So far we got request for R, Python 3.x and Python 2.x and PHP so we have those.

What are the differences in the code between Transformations and Custom Science?

Almost none, there are minor differences in handling packages (they are installed automatically in R/Python applications and have to be installed manually in CS) and handling file inputs.

I made a Custom Science Application, can I share it with other people?

Great, fill out the checklist and let us know that you want to register it as a KBC application. When registered it will be available as any other KBC component.

More reading