New Currency Extractor

We have created a new currency exchange rate extractor. This component allows you to extract currency exchange rates as published by European Central Bank (ECB). Its configuration is dead simple, all you need to do is select the source currency (USD and EUR are currently supported).


Migration

This extractor provides the same data in the same format as the old extractor (which had to be configured through support). We encourage you to switch to the new one - set it up as any other extractor. We will keep running the old extractor till January 2017. Just let us know once you don't need it, so that we can deactivate it and stop magically pushing data to your project.

Like in the old currency extractor, the resulting data contains gaps for bank holidays (including weekends). If you would like to fill the gaps with last known value, you can use a little SQL script we hacked together.

Weeks in Review -- October 19, 2016

Even though the weekly status has been a little weak recently, it does not mean that we're dawdling. We are working on some quite big internal things, which take much more than a week. For example (in no particular order):

  • Developer Portal, which will enable 3rd party developers to manage their applications.
  • Keboola Connection in EU Region.
  • OpenRefine transformations.
  • Validation and simplification of input and output mapping.
  • Internal changes to transformations and sandboxes using so called Workspaces.
  • RStudio and Jupyter Sandboxes.
  • Collecting more statistics about running jobs:
  • New Redshift, Oracle and MSSQL Writers and DynamoDB, Facebook and S3 Extractors.
  • Support of LuckyGuess applications (e.g. Anomaly Detection) on Snowflake.
  • Shared buckets between projects (to replace copying data via Restbox).
  • Resolving problem with ever growing tables on Redshift.
  • Plus we fixed the annoying problem that yelled "Table already exists" when loading data into sandbox.

All of the above are in various state of completeness (except the last item, which is complete). So stay tuned for more announcements when these features are finished.


Custom Science

In fall last year we introduced Custom Science applications at Keboola Meetup, it was exactly one day old at that time. From that time we worked in a lot of improvements (support for private repositories, encryption, error handling and documentation) and many bug fixes. Now we also improved the UI, so this post is a summary of what is currently possible.

Custom Science is the easiest way ever to integrate arbitrary code (in R and Python and now also in PHP) into your KBC projects. To create a CS application, you just need to provide a Git repository, which contains the code to transform your data, we will take care of the rest.

Sounds similar to transformations? Good, because it really is, but with some awesome twists. With custom science:

  • you can quickly integrate a 3rd party application or API
  • the developer of the application does not need to have access to your project 
  • you do not need to have access to the application code
  • you have much more freedom in the code organization

What is it good for?

The point of all this is that you can hire a 3rd party developer, point him to our documentation and let him do his work. When he is done, you can integrate his application with a few clicks. We see this as an ultimate connection between You and hardcore Ph.D. data scientists who have algorithms you're ailing for (but they do not have the infrastructure to run them in a usable form). We care about protecting you, because those developers do not need access to your project. If you are really concerned, than you can also isolate the application completely from any network. We also care about protecting the developers, because if they share a git repository with you, they can use a private repository, without sharing the password with you. We don't really want to interrupt anyones business and we try to stay out of the way as much as possible, so what the application does is completely upon the agreement between you and 3rd party.

You might also consider using custom applications for your own transformations code. We are long aware that some of your transformations are really complicated. You can now take a complex transformation, split it into several files, classes (or whatever you want), and you can run tests on it. Again, your code can be stored in private git repositories, and be protected if you want to. Also, this way, you can share transformation code between multiple KBC projects.

Differences to transformations

  • To create transformation code, you need access to KBC project, to create Custom Science code, you don't need to.
  • Transformation code is accessible to anyone in project, Custom Science code can be hidden.
  • Transformation code must be stacked into a single script, Custom Science code can be organized freely.
  • Transformations are tied to project, Custom Science code is in separate repository and can be shared between projects.
  • Transformations are versioned as changes in the configuration in the KBC project. Custom Science is versioned using tags in a git repository.
  • Transformations should have input and output, Custom Science does not need to, so it can take role of extractors or writers.
  • Transformations have no parameters, Custom Science can be parametrized. 

Q & A

Why is it called Custom Science?

Because it is designed to connect a 3rd party (data) scientist with you, so that he can provide you with Science customized to your needs.

Will I have to rewrite all Transformations to Custom Science?

Certainly not. Transformations are there to stay. Custom Science is another option, not replacement.

Will Custom Science be limited to R and Python and PHP?

It depends on demand. If you require another language, let us know. So far we got request for R, Python 3.x and Python 2.x and PHP so we have those.

What are the differences in the code between Transformations and Custom Science?

Almost none, there are minor differences in handling packages (they are installed automatically in R/Python applications and have to be installed manually in CS) and handling file inputs.

I made a Custom Science Application, can I share it with other people?

Great, fill out the checklist and let us know that you want to register it as a KBC application. When registered it will be available as any other KBC component.

More reading



End of life of VPNs and Fixed IPs

Many of current configurations of database extractors are set using fixed IPs and VPN connections. We think it's time to move to the 21st century and use SSH tunnels (yeah those were designed in 90s too). Therefore we won't be offering fixed IPs and VPN setups for new customers, effective from August 2016.

Why?

Because there are never-ending problems. IPs which are not supposed to change, do change. VPNs are prone to outages. Servers get restarted unexpectedly. It is sometimes very complicated to identify the source of connection problems and many other issues. Overall, these setups are not up to our reliability and traceability standards.

Migration

We will support all of you - existing customers - till August 2017. This should give you enough time to migrate to SSH tunnels. We recommend that you (or your system administrator) read our guide for setting up an SSH tunnel. If you have any questions, do not hesitate to contact us.

Occasional Job failures

We are experiencing some issues with our infrastructure which causes occasional job failures. The jobs fail with generic message "Internal error occurred ...". We have identified the issue and we applied a fix already.

Currently we are monitoring the situation closely and we restarted the failed orchestrations. In case you encounter the issue, the solution is to restart the failed job, because the failure is transient.


Job failures

Today between 12:13 and 13:43 we had an error in one of our compontents, which caused some jobs to fail with message: 

Running container exceeded the timeout of 3600 seconds.

even if the job did not in fact exceed a timeout. Failed orchestrations have been restared. We are really sorry about this.

Week In Review - May 16, 2016

Last week was very rich on new features and stuff, so this week is a little lighter for a change.

New features

- If you have a limited project, we will now send you an email notification a week before the project expires.

- We have updated R in Transformations and Custom Science to R version 3.2.5 (April 2016).


Bug fixes

- Snapshots of Redshift tables with non-lowercase primary keys are now working correctly.

- Project Backup and Takeout now exports configuration rows (e.g. transformation queries) and works for large projects too.


Other posts this week

Redshift Transformation Input Mapping Update

Orchestrator table deletion announcement

Python Transformations

We have added Python support to our Transformation engine. Python is a handy and versatile programming language. It also has a lot of useful libraries. Particularly interesting may be the SciPy stack. All Python transformations run in our public docker image with Python 3.5.1 and have an 8GB memory limit. 

The interface is highly similar to the existing R transformations. You can start by setting input and output mapping in the UI. The tables from input mapping will be created as CSV files in in/tables directory. Result CSV files from out/tables directory will be uploaded to your project Storage. All your python code has to do is read the CSV file, do some magic and then write a CSV file.

If you need some packages from PyPI, you can list them in the packages section of the UI. By the way, the SciPy stack is installed by default.

If you are interested in writing Python transformations, we have an introduction article in documentation with some examples that show how to work with the input and output files.