Elasticsearch Failure - Components not working

Our Elasticsearch Syrup cluster is not responding. This cluster stores all information for all components' jobs. Storage works fine, but the rest of the system came to a halt. We're investigating this issue.

UPDATE 10:30pm PST / 7:30am CEST: Cluster is back online, all operations resumed or restarted. We're sorry for any inconvenience.

Storage API Client For R

Want to play with your KBC data in your local R environment?

Install the keboola-sapi-r-client and you can.  
(The package is on GitHub so it is installed via the devtools package) 

install.packages("devtools")
library(devtools)

We need to install a github dependency
for aws request signature generation

devtools::install_github("cloudyr/aws.signature")

Now we can install the Storage api client and load it into our R session

devtools::install_github("keboola/sapi-r-client")
library(keboola.sapi.r.client)

Just like any other R package, once installed, it can be invoked in any future session with the library() command.

To instantiate the client just give it a KBC token.
We'll use the token for the currency exchange rates for demonstration purposes.

client <- SapiClient$new('452-33945-de5bb7fecb818901f0834b2431564003296a4b05')

Now we can import data to our R session

currencyData <- client$importTable('in.c-ex-currency.rates')

Just for fun, let's make a simple plot of EUR vs USD using the ggplot2 library
if not installed on your R use install.packages("ggplot2")

# prepare our data
eurVsUsd <- currencyData[which(currencyData$toCurrency == "USD"),]
eurVsUsd$date <- as.Date(eurVsUsd$date)

# load the libraries needed to make our plot
library(ggplot2)
library(scales) # for prettier x-axis labeling

p <- ggplot(eurVsUsd, aes_string(x="date", y="rate")) + geom_point()
# add x-axis scaling and title
p <- p + scale_x_date(breaks="1 year", labels=date_format("%Y"))
p <- p + ggtitle("EUR vs USD")
print(p)

The code for this sample is here in this gist

The Storage API client gives full read and write access to your KBC project within the comforting power of your local R environment.

Imagine the possibilities!

* small print *  This is a development tool in Beta, use at your own risk!

MySQL Transformation input mapping size limit

We'll be introducing a limit on size of tables that are imported in a MySQL transformation. 

Why? Processing large tables in MySQL is very ineffective and slow, and it also negatively affects other users in the shared MySQL environment. To ensure your smooth user experience for everyone we'll be pushing all large transformations to a faster backend (Redshift and possibly also some others in the future).

This is an addition to query time limit, which focuses on (accidentally) unoptimized queries.

There will be two limits. A lower soft limit will warn you that you're exceeding the limit, but won't stop the transformation. A higher hard limit will stop the transformation immediately. Soft limit is just a warning, that you're processing larger amounts of data. You should take action only if you're getting close to the hard limit.

What to do, if you're exceeding the limit? There are few easy things to avoid breaking these limits:

  • Incremental processing. Set up your pipeline as incremental and do not process all data every run. The limit measures only transferred data, not the whole table size. 
  • Move the transformation to Redshift and the relevant storage buckets as well. There are no such limits on Redshift. It's just way faster.

The soft limit is already in place and its size is 2GB (2147483648 bytes). You can find the warnings in your Event list by searching for "We recommend using Redshift for tables larger than 2147483648 bytes.".

The hard limit will be introduced on July 1st and the size will be 5GB (5368709120 bytes). On June 1st we will notify all affected users before this policy will come in place and will try to help finding a feasible solution.

Infrastructure issues

We are investigating infrastructure issues affecting most of extractors and writers. Thanks for your patience.

UPDATE 0:33am PST / 9:33 CEST: Issue has been resolved, we'll try to restart all failed orchestrations, if we miss anything, please feel free to restart by yourself. Sorry for any inconvenience.

GoodData Writer issues

We have been fixing multiple errors regarding proper handling of GoodData maintenance during Saturday which could cause failing of some Writer's jobs. All problems were solved and shouldn't appear again. We apologize for any inconvenience.

Docker bundle enhancements

We're excited to announce new features in Docker bundle. 

For those who don't know that is a component, that allows anyone to run apps encapsulated in Docker to run in Keboola Connection.

Streaming Logs

If your app writes to stdout or stderr these logs are immediately forwarded to Storage API Events, so you can notify about important events in your app live.

More abut streamed logs in the documentation.

Incremental File Processing

In a scenario, where you're processing unknown number of files on a regular basis, incremental file processing comes in handy. Successfully processed files get tagged and are excluded on the next run. 

More about incremental file processing in the documentation.

Development and troubleshooting API calls

We added sandbox, input and dry-run API calls to Docker bundle. They are similar to the counterparts from Transformation API and allow you to

  • prepare data and serialized configuration file for your application before you start developing the app, so you don't have to prepare the folder structure manually (sandbox)
  • see exactly what data comes in to your application (input)
  • see the data input and output of your app (dry-run)

The data is compressed in a ZIP file and stored in File Uploads in the given project.

More about these API calls in the documentation.

Want to know more, interested in developing your own apps in KBC? Read more in the documentation or get in touch with support@keboola.com.