Storage API Client For R

Want to play with your KBC data in your local R environment?

Install the keboola-sapi-r-client and you can.  
(The package is on GitHub so it is installed via the devtools package) 

install.packages("devtools")
library(devtools)

We need to install a github dependency
for aws request signature generation

devtools::install_github("cloudyr/aws.signature")

Now we can install the Storage api client and load it into our R session

devtools::install_github("keboola/sapi-r-client")
library(keboola.sapi.r.client)

Just like any other R package, once installed, it can be invoked in any future session with the library() command.

To instantiate the client just give it a KBC token.
We'll use the token for the currency exchange rates for demonstration purposes.

client <- SapiClient$new('452-33945-de5bb7fecb818901f0834b2431564003296a4b05')

Now we can import data to our R session

currencyData <- client$importTable('in.c-ex-currency.rates')

Just for fun, let's make a simple plot of EUR vs USD using the ggplot2 library
if not installed on your R use install.packages("ggplot2")

# prepare our data
eurVsUsd <- currencyData[which(currencyData$toCurrency == "USD"),]
eurVsUsd$date <- as.Date(eurVsUsd$date)

# load the libraries needed to make our plot
library(ggplot2)
library(scales) # for prettier x-axis labeling

p <- ggplot(eurVsUsd, aes_string(x="date", y="rate")) + geom_point()
# add x-axis scaling and title
p <- p + scale_x_date(breaks="1 year", labels=date_format("%Y"))
p <- p + ggtitle("EUR vs USD")
print(p)

The code for this sample is here in this gist

The Storage API client gives full read and write access to your KBC project within the comforting power of your local R environment.

Imagine the possibilities!

* small print *  This is a development tool in Beta, use at your own risk!

SSL security improvement

Please review the entire post carefully to determine whether your use of the services will be affected.

As of 12:00 AM PDT April 30, 2015, we will discontinue support of RC4 cipher for securing connections to connection.keboola.com. 

These requests will fail once we disable support for RC4 cipher for the Keboola Connection. To avoid interrupted access, you must update any client software (or inform any clients to update software) making the requests that are using RC4 cipher to connect to our API services.

Security improvements

We're announcing few security improvements:

  • All our servers, facing to clients, are using EV security certificates (what is EV?
  • All our servers have encrypted disks by using Amazon AWS KMS.
  • All our Elasticsearch clusters encrypt all events.
  • Amazon Redshift backends are encrypted by default. Existing customers can request to be moved to encrypted backends.
  • Storage API employ native Amazon S3 file encryption by default
  • All our Multi-AZ RDS metadata servers have encrypted data by default.
  • New Amazon RDS servers are encrypted by default. Existing customers can request to be moved to encrypted backends.

Long story short: if you're connecting to Keboola Connection, client facing servers are covered by strong encryption SSL with displayed identity in browser's address bar + all client's data in Keboola Connection are encrypted by default.

Storage API news

Replace attributes

All attributes of table or bucket can be set by one API call 

File Storage RunId

RunId identifiing running job is associated to all Storage API Files generated by this job. At the moment files can be filtered by RunId http://docs.keboola.apiary.io/#get-%2Fv2%2Fstorage%2Ffiles

This is a building block for upcoming aggregation API which will be able to provide aggregated informations about jobs such as data size transfered in and out, imported or created tables etc. 

Table Aliases with your custom SQL

A few weeks ago, we silently launched ability to create Storage API Aliases by using your own SQL code. These Alias Tables with custom SQL can be created with Redshift backend only.

Create New Alias Table:

Define your own SQL code:

Why?

Alias Tables can help you structure your data. Imagine it as a "Transform on Demand" - everything is happening on-the-fly (aka real-time). Say we have business transactions in table "data". This is an example how to define "derived" table with weekly sum of all transactions, that can't be joined with our Customer (alarm, wrong data!! :-)

Raw Result of this simple alias table:

"year","week","total"
"2014","1","1314788.27"
"2014","2","3719694.16"
"2014","3","3907852.92"
"2014","4","4013945.26"
"2014","5","3884234.84"

Thanks to the almost unlimited power of the Redshift cluster, you can also create much more complex examples. For instance, this one creates a denormalised table of transactions that occur during the night at weekends, in EUR, outside of Czech Republic and not having one specific product code: