Failing GoodData Writer uploads

After last GoodData maintenance some uploads to GoodData using our writer are started failing with the following error:

Running the LoadData task again manually will be successful. If you're running more tasks in one job, you can disable the failing job to get all the other tasks through. We have discussed the error with GoodData support and this is what they have to say:

I have already confirmed by our R&D engineers, that we have introduced the issue during the R119. We have prepared the fix for the issue and the internal discussion about the fix delivery has already been triggered. The fix unfortunately requires short time (up to 10 minutes) GoodData upload subsystems outage, therefore the our responsible managers are trying to find the best window for its delivery.

I can also confirm, that temporal switching OFF of the gZip in the command would serve as workaround, but I do as well understand the concerns you are mentioning that keeping you away of using it.

I'll share further information w/ you as soon as I have them available.

Currently we cannot turn off gzipped transfers for all projects as it will slow down all uploads. Let's wait if GoodData can release the fix during the weekend and if not, we'll implement temporary gzip disabling for affected projects early next week. Please bear with us, feel free to drop us a line at support@keboola.com if you're concerned about your project.

UPDATE 4:08pm CEST: GoodData announced maintenance for August 20th 2016. During this maintenance this bug should be resolved.

UPDATE Monday August 22 7:30am CEST: We're still rarely experiencing this issue.

UPDATE Monday August 22 2:40pm CEST: We have deployed a workaround fix for the bug so GoodData data loads now should work as expected.

We're sorry for this inconvenience.

Revamped Google Analytics Extractor

New version of Google Analytics Extractor is now available. The old one has been marked deprecated.

Features

Google Analytics Extractor now works with the newest Google Analytics API V4, providing these key features:

  • Metrics expressions

    The API allows you to request not only built-in metrics but also combination of metrics expressed in mathematical operations. For example, you can use the expression ga:goal1completions/ga:sessions to request the goal completions per number of sessions.

  • Multiple date ranges

    The API allows you to get data from 2 date ranges with a single request.

  • Multiple segments

    The API enables you to get data from multiple segments with a single request.

Read more in our documentation.

We have also improved the UI of this extractor, you can now choose available metrics, dimensions and segments from dropdown selectors and immediately see results from your query.

Migration

To help you migrate your configuration to the new version of this extractor, we have prepared a migration tool, which will be available very soon.
The old extractor is now deprecated, but it will remain functional until December 2016.

We are always glad to receive feedback from our users, so if you have any questions or ideas how to improve this component, don't hesitate to contact us.

Week in Review -- August 15, 2016

Since our last update here's what happened in Keboola Connection

Bugfixes, minor changes

  • Facebook Ads extractor was updated to API v2.7

IBM dashDB Writer

It's my pleasure to introduce another component into the family of Keboola Connection ecosystem: IBM dashDB Writer.

IBM dashDB is a new generation analytical datawarehouse. Build on core of DB2, Powered by in-memory and columnar storage and enhanced by in-database analytical functions migrated from IBM Netezza. You will find more in the documentation.

The current version of the writer supports transfers of the objects from Keboola Storage into dashDB tables and at the moment just full loads are supported. But you should have no problem to transfer even bigger files in decent time.

This writer is developed independently by Radek Tomášek with huge support of IBM. For more information on how to use the writer, please refer to the documentation.

Extended Versions Management

We've recently introduced the configuration versioning management in transformations: http://status.keboola.com/transformation-versions-management

Now we've added a diff feature which allows comparisons of two adjacent versions, i,e., the differences between two configuration updates.  For example, the user can now track and compare changes of sql queries in their transformations or changes in other parameters(input/output mapping).

Moreover, we extendeded this feature to other components such as all database extractors, all generic components and more are coming soon..


Feel free to try it and let us know how you like it :)




New Storage API Importer

We have launched a new version of Storage API Importer which is replacing old one running at https://syrup.keboola.com/sapi-importer/run

Storage API Importer simplifies the whole process of importing a table into Storage . The SAPI Importer allows you to make an HTTP POST request and import a file directly into a Storage table.

The HTTP request must contain the tableId and data form fields. Therefore to upload the new-table.csv CSV file (and replace the contents) into the new-table table in the in.c-main bucket, call:

curl --request POST --header "X-StorageApi-Token:storage-token" --form "tableId=in.c-main.new-table" --form "data=@new-table.csv" "https://import.keboola.com/write-table"

New Importer is now running as a standalone service so it gives us more control about service scaling, stability and performance.

Read more details in New Storage API Importer Documentation

Migration

The services are fully compatible the only difference is in hostname and path. That means all you need to do is replace https://syrup.keboola.com/sapi-importer/run to https://import.keboola.com/write-table in your scripts.

So the following script:

curl --request POST --header "X-StorageApi-Token:storage-token" --form "tableId=in.c-main.new-table" --form "data=@new-table.csv" "https://syrup.keboola.com/sapi-importer/run"

Will become:

curl --request POST --header "X-StorageApi-Token:storage-token" --form "tableId=in.c-main.new-table" --form "data=@new-table.csv" "https://import.keboola.com/write-table"

We will support old service December 2016. All customers using old service will be notified soon.

Are you not sure if you are using old importer service?

You can validate it with following steps:
  • Open storage page in your project
  • Search for component:sapi-importer in events
  • If there are no results then you are not using the old service.  If you see something similar to the screenshot below, then you are using the old service.






Custom Science

In fall last year we introduced Custom Science applications at Keboola Meetup, it was exactly one day old at that time. From that time we worked in a lot of improvements (support for private repositories, encryption, error handling and documentation) and many bug fixes. Now we also improved the UI, so this post is a summary of what is currently possible.

Custom Science is the easiest way ever to integrate arbitrary code (in R and Python and now also in PHP) into your KBC projects. To create a CS application, you just need to provide a Git repository, which contains the code to transform your data, we will take care of the rest.

Sounds similar to transformations? Good, because it really is, but with some awesome twists. With custom science:

  • you can quickly integrate a 3rd party application or API
  • the developer of the application does not need to have access to your project 
  • you do not need to have access to the application code
  • you have much more freedom in the code organization

What is it good for?

The point of all this is that you can hire a 3rd party developer, point him to our documentation and let him do his work. When he is done, you can integrate his application with a few clicks. We see this as an ultimate connection between You and hardcore Ph.D. data scientists who have algorithms you're ailing for (but they do not have the infrastructure to run them in a usable form). We care about protecting you, because those developers do not need access to your project. If you are really concerned, than you can also isolate the application completely from any network. We also care about protecting the developers, because if they share a git repository with you, they can use a private repository, without sharing the password with you. We don't really want to interrupt anyones business and we try to stay out of the way as much as possible, so what the application does is completely upon the agreement between you and 3rd party.

You might also consider using custom applications for your own transformations code. We are long aware that some of your transformations are really complicated. You can now take a complex transformation, split it into several files, classes (or whatever you want), and you can run tests on it. Again, your code can be stored in private git repositories, and be protected if you want to. Also, this way, you can share transformation code between multiple KBC projects.

Differences to transformations

  • To create transformation code, you need access to KBC project, to create Custom Science code, you don't need to.
  • Transformation code is accessible to anyone in project, Custom Science code can be hidden.
  • Transformation code must be stacked into a single script, Custom Science code can be organized freely.
  • Transformations are tied to project, Custom Science code is in separate repository and can be shared between projects.
  • Transformations are versioned as changes in the configuration in the KBC project. Custom Science is versioned using tags in a git repository.
  • Transformations should have input and output, Custom Science does not need to, so it can take role of extractors or writers.
  • Transformations have no parameters, Custom Science can be parametrized. 

Q & A

Why is it called Custom Science?

Because it is designed to connect a 3rd party (data) scientist with you, so that he can provide you with Science customized to your needs.

Will I have to rewrite all Transformations to Custom Science?

Certainly not. Transformations are there to stay. Custom Science is another option, not replacement.

Will Custom Science be limited to R and Python and PHP?

It depends on demand. If you require another language, let us know. So far we got request for R, Python 3.x and Python 2.x and PHP so we have those.

What are the differences in the code between Transformations and Custom Science?

Almost none, there are minor differences in handling packages (they are installed automatically in R/Python applications and have to be installed manually in CS) and handling file inputs.

I made a Custom Science Application, can I share it with other people?

Great, fill out the checklist and let us know that you want to register it as a KBC application. When registered it will be available as any other KBC component.

More reading



End of life of VPNs and Fixed IPs

Many of current configurations of database extractors are set using fixed IPs and VPN connections. We think it's time to move to the 21st century and use SSH tunnels (yeah those were designed in 90s too). Therefore we won't be offering fixed IPs and VPN setups for new customers, effective from August 2016.

Why?

Because there are never-ending problems. IPs which are not supposed to change, do change. VPNs are prone to outages. Servers get restarted unexpectedly. It is sometimes very complicated to identify the source of connection problems and many other issues. Overall, these setups are not up to our reliability and traceability standards.

Migration

We will support all of you - existing customers - till August 2017. This should give you enough time to migrate to SSH tunnels. We recommend that you (or your system administrator) read our guide for setting up an SSH tunnel. If you have any questions, do not hesitate to contact us.

Occasional Job failures

We are experiencing some issues with our infrastructure which causes occasional job failures. The jobs fail with generic message "Internal error occurred ...". We have identified the issue and we applied a fix already.

Currently we are monitoring the situation closely and we restarted the failed orchestrations. In case you encounter the issue, the solution is to restart the failed job, because the failure is transient.