New Generic Extractor Documentation

Long neglected, long awaited, it's here!

We are done with a major overhaul of our Generic Extractor documentation. It is now completely moved to developers.keboola.com. The new documentation contains over 100 runnable examples. We have also created a tutorial which guides you through how to configure the Generic Extractor for a new API.

We have also a configuration map with links to documentation of all parts of the configuration.

In case you haven't heard of the Generic Extractor, it is our universal REST API client. It reads data (in JSON format) from an API according to how you configure it, then converts the results to CSV files and imports them into KBC Storage. This way you can extract data from new APIs in tens of minutes. You can use the Generic Extractor to create a new extractor or perform ad-hoc extractions from an API. 

If you are interested, give a try to the Tutorial.

Week in Review -- April 17, 2017


UI Improvements

  •  Filter configurations in each section(transformation/extractors/writers/orchestrations/applications)

  • Move configuration to trash without confirm dialog - you can revert deleted configuration right after if you did delete the configuration by mistake.


Strict Primary Key Checking Announcement

No project should be affected in their daily operations and no action is required, this is a purely informational message about an upcoming change.

From May 3rd we will begin gradually turn on strict primary key checking in all applications. This will cause all components writing to Storage to fail if the primary key on the table does not match the one set in the component's configuration. 

Previously the primary key was set when the table was created and was not checked any further. 

We recently started monitoring primary key mismatches and tried to fix them automatically - the component executor tried to modify the primary key on the table so that it would match the configuration. A few situations arose where it couldn't be fixed automatically:

  • New primary key contained columns not present in the table
  • New primary key could not be set as there were duplicate values in the new primary key columns
  • Several configurations in the project were changing the primary key back and forth

We have contacted all projects affected by these issues and resolved them. We will continue to monitor this situation closely until we switch on the strict checking. 


Intermittent data load errors in GoodData Writer

Some data loads may end with similar error message: {"upload_status.json":"SLI CSV \"dataset.outcmaincalled_numbers.csv\" does not contain header line"}. It is caused by an error in GoodData WebDav and GoodData support is working on a fix. The occurence is rather sporadic and appears in one or two client projects a day and usually vanishes after few hours.

Transformations UI update

We have slightly updated some of the input fields in the Transformations UI. 

  • Packages, Stored Files and Requires are editable straight away and save immediately after each change.
  • Queries and Scripts are editable straight away too, but you can reset (to the previous state) or save the change with a button.

No more clicking to edit a field. If you delete or add something by accident, you can always revert to the previous version.

Let us know what you think!

Short Quay.io outage

On April 11 between 4:15pm and 4:31pm CEST our connection to quay.io was down. All Docker images stored there were inaccessible. That caused failures in running some components. If you encountered an application error during that period, this outage was likely the culprit. 

Unfortunately status.quay.io remains silent about this outage. 

We are sorry for this inconvenience and to mitigate similar issues in the future, we will move all used Docker images from public repositories (DockerHub and Quay.io) to our private repository in AWS ECR.

Week in Review -- April 11, 2017

Improved table export from Storage console

  • works uniformly regardless of the backend
  • exports big tables up to 5 GB gzipped
  • exported file contains name of the table

UI Improvements

  • Fixed text overflow in table layouts
  • Improved empty states in transformations

Updated Storage API client for R

Please upgrade to a new version, see its Github repository.

Improved Facebook extractor

Facebook extractor is able to correctly set required primary keys for insights data.

New Google Sheets Writer

Google Sheets Writer is now available. It is meant to replace a portion of the current Google Drive Writer.

We have decided to split the current Google Drive Writer into two separate components to simplify their usage:

  • Google Sheets Writer - as its name implies, is designed to upload tabular data from Storage to Google Sheets.
  • (New) Google Drive Writer - will handle uploading general files to Google Drive. It will be released very soon.



Features

  • Advanced Input Mapping
    • choose which columns will be uploaded
    • filter data by date or by column values

  • Upload into existing or new Spreadsheet

  • Update or append data in a Sheet
    • you can write into existing Sheet within a Spreadsheet or create a new one


Migration

The existing Google Drive Writer will be deprecated soon and a migration tool will help you transfer your existing configurations to new Google Drive or Google Sheets Writer respectively.

Don't hesitate to give us feedback or ask a question, write to support@keboola.com.

Week in Review -- April 4, 2017

New Component - Papertrail Extractor

We’re happy to welcome the Papertrail Extractor to the family.  Papertrail manages billions of log messages for operations-savvy companies. It has been our log management system of choice for years. If you log to Papertrail and realise that log messages contain important information, feel free to incorporate this unstructured data into your data strategy. By using our Papertrail Extractor, you can download all records matching your search query within the retention period. The extractor can also incrementally add new records each run.

Discrete sessions from e-commerce, low-level transactions, developer stack traces or operational data - everything can fit in your Keboola project! 

We will cover this topic in an upcoming blog post next week. If you're interested in how we perform complex analytic deep-dive into our logs, follow our Medium account at https://500.keboola.com/

Minor Improvements

  • S3 extractor now displays how many files were downloaded in the specific job; it is very handy especially in case of wildcard rules

  • MSSQL Writer now supports the BCP method - you can activate it in the table settings - it can write your data to desired MSSQL DB at supersonic speed, but take note that it doesn't handle weird UTF8 characters properly

  • Transformations with Snowflake backend now support FLOAT data type in input mapping -> no hacking with NUMBER data type anymore