Transformations EA Preview

We'll be releasing updated version of the Transformation API (and accordingly modified UI) later this week. Changes will include:

All transformation and sandbox job are asynchronous. 

More reliability, durability and scalability of both UI and API. 

You can monitor all transformation jobs in the Jobs app, processing jobs will show detailed 

Running a transformation or a bucket will not keep the modal window open. 

It will provide you a link to the job detail instead. 

Sandbox jobs will keep the window with the details, though.

Custom credentials are deprecated.

We're providing you with enough power to process your data. If you want to run transformations on your own database servers, please contact support@keboola.com.

Running disabled transformation is disabled.

May sound weird, but it is. You were allowed to run disabled transformation from the transformation detail in the UI. 

If you want to separate a transformation, set it to a different phase or migrate it to another bucket.

Testing period

If you want to make sure everything works fine in your project, you can try it out. The new UI is available and is connected to the new version of API in the Applications app.

You can try running certain transformations, create sandboxes etc. If you want to try the new API within an orchestration, you need to change manually the component name in the configuration table (sys.c-orchestrator.*) from transformation to transformation-new.

In the Orchestrations UI the task name will show Transformations (EA Preview)

When your testing is done, please reset the value back to transformation. 

Stay updated for the release date and please report all bugs and concerns to support@keboola.com.


Transformation Events

As a part of the ongoing Transformation API overhaul we've changed transformation events. We tried to keep it simple, so there's one event for each:

  • Engine startup
  • Start of a phase
  • Database cleanup (happens at the beginning and end of a phase)
  • Input mapping
  • Input mapping that takes longer than 120s (configurable)
  • Transformation (as a whole, not a query in the transformation)
  • Query that takes longer than 120s (configurable)
  • Output mapping
  • Engine shutdown (success or error)
There are no more any START or END events and the engine produces less events. Further activity can be found in related storage events or jobs (eg. table imports and exports).

Generic REST API Extractor

We've just developed a new extractor, which allows you to export data from various APIs, by just setting their URL and a few other parameters in configuration bucket.

What it can do:

  • Export data from any REST API
  • Authenticate using HTTP Basic authentication
  • Authenticate using a generated signature in a query string
  • Scroll through result pages using offset or page number parameters

What can't it do:

  • Paginate using a value within API's response (eg. "next_page": "http://api.example.com/endpoint/nextSetOfResults")
    • This functionality will be enabled soon(TM)!
  • Authenticate using OAuth
    • This, however, won't be enabled anytime soon due to the nature of OAuth, where the application has to be registered at the API, and therefore doesn't allow use of the API without developer interaction
  • Simple configuration!
    • This extractor is designed to be as universal as possible, and we're at work to develop an easy to understand user interface, that'll not take away from the application's abilities, but won't require bleeding out of eyes, ears nor nose to set up!

How does it work?

You create a bucket, which defines to what API should the extractor connect and how, and the nickname of the API. Then you can create a table within such bucket, and its name can be used as config parameter.

Documentation:

https://developers.keboola.com/extend/generic-extractor/

See attached images for an example (how to set up Conductor API)

Feel free to contact support@keboola.com for help with configuration for any API, or inquiries whether some API is supported, or whether it can possibly be supported, or just with any issues you may encounter setting the extractor up and running!

New SalesForce.com Extractor (and Migration Guide)

We developed a new version of the SalesForce.com extractor and renamed the current version to SalesForce.com (Deprecated).

As some of the changes introduced in the new version are backwards incompatible, these two versions will be running aside for a period of time and we kindly ask you to migrate your configurations.

Changes:

  • Extractor now runs asynchronously - less CURL errors, more scalability and durability, monitoring via Jobs app
  • Configuration is stored in sys.c-ex-salesforce, was in sys.c-SFDC previously
  • Data is stored in in.c-ex-salesforce-config, was in in.c-config previously (where config is configuration name)
  • Some changes in the UI (mostly the menu on the right)

Migration Guide

As there is an OAuth authorization in the process, we can't automate and test the process. Follow this guide for each SalesForce.com extractor configuration in your project. Migration can be performed in 4 easy steps:

  1. Copy configuration
  2. Reauthorize extractor
  3. Run extraction
  4. Integrate 

1. Copy Configuration

Using the UI create a new configuration in SalesForce.com extractor and copy & paste all queries and credentials you have in your SalesForce.com (Deprecated) configuration.

2. Reauthorize Extractor

As the extractor runs on a different worker, you need to get new OAuth tokens. Do it simply by clicking on Authorize SalesForce in the right menu.

3. Run extraction

You can now run all queries by clicking on Run all queries in the UI. You can monitor the progress in the Jobs application.

If you have any incremental query in your configuration you need to migrate the data extracted by these queries first. Repeat this for every incremental query: 

  • Use Storage application and make a snapshot of the two original incremental tables (eg. in.c-SFDC01.User and in.c-SFDC01.User_deleted). 
  • Use the Create new table from snapshot function to copy the tables to the new bucket (eg. in.c-ex-salesforce-SFDC01.User and in.c-ex-salesforce-SFDC01.User_deleted). If the destination bucket does not exist, simply create the bucket manually or run any non-incremental query.

4. Integrate

Once you have downloaded the initial set of data you may need to alter some transformations and orchestrations to integrate the new extractor in the whole pipeline.

Orchestrations

Create a new orchestration task with the new SalesForce.com extractor with the same parameters and then delete the old SalesForce.com Deprecated extractor. 

Transformations

There are two options how to migrate the transformations. You can change the input mappings from the old tables to the new tables (the structure and column names remain the same), or you can keep the old names and simply delete the old tables and make an alias for each deleted table (eg. delete in.c-SFDC01.User and make an alias, eg. in.c-ex-salesforce-SFDC01.User->in.c-SFDC01.User). 

End Of Life Announcement

The SalesForce.com (Deprecated) extractor will be terminated on January 15th. If you have any trouble migratings your configuration, please contact support@keboola.com.

Pigeon Importer app

Attach your data(a csv or gzipped csv file) and send it to a given email, the pigeon will check the inbox and import the received attachment into a storage api table. The whole work flow can be configured via Pigeon Importer UI app and then registered as a regular orchestration task.


New App Annie Extractor

We have added a new App Annie (http://www.appannie.com/) extractor to our connectors portfolio. 

It is available in the Add Extractor menu in your project. The user interface is underway, but feel free to set up your configuration manually, and feel free to contact us at support@keboola.com should you encounter any troubles during the process.

Transformation Input Mapping: Views and Tables

We just introduced an icon in input mapping to show, whether the input mapping is created as a view or a table

Running Redshift transformations and reading data from Redshift Storage (which is the current recommended fastest option) you can choose between creating a table or a view in the input mapping. Whats the difference?

Views are lightning fast to create. Input mapping just aliases a table to your working schema within the cluster and that's it (including all filters). You can then layer another view on that and another... until you're done and you can set the final view as a source table for an output mapping. All the work is then done when processing the output mapping. That is the snatch - it is easier to reach the cluster's limits (memory, disk) with one large query (multiple nested views). And because the cluster is out of memory, it will also terminate all other queries running on the cluster at the same time. 

So please be careful when using views. If you're not sure, feel free to reach out to support@keboola.com for more assistance.