Transformation Overview

We have slightly changed the layout of the transformation detail (sections moved to tabs) and added an Overview graph

The graph shows the data flow around the current transformation, including all required transformations, input/output tables and transformations from the same bucket, that work with any of the tables as the current transformation. This allows you to get a bird's eye view on what you're working with. 

The difference between this and Visual SQL (or SQLdep) is that Visual SQL goes deeper in the transformation and analyses the queries, whereas Overview shows you the environment around the transformation. 

GoodData LDM Visualizer

There's a direct link to GoodData LDM Visualizer in the model page in GoodData Writer. So in a single click you can compare what is defined in Keboola Connection and what has been already uploaded to GoodData and how the model is interpreted.

Transformation Descriptions

We're now showing bucket and transformation descriptions in the UI. Currently there's no way to change the bucket description - it is defined setup when the bucket is created, but we're working on a way to make them editable. Transformation descriptions can be changed easily in the transformation detail

Editing in Storage Console

Due to a possible significant inconsistency in real and estimated number of rows (and table sizes) on the MySQL Storage backend which could lead to data loss when editing the data sample, we turned off the data sample editing for all IN and OUT MySQL buckets. SYS buckets are fully editable and will always show and edit all available rows. 

Redshift works as expected - editing is allowed for all tables if they contain less than 800 cells (excluding headers). 

Transformations EA Preview

We'll be releasing updated version of the Transformation API (and accordingly modified UI) later this week. Changes will include:

All transformation and sandbox job are asynchronous. 

More reliability, durability and scalability of both UI and API. 

You can monitor all transformation jobs in the Jobs app, processing jobs will show detailed 

Running a transformation or a bucket will not keep the modal window open. 

It will provide you a link to the job detail instead. 

Sandbox jobs will keep the window with the details, though.

Custom credentials are deprecated.

We're providing you with enough power to process your data. If you want to run transformations on your own database servers, please contact support@keboola.com.

Running disabled transformation is disabled.

May sound weird, but it is. You were allowed to run disabled transformation from the transformation detail in the UI. 

If you want to separate a transformation, set it to a different phase or migrate it to another bucket.

Testing period

If you want to make sure everything works fine in your project, you can try it out. The new UI is available and is connected to the new version of API in the Applications app.

You can try running certain transformations, create sandboxes etc. If you want to try the new API within an orchestration, you need to change manually the component name in the configuration table (sys.c-orchestrator.*) from transformation to transformation-new.

In the Orchestrations UI the task name will show Transformations (EA Preview)

When your testing is done, please reset the value back to transformation. 

Stay updated for the release date and please report all bugs and concerns to support@keboola.com.


Transformation Events

As a part of the ongoing Transformation API overhaul we've changed transformation events. We tried to keep it simple, so there's one event for each:

  • Engine startup
  • Start of a phase
  • Database cleanup (happens at the beginning and end of a phase)
  • Input mapping
  • Input mapping that takes longer than 120s (configurable)
  • Transformation (as a whole, not a query in the transformation)
  • Query that takes longer than 120s (configurable)
  • Output mapping
  • Engine shutdown (success or error)
There are no more any START or END events and the engine produces less events. Further activity can be found in related storage events or jobs (eg. table imports and exports).

New SalesForce.com Extractor (and Migration Guide)

We developed a new version of the SalesForce.com extractor and renamed the current version to SalesForce.com (Deprecated).

As some of the changes introduced in the new version are backwards incompatible, these two versions will be running aside for a period of time and we kindly ask you to migrate your configurations.

Changes:

  • Extractor now runs asynchronously - less CURL errors, more scalability and durability, monitoring via Jobs app
  • Configuration is stored in sys.c-ex-salesforce, was in sys.c-SFDC previously
  • Data is stored in in.c-ex-salesforce-config, was in in.c-config previously (where config is configuration name)
  • Some changes in the UI (mostly the menu on the right)

Migration Guide

As there is an OAuth authorization in the process, we can't automate and test the process. Follow this guide for each SalesForce.com extractor configuration in your project. Migration can be performed in 4 easy steps:

  1. Copy configuration
  2. Reauthorize extractor
  3. Run extraction
  4. Integrate 

1. Copy Configuration

Using the UI create a new configuration in SalesForce.com extractor and copy & paste all queries and credentials you have in your SalesForce.com (Deprecated) configuration.

2. Reauthorize Extractor

As the extractor runs on a different worker, you need to get new OAuth tokens. Do it simply by clicking on Authorize SalesForce in the right menu.

3. Run extraction

You can now run all queries by clicking on Run all queries in the UI. You can monitor the progress in the Jobs application.

If you have any incremental query in your configuration you need to migrate the data extracted by these queries first. Repeat this for every incremental query: 

  • Use Storage application and make a snapshot of the two original incremental tables (eg. in.c-SFDC01.User and in.c-SFDC01.User_deleted). 
  • Use the Create new table from snapshot function to copy the tables to the new bucket (eg. in.c-ex-salesforce-SFDC01.User and in.c-ex-salesforce-SFDC01.User_deleted). If the destination bucket does not exist, simply create the bucket manually or run any non-incremental query.

4. Integrate

Once you have downloaded the initial set of data you may need to alter some transformations and orchestrations to integrate the new extractor in the whole pipeline.

Orchestrations

Create a new orchestration task with the new SalesForce.com extractor with the same parameters and then delete the old SalesForce.com Deprecated extractor. 

Transformations

There are two options how to migrate the transformations. You can change the input mappings from the old tables to the new tables (the structure and column names remain the same), or you can keep the old names and simply delete the old tables and make an alias for each deleted table (eg. delete in.c-SFDC01.User and make an alias, eg. in.c-ex-salesforce-SFDC01.User->in.c-SFDC01.User). 

End Of Life Announcement

The SalesForce.com (Deprecated) extractor will be terminated on January 15th. If you have any trouble migratings your configuration, please contact support@keboola.com.

Transformation Input Mapping: Views and Tables

We just introduced an icon in input mapping to show, whether the input mapping is created as a view or a table

Running Redshift transformations and reading data from Redshift Storage (which is the current recommended fastest option) you can choose between creating a table or a view in the input mapping. Whats the difference?

Views are lightning fast to create. Input mapping just aliases a table to your working schema within the cluster and that's it (including all filters). You can then layer another view on that and another... until you're done and you can set the final view as a source table for an output mapping. All the work is then done when processing the output mapping. That is the snatch - it is easier to reach the cluster's limits (memory, disk) with one large query (multiple nested views). And because the cluster is out of memory, it will also terminate all other queries running on the cluster at the same time. 

So please be careful when using views. If you're not sure, feel free to reach out to support@keboola.com for more assistance.