Generic REST API Extractor

We've just developed a new extractor, which allows you to export data from various APIs, by just setting their URL and a few other parameters in configuration bucket.

What it can do:

  • Export data from any REST API
  • Authenticate using HTTP Basic authentication
  • Authenticate using a generated signature in a query string
  • Scroll through result pages using offset or page number parameters

What can't it do:

  • Paginate using a value within API's response (eg. "next_page": "http://api.example.com/endpoint/nextSetOfResults")
    • This functionality will be enabled soon(TM)!
  • Authenticate using OAuth
    • This, however, won't be enabled anytime soon due to the nature of OAuth, where the application has to be registered at the API, and therefore doesn't allow use of the API without developer interaction
  • Simple configuration!
    • This extractor is designed to be as universal as possible, and we're at work to develop an easy to understand user interface, that'll not take away from the application's abilities, but won't require bleeding out of eyes, ears nor nose to set up!

How does it work?

You create a bucket, which defines to what API should the extractor connect and how, and the nickname of the API. Then you can create a table within such bucket, and its name can be used as config parameter.

Documentation:

https://developers.keboola.com/extend/generic-extractor/

See attached images for an example (how to set up Conductor API)

Feel free to contact support@keboola.com for help with configuration for any API, or inquiries whether some API is supported, or whether it can possibly be supported, or just with any issues you may encounter setting the extractor up and running!

New SalesForce.com Extractor (and Migration Guide)

We developed a new version of the SalesForce.com extractor and renamed the current version to SalesForce.com (Deprecated).

As some of the changes introduced in the new version are backwards incompatible, these two versions will be running aside for a period of time and we kindly ask you to migrate your configurations.

Changes:

  • Extractor now runs asynchronously - less CURL errors, more scalability and durability, monitoring via Jobs app
  • Configuration is stored in sys.c-ex-salesforce, was in sys.c-SFDC previously
  • Data is stored in in.c-ex-salesforce-config, was in in.c-config previously (where config is configuration name)
  • Some changes in the UI (mostly the menu on the right)

Migration Guide

As there is an OAuth authorization in the process, we can't automate and test the process. Follow this guide for each SalesForce.com extractor configuration in your project. Migration can be performed in 4 easy steps:

  1. Copy configuration
  2. Reauthorize extractor
  3. Run extraction
  4. Integrate 

1. Copy Configuration

Using the UI create a new configuration in SalesForce.com extractor and copy & paste all queries and credentials you have in your SalesForce.com (Deprecated) configuration.

2. Reauthorize Extractor

As the extractor runs on a different worker, you need to get new OAuth tokens. Do it simply by clicking on Authorize SalesForce in the right menu.

3. Run extraction

You can now run all queries by clicking on Run all queries in the UI. You can monitor the progress in the Jobs application.

If you have any incremental query in your configuration you need to migrate the data extracted by these queries first. Repeat this for every incremental query: 

  • Use Storage application and make a snapshot of the two original incremental tables (eg. in.c-SFDC01.User and in.c-SFDC01.User_deleted). 
  • Use the Create new table from snapshot function to copy the tables to the new bucket (eg. in.c-ex-salesforce-SFDC01.User and in.c-ex-salesforce-SFDC01.User_deleted). If the destination bucket does not exist, simply create the bucket manually or run any non-incremental query.

4. Integrate

Once you have downloaded the initial set of data you may need to alter some transformations and orchestrations to integrate the new extractor in the whole pipeline.

Orchestrations

Create a new orchestration task with the new SalesForce.com extractor with the same parameters and then delete the old SalesForce.com Deprecated extractor. 

Transformations

There are two options how to migrate the transformations. You can change the input mappings from the old tables to the new tables (the structure and column names remain the same), or you can keep the old names and simply delete the old tables and make an alias for each deleted table (eg. delete in.c-SFDC01.User and make an alias, eg. in.c-ex-salesforce-SFDC01.User->in.c-SFDC01.User). 

End Of Life Announcement

The SalesForce.com (Deprecated) extractor will be terminated on January 15th. If you have any trouble migratings your configuration, please contact support@keboola.com.

Pigeon Importer app

Attach your data(a csv or gzipped csv file) and send it to a given email, the pigeon will check the inbox and import the received attachment into a storage api table. The whole work flow can be configured via Pigeon Importer UI app and then registered as a regular orchestration task.


New App Annie Extractor

We have added a new App Annie (http://www.appannie.com/) extractor to our connectors portfolio. 

It is available in the Add Extractor menu in your project. The user interface is underway, but feel free to set up your configuration manually, and feel free to contact us at support@keboola.com should you encounter any troubles during the process.

Transformation Input Mapping: Views and Tables

We just introduced an icon in input mapping to show, whether the input mapping is created as a view or a table

Running Redshift transformations and reading data from Redshift Storage (which is the current recommended fastest option) you can choose between creating a table or a view in the input mapping. Whats the difference?

Views are lightning fast to create. Input mapping just aliases a table to your working schema within the cluster and that's it (including all filters). You can then layer another view on that and another... until you're done and you can set the final view as a source table for an output mapping. All the work is then done when processing the output mapping. That is the snatch - it is easier to reach the cluster's limits (memory, disk) with one large query (multiple nested views). And because the cluster is out of memory, it will also terminate all other queries running on the cluster at the same time. 

So please be careful when using views. If you're not sure, feel free to reach out to support@keboola.com for more assistance. 




Email notifications

Orchestrator's email notifications were redesigned. 

If anything wrong happen, Orchestrator send you brief visual overview. All necessary details are accessible through UI. We're not spamming you by long list of logs anymore.

Mobile skin:

Desktop skin:

Orchestrator's Job details

...were redesigned, so your debug scenario should work much smoothly. This is redesigned page with all Jobs and tasks details:

Extractors failures

Paymo, Facebook, Facebook Ads and Salesforce extractors were returning curl(60) error in orchestrations from 3 PM - 11PM PST November 6th. Error was caused by invalid SSL certificates.

To finish your tasks, just re-run your orchestrators. We're sorry for any inconvenience! 

Inaccessible Storage API files

Some files were not accessible between 7 PM - 10 PM PST November 4. It caused failures of loads to storage API tables and thus also orchestration failures.

Example of failed orchestration:


It was caused by failed Elasticsearch cluster node. We are still investigating the cause of this issue. However, our whole infrastructure works smoothly at this time. To finish your tasks, just re-run your orchestrators. We're sorry for any inconvenience!