Elasticsearch Failure - Components not working

Our Elasticsearch Syrup cluster is not responding. This cluster stores all information for all components' jobs. Storage works fine, but the rest of the system came to a halt. We're investigating this issue.

UPDATE 10:30pm PST / 7:30am CEST: Cluster is back online, all operations resumed or restarted. We're sorry for any inconvenience.

MySQL Transformation input mapping size limit

We'll be introducing a limit on size of tables that are imported in a MySQL transformation. 

Why? Processing large tables in MySQL is very ineffective and slow, and it also negatively affects other users in the shared MySQL environment. To ensure your smooth user experience for everyone we'll be pushing all large transformations to a faster backend (Redshift and possibly also some others in the future).

This is an addition to query time limit, which focuses on (accidentally) unoptimized queries.

There will be two limits. A lower soft limit will warn you that you're exceeding the limit, but won't stop the transformation. A higher hard limit will stop the transformation immediately. Soft limit is just a warning, that you're processing larger amounts of data. You should take action only if you're getting close to the hard limit.

What to do, if you're exceeding the limit? There are few easy things to avoid breaking these limits:

  • Incremental processing. Set up your pipeline as incremental and do not process all data every run. The limit measures only transferred data, not the whole table size. 
  • Move the transformation to Redshift and the relevant storage buckets as well. There are no such limits on Redshift. It's just way faster.

The soft limit is already in place and its size is 2GB (2147483648 bytes). You can find the warnings in your Event list by searching for "We recommend using Redshift for tables larger than 2147483648 bytes.".

The hard limit will be introduced on July 1st and the size will be 5GB (5368709120 bytes). On June 1st we will notify all affected users before this policy will come in place and will try to help finding a feasible solution.

Docker bundle enhancements

We're excited to announce new features in Docker bundle. 

For those who don't know that is a component, that allows anyone to run apps encapsulated in Docker to run in Keboola Connection.

Streaming Logs

If your app writes to stdout or stderr these logs are immediately forwarded to Storage API Events, so you can notify about important events in your app live.

More abut streamed logs in the documentation.

Incremental File Processing

In a scenario, where you're processing unknown number of files on a regular basis, incremental file processing comes in handy. Successfully processed files get tagged and are excluded on the next run. 

More about incremental file processing in the documentation.

Development and troubleshooting API calls

We added sandbox, input and dry-run API calls to Docker bundle. They are similar to the counterparts from Transformation API and allow you to

  • prepare data and serialized configuration file for your application before you start developing the app, so you don't have to prepare the folder structure manually (sandbox)
  • see exactly what data comes in to your application (input)
  • see the data input and output of your app (dry-run)

The data is compressed in a ZIP file and stored in File Uploads in the given project.

More about these API calls in the documentation.

Want to know more, interested in developing your own apps in KBC? Read more in the documentation or get in touch with support@keboola.com.

End of support for long MySQL queries on June 1st

After June 1st 2015 we'll be enforcing strict limits on MySQL queries. All queries longer than 30 minutes (1800 seconds) will be terminated and the transformation will fail. You can now see duration of all queries longer than 2 minutes (120 seconds) in event log of any Transformation job so you can take optimization steps in advance.

We're introducing this limit to prevent errors like forgotten indexes and also to balance load on the shared MySQL Transformation database.

If your queries take significantly longer time than 30 minutes, please consider migrating your project to Redshift.


AWS Connectivity Issues

We're experiencing connectivity issues from AWS to some parts of the outer world (aka the Internet). As our transformation sandbox server (ovh-tapi.keboola.com) is not in AWS you may experience failures when creating sandboxes and credentials. If this is not resolved within AWS shortly, we'll be switching the sandbox server into AWS network. 

We will keep this post updated with the current status.

UPDATE (2:00pm PST): Connectivity seems to work fine now.

Facebook Extractor: invalid account or token

We're changing the way Facebook Extractor reacts to invalid accounts and tokens. Previously all invalid accounts/tokens were automatically disabled (with an error event in Storage) and the extractor continued with extraction. 

As this event was easy to miss and there was no other notification about an invalid token/account, we switched to a more strict behavior. Any invalid account/token will stop the execution of the whole job/orchestration with an explaining message:

You need then to change the token or disable the account manually. 

For any questions and comments do not hesitate to contact us at support@keboola.com.

Provisioning improvements: MySQL DB names and logging

We changed the naming conventions for MySQL provisioning. Instead of tapi_3800_sand and tapi_3800_tran, where 3800 is my token ID, the new database names are sand_232_3800 and tran_3800, where 232 is project ID for easier distinction between projects in your sandbox userspace. Existing credentials keep their database names.

We also added a little more information to events:

AWS Connectivity Issues

AWS recently issued some information about connectivity issue in US-EAST-1 Region, where majority of our infrastructure is located. This may result in 500, 503 and 504 application errors within the infrastructure (our components) or when reaching out to other APIs (extractors). 

We're sorry for any inconvenience. we'll keep this post updated with current status. You can also check the current status at http://status.aws.amazon.com/, row Amazon Elastic Compute Cloud (N. Virginia).

---

9:23 AM PST We are investigating possible Internet connectivity issues in the US-EAST-1 Region.

10:09 AM PST We are continuing to investigate Internet connectivity issues in the US-EAST-1 Region.

11:07 AM PST We are continuing to investigate Internet connectivity issues in the US-EAST-1 Region. This is impacting connectivity between some customer networks and the region. Connectivity within the US-EAST-1 Region is not impacted.

12:23 PM PST We continue to make progress in resolving an issue with an Internet provider outside of our network in the US-EAST-1 Region. Internet connectivity between some customer networks and the region may have been impacted by this issue. We have taken action to address the impact and are seeing recovery for many of the affected instances. Connectivity within the US-EAST-1 Region remains unaffected.

1:44 PM PST We continue to make progress in resolving the Internet connectivity issue between customer networks and affected instances. Connectivity within the US-EAST-1 Region remains unaffected.

2:21 PM PST We experienced an issue with an Internet provider outside of our network that impacted connectivity between some customer networks and the US-EAST-1 Region. Connectivity to instances and services within the region was not affected by the event. The issue has been mitigated, and impacted customers should no longer have problems connecting to instances in the US-EAST-1 Region.