Storage bucket and tables using MySQL are now only estimating their rows count and data size (using data from SHOW TABLE STATUS query). Counting rows manually causes a full table scan and performs poorly. Redshift data is accurate.
Storage bucket and tables using MySQL are now only estimating their rows count and data size (using data from SHOW TABLE STATUS query). Counting rows manually causes a full table scan and performs poorly. Redshift data is accurate.
One of Storage API backend servers was unavailable between 6:30am - 6:50am PST. This might have caused some orchestration failures, some UIs might not be responding (eg Orchestration UI, stuck jobs in the Writer UI). Sorry for any inconvenience.
GoodData will be performing maintenance on our hardware on Saturday, September 13th around 1pm CET (4am PST). The expected length of the maintenance window is about 5 hours. Some of our client's GoodData projects will be inaccessible at that time. We apologize in advance for any inconvenience - the maintenance is necessary in order to prevent errors in near future. We will be monitoring all data load orchestrations and will restart them as needed.
The Transformation UI now shows size of the table and backend type the bucket (mysql or redshift) in input and output mapping of a transformation.
Please note, that it does not show data transfer size, but the size of the respective tables in Storage.
Due to peak load we had to reboot the main (shared) Redshift cluster. All transformation and sandboxes are stopped and purged, data stored on this cluster is temporarily unavailable. Will be back soon so you can resume your operations.
We apologize for any inconvenience and kindly remind you, that you can have your own Redshift cluster. For more information get in touch with support@keboola.com.
We switched the Salesforce Extractor from using Salesforce API v23.0 to API v31.0. Let us know if you encounter any problems.
Part of our frontend infrastructure became inaccessible. This might have caused some orchestration failures, some UIs might not be responding (eg Orchestration UI, stuck jobs in the Writer UI). We're putting the server back online and will resume all failed orchestrations. Sorry for any inconvenience.
Update 3:30pm PST: The problem still persists.
Update 5:10pm PST: Problem resolved. All failed orchestrations were resumed.
Update 11:30pm PST: We're experiencing network connectivity issues. Some orchestrations might end up with a CURL 52, 36 or 35 errors. Feel free to retry, while we're trying to get this resolved.
Update 10:00am PST: All problems resolved, everything running OK now.
Update 5:00pm PST: We got back from AWS support: "Yesterday, we experienced an issue on our end that impacted some traffic going to any public IP address, where the source interface's MTU was above 1500. Since your instance most likely has an MTU of 9001 (Commonly default for large instances), it may have been impacted by this."
Here's a list of things that have changed in last couple weeks, in case you didn't notice in the UI directly.
Support for custom transformation or sandbox credentials was dropped in the UI. Everything in the UI is provisioned. You can still use your own credentials using the API.
When adding an input mapping in a Redshift transformation from a Redshift (Storage) backend, the engine chooses the fastest data transfer route. As the transformations and storage share the same cluster, the table never leaves the database and is transferred directly within the cluster. You have an additional option to set, if the input mapping is performed as a CREATE VIEW or CREATE TABLE from the original table. View is faster (there is no data transfer at all), but might cause memory issues, if you start piling up views on top of each other, table is slower (the data is duplicated), but may perform better on larger set of queries.
Also COPY Options pane disappeared from input mappings within Redshift, they didn't make any sense there.
Note: Be careful with mismatching datatypes.
Sandbox Credentials page now shows both MySQL and Redshift credentials with all their details.
You can now terminate both MySQL and Redshift processes.
Note: Terminating a database process is not immediate, it might take some time - there are some rollbacks on the backend.
Note: Running a Redshift transformation uses both MySQL and Redshift credentials. Terminating any of them will end the transformation.
Last couple months we've been working hard to deliver this new exciting feature. The day is very close, but we'd like to invite you to our release preview. Until the end of August we're running Redshift as Beta or Release Preview, polishing bugs and delivering the best possible user experience and performance boost.
To get started with Redshift, you need to create a Redshift bucket in your Storage. The Storage API console allows this or contact support@keboola.com. Once you have at least one Redshift bucket in your storage (you don't need to store any data in there), the provisioning, transformations and other features will unlock.
You can create transformations (and sandboxes) that use AWS Redshift as their backend. Your current MySQL transformations are incompatible, but the Redshift SQL syntax is very similar. In the input mappings there are some new options (SORTKEY, DISTKEY, COPY command options, datatypes), that are Redshift specific. To take full advantage of our next steps we'd recommend that your Redshift transformations use data from Redshift storage buckets - this will basically eliminate all input and output transfers. But you can use data from regular buckets as well.
For a Redshift sandbox your Sequel Pro or the native Adminer app are unfortunately useless. You can use the free version of JackDB web app, the Amazon recommended SQL Workbench/J, DBeaver or 0xDBE. Always be careful to use only the schema provided you in the credentials.
During this Release Preview period the usage is free without any guarantee, contact support@keboola.com for production deployment during this period. Be careful with big data loads to non-production cluster. As a rule of thumb don't use it for anything >10G (all tables combined).
All newly created writers will use GoodData's Project Model API by default (also known as LDM API). Existing projects still use the CL tool but will be switched in near future.
New writers also don't use date facts in datasets referencing date dimensions (this does not apply for time dimensions). Counting without Date facts is covered by "Date Attribute Arithmetic". Older projects have reports built upon these facts and can't be switched automatically so far.