We'll be introducing a limit on size of tables that are imported in a MySQL transformation.
Why? Processing large tables in MySQL is very ineffective and slow, and it also negatively affects other users in the shared MySQL environment. To ensure your smooth user experience for everyone we'll be pushing all large transformations to a faster backend (Redshift and possibly also some others in the future).
This is an addition to query time limit, which focuses on (accidentally) unoptimized queries.
There will be two limits. A lower soft limit will warn you that you're exceeding the limit, but won't stop the transformation. A higher hard limit will stop the transformation immediately. Soft limit is just a warning, that you're processing larger amounts of data. You should take action only if you're getting close to the hard limit.
What to do, if you're exceeding the limit? There are few easy things to avoid breaking these limits:
- Incremental processing. Set up your pipeline as incremental and do not process all data every run. The limit measures only transferred data, not the whole table size.
- Move the transformation to Redshift and the relevant storage buckets as well. There are no such limits on Redshift. It's just way faster.
The soft limit is already in place and its size is 2GB (2147483648 bytes). You can find the warnings in your Event list by searching for "We recommend using Redshift for tables larger than 2147483648 bytes.".
The hard limit will be introduced on July 1st and the size will be 5GB (5368709120 bytes). On June 1st we will notify all affected users before this policy will come in place and will try to help finding a feasible solution.