-
Notifications
You must be signed in to change notification settings - Fork 15.5k
Fix MySQL database character set instruction #17603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The currently instructed character set `utf8mb4 COLLATE utf8mb4_unicode_ci;` does not work on mysql 8. When I do: `airflow db init` the following error occurs: ` sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1071, 'Specified key was too long; max key length is 3072 bytes') ` Changing to this character set: `utf8 COLLATE utf8_general_ci;` solved the problem
The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
Awesome work, congrats on your first merged pull request! |
I’m very late here but is this really the correct solution? Setting a MySQL db to |
Well agree this has been tactical only @uranusjr And we actually HAVE a fix but it is purely manual configuration one and it likely could be improved in terms of "applied automatically when needed": I think the dedicated collation just has to be "injected" into the sqlalchemy automatically when someone uses any variant of utf8mb4, It was left to discretion of the users to change it but this is a) difficult to discover b) possibly can be set automatically |
Do you feel you know intricacies of sqlachemy (including any migration, detection of all the cases when it is needed) etc. to do it automatcally :D ? |
Alternatively we could simply check at migrate and throw an exception "you are using utf8mb4, please set this collation_for_ids_to ...". |
I don’t know sqlalchemy… |
OK. I think I addressed it it in #17729 |
No SQLAlchemy internals needed ;). |
The index size is too big in case utf8mb4 is used as encoding for MySQL database. We already had `sql_engine_collation_for_ids` configuration to allow the id fields to use different collation, but the user had to set it up manually in case of a failure to create a db and it was not obvious, not discoverable and rather clumsy. Since this is really only a problem with MySQL the easy solution is to force this parameter to utf8mb3_general_ci for all mysql databases. It has no negative consequences, really as all relevant IDs are ASCII anyway. Related: apache#17603
The index size is too big in case utf8mb4 is used as encoding for MySQL database. We already had `sql_engine_collation_for_ids` configuration to allow the id fields to use different collation, but the user had to set it up manually in case of a failure to create a db and it was not obvious, not discoverable and rather clumsy. Since this is really only a problem with MySQL the easy solution is to force this parameter to utf8mb3_general_ci for all mysql databases. It has no negative consequences, really as all relevant IDs are ASCII anyway. Related: #17603
Version 2.9.3: Restart after replacing mysql. Report the following error: INFO - Filling up the DagBag from /dev/null |
The currently instructed character set
utf8mb4 COLLATE utf8mb4_unicode_ci;
does not work on mysql 8.When I do:
airflow db init
the following error occurs:sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1071, 'Specified key was too long; max key length is 3072 bytes')
Changing to this character set:
utf8 COLLATE utf8_general_ci;
solved the problem