Skip to content

Fix MySQL database character set instruction #17603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 13, 2021
Merged

Conversation

tdnguyen6
Copy link
Contributor

The currently instructed character set utf8mb4 COLLATE utf8mb4_unicode_ci; does not work on mysql 8.
When I do: airflow db init the following error occurs:
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1071, 'Specified key was too long; max key length is 3072 bytes')
Changing to this character set: utf8 COLLATE utf8_general_ci; solved the problem

The currently instructed character set `utf8mb4 COLLATE utf8mb4_unicode_ci;` does not work on mysql 8. 
When I do: `airflow db init` the following error occurs:
`
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1071, 'Specified key was too long; max key length is 3072 bytes')
`
Changing to this character set: `utf8 COLLATE utf8_general_ci;` solved the problem
@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Aug 13, 2021
@github-actions
Copy link

The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit 88e3379 into apache:main Aug 13, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 13, 2021

Awesome work, congrats on your first merged pull request!

@tdnguyen6 tdnguyen6 deleted the patch-2 branch August 13, 2021 16:05
@uranusjr
Copy link
Member

Changing to this character set: utf8 COLLATE utf8_general_ci; solved the problem

I’m very late here but is this really the correct solution? Setting a MySQL db to utf8 is very frawned upon since this name is an alias to utf8mb3, which does not work with less common CJK characters and most emojis. utf8mb4 is the much more preferred encoding. In fact, the utf8mb3 encoding and its utf8 alias are actually deprecated in MySQL 8. So instead of recommending people to use utf8mb3, we should instead fix whatever is causing issues in the database so utf8mb4 can be correctly applied.

@potiuk
Copy link
Member

potiuk commented Aug 18, 2021

Well agree this has been tactical only @uranusjr

And we actually HAVE a fix but it is purely manual configuration one and it likely could be improved in terms of "applied automatically when needed":

https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#sql-engine-collation-for-ids

I think the dedicated collation just has to be "injected" into the sqlalchemy automatically when someone uses any variant of utf8mb4, It was left to discretion of the users to change it but this is a) difficult to discover b) possibly can be set automatically

@potiuk
Copy link
Member

potiuk commented Aug 18, 2021

Do you feel you know intricacies of sqlachemy (including any migration, detection of all the cases when it is needed) etc. to do it automatcally :D ?

@potiuk
Copy link
Member

potiuk commented Aug 18, 2021

Alternatively we could simply check at migrate and throw an exception "you are using utf8mb4, please set this collation_for_ids_to ...".

@uranusjr
Copy link
Member

Do you feel you know intricacies of sqlachemy

I don’t know sqlalchemy…

@potiuk
Copy link
Member

potiuk commented Aug 19, 2021

OK. I think I addressed it it in #17729

@potiuk
Copy link
Member

potiuk commented Aug 19, 2021

No SQLAlchemy internals needed ;).

potiuk added a commit to potiuk/airflow that referenced this pull request Aug 22, 2021
The index size is too big in case utf8mb4 is used as encoding
for MySQL database. We already had `sql_engine_collation_for_ids`
configuration to allow the id fields to use different collation,
but the user had to set it up manually in case of a failure to
create a db and it was not obvious, not discoverable and rather
clumsy.

Since this is really only a problem with MySQL the easy solution
is to force this parameter to utf8mb3_general_ci for all mysql
databases. It has no negative consequences, really as all
relevant IDs are ASCII anyway.

Related: apache#17603
potiuk added a commit that referenced this pull request Aug 22, 2021
The index size is too big in case utf8mb4 is used as encoding
for MySQL database. We already had `sql_engine_collation_for_ids`
configuration to allow the id fields to use different collation,
but the user had to set it up manually in case of a failure to
create a db and it was not obvious, not discoverable and rather
clumsy.

Since this is really only a problem with MySQL the easy solution
is to force this parameter to utf8mb3_general_ci for all mysql
databases. It has no negative consequences, really as all
relevant IDs are ASCII anyway.

Related: #17603
@potiuk potiuk mentioned this pull request Jul 20, 2022
2 tasks
@algorixco
Copy link

Version 2.9.3: Restart after replacing mysql. Report the following error:

INFO - Filling up the DagBag from /dev/null
/Users/liuweile/env/airflow/lib/python3.10/site-packages/flask_limiter/extension.py:333 UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
/Users/liuweile/env/airflow/lib/python3.10/site-packages/airflow/providers/fab/auth_manager/fab_auth_manager.py:498 FutureWarning: section/key [webserver/update_fab_perms] has been deprecated, you should use[fab/update_fab_perms] instead. Please update your conf.get* call to use the new name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:documentation okay to merge It's ok to merge this PR as it does not require more tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants