Регулярно требуется преобразовать какой-либо текст в максимально совместимый текст для URL, имени файла, имени объекта в каком-то софте и тд. Требования совместимости простые: в тексте должны быть только допустимые символы. Обычно это a-z, 0-9 и "_" или "-". То есть, только прописные буквы латинского алфавита и цифры (как пример).
Допустим, нам нужно название статьи в блоге преобразовать в slug для добавления его в URL этой статьи. Как это лучше всего сделать?
В Django по умолчанию есть готовая функция slugify для таких случаев.
Но я её никогда не использую. Почему? Потому что её недостаточно!
Приведём пример
>>> from django.utils.text import slugify
>>> slugify('This is a Title')
'this-is-a-title'
Пока всё отлично
>>> slugify('This is a "Title!"')
'this-is-a-title'
Спец символы удалились, всё хорошо.
>>> slugify('Это заголовок статьи')
''
Вот и приехали 😢. Если текст не английский то буквы просто игнорируются. Можно это поправить
>>> slugify('Это заголовок статьи', allow_unicode=True)
'это-заголовок-статьи'
Но тогда мы не вписываемся в условие. У нас появилась кириллица в тексте.
Так как я часто пишу сайты для русскоязычных пользователей эта проблема весьма актуальна. Я не использую стандартную функцию и всегда пишу свою.
Оригинал я не беру в расчёт и пишу полностью свою функцию. И так, по порядку:
🔸1. Исходный текст:
>>> text = 'Мой заголовок №10 😁!'
Взял специально посложней со специальными символами.
🔸2. Транслит
Необходимо сделать транслит всех символов в латиницу. Здесь очень выручает библиотека unidecode. Помимо простого транслита кириллицы в латиницу она умеет преобразовывать спец символы и иероглифы в текстовые аналоги.
from unidecode import unidecode
>>> unidecode("Ñ Σ ® µ ¶ ¼ 月 山")
'N S (r) u P 1/4 Yue Shan'
Очень крутая библиотека, советую👍
В нашем случае получаем такое преобразование:
>>> text = unidecode(text)
>>> print(text)
'Moi zagolovok No. 10 !'
Отличный транслит. Смайл просто удалился, хотя я ждал что-то вроде :). Ну и ладно, всë равно невалидные символы.
А еще наш код уже поддерживает любой язык, будь то хинди или корейский.
🔸4. Фильтр символов
Unidecode не занимается фильтрацией по недопустимым символам. Это мы делаем в следующем шаге через regex. Просто заменим все символы на "_" если они вне указанного диапазона.
>>> text = re.sub(r'[^a-zA-Z0-9]+', '_', text)
>>> print(text)
'Moi_zagolovok_No_10_'
Символ "+" в паттерне выручает когда несколько недопустимых символов идут рядом. Все они заменяются на один символ "_".
🔸5. Slugify
Осталось удалить лишние символы по краям и сделать нижний регистр
>>> text = text.strip('_').lower()
>>> print(text)
'moi_zagolovok_no_10'
Получаем отличный slug! 😎
🌎 Полный код в виде функции.
______________
PS. Проверку что в строке остался хоть один допустимый символ я бы вынес в отдельную функцию.
#libs#tricks#django
#SGDengueUpdate 153 new dengue cases and 53 active dengue clusters were reported in the week ending 4 Nov 2023.
If you are living in a dengue cluster area, or have been diagnosed with or are suspected to be infected with dengue,carry out ‘S-A-W’ and ‘B-L-O-C-K’ actions immediately: https://go.gov.sg/mozzieproof
⚠️ Continued urgent action from all is critical to prevent a surge in dengue cases at year-end.
#SGDengueUpdate 191 new dengue cases and 50 active dengue clusters were reported in the week ending 28 Oct 2023.
If you are living in a dengue cluster area, or have been diagnosed with or are suspected to be infected with dengue,carry out ‘S-A-W’ and ‘B-L-O-C-K’ actions immediately: https://go.gov.sg/mozzieproof
⚠️ Continued urgent action from all is critical to prevent a surge in dengue cases at year-end.
#SGDengueUpdate 215 new dengue cases and 66 active dengue clusters were reported in the week ending 14 Oct 2023.
If you are living in a dengue cluster area, or have been diagnosed with or are suspected to be infected with dengue,carry out ‘S-A-W’ and ‘B-L-O-C-K’ actions immediately: https://go.gov.sg/mozzieproof
The weekly number of dengue cases has increased steadily, and is expected to rise beyond the traditional peak dengue season, with many dengue clusters across the island. Urgent action from all is critical to prevent a surge in dengue cases at year-end.
#SGDengueUpdate 267 new dengue cases and 90 active dengue clusters were reported in the week ending 7 Oct 2023.
If you are living in a dengue cluster area, or have been diagnosed with or suspected to be infected with dengue,carry out ‘S-A-W’ and ‘B-L-O-C-K’ actions immediately: https://go.gov.sg/mozzieproof
⚠️ The number of dengue cases is expected to rise beyond the traditional peak dengue season, as fast rate of dengue transmission is seen in dengue clusters across the island. Urgent action from all is critical to prevent a surge in dengue cases at year-end.
#SGDengueUpdate 343 new dengue cases and 76 active dengue clusters were reported in the week ending 30 Sep 2023. No new Zika cases were reported last week.
If you are living in a dengue cluster area, or have been diagnosed with or suspected to be infected with dengue,carry out ‘S-A-W’ and ‘B-L-O-C-K’ actions immediately: https://go.gov.sg/mozzieproof
#SGDengueUpdate 303 new dengue cases and 64 active dengue clusters were reported in the week ending 23 Sep 2023. No new Zika cases were reported last week.
If you are living in a dengue cluster area, or have been diagnosed with or suspected to be infected with dengue,carry out ‘S-A-W’ and ‘B-L-O-C-K’ actions immediately: https://go.gov.sg/mozzieproof
#SGDengueUpdate 330 new dengue cases and 56 active dengue clusters were reported in the week ending 16 Sep 2023. Two new Zika cases were also reported last week.
The number of dengue cases at Toa Payoh remains high. If you are living in a dengue cluster area, or have been diagnosed with or suspected to be infected with dengue,carry out ‘S-A-W’ and ‘B-L-O-C-K’ actions immediately: https://go.gov.sg/mozzieproof
#SGDengueUpdate 261 new dengue cases and 53 active dengue clusters were reported in the week ending 9 Sep 2023. There are two fast-growing dengue clusters at Science Park Drive (47 cases) and Lentor Loop (33 cases). The number of dengue cases at Toa Payoh remains high.
If you are living in a dengue cluster area, or have been diagnosed with or suspected to be infected with dengue,carry out ‘S-A-W’ and ‘B-L-O-C-K’ actions immediately: https://go.gov.sg/mozzieproof
#SGDengueUpdate 213 new dengue cases and 42 active dengue clusters were reported in the week ending 2 Sep 2023. There are two fast-growing dengue clusters at Science Park Drive (26 cases) and Lentor Loop (22 cases). The number of dengue cases at Toa Payoh remains high.
Particularly if you are living in a dengue cluster area, or have been diagnosed with or suspected to be infected with dengue,carry out ‘S-A-W’ and ‘B-L-O-C-K’ actions immediately: https://go.gov.sg/mozzieproof
#SGDengueUpdate Over 4,500 dengue cases have been reported this year, and there are currently 12 active dengue clusters. Although recent weekly dengue cases have been lower than during the same period in 2020, we should all continue to stay vigilant, to prevent a year-end rise in dengue cases amid higher circulation of less common Dengue virus serotype 3 (DENV-3)! Learn how you can prevent Aedes mosquitoes from breeding in your home: www.nea.gov.sg/dengue-zika/prevent-aedes-mosquito-breedin
#SGDengueUpdate The number of weekly dengue cases has come down slightly to 156 cases as of 23 Jan. However, the Aedes aegypti mosquito is still high in parts of Singapore. Coupled with a sizeable proportion of people still working from home, the risk of dengue remains high in 2021.
The strong support from our stakeholders and the public helped to bring down the weekly number of dengue cases from its peak of 1,792 in Jul 2020. Let’s stay vigilant in this fight against dengue! Here’s how: www.nea.gov.sg/stop-dengue
#SGDengueUpdate 281 weekly dengue cases were reported last week, reversing the downward trend in the number of weekly dengue cases over the past 6 weeks. The Aedes mosquito population has also increased by 11% since the second week of Nov. Let’s all stay vigilant and do our part to reduce the Aedes mosquito population! Learn more: go.gov.sg/mozzieproof