SlideShare a Scribd company logo
@dankogai
my$talk=qr{((?:ir)?reg(?:ular )?
exp(?:ressions?)?)}i;
Table of Contents
• regexp? what is it?


• $supported_by ~~ @most_major_languages;


• but how (much)??


• Unicode support?


• assertions?


• modifiers?


• Irregular expressions


• qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}


• use CPAN;


• Regexp::Assemble;


• Regexp::Common;


• (ir)?regular questions (?:from|by) the audience
regexp? what is it?
Mathematically speaking[*]
• The empty language Ø is a regular language.

• For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular
language.

• If A is a regular language, A* (Kleene star) is a regular language. Due to this,
the empty string language {ε} is also regular.

• If A and B are regular languages, then A ∪ B (union) and A • B (concatenation)
are regular languages.

• No other languages over Σ are regular.
regexp? what is it?
In our language
• 0 or more of… (quantifier)


• '' # empty string


• 'string' # any string


• '(?:string|文字列)' # any alteration of strings


• That's it!


• ? # {0,}


• + # {1,}


• [0-9] # (?:0|1|2|3|4|5|6|7|8|9)
regexp? what is it?
((?:ir)?reg(?:ular )?exp(?:ressions?)?)
Visualized by: regexper.com
regexp? what is it?
(?:[x00-x7F]|[xC2-xDF][x80-xBF]|xE0[xA0-xBF][x80-xBF]|[xE1-xECxEExEF][x80-xBF]{2}|xED[x80-x9F][x80-xBF]|
xF0[x90-xBF][x80-xBF]{2}|[xF1-xF3][x80-xBF]{3}|xF4[x80-x8F][x80-xBF]{2})
Exerpt from: https://www.w3.org/International/questions/qa-forms-utf-8
Visualized by: regexper.com
regexp? what is it?
(?:[+-]?)(?:0x[0-9a-fA-F]+(?:.[0-9a-fA-F]+)?(?:[pP][+-]?[0-9]+)|(?:[1-9][0-9]*)(?:.[0-9]+)?(?:[eE][+-]?[0-9]+)?|0(?:.0+|(?:.0+)?(?:[eE]
[+-]?[0-9]+))|(?:[Nn]a[Nn]|[Ii]nf(?:inity)?))
Exerpt from: https://github.com/dankogai/js-sion/blob/main/sion.ts
Visualized by: regexper.com
Irregular expressions
/^(11+?)1+$/ # is this a regular expression?
$ seq 2 100 | perl -nlE 'say $_ if (1x$_) !~ /^(11+?)1+$/'
Irregular expressions
/^(11+?)1+$/ # is this a regular expression?
$ seq 2 100 | perl -nlE 'say $_ if (1x$_) !~ /^(11+?)1+$/'
2
3
5
7
…
79
83
89
97
Irregular expressions
/^(11+?)1+$/ # is NOT EXACTLY a regular expression!
• The problem is 1


• It is the result of the preceding capture


• In other words, this expression is self-modifying.


• So it is not mathematically a regular expression


• Regexp ≠ Regular Expression


• Regexp ⊆ Regular Expression
Irregular expressions
qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}
• Q: Can a regular expression match nested parentheses?


• A: No. But some regex engines allow you to do that.
Irregular expressions
qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}
my $re = qr{(
[A-Za-z_]w*s*
(
(
(
(?:
(?>[^()]+)
|
(?2)
)*
)
)
)
)
}x;
Irregular expressions
qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}
#!/usr/bin/env perl
use strict;
use warnings;
use feature ':all';
my $re = qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))};
my $str = '$result = a(b(c),d(e,f(g,g,g)))';
$str =~ $re;
say $1;
say $2;
say $3;
Unicode Support
What is a character?
• String is /.*/ but . =


• [x00-xff] # legacy world of bytes


• [u0000-uFFFF] # prematurely modern


• [u{0000}-u{10FFFF}] # correctly modern
Unicode Support
What is a character?
• String is /.*/ but . =


• [x00-xff] # Perl < 5.7


• [u0000-uFFFF] # Java(Script)?, Python2, …


• [u{0000}-u{10FFFF}] # Perl, Ruby, Python3, …
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'
$VAR1 = [
"x{1f98f}",
"x{1f42a}",
"x{1f418}",
"x{1f40d}",
"x{1f48e}",
"x{2699}"
];
Unicode Support?
What will the following say?
$ node -e 
'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'
Unicode Support?
What will the following say?
$ node -e 
'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'
[
'', '', '', '',
'', '', '', '',
'', '', '⚙'
]
Unicode Support?
What will the following say?
$ node -e 
'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/ug))'
[ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
$VAR1 = [
"x{1f1ef}",
"x{1f1f5}",
"x{1f1fa}",
"x{1f1e6}"
];
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
$VAR1 = [
"x{1f1ef}", # REGIONAL INDICATOR SYMBOL LETTER J
"x{1f1f5}", # REGIONAL INDICATOR SYMBOL LETTER P
"x{1f1fa}", # REGIONAL INDICATOR SYMBOL LETTER U
"x{1f1e6}" # REGIONAL INDICATOR SYMBOL LETTER A
];
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🇯🇵🇺🇦" =~ /(X)/g); say Dumper([@m])'
$VAR1 = [
"x{1f1ef}x{1f1f5}",
"x{1f1fa}x{1f1e6}"
];
Unicode Support?
What will the following say?
$ node -e 
'console.log("🇯🇵🇺🇦".match(/(.)/ug))'
Unicode Support?
What will the following say?
$ node -e 
'console.log("🇯🇵🇺🇦".match(/(.)/ug))'
[ '🇯', '🇵', '🇺', '🇦' ]
Unicode Support?
What will the following say?
$ node -e 
'console.log("🇯🇵🇺🇦".match(/(X)/ug))'
🙅 [ '🇯🇵','🇺🇦' ]
🙆 SyntaxError: Invalid regular expression: /(X)/: Invalid escape
at [eval]:1:24
at Script.runInThisContext (node:vm:129:12)
at Object.runInThisContext (node:vm:305:38)
at node:internal/process/execution:75:19
at [eval]-wrapper:6:22
at evalScript (node:internal/process/execution:74:60)
at node:internal/main/eval_string:27:3
🤦
Unicode Support?
Grapheme Cluster
• Defined in:


• https://unicode.org/reports/tr29/


• X is supported by:


• 🐘 PHP


• 🐪 Perl


• 💎 Ruby


• Not yet supported by:


• 🦏 JavaScript


• 🐍 Python
use CPAN
Regexp::Common
$ perl -MRegexp::Common -E 'say $RE{net}{IPv6}'
use CPAN
Regexp::Common
$perl -MRegexp::Common -E 'say $RE{net}{IPv6}'
(?:(?|(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-
fA-F]{1,4})|(?::(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:):
(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]
{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:):(?:
[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:
[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]
{1,4}):(?:)(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]
{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-
F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-
F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)
(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):
(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]
{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):)))
use CPAN
Regexp::Assemble
$ egrep '^.{5}$' /usr/share/dict/words 
| perl -MRegexp::Assemble -nl 
-E 'BEGIN{$ra=Regexp::Assemble->new}' 
-E '$ra->add($_);' 
-E 'END{say $ra->re}'
Wrap↑
• (?:ir)?regular expressions


• Regexp ≠ Regular Expression


• Regexp ⊆ Regular Expression


• Definition of characters


• [x00-xff]


• [u0000-uFFFF]


• [u{0000}-u{10FFFF}]


• (?:un)?availability of X


• Using perl? use CPAN!
BTW
Bible, an obsolete
• 3rd edition: August 8,
2006


• Too old especially for
JS
Thank you
🙇
Questions and answers
answer($_) foreach (/($questions)/sg);

More Related Content

PPTX
CRX: Container Runtime Executive 
PDF
Aws Dev Day2021 「ドメイン駆動設計のマイクロサービスへの活用とデベロッパーに求められるスキル」参考資料(松岡パート)
PDF
4章 Linuxカーネル - 割り込み・例外 4
PDF
「実践ドメイン駆動設計」 から理解するDDD (2018年11月)
PDF
ISACA名古屋支部_2023年7月SR分科会_フィッシング詐欺.pdf
PDF
MongoDB Configパラメータ解説
PDF
JavaでWebサービスを作り続けるための戦略と戦術 JJUG-CCC-2018-Spring-g1
PDF
ソーシャルゲーム案件におけるDB分割のPHP実装
CRX: Container Runtime Executive 
Aws Dev Day2021 「ドメイン駆動設計のマイクロサービスへの活用とデベロッパーに求められるスキル」参考資料(松岡パート)
4章 Linuxカーネル - 割り込み・例外 4
「実践ドメイン駆動設計」 から理解するDDD (2018年11月)
ISACA名古屋支部_2023年7月SR分科会_フィッシング詐欺.pdf
MongoDB Configパラメータ解説
JavaでWebサービスを作り続けるための戦略と戦術 JJUG-CCC-2018-Spring-g1
ソーシャルゲーム案件におけるDB分割のPHP実装

What's hot (20)

PDF
中3女子でもわかる constexpr
PDF
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
PDF
すごいBOSHたのしく学ぼう
PDF
Twitterのsnowflakeについて
PDF
ドメイン駆動設計 基本を理解する
PDF
インメモリーデータグリッドの選択肢
PDF
vcs_infoを使おう
PDF
SQLアンチパターン~ファントムファイル
PDF
勉強か?趣味か?人生か?―プログラミングコンテストとは
PPTX
ウェブセキュリティのありがちな誤解を解説する
ODP
Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
PDF
HashiCorpのNomadを使ったコンテナのスケジューリング手法
PDF
いつやるの?Git入門 v1.1.0
PDF
Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜
PDF
Webブラウザ上で動作する帳票エンジンを作る話
PDF
20分でわかるgVisor入門
PDF
CentOS 8で標準搭載! 「389-ds」で構築する 認証サーバーについて
PDF
3週連続DDDその1 ドメイン駆動設計の基本を理解する
PDF
はじめてのグラフデータベース 〜 Amazon Neptune と主なユースケース 〜
PDF
認証の標準的な方法は分かった。では認可はどう管理するんだい? #cmdevio
中3女子でもわかる constexpr
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
すごいBOSHたのしく学ぼう
Twitterのsnowflakeについて
ドメイン駆動設計 基本を理解する
インメモリーデータグリッドの選択肢
vcs_infoを使おう
SQLアンチパターン~ファントムファイル
勉強か?趣味か?人生か?―プログラミングコンテストとは
ウェブセキュリティのありがちな誤解を解説する
Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
HashiCorpのNomadを使ったコンテナのスケジューリング手法
いつやるの?Git入門 v1.1.0
Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜
Webブラウザ上で動作する帳票エンジンを作る話
20分でわかるgVisor入門
CentOS 8で標準搭載! 「389-ds」で構築する 認証サーバーについて
3週連続DDDその1 ドメイン駆動設計の基本を理解する
はじめてのグラフデータベース 〜 Amazon Neptune と主なユースケース 〜
認証の標準的な方法は分かった。では認可はどう管理するんだい? #cmdevio
Ad

Similar to my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i; (20)

PDF
Unicode Regular Expressions
PDF
Regexp secrets
PDF
Ruby presentasjon på NTNU 22 april 2009
PDF
Ruby presentasjon på NTNU 22 april 2009
PDF
Ruby presentasjon på NTNU 22 april 2009
PDF
Recursive descent parsing
PDF
Out with Regex, In with Tokens
PPT
Perl Presentation
PDF
Good Evils In Perl
PDF
Perl6 a whistle stop tour
PPTX
Perl6 a whistle stop tour
PDF
Lecture 23
PPTX
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
PPTX
Regular Expression
PDF
Abstract machines for great good
PPTX
Regexes in .NET
PPT
Regular Expressions
PDF
Os Fetterupdated
ODP
Writing Maintainable Perl
PDF
Working with text, Regular expressions
Unicode Regular Expressions
Regexp secrets
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
Recursive descent parsing
Out with Regex, In with Tokens
Perl Presentation
Good Evils In Perl
Perl6 a whistle stop tour
Perl6 a whistle stop tour
Lecture 23
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Regular Expression
Abstract machines for great good
Regexes in .NET
Regular Expressions
Os Fetterupdated
Writing Maintainable Perl
Working with text, Regular expressions
Ad

Recently uploaded (20)

PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Transform Your Business with a Software ERP System
PPTX
assetexplorer- product-overview - presentation
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
L1 - Introduction to python Backend.pptx
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Operating system designcfffgfgggggggvggggggggg
Design an Analysis of Algorithms I-SECS-1021-03
history of c programming in notes for students .pptx
Softaken Excel to vCard Converter Software.pdf
top salesforce developer skills in 2025.pdf
Digital Strategies for Manufacturing Companies
Digital Systems & Binary Numbers (comprehensive )
wealthsignaloriginal-com-DS-text-... (1).pdf
Transform Your Business with a Software ERP System
assetexplorer- product-overview - presentation
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
How to Choose the Right IT Partner for Your Business in Malaysia
Why Generative AI is the Future of Content, Code & Creativity?
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Understanding Forklifts - TECH EHS Solution
Navsoft: AI-Powered Business Solutions & Custom Software Development
Upgrade and Innovation Strategies for SAP ERP Customers
L1 - Introduction to python Backend.pptx

my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;

  • 2. Table of Contents • regexp? what is it? • $supported_by ~~ @most_major_languages; • but how (much)?? • Unicode support? • assertions? • modifiers? • Irregular expressions • qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))} • use CPAN; • Regexp::Assemble; • Regexp::Common; • (ir)?regular questions (?:from|by) the audience
  • 3. regexp? what is it? Mathematically speaking[*] • The empty language Ø is a regular language. • For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language. • If A is a regular language, A* (Kleene star) is a regular language. Due to this, the empty string language {ε} is also regular. • If A and B are regular languages, then A ∪ B (union) and A • B (concatenation) are regular languages. • No other languages over Σ are regular.
  • 4. regexp? what is it? In our language • 0 or more of… (quantifier) • '' # empty string • 'string' # any string • '(?:string|文字列)' # any alteration of strings • That's it! • ? # {0,} • + # {1,} • [0-9] # (?:0|1|2|3|4|5|6|7|8|9)
  • 5. regexp? what is it? ((?:ir)?reg(?:ular )?exp(?:ressions?)?) Visualized by: regexper.com
  • 6. regexp? what is it? (?:[x00-x7F]|[xC2-xDF][x80-xBF]|xE0[xA0-xBF][x80-xBF]|[xE1-xECxEExEF][x80-xBF]{2}|xED[x80-x9F][x80-xBF]| xF0[x90-xBF][x80-xBF]{2}|[xF1-xF3][x80-xBF]{3}|xF4[x80-x8F][x80-xBF]{2}) Exerpt from: https://www.w3.org/International/questions/qa-forms-utf-8 Visualized by: regexper.com
  • 7. regexp? what is it? (?:[+-]?)(?:0x[0-9a-fA-F]+(?:.[0-9a-fA-F]+)?(?:[pP][+-]?[0-9]+)|(?:[1-9][0-9]*)(?:.[0-9]+)?(?:[eE][+-]?[0-9]+)?|0(?:.0+|(?:.0+)?(?:[eE] [+-]?[0-9]+))|(?:[Nn]a[Nn]|[Ii]nf(?:inity)?)) Exerpt from: https://github.com/dankogai/js-sion/blob/main/sion.ts Visualized by: regexper.com
  • 8. Irregular expressions /^(11+?)1+$/ # is this a regular expression? $ seq 2 100 | perl -nlE 'say $_ if (1x$_) !~ /^(11+?)1+$/'
  • 9. Irregular expressions /^(11+?)1+$/ # is this a regular expression? $ seq 2 100 | perl -nlE 'say $_ if (1x$_) !~ /^(11+?)1+$/' 2 3 5 7 … 79 83 89 97
  • 10. Irregular expressions /^(11+?)1+$/ # is NOT EXACTLY a regular expression! • The problem is 1 • It is the result of the preceding capture • In other words, this expression is self-modifying. • So it is not mathematically a regular expression • Regexp ≠ Regular Expression • Regexp ⊆ Regular Expression
  • 11. Irregular expressions qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))} • Q: Can a regular expression match nested parentheses? • A: No. But some regex engines allow you to do that.
  • 12. Irregular expressions qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))} my $re = qr{( [A-Za-z_]w*s* ( ( ( (?: (?>[^()]+) | (?2) )* ) ) ) ) }x;
  • 13. Irregular expressions qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))} #!/usr/bin/env perl use strict; use warnings; use feature ':all'; my $re = qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}; my $str = '$result = a(b(c),d(e,f(g,g,g)))'; $str =~ $re; say $1; say $2; say $3;
  • 14. Unicode Support What is a character? • String is /.*/ but . = • [x00-xff] # legacy world of bytes • [u0000-uFFFF] # prematurely modern • [u{0000}-u{10FFFF}] # correctly modern
  • 15. Unicode Support What is a character? • String is /.*/ but . = • [x00-xff] # Perl < 5.7 • [u0000-uFFFF] # Java(Script)?, Python2, … • [u{0000}-u{10FFFF}] # Perl, Ruby, Python3, …
  • 16. Unicode Support? What will the following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'
  • 17. Unicode Support? What will the following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "x{1f98f}", "x{1f42a}", "x{1f418}", "x{1f40d}", "x{1f48e}", "x{2699}" ];
  • 18. Unicode Support? What will the following say? $ node -e 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'
  • 19. Unicode Support? What will the following say? $ node -e 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))' [ '', '', '', '', '', '', '', '', '', '', '⚙' ]
  • 20. Unicode Support? What will the following say? $ node -e 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/ug))' [ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]
  • 21. Unicode Support? What will the following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
  • 22. Unicode Support? What will the following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "x{1f1ef}", "x{1f1f5}", "x{1f1fa}", "x{1f1e6}" ];
  • 23. Unicode Support? What will the following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "x{1f1ef}", # REGIONAL INDICATOR SYMBOL LETTER J "x{1f1f5}", # REGIONAL INDICATOR SYMBOL LETTER P "x{1f1fa}", # REGIONAL INDICATOR SYMBOL LETTER U "x{1f1e6}" # REGIONAL INDICATOR SYMBOL LETTER A ];
  • 24. Unicode Support? What will the following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🇯🇵🇺🇦" =~ /(X)/g); say Dumper([@m])' $VAR1 = [ "x{1f1ef}x{1f1f5}", "x{1f1fa}x{1f1e6}" ];
  • 25. Unicode Support? What will the following say? $ node -e 'console.log("🇯🇵🇺🇦".match(/(.)/ug))'
  • 26. Unicode Support? What will the following say? $ node -e 'console.log("🇯🇵🇺🇦".match(/(.)/ug))' [ '🇯', '🇵', '🇺', '🇦' ]
  • 27. Unicode Support? What will the following say? $ node -e 'console.log("🇯🇵🇺🇦".match(/(X)/ug))' 🙅 [ '🇯🇵','🇺🇦' ] 🙆 SyntaxError: Invalid regular expression: /(X)/: Invalid escape at [eval]:1:24 at Script.runInThisContext (node:vm:129:12) at Object.runInThisContext (node:vm:305:38) at node:internal/process/execution:75:19 at [eval]-wrapper:6:22 at evalScript (node:internal/process/execution:74:60) at node:internal/main/eval_string:27:3
  • 28. 🤦
  • 29. Unicode Support? Grapheme Cluster • Defined in: • https://unicode.org/reports/tr29/ • X is supported by: • 🐘 PHP • 🐪 Perl • 💎 Ruby • Not yet supported by: • 🦏 JavaScript • 🐍 Python
  • 30. use CPAN Regexp::Common $ perl -MRegexp::Common -E 'say $RE{net}{IPv6}'
  • 31. use CPAN Regexp::Common $perl -MRegexp::Common -E 'say $RE{net}{IPv6}' (?:(?|(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a- fA-F]{1,4})|(?::(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:): (?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F] {1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:):(?: [0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?: [0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F] {1,4}):(?:)(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F] {1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA- F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA- F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:) (?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:): (?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F] {1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):)))
  • 32. use CPAN Regexp::Assemble $ egrep '^.{5}$' /usr/share/dict/words | perl -MRegexp::Assemble -nl -E 'BEGIN{$ra=Regexp::Assemble->new}' -E '$ra->add($_);' -E 'END{say $ra->re}'
  • 33. Wrap↑ • (?:ir)?regular expressions • Regexp ≠ Regular Expression • Regexp ⊆ Regular Expression • Definition of characters • [x00-xff] • [u0000-uFFFF] • [u{0000}-u{10FFFF}] • (?:un)?availability of X • Using perl? use CPAN!
  • 34. BTW Bible, an obsolete • 3rd edition: August 8, 2006 • Too old especially for JS
  • 36. Questions and answers answer($_) foreach (/($questions)/sg);