é¬¼è» æ£è¦è¡¨ç¾ Version 5.9.1 2007/09/05 使ç¨ææ³: ONIG_SYNTAX_RUBY (æ¢å®å¤) 1. åºæ¬è¦ç´ \ éé¿ä¿®é£¾ (ã¨ã¹ã±ã¼ã) æ£è¦è¡¨ç¾è¨å·ã®æå¹/ç¡å¹ã®å¶å¾¡ | é¸æå (...) å¼éå (ã°ã«ã¼ã) [...] æåéå (æåã¯ã©ã¹) 2. æå \t æ°´å¹³ã¿ã (0x09) \v åç´ã¿ã (0x0B) \n æ¹è¡ (0x0A) \r 復帰 (0x0D) \b å¾éç©ºç½ (0x08) \f æ¹é (0x0C) \a é (0x07) \e éé¿ä¿®é£¾ (0x1B) \nnn å «é²æ°è¡¨ç¾ 符å·åãã¤ãå¤(ã®ä¸é¨) \xHH åå é²æ°è¡¨ç¾ 符å·åãã¤ãå¤(ã®ä¸é¨) \x{7HHHHHHH} æ¡å¼µåå é²æ°è¡¨ç¾ ã³ã¼ããã¤ã³ãå¤ \cx å¶å¾¡æåè¡¨ç¾ ã³ã¼ããã¤ã³ãå¤ \C-x å¶å¾¡æåè¡¨ç¾ ã³ã¼ããã¤ã³ãå¤ \M-x è¶ (x|0x80) ã³ã¼ããã¤ã³ãå¤ \M-\C-x è¶ + å¶å¾¡æåè¡¨ç¾ ã³ã¼ããã¤ã³ãå¤ â» \bã¯ãæåéåå ã§ã®ã¿æå¹ 3. æå種 . ä»»ææå (æ¹è¡ãé¤ã) \w åèªæ§ææå Unicode以å¤ã®å ´å: è±æ°å, "_" ããã³ å¤ãã¤ãæåã Unicodeã®å ´å: General_Category -- (Letter|Mark|Number|Connector_Punctuation) \W éåèªæ§ææå \s 空ç½æå Unicode以å¤ã®å ´å: \t, \n, \v, \f, \r, \x20 Unicodeã®å ´å: 0009, 000A, 000B, 000C, 000D, 0085(NEL), General_Category -- Line_Separator -- Paragraph_Separator -- Space_Separator \S é空ç½æå \d 10é²æ°å Unicodeã®å ´å: General_Category -- Decimal_Number \D é10é²æ°å \h 16é²æ°å [0-9a-fA-F] \H é16é²æ°å Character Property * \p{property-name} * \p{^property-name} (negative) * \P{property-name} (negative) property-name: + å ¨ã¦ã®ã¨ã³ã³ã¼ãã£ã³ã°ã§æå¹ Alnum, Alpha, Blank, Cntrl, Digit, Graph, Lower, Print, Punct, Space, Upper, XDigit, Word, ASCII, + EUC-JP, Shift_JISã§æå¹ Hiragana, Katakana + UTF8, UTF16, UTF32ã§æå¹ Any, Assigned, C, Cc, Cf, Cn, Co, Cs, L, Ll, Lm, Lo, Lt, Lu, M, Mc, Me, Mn, N, Nd, Nl, No, P, Pc, Pd, Pe, Pf, Pi, Po, Ps, S, Sc, Sk, Sm, So, Z, Zl, Zp, Zs, Arabic, Armenian, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana, Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Ogham, Old_Italic, Old_Persian, Oriya, Osmanya, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi 4. éæå®å 欲張ã ? ä¸åã¾ãã¯é¶å * é¶åä»¥ä¸ + ä¸åä»¥ä¸ {n,m} nå以ä¸måä»¥ä¸ {n,} nåä»¥ä¸ {,n} é¶å以ä¸nåä»¥ä¸ ({0,n}) {n} nå ç¡æ¬² ?? ä¸åã¾ãã¯é¶å *? é¶åä»¥ä¸ +? ä¸åä»¥ä¸ {n,m}? nå以ä¸måä»¥ä¸ {n,}? nåä»¥ä¸ {,n}? é¶å以ä¸nåä»¥ä¸ (== {0,n}?) 強欲 (欲張ãã§ãç¹°ãè¿ãã«æåããå¾ã¯åæ°ãæ¸ãããããªå¾éå試è¡ãããªã) ?+ ä¸åã¾ãã¯é¶å *+ é¶åä»¥ä¸ ++ ä¸åä»¥ä¸ ({n,m}+, {n,}+, {n}+ ã¯ãONIG_SYNTAX_JAVAã§ã®ã¿å¼·æ¬²ãªæå®å) ä¾. /a*+/ === /(?>a*)/ 5. é¨ ^ è¡é $ è¡æ« \b åèªå¢ç \B éåèªå¢ç \A æååå é \Z æååæ«å°¾ãã¾ãã¯æååæ«å°¾ã®æ¹è¡ã®ç´å \z æååæ«å°¾ \G ç §åéå§ä½ç½® 6. æåéå ^... å¦å® (æä½åªå 度æ¼ç®å) x-y ç¯å² (xããyã¾ã§) [...] éå (æåéåå æåéå) ..&&.. ç©æ¼ç® (^ã®æ¬¡ã«åªå 度ãä½ãæ¼ç®å) ä¾. [a-w&&[^c-g]z] ==> ([a-w] and ([^c-g] or z)) ==> [abh-w] â» '[', '-', ']'ããæåéåå ã§é常æåã®æå³ã§ä½¿ç¨ãããå ´åã«ã¯ã ãããã®æåã'\'ã§éé¿ä¿®é£¾ããªããã°ãªããªãã POSIXãã©ã±ãã ([:xxxxx:], å¦å® [:^xxxxx:]) Unicode以å¤ã®å ´å: alnum è±æ°å alpha è±å ascii 0 - 127 blank \t, \x20 cntrl digit 0-9 graph å¤ãã¤ãæåå ¨é¨ãå«ã lower print å¤ãã¤ãæåå ¨é¨ãå«ã punct space \t, \n, \v, \f, \r, \x20 upper xdigit 0-9, a-f, A-F word è±æ°å, "_" ããã³ å¤ãã¤ãæå Unicodeã®å ´å: alnum Letter | Mark | Decimal_Number alpha Letter | Mark ascii 0000 - 007F blank Space_Separator | 0009 cntrl Control | Format | Unassigned | Private_Use | Surrogate digit Decimal_Number graph [[:^space:]] && ^Control && ^Unassigned && ^Surrogate lower Lowercase_Letter print [[:graph:]] | [[:space:]] punct Connector_Punctuation | Dash_Punctuation | Close_Punctuation | Final_Punctuation | Initial_Punctuation | Other_Punctuation | Open_Punctuation space Space_Separator | Line_Separator | Paragraph_Separator | 0009 | 000A | 000B | 000C | 000D | 0085 upper Uppercase_Letter xdigit 0030 - 0039 | 0041 - 0046 | 0061 - 0066 (0-9, a-f, A-F) word Letter | Mark | Decimal_Number | Connector_Punctuation 7. æ¡å¼µå¼éå (?#...) 注é (?imx-imx) å¤ç«ãªãã·ã§ã³ i: 大æåå°æåç §å m: è¤æ°è¡ x: æ¡å¼µå½¢å¼ (?imx-imx:å¼) å¼ãªãã·ã§ã³ (å¼) æç²å¼éå (?:å¼) éæç²å¼éå (?=å¼) å èªã¿ (?!å¼) å¦å®å èªã¿ (?<=å¼) æ»ãèªã¿ (?<!å¼) å¦å®æ»ãèªã¿ æ»ãèªã¿ã®å¼ã¯åºå®æåé·ã§ãªããã°ãªããªãã ããããæä¸ä½ã®é¸æåã ãã¯ç°ãªã£ãæåé·ã許ãããã ä¾. (?<=a|bc) ã¯è¨±å¯. (?<=aaa(?:b|cd)) ã¯ä¸è¨±å¯ å¦å®æ»ãèªã¿ã§ã¯ãæç²å¼éåã¯è¨±ãããªããã éæç²å¼éåã¯è¨±ãããã (?>å¼) ååçå¼éå å¼å ¨ä½ãééããã¨ããå¼ã®ä¸ã§ã®å¾éå試è¡ãè¡ãªããªã (?<name>å¼), (?'name'å¼) ååä»ãæç²å¼éå å¼éåã«ååãå²ãå½ã¦ã(å®ç¾©ãã)ã (ååã¯åèªæ§ææåã§ãªããã°ãªããªãã) ååã ãã§ãªããæç²å¼éåã¨åæ§ã«çªå·ãå²ãå½ã¦ãããã çªå·æå®ãç¦æ¢ããã¦ããªãç¶æ (10. æç²å¼éå ãåç §) ã®ã¨ãã¯ãååã使ããªãã§çªå·ã§ãåç §ã§ããã è¤æ°ã®å¼éåã«åãååãä¸ãããã¨ã¯è¨±ããã¦ããã ãã®å ´åã«ã¯ããã®ååã使ç¨ããå¾æ¹åç §ã¯å¯è½ã§ãããã é¨åå¼å¼åºãã¯ã§ããªãã 8. å¾æ¹åç § \n çªå·æå®åç § (n >= 1) \k<n> çªå·æå®åç § (n >= 1) \k'n' çªå·æå®åç § (n >= 1) \k<-n> ç¸å¯¾çªå·æå®åç § (n >= 1) \k'-n' ç¸å¯¾çªå·æå®åç § (n >= 1) \k<name> ååæå®åç § \k'name' ååæå®åç § ååæå®åç §ã§ããã®ååãè¤æ°ã®å¼éåã§å¤éå®ç¾©ããã¦ããå ´åã«ã¯ã çªå·ã®å¤§ããå¼éåããåªå çã«åç §ãããã (ãããããªãã¨ãã«ã¯çªå·ã®å°ããå¼éåãåç §ããã) â» çªå·æå®åç §ã¯ãååä»ãæç²å¼éåãå®ç¾©ããã ã㤠ONIG_OPTION_CAPTURE_GROUPãæå®ããã¦ããªãå ´åã«ã¯ã ç¦æ¢ãããã(10. æç²å¼éå ãåç §) ãã¹ãã¬ãã«ä»ãå¾æ¹åç § level: 0, 1, 2, ... \k<n+level> (n >= 1) \k<n-level> (n >= 1) \k'n+level' (n >= 1) \k'n-level' (n >= 1) \k<name+level> \k<name-level> \k'name+level' \k'name-level' å¾æ¹åç §ã®ä½ç½®ããç¸å¯¾çãªé¨åå¼å¼åºããã¹ãã¬ãã«ãæå®ãã¦ããã®ã¬ãã«ã§ã® æç²å¤ãåç §ããã ä¾-1. /\A(?<a>|.|(?:(?<b>.)\g<a>\k<b+0>))\z/.match("reer") ä¾-2. r = Regexp.compile(<<'__REGEXP__'.strip, Regexp::EXTENDED) (?<element> \g<stag> \g<content>* \g<etag> ){0} (?<stag> < \g<name> \s* > ){0} (?<name> [a-zA-Z_:]+ ){0} (?<content> [^<&]+ (\g<element> | [^<&]+)* ){0} (?<etag> </ \k<name+1> >){0} \g<element> __REGEXP__ p r.match('<foo>f<bar>bbb</bar>f</foo>').captures 9. é¨åå¼å¼åºã ("ç°ä¸å²ã¹ãã·ã£ã«") \g<name> ååæå®å¼åºã \g'name' ååæå®å¼åºã \g<n> çªå·æå®å¼åºã (n >= 1) \g'n' çªå·æå®å¼åºã (n >= 1) \g<-n> ç¸å¯¾çªå·æå®å¼åºã (n >= 1) \g'-n' ç¸å¯¾çªå·æå®å¼åºã (n >= 1) â» æå·¦ä½ç½®ã§ã®å帰å¼åºãã¯ç¦æ¢ãããã ä¾. (?<name>a|\g<name>b) => error (?<name>a|b\g<name>c) => OK â» çªå·æå®å¼åºãã¯ãååä»ãæç²å¼éåãå®ç¾©ããã ã㤠ONIG_OPTION_CAPTURE_GROUPãæå®ããã¦ããªãå ´åã«ã¯ã ç¦æ¢ãããã (10. æç²å¼éå ãåç §) â» å¼ã³åºãããå¼éåã®ãªãã·ã§ã³ç¶æ ãå¼åºãå´ã®ãªãã·ã§ã³ç¶æ ã¨ç°ãªã£ã¦ãã ã¨ããå¼ã³åºãããå´ã®ãªãã·ã§ã³ç¶æ ãæå¹ã§ããã ä¾. (?-i:\g<name>)(?i:(?<name>a)){0} 㯠"A" ã«ç §åæåããã 10. æç²å¼éå æç²å¼éå(...)ã¯ã以ä¸ã®æ¡ä»¶ã«å¿ãã¦æ¯èãå¤åããã (ååä»ãæç²å¼éåã¯å¤åããªã) case 1. /.../ (ååä»ãæç²å¼éåã¯ä¸ä½¿ç¨ããªãã·ã§ã³ãªã) (...) ã¯ãæç²å¼éåã¨ãã¦æ±ãããã case 2. /.../g (ååä»ãæç²å¼éåã¯ä¸ä½¿ç¨ããªãã·ã§ã³ 'g'ãæå®) (...) ã¯ãéæç²å¼éåã¨ãã¦æ±ãããã case 3. /..(?<name>..)../ (ååä»ãæç²å¼éåã¯ä½¿ç¨ããªãã·ã§ã³ãªã) (...) ã¯ãéæç²å¼éåã¨ãã¦æ±ãããã çªå·æå®åç §/å¼ã³åºãã¯ä¸è¨±å¯ã case 4. /..(?<name>..)../G (ååä»ãæç²å¼éåã¯ä½¿ç¨ããªãã·ã§ã³ 'G'ãæå®) (...) ã¯ãæç²å¼éåã¨ãã¦æ±ãããã çªå·æå®åç §/å¼ã³åºãã¯è¨±å¯ã ä½ã g: ONIG_OPTION_DONT_CAPTURE_GROUP G: ONIG_OPTION_CAPTURE_GROUP ('g'ã¨'G'ãªãã·ã§ã³ã¯ãruby-dev MLã§è°è«ãããã) ãããã®æ¯èã®æå³ã¯ã ååä»ãæç²ã¨ååç¡ãæç²ãåæã«ä½¿ç¨ããå¿ ç¶æ§ã®ããå ´é¢ã¯å°ãªãã§ããã ã¨ããçç±ããèãããããã®ã§ããã ----------------------------- è£è¨ 1. ææ³ä¾åãªãã·ã§ã³ + ONIG_SYNTAX_RUBY (?m): çµæ¢ç¬¦è¨å·(.)ã¯æ¹è¡ã¨ç §åæå + ONIG_SYNTAX_PERL 㨠ONIG_SYNTAX_JAVA (?s): çµæ¢ç¬¦è¨å·(.)ã¯æ¹è¡ã¨ç §åæå (?m): ^ ã¯æ¹è¡ã®ç´å¾ã«ç §åããã$ ã¯æ¹è¡ã®ç´åã«ç §åãã è£è¨ 2. ç¬èªæ¡å¼µæ©è½ + 16é²æ°æ°åãé16é²æ°å \h, \H + ååä»ãæç²å¼éå (?<name>...), (?'name'...) + ååæå®å¾æ¹åç § \k<name> + é¨åå¼å¼åºã \g<name>, \g<group-num> è£è¨ 3. Perl 5.8.0ã¨æ¯è¼ãã¦åå¨ããªãæ©è½ + \N{name} + \l,\u,\L,\U, \X, \C + (?{code}) + (??{code}) + (?(condition)yes-pat|no-pat) * \Q...\E ä½ãONIG_SYNTAX_PERLã¨ONIG_SYNTAX_JAVAã§ã¯æå¹ è£è¨ 4. Ruby 1.8 ã®æ¥æ¬èªå GNU regex(version 0.12)ã¨ã®éã + æåPropertyæ©è½è¿½å (\p{property}, \P{Property}) + 16é²æ°åã¿ã¤ã追å (\h, \H) + æ»ãèªã¿æ©è½ã追å + 強欲ãªç¹°ãè¿ãæå®åã追å (?+, *+, ++) + æåéåã®ä¸ã®æ¼ç®åã追å ([...], &&) ('[' ã¯ãæåéåã®ä¸ã§é常ã®æåã¨ãã¦ä½¿ç¨ããã¨ãã«ã¯ éé¿ä¿®é£¾ããªããã°ãªããªã) + ååä»ãæç²å¼éåã¨ãé¨åå¼å¼åºãæ©è½è¿½å + å¤ãã¤ãæåã³ã¼ããæå®ããã¦ããã¨ãã æåéåã®ä¸ã§å «é²æ°ã¾ãã¯åå é²æ°è¡¨ç¾ã®é£ç¶ã¯ãå¤ãã¤ã符åã§è¡¨ç¾ããã ä¸åã®æåã¨è§£éããã (ä¾. [\xa1\xa2], [\xa1\xa7-\xa4\xa1]) + æåéåã®ä¸ã§ãä¸ãã¤ãæåã¨å¤ãã¤ãæåã®ç¯å²æå®ã¯è¨±ãããã ex. /[a-ã]/ + å¤ç«ãªãã·ã§ã³ã®æå¹ç¯å²ã¯ããã®å¤ç«ãªãã·ã§ã³ãå«ãã§ããå¼éåã® çµããã¾ã§ã§ãã ä¾. (?:(?i)a|b) 㯠(?:(?i:a|b)) ã¨è§£éãããã(?:(?i:a)|b)ã§ã¯ãªã + å¤ç«ãªãã·ã§ã³ã¯ãã®åã®å¼ã«å¯¾ãã¦ééçã§ã¯ãªã ä¾. /a(?i)*/ ã¯ææ³ã¨ã©ã¼ã¨ãªã + ä¸å®å ¨ãªç¹°ãè¿ãç¯å²æå®åã¯é常ã®æååã¨ãã¦è¨±å¯ããã ä¾. /{/, /({)/, /a{2,3/ + å¦å®çPOSIXãã©ã±ãã [:^xxxx:] ã追å + POSIXãã©ã±ãã [:ascii:] ã追å + å èªã¿ã®ç¹°ãè¿ãã¯ä¸è¨±å¯ ä¾. /(?=a)*/, /(?!b){5}/ + æ°å¤ã§æå®ãããæåã«å¯¾ãã¦ãã大æåå°æåç §åãªãã·ã§ã³ã¯æå¹ ä¾. /\x61/i =~ "A" + ç¹°ãè¿ãåæ°æå®ã§ãæä½åæ°ã®çç¥(0å)ãã§ãã /a{,n}/ == /a{0,n}/ æä½åæ°ã¨æ大åæ°ã®åæçç¥ã¯è¨±ãããªãã(/a{,}/) + /a{n}?/ã¯ç¡æ¬²ãªæ¼ç®åã§ã¯ãªãã /a{n}?/ == /(?:a{n})?/ + ç¡å¹ãªå¾æ¹åç §ããã§ãã¯ãã¦ã¨ã©ã¼ã«ããã /\1/, /(a)\2/ + ç¡éç¹°ãè¿ãã®ä¸ã§ãé·ãé¶ã§ã®ç §åæåã¯ç¹°ãè¿ããä¸æããããã ãã®ã¨ããä¸æãã¹ããã©ããã®å¤å®ã¨ãã¦ãæç²å¼éåã®æç²ç¶æ ã® å¤åã¾ã§èæ ®ãã¦ãã /(?:()|())*\1\2/ =~ "" /(?:\1a|())*/ =~ "a" è£è¨ 5. å®è£ ããã¦ããããæ¢å®å¤ã§ã¯æå¹ã«ãã¦ããªãæ©è½ + æç²å±¥æ´åç § (?@...) 㨠(?@<name>...) ä¾. /(?@a)*/.match("aaa") ==> [<0-1>, <1-2>, <2-3>] 使ç¨æ¹æ³ã¯ãsample/listcap.cãåç § æå¹ã«ãã¦ããªãçç±ã¯ãã©ã®ç¨åº¦å½¹ã«ç«ã¤ãã¯ã£ããããªãããã è£è¨ 6. åé¡ç¹ + ã¨ã³ã³ã¼ãã£ã³ã°ãã¤ãå¤ãé©æ£ãªä¾¡ãã©ããã®ãã§ãã¯ã¯è¡ãªã£ã¦ããªãã ä¾: UTF-8 * å é ãã¤ãã¨ãã¦ä¸æ£ãªãã¤ããä¸æåã¨ã¿ãªã /./u =~ "\xa3" * ä¸å®å ¨ãªãã¤ãã·ã¼ã±ã³ã¹ã®ãã§ãã¯ãããªã /\w+/u =~ "a\xf3\x8ec" ããã調ã¹ããã¨ã¯å¯è½ã§ã¯ããããé ããªãã®ã§è¡ãªããªãã æååã¨ãã¦ããã®ãããªãã¤ãåãæå®ããå ´åã®åä½ã¯ä¿è¨¼ããªãã çµã