negative look behind with unknown number of spaces

negative look behind with unknown number of spaces

掩饰不了的爱 发布于 2021-12-01 字数 542 浏览 799 回复 3 原文

Using c# regex I'm trying to match things in quotes which aren't also in brackets while also ignoring any white space:

"blah" - match
("blah") - no match
( "blah") - no match
(  "blah") - no match

I've got (unescaped):

"(?<=[^(]s")(.*?)"

which works with the first three but I can't work out how to deal with more than one space between the first bracket and the quote. Using a + after the s is the same result, using a * means both the last two match. Any ideas?

如果你对这篇文章有疑问,欢迎到本站 社区 发帖提问或使用手Q扫描下方二维码加群参与讨论,获取更多帮助。

扫码加入群聊

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

活雷疯 2022-06-07 3 楼

Look behinds need a fixed width, but you might be able to get there with the expression below. This assumes no nesting.

/G                 # from the spot of the last match
  (?:               # GROUP OF: 
     [^("]*           # anything but open-paren and double quote.
     [(]              # an open-paren
     [^)]*            # anything but closing-paren
     [)]              # a closing-paren
  )*                # any number of times 
  [^"]*             # anything but double quote

  "([^"]*)"         # quote, sequence of anything except quote, then ending quote
/x
天涯离梦残月幽梦 2022-06-07 2 楼

In PCRE as I know it, lookbehinds have to be fixed-width. If that remains true in C#'s PCRE engine, then you're not going to be able to do it the way you're trying to.

百变从容 2022-06-07 1 楼

This should work:

/(?<![^(s])s*"([^"]*)"s*(?![s)])/
  • The first (?<![^(s]) asserts that there is no whitespace or left parenthesis before the string.

  • Then s* will match any number of whitespace characters.

  • ("[^"]*") will match a quoted string, and capture it's content.

  • s* will match any number of whitespace characters.

  • Last, (?![s)]) will assert that there is no whitespace or right-parenthesis following.

Together they make sure that all the whitespace is matched by each s*, and that they are not bordering a parenthesis.