Complex SQL where clause: whether to factor logic

Complex SQL where clause: whether to factor logic

酷到爆炸 发布于 2021-11-28 字数 1556 浏览 946 回复 3 原文

I've got a complex SQL where clause that just got more complex because of a requirements change. There are four basic sets of cases, each with a different combination of other factors. It's more readable (in my opinion) to have the four cases as separate branches of the where clause, and to repeat the redundant criteria in each branch. But I don't know how well the database engine will optimize that.

Here's the expression in its redundant form. I've replaced the actual criteria with letters. A is the "branching" criterion, which comes in four forms. All of the expressions are in the form field='value', unless otherwise noted.

A1 AND B AND C AND D
OR A2 AND B AND C AND D AND E AND F1 AND G
OR A3 AND A3a AND B AND C AND D AND E AND F1 AND G
OR A4 AND B AND C AND D AND F2

All of the A's except A4 are in the form field in ('value1','value2'). D is field > 'value'. G is in the form field not in (subquery).

Here's the expression, factored to (I think) its least redundant form.

B AND C AND D AND (
    A1
    OR (
        E AND F1 AND G AND (
            A2
            OR (A3 AND A3a)
        )
    )
    OR (A4 AND F2)

My question is whether I should factor this expression into its simplest (least redundant) logical form, or whether it's OK to keep it in it's more redundant but also more readable form. The target database is Sybase, but I'd like to know the answer for RDMBSs generally.

如果你对这篇文章有疑问,欢迎到本站 社区 发帖提问或使用手Q扫描下方二维码加群参与讨论,获取更多帮助。

扫码加入群聊

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一世旳自豪 2022-06-07 3 楼

I would refactor it. Eventually, duplicated logic will run you into problems. The second example may take a couple seconds longer to understand, but in the big scope of things it's easier to see what is going on as you can quickly look at the entire where clause and begin to determine what affects what.

呆萌少年 2022-06-07 2 楼

If I were attacking this problem on M$ SQL Server, I would write it the way I wanted and look at the query execution plan. If it (a) ran slow and (b) had a bad execution plan, then I would refactor and document. Not sure what the mechanism is in Sybase for viewing HOW the optimizer executes the query.

故人如初 2022-06-07 1 楼

In an RDBMS world I wouldn't bother of redundancy much, efficiency is more important here.

In your case, I would UNION all the four queries using A's as a top condition, like this:

SELECT  *
FROM    mytable
WHERE   A1 AND B AND C
UNION
SELECT  *
FROM    mytable
WHERE   A2 AND B AND C AND D AND E AND F1 AND G
…

I didn't look into Sybase for more than 7 years, but in all major RDBMS's UNION's are more efficient than OR's.

See this article in my blog for approach to a silimar problem in Oracle:

and also this article for comparison of UNION versus OR in MySQL:

I think these approaches will work well for Sybase too.

You also need to create indexes on columns used in your conditions to benefit from UNION's

Update:

Since condition G is a subquery, it may probably happen that it needs a HASH JOIN to perform fast. HASH JOIN requires a full scan on all unfiltered values, that's why it may probable be better to filter all values in a single fullscan and then perform a HASH JOIN:

SELECT  *
FROM    (
        SELECT  *
        FROM    foo
        WHERE   condition_set_1
        UNION
        SELECT  *
        FROM    foo
        WHERE   condition_set_2_but_no_g
        …
        ) q
WHERE   G

To make any further judgements, it will be much better to see the query itself, really.