该页面描述了 Calcite 的默认 SQL 解析器识别的 SQL 方言。
语法
SQL 语法采用了 BNF 风格。
1 | statement: |
在 insert
中,如果 INSERT
或 UPSERT
语句未指定目标列列表,则查询必须具有与目标表相同的列数,某些一致性级别除外。
在 merge
中,至少必须存在 WHEN MATCHED
和 WHEN NOT MATCHED
子句之一。
tablePrimary
只能在某些一致性级别中包含 EXTEND
子句;在这些相同的一致性级别中,insert 中的任何列都可以由 columnDecl
替换,这与将其包含在 EXTEND 子句中具有类似的效果。
在 orderItem
中,如果 expression
是正整数 n,则表示 SELECT 子句中的第 n 项。
在查询中,count
和 start
可以分别是无符号整数字面量或值为整数的动态参数。
聚合查询是在 SELECT 子句中包含 GROUP BY 或 HAVING 子句,或包含聚合函数的查询。在聚合查询的 SELECT、HAVING 和 ORDER BY 子句中,所有表达式必须是当前组内的常量(即:由 GROUP BY 子句定义的分组常量或常量)、或者是聚合函数,或者是常量和聚合函数的组合。聚合和分组函数只能出现在聚合查询中,并且只能出现在 SELECT、HAVING 或 ORDER BY 子句中。
标量子查询是指用作表达式的子查询。如果子查询没有返回行,则值为 NULL,如果它返回多于一行,则会报错。
IN
、EXISTS
、UNIQUE
和标量子查询,可以出现在任何可以出现表达式的位置(例如 JOIN 的 SELECT 子句、WHERE 子句、ON 子句,或作为聚合函数的参数)。
IN
、EXISTS
、UNIQUE
或标量子查询可以是相关的,即:它可以引用一个封闭查询中 FROM 子句的表。
GROUP BY DISTINCT
删除重复的分组集(例如:GROUP BY DISTINCT GROUPING SETS ((a), (a, b), (a))
相当于 GROUP BY GROUPING SETS ((a), (a, b))
),GROUP BY ALL
和 GROUP BY
是等价的。
selectWithoutFrom
相当于 VALUES
,但它不是标准 SQL,并且仅在某些一致性级别中允许使用。
MINUS
相当于 EXCEPT
,但不是标准 SQL,仅在某些一致性级别中允许使用。
CROSS APPLY
和 OUTER APPLY
仅允许在某些一致性级别中使用。
LIMIT start, count
等价于 LIMIT count OFFSET start
,但仅在某些一致性级别中允许使用。
在某些一致性级别中,OFFSET start
可能发生在 LIMIT count
之前。
VALUE
与 VALUES
等效,但不是标准 SQL,并且仅在某些一致性级别中允许使用。
关键字
以下是 SQL 关键字的列表。保留的关键字使用粗体展示。
A, ABS, ABSENT, ABSOLUTE, ACTION, ADA, ADD, ADMIN, AFTER, ALL, ALLOCATE, ALLOW, ALTER, ALWAYS, AND, ANY, APPLY, ARE, ARRAY, ARRAY_AGG, ARRAY_CONCAT_AGG, ARRAY_MAX_CARDINALITY, AS, ASC, ASENSITIVE, ASSERTION, ASSIGNMENT, ASYMMETRIC, AT, ATOMIC, ATTRIBUTE, ATTRIBUTES, AUTHORIZATION, AVG, BEFORE, BEGIN, BEGIN_FRAME, BEGIN_PARTITION, BERNOULLI, BETWEEN, BIGINT, BINARY, BIT, BLOB, BOOLEAN, BOTH, BREADTH, BY, C, CALL, CALLED, CARDINALITY, CASCADE, CASCADED, CASE, CAST, CATALOG, CATALOG_NAME, CEIL, CEILING, CENTURY, CHAIN, CHAR, CHARACTER, CHARACTERISTICS, CHARACTERS, CHARACTER_LENGTH, CHARACTER_SET_CATALOG, CHARACTER_SET_NAME, CHARACTER_SET_SCHEMA, CHAR_LENGTH, CHECK, CLASSIFIER, CLASS_ORIGIN, CLOB, CLOSE, COALESCE, COBOL, COLLATE, COLLATION, COLLATION_CATALOG, COLLATION_NAME, COLLATION_SCHEMA, COLLECT, COLUMN, COLUMN_NAME, COMMAND_FUNCTION, COMMAND_FUNCTION_CODE, COMMIT, COMMITTED, CONDITION, CONDITIONAL, CONDITION_NUMBER, CONNECT, CONNECTION, CONNECTION_NAME, CONSTRAINT, CONSTRAINTS, CONSTRAINT_CATALOG, CONSTRAINT_NAME, CONSTRAINT_SCHEMA, CONSTRUCTOR, CONTAINS, CONTAINS_SUBSTR, CONTINUE, CONVERT, CORR, CORRESPONDING, COUNT, COVAR_POP, COVAR_SAMP, CREATE, CROSS, CUBE, CUME_DIST, CURRENT, CURRENT_CATALOG, CURRENT_DATE, CURRENT_DEFAULT_TRANSFORM_GROUP, CURRENT_PATH, CURRENT_ROLE, CURRENT_ROW, CURRENT_SCHEMA, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_TRANSFORM_GROUP_FOR_TYPE, CURRENT_USER, CURSOR, CURSOR_NAME, CYCLE, DATA, DATABASE, DATE, DATETIME, DATETIME_DIFF, DATETIME_INTERVAL_CODE, DATETIME_INTERVAL_PRECISION, DATETIME_TRUNC, DATE_DIFF, DATE_TRUNC, DAY, DAYOFWEEK, DAYOFYEAR, DAYS, DEALLOCATE, DEC, DECADE, DECIMAL, DECLARE, DEFAULT, DEFAULTS, DEFERRABLE, DEFERRED, DEFINE, DEFINED, DEFINER, DEGREE, DELETE, DENSE_RANK, DEPTH, DEREF, DERIVED, DESC, DESCRIBE, DESCRIPTION, DESCRIPTOR, DETERMINISTIC, DIAGNOSTICS, DISALLOW, DISCONNECT, DISPATCH, DISTINCT, DOMAIN, DOT, DOUBLE, DOW, DOY, DROP, DYNAMIC, DYNAMIC_FUNCTION, DYNAMIC_FUNCTION_CODE, EACH, ELEMENT, ELSE, EMPTY, ENCODING, END, END-EXEC, END_FRAME, END_PARTITION, EPOCH, EQUALS, ERROR, ESCAPE, EVERY, EXCEPT, EXCEPTION, EXCLUDE, EXCLUDING, EXEC, EXECUTE, EXISTS, EXP, EXPLAIN, EXTEND, EXTERNAL, EXTRACT, FALSE, FETCH, FILTER, FINAL, FIRST, FIRST_VALUE, FLOAT, FLOOR, FOLLOWING, FOR, FOREIGN, FORMAT, FORTRAN, FOUND, FRAC_SECOND, FRAME_ROW, FREE, FRIDAY, FROM, FULL, FUNCTION, FUSION, G, GENERAL, GENERATED, GEOMETRY, GET, GLOBAL, GO, GOTO, GRANT, GRANTED, GROUP, GROUPING, GROUPS, GROUP_CONCAT, HAVING, HIERARCHY, HOLD, HOP, HOUR, HOURS, IDENTITY, IGNORE, ILIKE, IMMEDIATE, IMMEDIATELY, IMPLEMENTATION, IMPORT, IN, INCLUDE, INCLUDING, INCREMENT, INDICATOR, INITIAL, INITIALLY, INNER, INOUT, INPUT, INSENSITIVE, INSERT, INSTANCE, INSTANTIABLE, INT, INTEGER, INTERSECT, INTERSECTION, INTERVAL, INTO, INVOKER, IS, ISODOW, ISOLATION, ISOYEAR, JAVA, JOIN, JSON, JSON_ARRAY, JSON_ARRAYAGG, JSON_EXISTS, JSON_OBJECT, JSON_OBJECTAGG, JSON_QUERY, JSON_SCOPE, JSON_VALUE, K, KEY, KEY_MEMBER, KEY_TYPE, LABEL, LAG, LANGUAGE, LARGE, LAST, LAST_VALUE, LATERAL, LEAD, LEADING, LEFT, LENGTH, LEVEL, LIBRARY, LIKE, LIKE_REGEX, LIMIT, LN, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATOR, LOWER, M, MAP, MATCH, MATCHED, MATCHES, MATCH_NUMBER, MATCH_RECOGNIZE, MAX, MAXVALUE, MEASURES, MEMBER, MERGE, MESSAGE_LENGTH, MESSAGE_OCTET_LENGTH, MESSAGE_TEXT, METHOD, MICROSECOND, MILLENNIUM, MILLISECOND, MIN, MINUS, MINUTE, MINUTES, MINVALUE, MOD, MODIFIES, MODULE, MONDAY, MONTH, MONTHS, MORE, MULTISET, MUMPS, NAME, NAMES, NANOSECOND, NATIONAL, NATURAL, NCHAR, NCLOB, NESTING, NEW, NEXT, NO, NONE, NORMALIZE, NORMALIZED, NOT, NTH_VALUE, NTILE, NULL, NULLABLE, NULLIF, NULLS, NUMBER, NUMERIC, OBJECT, OCCURRENCES_REGEX, OCTETS, OCTET_LENGTH, OF, OFFSET, OLD, OMIT, ON, ONE, ONLY, OPEN, OPTION, OPTIONS, OR, ORDER, ORDERING, ORDINAL, ORDINALITY, OTHERS, OUT, OUTER, OUTPUT, OVER, OVERLAPS, OVERLAY, OVERRIDING, PAD, PARAMETER, PARAMETER_MODE, PARAMETER_NAME, PARAMETER_ORDINAL_POSITION, PARAMETER_SPECIFIC_CATALOG, PARAMETER_SPECIFIC_NAME, PARAMETER_SPECIFIC_SCHEMA, PARTIAL, PARTITION, PASCAL, PASSING, PASSTHROUGH, PAST, PATH, PATTERN, PER, PERCENT, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, PERIOD, PERMUTE, PIVOT, PLACING, PLAN, PLI, PORTION, POSITION, POSITION_REGEX, POWER, PRECEDES, PRECEDING, PRECISION, PREPARE, PRESERVE, PREV, PRIMARY, PRIOR, PRIVILEGES, PROCEDURE, PUBLIC, QUALIFY, QUARTER, QUARTERS, RANGE, RANK, READ, READS, REAL, RECURSIVE, REF, REFERENCES, REFERENCING, REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX, REGR_SXY, REGR_SYY, RELATIVE, RELEASE, REPEATABLE, REPLACE, RESET, RESPECT, RESTART, RESTRICT, RESULT, RETURN, RETURNED_CARDINALITY, RETURNED_LENGTH, RETURNED_OCTET_LENGTH, RETURNED_SQLSTATE, RETURNING, RETURNS, REVOKE, RIGHT, RLIKE, ROLE, ROLLBACK, ROLLUP, ROUTINE, ROUTINE_CATALOG, ROUTINE_NAME, ROUTINE_SCHEMA, ROW, ROWS, ROW_COUNT, ROW_NUMBER, RUNNING, SAFE_CAST, SAFE_OFFSET, SAFE_ORDINAL, SATURDAY, SAVEPOINT, SCALAR, SCALE, SCHEMA, SCHEMA_NAME, SCOPE, SCOPE_CATALOGS, SCOPE_NAME, SCOPE_SCHEMA, SCROLL, SEARCH, SECOND, SECONDS, SECTION, SECURITY, SEEK, SELECT, SELF, SENSITIVE, SEPARATOR, SEQUENCE, SERIALIZABLE, SERVER, SERVER_NAME, SESSION, SESSION_USER, SET, SETS, SHOW, SIMILAR, SIMPLE, SIZE, SKIP, SMALLINT, SOME, SOURCE, SPACE, SPECIFIC, SPECIFICTYPE, SPECIFIC_NAME, SQL, SQLEXCEPTION, SQLSTATE, SQLWARNING, SQL_BIGINT, SQL_BINARY, SQL_BIT, SQL_BLOB, SQL_BOOLEAN, SQL_CHAR, SQL_CLOB, SQL_DATE, SQL_DECIMAL, SQL_DOUBLE, SQL_FLOAT, SQL_INTEGER, SQL_INTERVAL_DAY, SQL_INTERVAL_DAY_TO_HOUR, SQL_INTERVAL_DAY_TO_MINUTE, SQL_INTERVAL_DAY_TO_SECOND, SQL_INTERVAL_HOUR, SQL_INTERVAL_HOUR_TO_MINUTE, SQL_INTERVAL_HOUR_TO_SECOND, SQL_INTERVAL_MINUTE, SQL_INTERVAL_MINUTE_TO_SECOND, SQL_INTERVAL_MONTH, SQL_INTERVAL_SECOND, SQL_INTERVAL_YEAR, SQL_INTERVAL_YEAR_TO_MONTH, SQL_LONGVARBINARY, SQL_LONGVARCHAR, SQL_LONGVARNCHAR, SQL_NCHAR, SQL_NCLOB, SQL_NUMERIC, SQL_NVARCHAR, SQL_REAL, SQL_SMALLINT, SQL_TIME, SQL_TIMESTAMP, SQL_TINYINT, SQL_TSI_DAY, SQL_TSI_FRAC_SECOND, SQL_TSI_HOUR, SQL_TSI_MICROSECOND, SQL_TSI_MINUTE, SQL_TSI_MONTH, SQL_TSI_QUARTER, SQL_TSI_SECOND, SQL_TSI_WEEK, SQL_TSI_YEAR, SQL_VARBINARY, SQL_VARCHAR, SQRT, START, STATE, STATEMENT, STATIC, STDDEV_POP, STDDEV_SAMP, STREAM, STRING_AGG, STRUCTURE, STYLE, SUBCLASS_ORIGIN, SUBMULTISET, SUBSET, SUBSTITUTE, SUBSTRING, SUBSTRING_REGEX, SUCCEEDS, SUM, SUNDAY, SYMMETRIC, SYSTEM, SYSTEM_TIME, SYSTEM_USER, TABLE, TABLESAMPLE, TABLE_NAME, TEMPORARY, THEN, THURSDAY, TIES, TIME, TIMESTAMP, TIMESTAMPADD, TIMESTAMPDIFF, TIMESTAMP_DIFF, TIMESTAMP_TRUNC, TIMEZONE_HOUR, TIMEZONE_MINUTE, TIME_DIFF, TIME_TRUNC, TINYINT, TO, TOP_LEVEL_COUNT, TRAILING, TRANSACTION, TRANSACTIONS_ACTIVE, TRANSACTIONS_COMMITTED, TRANSACTIONS_ROLLED_BACK, TRANSFORM, TRANSFORMS, TRANSLATE, TRANSLATE_REGEX, TRANSLATION, TREAT, TRIGGER, TRIGGER_CATALOG, TRIGGER_NAME, TRIGGER_SCHEMA, TRIM, TRIM_ARRAY, TRUE, TRUNCATE, TRY_CAST, TUESDAY, TUMBLE, TYPE, UESCAPE, UNBOUNDED, UNCOMMITTED, UNCONDITIONAL, UNDER, UNION, UNIQUE, UNKNOWN, UNNAMED, UNNEST, UNPIVOT, UPDATE, UPPER, UPSERT, USAGE, USER, USER_DEFINED_TYPE_CATALOG, USER_DEFINED_TYPE_CODE, USER_DEFINED_TYPE_NAME, USER_DEFINED_TYPE_SCHEMA, USING, UTF16, UTF32, UTF8, VALUE, VALUES, VALUE_OF, VARBINARY, VARCHAR, VARYING, VAR_POP, VAR_SAMP, VERSION, VERSIONING, VIEW, WEDNESDAY, WEEK, WEEKS, WHEN, WHENEVER, WHERE, WIDTH_BUCKET, WINDOW, WITH, WITHIN, WITHOUT, WORK, WRAPPER, WRITE, XML, YEAR, YEARS, ZONE.
标识符
标识符是 SQL 查询中使用的表、列和其他元数据元素的名称。
不带引号的标识符(例如 emp)必须以字母开头,并且只能包含字母、数字和下划线。它们被隐式转换为大写。
带引号的标识符,例如 "Employee Name"
,以双引号开头和结尾。它们几乎可以包含任何字符,包括空格和其他标点符号。如果您希望在标识符中包含双引号,请使用另一个双引号对其进行转义,例如:"An employee called ""Fred""."
。
在 Calcite 中,将标识符与引用对象的名称匹配是区分大小写的。但请记住,未加引号的标识符在匹配之前会隐式转换为大写,并且如果它引用的对象是使用未加引号的标识符作为其名称创建的,则其名称也将转换为大写。
数据类型
标量类型
数据类型 | 描述 | 范围和示例字面量 |
---|---|---|
BOOLEAN | 逻辑值 | 值:TRUE, FALSE, UNKNOWN |
TINYINT | 1 字节有符号整数 | 范围是 -128 到 127 |
SMALLINT | 2 字节有符号整数 | 范围为 -32768 至 32767 |
INTEGER, INT | 4 字节有符号整数 | 范围为 -2147483648 至 2147483647 |
BIGINT | 8 字节有符号整数 | 范围为 -9223372036854775808 至 9223372036854775807 |
DECIMAL(p, s) | 定点数(即:小数点位置固定) | 示例:123.45 和 DECIMAL ‘123.45’ 是相同的值,并且类型为 DECIMAL(5, 2) |
NUMERIC(p, s) | 定点数(即:小数点位置固定) | DECIMAL 的同义词 |
REAL | 4 字节浮点数 | 6 位小数精度;示例:CAST(1.2 AS REAL)、CAST(‘Infinity’ AS REAL) |
DOUBLE | 8 字节浮点数 | 15 位小数精度;示例:1.4E2、CAST(‘-Infinity’ AS DOUBLE)、CAST(‘NaN’ AS DOUBLE) |
FLOAT | 8 字节浮点数 | DOUBLE 的同义词 |
CHAR(n), CHARACTER(n) | 定长字符串 | ‘Hello’、‘’(空字符串)、_latin1’Hello’、n’Hello’、_UTF16’Hello’、‘Hello’ ‘there’(字面量分为多个部分)、e’Hello\nthere’(字面量包含 C 风格的转义符) |
VARCHAR(n), CHARACTER VARYING(n) | 变长字符串 | 作为 CHAR(n) |
BINARY(n) | 固定宽度的二进制字符串 | x’45F0AB’、x’‘(空二进制字符串)、x’AB’ ‘CD’(多部分二进制字符串字面量) |
VARBINARY(n), BINARY VARYING(n) | 变长二进制字符串 | 作为 BINARY(n) |
DATE | 日期 | 示例:DATE “1969-07-20” |
TIME | 一天中的时间 | 示例:TIME “20:17:40” |
TIMESTAMP [ WITHOUT TIME ZONE ] | 日期和时间 | 示例:TIMESTAMP ‘1969-07-20 20:17:40’ |
TIMESTAMP WITH LOCAL TIME ZONE | 带有当地时区的日期和时间 | 示例:TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’ |
TIMESTAMP WITH TIME ZONE | 带时区的日期和时间 | 示例:TIMESTAMP ‘1969-07-20 20:17:40 America/Los Angeles’ |
INTERVAL timeUnit [ TO timeUnit ] | 日期时间间隔 | 示例:INTERVAL ‘1-5’ YEAR TO MONTH, INTERVAL ‘45’ DAY, INTERVAL ‘1 2:34:56.789’ DAY TO SECOND |
GEOMETRY | 几何类型 | 示例: ST_GeomFromText(‘POINT (30 10)’) |
timeUnit
包含了以下可选值:
1 | timeUnit: |
注意:
- DATE、TIME 和 TIMESTAMP 没有时区。对于这些类型,甚至没有隐式时区,例如 UTC(如 Java 中)或本地时区。由用户或应用程序提供时区。反过来,TIMESTAMP WITH LOCAL TIME ZONE 不会在内部存储时区,但它将依赖于提供的时区来提供正确的语义。
- 仅在某些一致性级别中才允许使用几何类型。
- 间隔字面量只能使用时间单位 YEAR、QUARTER、MONTH、WEEK、DAY、HOUR、MINUTE 和 SECOND。在某些一致性级别中,我们还允许使用复数形式:YEARS、QUARTERS、MONTHS、WEEKS、DAYS、HOURS、MINUTES 和 SECONDS。
非标量类型
类型 | 描述 | 示例字面量 |
---|---|---|
ANY | 所有类型的联合 | |
UNKNOWN | 未知类型的值,用作占位符 | |
ROW | 具有 1 列或多列的行 | 示例:row(f0 int null, f1 varchar) |
MAP | 键值对集合 | 示例:(int, varchar) map |
MULTISET | 可能包含重复项的无序集合 | 示例:int multiset |
ARRAY | 可能包含重复项的有序连续集合 | 示例:varchar(10) array |
CURSOR | 执行结果之上的游标 |
注意:
- 每个
ROW
列类型,都可以有一个可选的[ NULL | NOT NULL ]
后缀,用来声明此列类型是否可为空,默认值是不可为空。
空间类型
空间数据使用字符串(众所周知的 text (WKT) 编码)或者二进制字符串进行表示(众所周知的 binary (WKB) 编码)。
在要使用字面量的地方,应用 ST_GeomFromText
函数,例如 ST_GeomFromText('POINT (30 10)')
。
数据类型 | 类型编码 | WKT 中的示例 |
---|---|---|
GEOMETRY | 0 | 点、曲线、曲面、几何集合的泛化 |
POINT | 1 | ST_GeomFromText('POINT (30 10)') 是 2D 空间中的点; ST_GeomFromText('POINT Z(30 10 2)') 是 3D 空间中的点 |
CURVE | 13 | LINESTRING 的泛化 |
LINESTRING | 2 | ST_GeomFromText('LINESTRING (30 10, 10 30, 40 40)') |
SURFACE | 14 | 多边形、多面体曲面的泛化 |
POLYGON | 3 | ST_GeomFromText('POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))') 是五边形; ST_GeomFromText('POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))') 是一个有四边形孔的五边形 |
POLYHEDRALSURFACE | 15 | |
GEOMETRYCOLLECTION | 7 | 零个或多个 GEOMETRY 实例的集合;多点、多线、多多边形的概括 |
MULTIPOINT | 4 | ST_GeomFromText(‘MULTIPOINT ((10 40), (40 30), (20 20), (30 10))’) 相当于 ST_GeomFromText(‘MULTIPOINT (10 40, 40 30, 20 20, 30 10)’) |
MULTICURVE | - | MULTILINESTRING 的泛化 |
MULTILINESTRING | 5 | ST_GeomFromText('MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20, 30 10))') |
MULTISURFACE | - | MULTIPOLYGON 的泛化 |
MULTIPOLYGON | 6 | ST_GeomFromText('MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))') |
运算符和函数
运算符优先级
运算符优先级和结合性,从高到低。
运算符 | 结合性 |
---|---|
. | 左 |
:: | 左 |
[ ](集合元素) | 左 |
+ -(一元加、减) | 右 |
* / % || | 左 |
+ - | 左 |
BETWEEN, IN, LIKE, SIMILAR, OVERLAPS, CONTAINS 等 | - |
< > = <= >= <> != <=> | 左 |
IS NULL, IS FALSE, IS NOT TRUE 等 | - |
NOT | 右 |
AND | 左 |
OR | 左 |
注意:::
、 <=>
是特定于方言的,但为了完整性起见在此表中显示。
比较运算符
运算符语法 | 描述 |
---|---|
value1 = value2 | 等于 |
value1 <> value2 | 不等于 |
value1 != value2 | 不相等(仅在某些一致性级别) |
value1 > value2 | 大于 |
value1 >= value2 | 大于等于 |
value1 < value2 | 小于 |
value1 <= value2 | 小于等于 |
value1 <=> value2 | 两个值是否相等,将 null 值视为相同 |
value IS NULL | 值是否为 null |
value IS NOT NULL | 值是否不为 null |
value1 IS DISTINCT FROM value2 | 两个值是否不相等,将 null 值视为相同 |
value1 IS NOT DISTINCT FROM value2 | 两个值是否相等,将 null 值视为相同 |
value1 BETWEEN value2 AND value3 | value1 是否大于等于 value2 且小于等于 value3 |
value1 NOT BETWEEN value2 AND value3 | value1 是否小于 value2 并且大于 value3 |
string1 LIKE string2 [ ESCAPE string3 ] | string1 是否与模式 string2 匹配 |
string1 NOT LIKE string2 [ ESCAPE string3 ] | string1 是否与模式 string2 不匹配 |
string1 SIMILAR TO string2 [ ESCAPE string3 ] | string1 是否与正则表达式 string2 匹配 |
string1 NOT SIMILAR TO string2 [ ESCAPE string3 ] | string1 是否与正则表达式 string2 不匹配 |
value IN (value [, value ]*) | value 是否等于列表中的值 |
value NOT IN (value [, value ]*) | value 是否不等于列表中的每个值 |
value IN (sub-query) | value 是否等于子查询返回的行 |
value NOT IN (sub-query) | value 是否不等于子查询返回的每一行 |
value comparison SOME (sub-query or collection) | 是否值比较 SOME 的子查询或集合至少返回一行 |
value comparison ANY (sub-query or collection) | SOME 的同义词 |
value comparison ALL (sub-query or collection) | 是否值比较 ALL 的子查询或集合返回所有值 |
EXISTS (sub-query) | 子查询是否至少返回一行 |
UNIQUE (sub-query) | 子查询返回的行是否唯一(忽略空值) |
1 | comp: |
逻辑运算符
运算符语法 | 描述 |
---|---|
boolean1 OR boolean2 | boolean1 为 TRUE 或者 boolean2 为 TRUE |
boolean1 AND boolean2 | boolean1 为 TRUE 并且 boolean2 为 TRUE |
NOT boolean | 布尔值是否不为 TRUE;如果布尔值未知,则返回 UNKNOWN |
boolean IS FALSE | 布尔值是否为FALSE;如果布尔值未知则返回 FALSE |
boolean IS NOT FALSE | 布尔值是否不为 FALSE;如果布尔值未知则返回 TRUE |
boolean IS TRUE | 布尔值是否为 TRUE;如果布尔值未知则返回 FALSE |
boolean IS NOT TRUE | 布尔值是否不为 TRUE;如果布尔值未知则返回 TRUE |
boolean IS UNKNOWN | 布尔值是否未知 |
boolean IS NOT UNKNOWN | 布尔值是否不为 UNKNOWN |
算术运算符和函数
运算符语法 | 描述 |
---|---|
+ numeric | 返回数字 |
- numeric | 返回负数 |
numeric1 + numeric2 | 返回 numeric1 加 numeric2 |
numeric1 - numeric2 | 返回 numeric1 减去 numeric2 |
numeric1 * numeric2 | 返回 numeric1 乘以 numeric2 |
numeric1 / numeric2 | 返回 numeric1 除以 numeric2 |
numeric1 % numeric2 | 作为 MOD(numeric1, numeric2)(仅在某些一致性级别) |
POWER(numeric1, numeric2) | 返回 numeric1 的 numeric2 次方 |
ABS(numeric) | 返回数字的绝对值 |
MOD(numeric1, numeric2) | 返回 numeric1 除以 numeric2 的余数(模)。仅当 numeric1 为负数时结果才为负数 |
SQRT(numeric) | 返回数字的平方根 |
LN(numeric) | 返回数值的自然对数(以 e 为底) |
LOG10(numeric) | 返回 numeric 以 10 为底的对数 |
EXP(numeric) | 返回 e 的数值次方 |
CEIL(numeric) | 将 numeric 向上舍入,返回大于或等于 numeric 的最小整数 |
FLOOR(numeric) | 将数字向下舍入,返回小于或等于数字的最大整数 |
RAND([seed]) | 生成 0 到 1(含)之间的随机双精度数,可选择使用种子初始化随机数生成器 |
RAND_INTEGER([seed, ] numeric) | 生成 0 到 numeric - 1(含)之间的随机整数,可选择使用种子初始化随机数生成器 |
ACOS(numeric) | 返回数值的反余弦 |
ASIN(numeric) | 返回数字的反正弦值 |
ATAN(numeric) | 返回数值的反正切值 |
ATAN2(numeric, numeric) | 返回数字坐标的反正切值 |
CBRT(numeric) | 返回数字的立方根 |
COS(numeric) | 返回数字的余弦值 |
COT(numeric) | 返回数值的余切值 |
DEGREES(numeric) | 将数值从弧度转换为度数 |
PI() | 返回比任何其他值更接近 pi 的值 |
RADIANS(numeric) | 将数值从度数转换为弧度 |
ROUND(numeric1 [, numeric2]) | 将 numeric1 舍入到小数点右边可选的 numeric2(如果未指定 0)位 |
SIGN(numeric) | 返回数字的符号 |
SIN(numeric) | 返回数字的正弦值 |
TAN(numeric) | 返回数字的正切值 |
TRUNCATE(numeric1 [, numeric2]) | 将 numeric1 截断为可选的 numeric2(如果未指定 0)小数点右边的位置 |
字符串运算符和函数
TODO
OPERATOR SYNTAX | DESCRIPTION |
---|---|
string || string | Concatenates two character strings |
CHAR_LENGTH(string) | Returns the number of characters in a character string |
CHARACTER_LENGTH(string) | As CHAR_LENGTH(string) |
UPPER(string) | Returns a character string converted to upper case |
LOWER(string) | Returns a character string converted to lower case |
POSITION(substring IN string) | Returns the position of the first occurrence of substring in string |
POSITION(substring IN string FROM integer) | Returns the position of the first occurrence of substring in string starting at a given point (not standard SQL) |
TRIM( { BOTH | LEADING | TRAILING } string1 FROM string2) | Removes the longest string containing only the characters in string1 from the start/end/both ends of string1 |
OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ]) | Replaces a substring of string1 with string2 |
SUBSTRING(string FROM integer) | Returns a substring of a character string starting at a given point |
SUBSTRING(string FROM integer FOR integer) | Returns a substring of a character string starting at a given point with a given length |
INITCAP(string) | Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters. |
Not implemented:
- SUBSTRING(string FROM regexp FOR regexp)
二进制字符串运算符和函数
OPERATOR SYNTAX | DESCRIPTION |
---|---|
binary || binary | Concatenates two binary strings |
OCTET_LENGTH(binary) | Returns the number of bytes in binary |
POSITION(binary1 IN binary2) | Returns the position of the first occurrence of binary1 in binary2 |
POSITION(binary1 IN binary2 FROM integer) | Returns the position of the first occurrence of binary1 in binary2 starting at a given point (not standard SQL) |
OVERLAY(binary1 PLACING binary2 FROM integer [ FOR integer2 ]) | Replaces a substring of binary1 with binary2 |
SUBSTRING(binary FROM integer) | Returns a substring of binary starting at a given point |
SUBSTRING(binary FROM integer FOR integer) | Returns a substring of binary starting at a given point with a given length |
日期/时间函数
OPERATOR SYNTAX | DESCRIPTION |
---|---|
LOCALTIME | Returns the current date and time in the session time zone in a value of datatype TIME |
LOCALTIME(precision) | Returns the current date and time in the session time zone in a value of datatype TIME, with precision digits of precision |
本地时间戳 | 以数据类型 TIMESTAMP 的值返回会话时区中的当前日期和时间 |
本地时间戳(精度) | 以数据类型 TIMESTAMP 的值返回会话时区中的当前日期和时间,精度为precision |
当前时间 | 返回会话时区中的当前时间,采用数据类型 TIMESTAMP WITH TIME ZONE 的值 |
当前日期 | 以数据类型 DATE 的值返回会话时区中的当前日期 |
CURRENT_TIMESTAMP | 返回会话时区中的当前日期和时间,采用数据类型 TIMESTAMP WITH TIME ZONE 的值 |
EXTRACT(timeUnit FROM 日期时间) | 从日期时间值表达式中提取并返回指定日期时间字段的值 |
FLOOR(日期时间 TO 时间单位) | 将日期时间向下舍入为timeUnit |
CEIL(日期时间 TO 时间单位) | 将日期时间向上舍入到timeUnit |
年(日期) | 相当于EXTRACT(YEAR FROM date) 。返回一个整数。 |
季度(日期) | 相当于EXTRACT(QUARTER FROM date) 。返回 1 到 4 之间的整数。 |
月(日期) | 相当于EXTRACT(MONTH FROM date) 。返回 1 到 12 之间的整数。 |
周(日期) | 相当于EXTRACT(WEEK FROM date) 。返回 1 到 53 之间的整数。 |
当年某日(日期) | 相当于EXTRACT(DOY FROM date) 。返回 1 到 366 之间的整数。 |
某月某日(日期) | 相当于EXTRACT(DAY FROM date) 。返回 1 到 31 之间的整数。 |
星期几(日期) | 相当于EXTRACT(DOW FROM date) 。返回 1 到 7 之间的整数。 |
小时(日期) | 相当于EXTRACT(HOUR FROM date) 。返回 0 到 23 之间的整数。 |
分钟(日期) | 相当于EXTRACT(MINUTE FROM date) 。返回 0 到 59 之间的整数。 |
第二(日期) | 相当于EXTRACT(SECOND FROM date) 。返回 0 到 59 之间的整数。 |
TIMESTAMPADD(时间单位,整数,日期时间) | 返回添加了(有符号)整数**timeUnit间隔的日期时间。相当于 datetime + INTERVAL 'integer' timeUnit |
TIMESTAMPDIFF(时间单位,日期时间,日期时间2) | 返回datetime和datetime2之间的**timeUnit间隔数(有符号)。相当于(datetime2 - datetime) timeUnit |
LAST_DAY(日期) | 以数据类型 DATE 的值返回该月最后一天的日期;例如,对于 DATE’2020-02-10’ 和 TIMESTAMP’2020-02-10 10:10:10’ 均返回 DATE’2020-02-29’ |
对 niladic 函数的调用(例如,CURRENT_DATE
在标准 SQL 中不接受括号)。带括号的调用,例如在某些一致性级别CURRENT_DATE()
中被接受 。
未实现:
- CEIL(间隔)
- 楼层(间隔)
- + 间隔
- - 间隔
- 间隔+间隔
- 间隔-间隔
- 间隔/间隔
系统函数
运算符语法 | 描述 |
---|---|
用户 | 相当于 CURRENT_USER |
当前用户 | 当前执行上下文的用户名 |
SESSION_USER | 会话用户名 |
系统用户 | 返回操作系统识别的当前数据存储用户的名称 |
当前_路径 | 返回一个字符串,表示当前查找范围以引用用户定义的例程和类型 |
当前角色 | 返回当前活动角色 |
CURRENT_SCHEMA | 返回当前模式 |
条件函数和运算符
运算符语法 | 描述 |
---|---|
CASE 值 WHEN value1 [, value11 ]* THEN result1 [ WHEN valueN [, valueN1 ]* THEN resultN ]* [ ELSE resultZ ] END | 简单案例 |
CASE WHEN 条件 1 THEN 结果 1 [ WHEN 条件 N THEN 结果 N ]* [ ELSE 结果 Z ] END | 搜索案例 |
NULLIF(值,值) | 如果值相同则返回 NULL。 例如,NULLIF(5, 5) 返回NULL;NULLIF(5, 0) 返回 5。 |
COALESCE(值, 值[, 值]*) | 如果第一个值为 null,则提供一个值。 例如,COALESCE(NULL, 5) 返回 5。 |
类型转换
通常,表达式不能包含不同数据类型的值。例如,表达式不能将 5 乘以 10,然后添加“JULIAN”。但是,Calcite 支持将值从一种数据类型隐式和显式转换为另一种数据类型。
隐式和显式类型转换
Calcite 建议您指定显式转换,而不是依赖隐式或自动转换,原因如下:
- 使用显式数据类型转换函数时,SQL 语句更容易理解。
- 隐式数据类型转换可能会对性能产生负面影响,尤其是当列值的数据类型转换为常量数据类型而不是相反时。
- 隐式转换取决于它发生的上下文,并且在每种情况下可能不会以相同的方式工作。例如,从日期时间值到 VARCHAR 值的隐式转换可能会返回意外的格式。
隐式转换的算法可能会在 Calcite 版本之间发生变化。显式转换的行为更可预测。
显式类型转换
运算符语法 | 描述 |
---|---|
CAST(值 AS 类型) | 将值转换为给定类型。整数类型之间的转换朝 0 截断 |
CONVERT(字符串,字符集1,字符集2) | 将字符串从charSet1转换为charSet2 |
CONVERT(使用转码名称的值) | 将值从一种基本字符集更改为transcodingName |
TRANSLATE(值使用转码名称) | 将值从一种基本字符集更改为transcodingName |
支持的数据类型语法:
1 | type: |
隐式类型转换
当转换有意义时,方解石会自动将值从一种数据类型转换为另一种数据类型。下表是方解石类型转换的矩阵。该表显示了所有可能的转换,而不考虑进行转换的上下文。管理这些细节的规则如下表所示。
从到 | 无效的 | 布尔值 | 天音 | 小智 | INT | BIGINT | 十进制 | 浮点数或实数 | 双倍的 | 间隔 | 日期 | 时间 | 时间戳 | CHAR 或 VARCHAR | 二进制或 VARBINARY | 几何学 | 大批 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
无效的 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | X |
布尔值 | X | 我 | X | X | X | X | X | X | X | X | X | X | X | 我 | X | X | X |
天音 | X | e | 我 | 我 | 我 | 我 | 我 | 我 | 我 | e | X | X | e | 我 | X | X | X |
小智 | X | e | 我 | 我 | 我 | 我 | 我 | 我 | 我 | e | X | X | e | 我 | X | X | X |
INT | X | e | 我 | 我 | 我 | 我 | 我 | 我 | 我 | e | X | X | e | 我 | X | X | X |
BIGINT | X | e | 我 | 我 | 我 | 我 | 我 | 我 | 我 | e | X | X | e | 我 | X | X | X |
十进制 | X | e | 我 | 我 | 我 | 我 | 我 | 我 | 我 | e | X | X | e | 我 | X | X | X |
浮点/实数 | X | e | 我 | 我 | 我 | 我 | 我 | 我 | 我 | X | X | X | e | 我 | X | X | X |
双倍的 | X | e | 我 | 我 | 我 | 我 | 我 | 我 | 我 | X | X | X | e | 我 | X | X | X |
间隔 | X | X | e | e | e | e | e | X | X | 我 | X | X | X | e | X | X | X |
日期 | X | X | X | X | X | X | X | X | X | X | 我 | X | 我 | 我 | X | X | X |
时间 | X | X | X | X | X | X | X | X | X | X | X | 我 | e | 我 | X | X | X |
时间戳 | X | X | e | e | e | e | e | e | e | X | 我 | e | 我 | 我 | X | X | X |
CHAR 或 VARCHAR | X | e | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 | 我 |
二进制或 VARBINARY | X | X | X | X | X | X | X | X | X | X | e | e | e | 我 | 我 | X | X |
几何学 | X | X | X | X | X | X | X | X | X | X | X | X | X | 我 | X | 我 | X |
大批 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | 我 |
i:隐式转换 / e:显式转换 / x:不允许
转化背景和策略
- Set operation (
UNION
,EXCEPT
,INTERSECT
): compare every branch row data type and find the common type of each fields pair; - Binary arithmetic expression (
+
,-
,&
,^
,/
,%
): promote string operand to data type of the other numeric operand; - Binary comparison (
=
,<
,<=
,<>
,>
,>=
): if operands areSTRING
andTIMESTAMP
, promote toTIMESTAMP
; make1 = true
and0 = false
always evaluate toTRUE
; if there is numeric type operand, find common type for both operands. IN
sub-query: compare type of LHS and RHS, and find the common type; if it is struct type, find wider type for every field;IN
expression list: compare every expression to find the common type;CASE WHEN
expression orCOALESCE
: find the common wider type of theTHEN
andELSE
operands;- Character +
INTERVAL
or character -INTERVAL
: promote character toTIMESTAMP
; - Built-in function: look up the type families registered in the checker, find the family default type if checker rules allow it;
- User-defined function (UDF): coerce based on the declared argument types of the
eval()
method; INSERT
andUPDATE
: coerce a source field to counterpart target table field’s type if the two fields differ with type name or precision(scale).
Note:
Implicit type coercion of following cases are ignored:
- One of the type is
ANY
; - Type coercion within
CHARACTER
types are always ignored, i.e. fromCHAR(20)
toVARCHAR(30)
; - Type coercion from a numeric to another with higher precedence is ignored, i.e. from
INT
toLONG
.
寻找共同类型的策略
- If the operator has expected data types, just take them as the desired one. (e.g. the UDF would have
eval()
method which has reflection argument types); - If there is no expected data type but the data type families are registered, try to coerce the arguments to the family’s default data type, i.e. the String family will have a
VARCHAR
type; - If neither expected data type nor families are specified, try to find the tightest common type of the node types, i.e.
INTEGER
andDOUBLE
will returnDOUBLE
, the numeric precision does not lose for this case; - If no tightest common type is found, try to find a wider type, i.e.
VARCHAR
andINTEGER
will returnINTEGER
, we allow some precision loss when widening decimal to fractional, or promote toVARCHAR
type.
值构造函数
OPERATOR SYNTAX | DESCRIPTION |
---|---|
ROW (value [, value ]*) | Creates a row from a list of values. |
(value [, value ]* ) | Creates a row from a list of values. |
row ‘[’ index ‘]’ | Returns the element at a particular location in a row (1-based index). |
row ‘[’ name ‘]’ | Returns the element of a row with a particular name. |
map ‘[’ key ‘]’ | Returns the element of a map with a particular key. |
array ‘[’ index ‘]’ | Returns the element at a particular location in an array (1-based index). |
ARRAY ‘[’ value [, value ]* ‘]’ | Creates an array from a list of values. |
MAP ‘[’ key, value [, key, value ]* ‘]’ | Creates a map from a list of key-value pairs. |
集合函数
OPERATOR SYNTAX | DESCRIPTION |
---|---|
ELEMENT(value) | Returns the sole element of an array or multiset; null if the collection is empty; throws if it has more than one element. |
CARDINALITY(value) | Returns the number of elements in an array or multiset. |
value MEMBER OF multiset | Returns whether the value is a member of multiset. |
multiset IS A SET | Whether multiset is a set (has no duplicates). |
multiset IS NOT A SET | Whether multiset is not a set (has duplicates). |
multiset IS EMPTY | Whether multiset contains zero elements. |
multiset IS NOT EMPTY | Whether multiset contains one or more elements. |
multiset SUBMULTISET OF multiset2 | Whether multiset is a submultiset of multiset2. |
multiset NOT SUBMULTISET OF multiset2 | Whether multiset is not a submultiset of multiset2. |
multiset MULTISET UNION [ ALL | DISTINCT ] multiset2 | Returns the union multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default). |
multiset MULTISET INTERSECT [ ALL | DISTINCT ] multiset2 | Returns the intersection of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default). |
multiset MULTISET EXCEPT [ ALL | DISTINCT ] multiset2 | Returns the difference of multiset and multiset2, eliminating duplicates if DISTINCT is specified (ALL is the default). |
See also: the UNNEST relational operator converts a collection to a relation.
句点谓词
OPERATOR SYNTAX | DESCRIPTION |
---|---|
period1 CONTAINS datetime | |
period1 CONTAINS period2 | |
period1 OVERLAPS period2 | |
period1 EQUALS period2 | |
period1 PRECEDES period2 | |
period1 IMMEDIATELY PRECEDES period2 | |
period1 SUCCEEDS period2 | |
period1 IMMEDIATELY SUCCEEDS period2 |
Where period1 and period2 are period expressions:
1 | period: |
JDBC 函数转义
数字
OPERATOR SYNTAX | DESCRIPTION |
---|---|
{fn ABS(numeric)} | Returns the absolute value of numeric |
{fn ACOS(numeric)} | Returns the arc cosine of numeric |
{fn ASIN(numeric)} | Returns the arc sine of numeric |
{fn ATAN(numeric)} | Returns the arc tangent of numeric |
{fn ATAN2(numeric, numeric)} | Returns the arc tangent of the numeric coordinates |
{fn CBRT(numeric)} | Returns the cube root of numeric |
{fn CEILING(numeric)} | Rounds numeric up, and returns the smallest number that is greater than or equal to numeric |
{fn COS(numeric)} | Returns the cosine of numeric |
{fn COT(numeric)} | Returns the cotangent of numeric |
{fn DEGREES(numeric)} | Converts numeric from radians to degrees |
{fn EXP(numeric)} | Returns e raised to the power of numeric |
{fn FLOOR(numeric)} | Rounds numeric down, and returns the largest number that is less than or equal to numeric |
{fn LOG(numeric)} | Returns the natural logarithm (base e) of numeric |
{fn LOG10(numeric)} | Returns the base-10 logarithm of numeric |
{fn MOD(numeric1, numeric2)} | Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative |
{fn PI()} | Returns a value that is closer than any other value to pi |
{fn POWER(numeric1, numeric2)} | Returns numeric1 raised to the power of numeric2 |
{fn RADIANS(numeric)} | Converts numeric from degrees to radians |
{fn RAND(numeric)} | Returns a random double using numeric as the seed value |
{fn ROUND(numeric1, numeric2)} | Rounds numeric1 to numeric2 places right to the decimal point |
{fn SIGN(numeric)} | Returns the signum of numeric |
{fn SIN(numeric)} | Returns the sine of numeric |
{fn SQRT(numeric)} | Returns the square root of numeric |
{fn TAN(numeric)} | Returns the tangent of numeric |
{fn TRUNCATE(numeric1, numeric2)} | Truncates numeric1 to numeric2 places right to the decimal point |
字符串
OPERATOR SYNTAX | DESCRIPTION |
---|---|
{fn ASCII(string)} | Returns the ASCII code of the first character of string; if the first character is a non-ASCII character, returns its Unicode code point; returns 0 if string is empty |
{fn CHAR(integer)} | Returns the character whose ASCII code is integer % 256, or null if integer < 0 |
{fn CONCAT(character, character)} | Returns the concatenation of character strings |
{fn INSERT(string1, start, length, string2)} | Inserts string2 into a slot in string1 |
{fn LCASE(string)} | Returns a string in which all alphabetic characters in string have been converted to lower case |
{fn LENGTH(string)} | Returns the number of characters in a string |
{fn LOCATE(string1, string2 [, integer])} | Returns the position in string2 of the first occurrence of string1. Searches from the beginning of string2, unless integer is specified. |
{fn LEFT(string, length)} | Returns the leftmost length characters from string |
{fn LTRIM(string)} | Returns string with leading space characters removed |
{fn REPLACE(string, search, replacement)} | Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed |
{fn REVERSE(string)} | Returns string with the order of the characters reversed |
{fn RIGHT(string, length)} | Returns the rightmost length characters from string |
{fn RTRIM(string)} | Returns string with trailing space characters removed |
{fn SUBSTRING(string, offset, length)} | Returns a character string that consists of length characters from string starting at the offset position |
{fn UCASE(string)} | Returns a string in which all alphabetic characters in string have been converted to upper case |
日期/时间
OPERATOR SYNTAX | DESCRIPTION |
---|---|
{fn CURDATE()} | Equivalent to CURRENT_DATE |
{fn CURTIME()} | Equivalent to LOCALTIME |
{fn NOW()} | Equivalent to LOCALTIMESTAMP |
{fn YEAR(date)} | Equivalent to EXTRACT(YEAR FROM date) . Returns an integer. |
{fn QUARTER(date)} | Equivalent to EXTRACT(QUARTER FROM date) . Returns an integer between 1 and 4. |
{fn MONTH(date)} | Equivalent to EXTRACT(MONTH FROM date) . Returns an integer between 1 and 12. |
{fn WEEK(date)} | Equivalent to EXTRACT(WEEK FROM date) . Returns an integer between 1 and 53. |
{fn DAYOFYEAR(date)} | Equivalent to EXTRACT(DOY FROM date) . Returns an integer between 1 and 366. |
{fn DAYOFMONTH(date)} | Equivalent to EXTRACT(DAY FROM date) . Returns an integer between 1 and 31. |
{fn DAYOFWEEK(date)} | Equivalent to EXTRACT(DOW FROM date) . Returns an integer between 1 and 7. |
{fn HOUR(date)} | Equivalent to EXTRACT(HOUR FROM date) . Returns an integer between 0 and 23. |
{fn MINUTE(date)} | Equivalent to EXTRACT(MINUTE FROM date) . Returns an integer between 0 and 59. |
{fn SECOND(date)} | Equivalent to EXTRACT(SECOND FROM date) . Returns an integer between 0 and 59. |
{fn TIMESTAMPADD(timeUnit, count, datetime)} | Adds an interval of count timeUnits to a datetime |
{fn TIMESTAMPDIFF(timeUnit, timestamp1, timestamp2)} | Subtracts timestamp1 from timestamp2 and returns the result in timeUnits |
系统
OPERATOR SYNTAX | DESCRIPTION |
---|---|
{fn DATABASE()} | Equivalent to CURRENT_CATALOG |
{fn IFNULL(value1, value2)} | Returns value2 if value1 is null |
{fn USER()} | Equivalent to CURRENT_USER |
转换
OPERATOR SYNTAX | DESCRIPTION |
---|---|
{fn CONVERT(value, type)} | Cast value into type |
聚合函数
Syntax:
1 | aggregateCall: |
其中agg是下表中的运算符之一,或者是用户定义的聚合函数。
如果FILTER
存在,则聚合函数仅考虑 条件评估为 TRUE 的行。
如果DISTINCT
存在,则在传递给聚合函数之前消除重复的参数值。
如果WITHIN DISTINCT
存在,则在传递给聚合函数之前,参数值在指定键的每个值中都是不同的。
如果WITHIN GROUP
存在,则聚合函数在聚合值之前根据ORDER BY
内部子句对输入行进行排序WITHIN GROUP
。WITHIN GROUP
仅允许用于假设集合函数 ( RANK
、 DENSE_RANK
和)、逆分布函数 (PERCENT_RANK
和)和集合函数 ( 和)。CUME_DIST``PERCENTILE_CONT``PERCENTILE_DISC``COLLECT``LISTAGG
运算符语法 | 描述 |
---|---|
ANY_VALUE([ 全部 | DISTINCT ] 值) | 返回所有输入值中value的值之一;SQL 标准中没有指定这一点 |
ARG_MAX(值,补偿) | 返回组中comp的最大值 |
ARG_MIN(值,补偿) | 返回组中comp的最小值 |
APPROX_COUNT_DISTINCT(值[,值]*) | 返回value的不同值的近似数量;数据库可以使用近似值,但不要求 |
AVG([ ALL | DISTINCT ] 数字) | 返回所有输入值的平均值(算术平均值) |
BIT_AND([ 全部 | DISTINCT ] 值) | 返回所有非空输入值的按位与,如果没有则返回 null;支持整数和二进制类型 |
BIT_OR( [ 全部 | DISTINCT ] 值) | 返回所有非空输入值的按位或,如果没有则返回 null;支持整数和二进制类型 |
BIT_XOR([全部|不同]值) | 返回所有非空输入值的按位异或,如果没有则返回 null;支持整数和二进制类型 |
收集([全部|不同]值) | 返回值的多重集 |
数数(*) | 返回输入行数 |
COUNT([ 全部 | DISTINCT ] 值 [, 值 ]*) | 返回值不为空的输入行数(如果值是复合值则完全不为空) |
COVAR_POP(数字1,数字2) | 返回所有输入值对 ( numeric1 , numeric2 )的总体协方差 |
COVAR_SAMP(数字1,数字2) | 返回所有输入值对 ( numeric1 , numeric2 )的样本协方差 |
每个(条件) | 如果条件的所有值都为TRUE,则返回 TRUE |
FUSION(多组) | Returns the multiset union of multiset across all input values |
INTERSECTION(multiset) | Returns the multiset intersection of multiset across all input values |
LISTAGG( [ ALL | DISTINCT ] value [, separator]) | Returns values concatenated into a string, delimited by separator (default ‘,’) |
MAX( [ ALL | DISTINCT ] value) | Returns the maximum value of value across all input values |
MIN( [ ALL | DISTINCT ] value) | Returns the minimum value of value across all input values |
MODE(value) | Returns the most frequent value of value across all input values |
REGR_COUNT(numeric1, numeric2) | Returns the number of rows where both dependent and independent expressions are not null |
REGR_SXX(numeric1, numeric2) | Returns the sum of squares of the dependent expression in a linear regression model |
REGR_SYY(numeric1, numeric2) | Returns the sum of squares of the independent expression in a linear regression model |
SOME(condition) | Returns TRUE if one or more of the values of condition is TRUE |
STDDEV( [ ALL | DISTINCT ] numeric) | Synonym for STDDEV_SAMP |
STDDEV_POP( [ ALL | DISTINCT ] numeric) | Returns the population standard deviation of numeric across all input values |
STDDEV_SAMP( [ ALL | DISTINCT ] numeric) | Returns the sample standard deviation of numeric across all input values |
SUM( [ ALL | DISTINCT ] numeric) | Returns the sum of numeric across all input values |
VAR_POP( [ ALL | DISTINCT ] value) | Returns the population variance (square of the population standard deviation) of numeric across all input values |
VAR_SAMP( [ ALL | DISTINCT ] numeric) | Returns the sample variance (square of the sample standard deviation) of numeric across all input values |
Not implemented:
- REGR_AVGX(numeric1, numeric2)
- REGR_AVGY(numeric1, numeric2)
- REGR_INTERCEPT(numeric1, numeric2)
- REGR_R2(numeric1, numeric2)
- REGR_SLOPE(numeric1, numeric2)
- REGR_SXY(numeric1, numeric2)
有序集聚合函数
The syntax is as for aggregateCall, except that WITHIN GROUP
is required.
In the following:
- fraction is a numeric literal between 0 and 1, inclusive, and represents a percentage
OPERATOR SYNTAX | DESCRIPTION |
---|---|
PERCENTILE_CONT(fraction) WITHIN GROUP (ORDER BY orderItem) | Returns a percentile based on a continuous distribution of the column values, interpolating between adjacent input items if needed |
PERCENTILE_DISC(fraction) WITHIN GROUP (ORDER BY orderItem [, orderItem ]*) | Returns a percentile based on a discrete distribution of the column values returning the first input value whose position in the ordering equals or exceeds the specified fraction |
窗口函数
Syntax:
1 | windowedAggregateCall: |
其中agg是下表中的运算符之一,或者是用户定义的聚合函数。
DISTINCT
,FILTER
并且WITHIN GROUP
如聚合函数所述。
运算符语法 | 描述 |
---|---|
COUNT(值 [, 值 ]*) OVER 窗口 | 返回窗口中**值不为空的行数(如果值为复合值则完全不为空) |
COUNT(*) OVER 窗口 | 返回窗口中的行数 |
AVG(数字) OVER 窗口 | 返回窗口中所有值的数值平均值(算术平均值) |
SUM(数字) OVER 窗口 | 返回窗口中所有值的数值总和 |
MAX(值) OVER 窗口 | 返回窗口中所有值的最大值 |
MIN(值) OVER 窗口 | 返回窗口中所有值的最小值 |
RANK() OVER 窗口 | 返回当前行有间隙的排名;与其第一个对等点的 ROW_NUMBER 相同 |
DENSE_RANK() 超过窗口 | 返回当前行的排名,没有间隙;该函数对同级组进行计数 |
ROW_NUMBER() 超过窗口 | 返回其分区内当前行的编号,从 1 开始计数 |
FIRST_VALUE(值) 超过窗口 | 返回在窗口框架第一行的行处计算的值 |
LAST_VALUE(值) 超过窗口 | 返回在窗框最后一行计算的值 |
LEAD(值、偏移、默认) OVER 窗口 | 返回在分区内当前行之后的偏移行处计算的值;如果没有这样的行,则返回default。偏移量和默认值都是相对于当前行进行评估的。如果省略,offset默认为 1,默认为 NULL |
LAG(值、偏移量、默认值) OVER 窗口 | 返回在分区内当前行之前的偏移行处计算的值;如果没有这样的行,则返回default。偏移量和默认值都是相对于当前行进行评估的。如果省略,offset默认为 1,默认为 NULL |
NTH_VALUE(值, 第 n) OVER 窗口 | 返回在窗口框架的第n行处计算的值 |
NTILE(value) OVER 窗口 | 返回一个从 1 到value 的整数,尽可能均等地划分分区 |
笔记:
- You may specify null treatment (
IGNORE NULLS
,RESPECT NULLS
) forFIRST_VALUE
,LAST_VALUE
,NTH_VALUE
,LEAD
andLAG
functions. The syntax handled by the parser, but onlyRESPECT NULLS
is implemented at runtime.
Not implemented:
- COUNT(DISTINCT value [, value ]*) OVER window
- APPROX_COUNT_DISTINCT(value [, value ]*) OVER window
- PERCENT_RANK(value) OVER window
- CUME_DIST(value) OVER window
分组函数
OPERATOR SYNTAX | DESCRIPTION |
---|---|
GROUPING(expression [, expression ]*) | Returns a bit vector of the given grouping expressions |
GROUP_ID() | Returns an integer that uniquely identifies the combination of grouping keys |
GROUPING_ID(expression [, expression ]*) | Synonym for GROUPING |
描述符
OPERATOR SYNTAX | DESCRIPTION |
---|---|
DESCRIPTOR(name [, name ]*) | DESCRIPTOR appears as an argument in a function to indicate a list of names. The interpretation of names is left to the function. |
表函数
Table functions occur in the FROM
clause.
Table functions may have generic table parameters (i.e., no row type is declared when the table function is created), and the row type of the result might depend on the row type(s) of the input tables. Besides, input tables are classified by three characteristics. The first characteristic is semantics. Input tables have either row semantics or set semantics, as follows:
- Row semantics means that the result of the table function depends on a row-by-row basis.
- Set semantics means that the outcome of the function depends on how the data is partitioned.
The second characteristic, which applies only to input tables with set semantics, is whether the table function can generate a result row even if the input table is empty.
- If the table function can generate a result row on empty input, the table is said to be “keep when empty”.
- The alternative is called “prune when empty”, meaning that the result would be pruned out if the input table is empty.
The third characteristic is whether the input table supports pass-through columns or not. Pass-through columns is a mechanism enabling the table function to copy every column of an input row into columns of an output row.
The input tables with set semantics may be partitioned on one or more columns. The input tables with set semantics may be ordered on one or more columns.
Note:
- The input tables with row semantics may not be partitioned or ordered.
- A polymorphic table function may have multiple input tables. However, at most one input table could have row semantics.
TUMBLE
In streaming queries, TUMBLE assigns a window for each row of a relation based on a timestamp column. An assigned window is specified by its beginning and ending. All assigned windows have the same length, and that’s why tumbling sometimes is named as “fixed windowing”. The first parameter of the TUMBLE table function is a generic table parameter. The input table has row semantics and supports pass-through columns.
OPERATOR SYNTAX | DESCRIPTION |
---|---|
TUMBLE(data, DESCRIPTOR(timecol), size [, offset ]) | 指示timecol的**大小间隔的翻滚窗口,可以选择在offset处对齐。 |
这是一个例子:
1 | SELECT * FROM TABLE( |
将范围为一分钟的滚动窗口应用于orders
表中的行。rowtime
是表中带水印的列orders
,用于通知数据是否完整。
跳
在流式查询中,HOP 分配覆盖大小间隔内的行的窗口,并根据时间戳列移动每张*幻灯片。*分配的窗口可能有重叠,因此有时跳跃被称为“滑动窗口”。HOP 表函数的第一个参数是通用表参数。输入表具有行语义并支持传递列。
运算符语法 | 描述 |
---|---|
HOP(数据, DESCRIPTOR(timecol), 幻灯片, 大小 [, 偏移量]) | 指示timecol的跳跃窗口,覆盖size间隔内的行,移动每张幻灯片并可选择以offset对齐。 |
这是一个例子:
1 | SELECT * FROM TABLE( |
对桌子上的行应用 5 分钟间隔大小的跳跃orders
,并每 2 分钟移动一次。rowtime
是表顺序的带水印的列,表明数据的完整性。
会议
在流式查询中,SESSION 分配基于datetime覆盖行的窗口。在会话窗口内,行的距离小于间隔。会话窗口适用于每个键。SESSION 表函数的第一个参数是通用表参数。输入表具有设定的语义并支持传递列。此外,如果输入表为空,SESSION 表函数将不会生成结果行。
运算符语法 | 描述 |
---|---|
会话(数据,描述符(时间列),描述符(键),大小) | 指示timecol的**大小间隔的会话窗口。会话窗口适用于每个键。 |
这是一个例子:
1 | SELECT * FROM TABLE( |
对表 中的行应用具有 20 分钟非活动间隙的会话orders
。 rowtime
是表顺序的带水印的列,表明数据的完整性。会话适用于每个产品。
注意:Tumble
、Hop
和Session
窗口表函数将原始表中的每一行分配给一个窗口。输出表具有与原始表相同的所有列以及两个附加列window_start
和window_end
,分别表示窗口间隔的开始和结束。
分组窗口函数
警告:不推荐使用分组窗口函数。
分组窗口函数出现在GROUP BY
子句中,并定义一个表示包含多行的窗口的键值。
在某些窗口函数中,一行可能属于多个窗口。例如,如果使用 分组查询 HOP(t, INTERVAL '2' HOUR, INTERVAL '1' HOUR)
,则时间戳为“10:15:00”的行将出现在 10:00 - 11:00 和 11:00 - 12:00 总计中。
运算符语法 | 描述 |
---|---|
HOP(日期时间,幻灯片,大小[,时间]) | 表示日期时间的跳跃窗口,覆盖大小间隔内的行,移动每张幻灯片,并且可以选择在时间上对齐。 |
会话(日期时间,间隔[,时间]) | 指示datetime**间隔的会话窗口,可以选择在时间上对齐 |
TUMBLE(日期时间, 间隔 [, 时间 ]) | 指示datetime的时间间隔的滚动窗口,可以选择在时间上对齐 |
分组辅助功能
分组辅助函数允许您访问由分组窗口函数定义的窗口的属性。
运算符语法 | 描述 |
---|---|
HOP_END(表情,幻灯片,大小[,时间]) | 返回由函数调用定义的窗口末尾的表达式的值HOP |
HOP_START(表情,幻灯片,大小[,时间]) | 返回由函数调用定义的窗口开头的表达式的值HOP |
SESSION_END(表达式,间隔[,时间]) | 返回由函数调用定义的窗口末尾的表达式的值SESSION |
SESSION_START(表达式, 间隔[, 时间]) | 返回由函数调用定义的窗口开头的表达式的值SESSION |
TUMBLE_END(表达式, 间隔 [, 时间 ]) | 返回由函数调用定义的窗口末尾的表达式的值TUMBLE |
TUMBLE_START(表达式, 间隔 [, 时间 ]) | 返回由函数调用定义的窗口开头的表达式的值TUMBLE |
空间功能
在下面的:
- geom是几何;
- geomCollection是一个 GEOMETRYCOLLECTION;
- 点是一个点;
- lineString是一个 LINESTRING;
- iMatrix是DE-9IM 交集矩阵;
- distance、tolerance、segmentLengthFraction、offsetDistance均为 double 类型;
- 维度、quadSegs、srid、zoom都是整数类型;
- LayerType为字符串;
- gml是包含地理标记语言(GML)的字符串;
- wkt是包含众所周知文本(WKT)的字符串;
- wkb是包含众所周知的二进制 (WKB) 的二进制字符串。
在“C”(“兼容性”)列中,“o”表示该函数实现了 OpenGIS Simple Features Implementing Specific for SQL 版本 1.2.1;“p”表示该函数是 PostGIS对OpenGIS的扩展;“h”表示该函数是 H2GIS扩展。
几何转换函数(2D)
C | 运算符语法 | 描述 |
---|---|---|
p | ST_AsBinary(几何) | 同义词ST_AsWKB |
p | ST_AsEWKB(几何) | 同义词ST_AsWKB |
p | ST_AsEWKT(几何) | Converts GEOMETRY → EWKT |
p | ST_AsGeoJSON(geom) | Converts GEOMETRY → GeoJSON |
p | ST_AsGML(geom) | Converts GEOMETRY → GML |
p | ST_AsText(geom) | Synonym for ST_AsWKT |
o | ST_AsWKB(geom) | Converts GEOMETRY → WKB |
o | ST_AsWKT(geom) | Converts GEOMETRY → WKT |
o | ST_Force2D(geom) | 3D GEOMETRY → 2D GEOMETRY |
o | ST_GeomFromEWKB(wkb [, srid ]) | Synonym for ST_GeomFromWKB |
o | ST_GeomFromEWKT(wkb [, srid ]) | Converts EWKT → GEOMETRY |
o | ST_GeomFromGeoJSON(json) | Converts GeoJSON → GEOMETRY |
o | ST_GeomFromGML(wkb [, srid ]) | Converts GML → GEOMETRY |
o | ST_GeomFromText(wkt [, srid ]) | Synonym for ST_GeomFromWKT |
o | ST_GeomFromWKB(wkb [, srid ]) | Converts WKB → GEOMETRY |
o | ST_GeomFromWKT(wkb [, srid ]) | Converts WKT → GEOMETRY |
o | ST_LineFromText(wkt [, srid ]) | Converts WKT → LINESTRING |
o | ST_LineFromWKB(wkt [, srid ]) | Converts WKT → LINESTRING |
o | ST_MLineFromText(wkt [, srid ]) | Converts WKT → MULTILINESTRING |
o | ST_MPointFromText(wkt [, srid ]) | Converts WKT → MULTIPOINT |
o | ST_MPolyFromText(wkt [, srid ]) Converts WKT → MULTIPOLYGON | |
o | ST_PointFromText(wkt [, srid ]) | Converts WKT → POINT |
o | ST_PointFromWKB(wkt [, srid ]) | Converts WKB → POINT |
o | ST_PolyFromText(wkt [, srid ]) | Converts WKT → POLYGON |
o | ST_PolyFromWKB(wkt [, srid ]) | Converts WKB → POLYGON |
p | ST_ReducePrecision(geom, gridSize) | Reduces the precision of a geom to the provided gridSize |
h | ST_ToMultiPoint(geom) | Converts the coordinates of geom (which may be a GEOMETRYCOLLECTION) into a MULTIPOINT |
h | ST_ToMultiLine(geom) | Converts the coordinates of geom (which may be a GEOMETRYCOLLECTION) into a MULTILINESTRING |
h | ST_ToMultiSegments(geom) | Converts geom (which may be a GEOMETRYCOLLECTION) into a set of distinct segments stored in a MULTILINESTRING |
Not implemented:
- ST_GoogleMapLink(geom [, layerType [, zoom ]]) GEOMETRY → Google map link
- ST_OSMMapLink(geom [, marker ]) GEOMETRY → OSM map link
Geometry conversion functions (3D)
C | OPERATOR SYNTAX | DESCRIPTION |
---|---|---|
o | ST_Force3D(geom) | 2D GEOMETRY → 3D GEOMETRY |
Geometry creation functions (2D)
C | OPERATOR SYNTAX | DESCRIPTION |
---|---|---|
h | ST_BoundingCircle(geom) | Returns the minimum bounding circle of geom |
h | ST_Expand(geom, distance) | Expands geom’s envelope |
h | ST_Expand(geom, deltaX, deltaY) | Expands geom’s envelope |
h | ST_MakeEllipse(point, width, height) | Constructs an ellipse |
p | ST_MakeEnvelope(xMin, yMin, xMax, yMax [, srid ]) | Creates a rectangular POLYGON |
h | ST_MakeGrid(geom, deltaX, deltaY) | Calculates a regular grid of POLYGONs based on geom |
h | ST_MakeGridPoints(geom, deltaX, deltaY) | Calculates a regular grid of points based on geom |
o | ST_MakeLine(point1 [, point ]*) | Creates a line-string from the given POINTs (or MULTIPOINTs) |
p | ST_MakePoint(x, y [, z ]) | Synonym for ST_Point |
p | ST_MakePolygon(lineString [, hole ]*) | Creates a POLYGON from lineString with the given holes (which are required to be closed LINESTRINGs) |
h | ST_MinimumDiameter(geom) | Returns the minimum diameter of geom |
h | ST_MinimumRectangle(geom) | Returns the minimum rectangle enclosing geom |
h | ST_OctogonalEnvelope(geom) | Returns the octogonal envelope of geom |
o | ST_Point(x, y [, z ]) | Constructs a point from two or three coordinates |
Not implemented:
- ST_RingBuffer(geom, distance, bufferCount [, endCapStyle [, doDifference]]) Returns a MULTIPOLYGON of buffers centered at geom and of increasing buffer size
Geometry creation functions (3D)
Not implemented:
- ST_Extrude(geom, height [, flag]) Extrudes a GEOMETRY
- ST_GeometryShadow(geom, point, height) Computes the shadow footprint of geom
- ST_GeometryShadow(geom, azimuth, altitude, height [, unify ]) Computes the shadow footprint of geom
Geometry properties (2D)
C | OPERATOR SYNTAX | DESCRIPTION |
---|---|---|
o | ST_Boundary(geom [, srid ]) | Returns the boundary of geom |
o | ST_Centroid(geom) | Returns the centroid of geom |
o | ST_CoordDim(geom) | Returns the dimension of the coordinates of geom |
o | ST_Dimension(geom) | Returns the dimension of geom |
o | ST_Distance(geom1, geom2) | Returns the distance between geom1 and geom2 |
h | ST_ExteriorRing(geom) | Returns the exterior ring of geom, or null if geom is not a polygon |
o | ST_GeometryType(geom) | Returns the type of geom |
o | ST_GeometryTypeCode(geom) | Returns the OGC SFS type code of geom |
p | ST_EndPoint(lineString) | Returns the last coordinate of geom |
o | ST_Envelope(geom [, srid ]) | Returns the envelope of geom (which may be a GEOMETRYCOLLECTION) as a GEOMETRY |
o | ST_Extent(geom) | Returns the minimum bounding box of geom (which may be a GEOMETRYCOLLECTION) |
h | ST_GeometryN(geomCollection, n) | Returns the nth GEOMETRY of geomCollection |
h | ST_InteriorRingN(geom) | Returns the nth interior ring of geom, or null if geom is not a polygon |
h | ST_IsClosed(geom) | Returns whether geom is a closed LINESTRING or MULTILINESTRING |
o | ST_IsEmpty(geom) | Returns whether geom is empty |
o | ST_IsRectangle(geom) | Returns whether geom is a rectangle |
h | ST_IsRing(geom) | Returns whether geom is a closed and simple line-string or MULTILINESTRING |
o | ST_IsSimple(geom) | Returns whether geom is simple |
o | ST_IsValid(geom) | Returns whether geom is valid |
h | ST_NPoints(geom) | Returns the number of points in geom |
h | ST_NumGeometries(geom) | Returns the number of geometries in geom (1 if it is not a GEOMETRYCOLLECTION) |
h | ST_NumInteriorRing(geom) | Synonym for ST_NumInteriorRings |
h | ST_NumInteriorRings(geom) | Returns the number of interior rings of geom |
h | ST_NumPoints(geom) | Returns the number of points in geom |
p | ST_PointN(geom, n) | Returns the nth point of a geom |
p | ST_PointOnSurface(geom) | Returns an interior or boundary point of geom |
o | ST_SRID(geom) | Returns SRID value of geom or 0 if it does not have one |
p | ST_StartPoint(geom) | Returns the first point of geom |
o | ST_X(geom) | Returns the x-value of the first coordinate of geom |
o | ST_XMax(geom) | Returns the maximum x-value of geom |
o | ST_XMin(geom) | Returns the minimum x-value of geom |
o | ST_Y(geom) | Returns the y-value of the first coordinate of geom |
o | ST_YMax(geom) | Returns the maximum y-value of geom |
o | ST_YMin(geom) | Returns the minimum y-value of geom |
Not implemented:
- ST_CompactnessRatio(polygon) Returns the square root of polygon’s area divided by the area of the circle with circumference equal to its perimeter
- ST_Explode(query [, fieldName]) Explodes the GEOMETRYCOLLECTIONs in the fieldName column of a query into multiple geometries
- ST_IsValidDetail(geom [, selfTouchValid ]) Returns a valid detail as an array of objects
- ST_IsValidReason(geom [, selfTouchValid ]) Returns text stating whether geom is valid, and if not valid, a reason why
Geometry properties (3D)
C | OPERATOR SYNTAX | DESCRIPTION |
---|---|---|
p | ST_Is3D(s) | Returns whether geom has at least one z-coordinate |
o | ST_Z(geom) | Returns the z-value of the first coordinate of geom |
o | ST_ZMax(geom) | Returns the maximum z-value of geom |
o | ST_ZMin(geom) | Returns the minimum z-value of geom |
Geometry predicates
C | OPERATOR SYNTAX | DESCRIPTION |
---|---|---|
o | ST_Contains(geom1, geom2) | Returns whether geom1 contains geom2 |
p | ST_ContainsProperly(geom1, geom2) | Returns whether geom1 contains geom2 but does not intersect its boundary |
p | ST_CoveredBy(geom1, geom2) | Returns whether no point in geom1 is outside geom2. |
p | ST_Covers(geom1, geom2) | Returns whether no point in geom2 is outside geom1 |
o | ST_Crosses(geom1, geom2) | Returns whether geom1 crosses geom2 |
o | ST_Disjoint(geom1, geom2) | Returns whether geom1 and geom2 are disjoint |
p | ST_DWithin(geom1, geom2, distance) | Returns whether geom1 and geom are within distance of one another |
o | ST_EnvelopesIntersect(geom1, geom2) | Returns whether the envelope of geom1 intersects the envelope of geom2 |
o | ST_Equals(geom1, geom2) | Returns whether geom1 equals geom2 |
o | ST_Intersects(geom1, geom2) | 返回geom1是否与**geom2相交 |
哦 | ST_Overlaps(geom1, geom2) | 返回geom1是否与**geom2重叠 |
哦 | ST_Relate(geom1, geom2) | 返回geom1和geom2的DE-9IM交集矩阵 |
哦 | ST_Relate(geom1, geom2, iMatrix) | 返回geom1和geom2是否通过给定的交集矩阵**iMatrix相关 |
哦 | ST_Touches(geom1, geom2) | 返回geom1是否接触geom2 |
哦 | ST_Within(geom1, geom2) | 返回geom1是否在geom2内 |
未实现:
- ST_OrderingEquals(geom1, geom2) 返回geom1是否等于geom2以及它们的坐标和组件几何图形以相同的顺序列出
几何运算符(2D)
以下函数结合了 2D 几何图形。
C | 运算符语法 | 描述 |
---|---|---|
p | ST_Buffer(几何,距离[,quadSegs,endCapStyle]) | 计算geom周围的缓冲区 |
p | ST_Buffer(几何,距离[,bufferStyle]) | 计算geom周围的缓冲区 |
哦 | ST_ConvexHull(几何) | 计算包含geom中所有点的最小凸多边形 |
哦 | ST_Difference(geom1, geom2) | 计算两个几何图形之间的差异 |
哦 | ST_SymDifference(geom1, geom2) | 计算两个几何图形之间的对称差 |
哦 | ST_Intersection(geom1, geom2) | 计算geom1和geom2的交集 |
p | ST_OffsetCurve(几何、距离、bufferStyle) | 计算线串的偏移线 |
哦 | ST_Union(geom1, geom2) | 计算geom1和geom2的并集 |
哦 | ST_Union(geomCollection) | 计算geomCollection中几何图形的并集 |
另请参见:ST_Union
聚合函数。
仿射变换函数(3D 和 2D)
以下函数转换 2D 几何图形。
C | 运算符语法 | 描述 |
---|---|---|
哦 | ST_Rotate(几何,角度[,原点| x,y]) | 将几何图形绕**原点(或点 ( x , y ))逆时针旋转角度(以弧度为单位) |
哦 | ST_Scale(几何、x 因子、y 因子) | 通过将纵坐标乘以指定的比例因子来缩放几何 |
哦 | ST_翻译(几何,x,y) | 通过向量 (x, y)平移geom |
未实现:
- ST_Scale(geom, xFactor, yFactor [, zFactor ])通过将纵坐标乘以指定的比例因子来缩放geom
- ST_Translate(geom, x, y, [, z]) 翻译geom
几何编辑功能(2D)
以下函数修改 2D 几何图形。
C | 运算符语法 | 描述 |
---|---|---|
p | ST_AddPoint(线串, 点 [, 索引]) | 将点添加到给定索引处的**线串(如果未指定索引,则添加到末尾) |
H | ST_Densify(几何,公差) | 通过沿线段插入额外的顶点来加密几何图形 |
H | ST_Flip坐标(几何) | 翻转几何体的 X 和 Y 坐标 |
H | ST_孔(几何) | 返回几何体中的孔(可能是 GEOMETRYCOLLECTION) |
H | ST_Normalize(几何) | 将geom转换为正常形式 |
p | ST_RemoveRepeatedPoints(geom [, 公差]) | 从几何中删除重复的坐标 |
H | ST_RemoveHoles(几何) | 去除几何体的孔 |
p | ST_RemovePoint(线串,索引) | 删除线串中给定索引处的**点 |
H | ST_Reverse(几何) | 反转几何坐标的顺序 |
未实现:
- ST_CollectionExtract(geom,Dimension) 过滤geom ,返回具有给定**维度的那些成员的多几何图形(1 = 点,2 = 线串,3 = 多边形)
几何编辑功能(3D)
以下函数修改 3D 几何图形。
C | 运算符语法 | 描述 |
---|---|---|
H | ST_AddZ(geom, zToAdd) | 将zToAdd添加到geom的 z 坐标 |
未实现:
- ST_Interpolate3DLine(geom) 返回带有 z 值插值的geom,如果不是线串或多行字符串,则返回 null
- ST_MultiplyZ(geom, zFactor) 返回z 值乘以zFactor的**geom
- ST_Reverse3DLine(geom [, sortOrder ]) 可能根据其第一个和最后一个坐标的 z 值反转geom
- ST_UpdateZ(geom, newZ [, updateCondition ]) 更新geom的 z 值
- ST_ZUpdateLineExtremities(geom, startZ, endZ [, interpolate ]) 更新geom的开始和结束 z 值
几何测量功能(2D)
以下函数测量几何形状。
C | 运算符语法 | 描述 |
---|---|---|
哦 | ST_Area(几何) | 返回geom的面积(可能是 GEOMETRYCOLLECTION) |
H | ST_ClosestCooperative(点,几何) | 返回最接近点的**几何坐标 |
H | ST_ClosestPoint(geom1, geom2) | 返回geom1最接近geom2的点 |
H | ST_Furthest坐标(几何,点) | 返回距离点最远的几何坐标 |
H | ST_长度(几何) | 返回geom的长度 |
H | ST_LocateAlong(几何,segmentLengthFraction,offsetDistance) | 返回一个 MULTIPOINT,其中包含沿geom线段的点,位于segmentLengthFraction和offsetDistance |
H | ST_LongestLine(geom1, geom2) | 返回geom1和geom2点之间的二维最长线串 |
H | ST_MaxDistance(geom1, geom2) | 计算geom1和geom2之间的最大距离 |
H | ST_周长(多边形) | 返回多边形周长(可能是 MULTIPOLYGON) |
H | ST_ProjectPoint(点, 线串) | 项目指向线串(可能是多线串) |
几何测量功能(3D)
未实现:
- ST_3DArea(geom) 返回多边形的 3D 面积
- ST_3DLength(geom) 返回线串的 3D 长度
- ST_3DPerimeter(geom) 返回多边形或 MULTIPOLYGON 的 3D 周长
- ST_SunPosition(point [, timestamp ]) 计算点和时间戳处的太阳位置(现在默认)
几何处理功能(2D)
以下函数处理几何形状。
C | 运算符语法 | 描述 |
---|---|---|
哦 | ST_LineMerge(几何) | 合并线性分量的集合以形成最大长度的线串 |
哦 | ST_MakeValid(几何) | 将给定的无效几何图形变为有效几何图形 |
哦 | ST_Polygonize(几何) | 从几何体的边缘创建一个多重多边形 |
哦 | ST_PrecisionReducer(geom, n) | 将geom的精度降低到小数点后n位 |
哦 | ST_Simplify(几何,距离) | 使用具有距离容差的Douglas-Peuker 算法简化几何 |
哦 | ST_SimplifyPreserveTopology(几何,距离) | 简化geom,保留其拓扑 |
哦 | ST_Snap(geom1, geom2, 公差) | 将geom1和geom2捕捉在一起 |
p | ST_Split(几何,刀片) | 通过刀片分割几何体 |
未实现:
- ST_LineIntersector(geom1, geom2) 将geom1(线串)与geom2分割
- ST_LineMerge(geom) 合并线性分量的集合以形成最大长度的线串
- ST_MakeValid(geom [,preserveGeomDim[,preserveDuplicateCoord[,preserveCoordDim]]]) 使geom有效
- ST_RingSideBuffer(geom, distance, bufferCount [, endCapStyle [, doDifference]]) 计算一侧的环形缓冲区
- ST_SideBuffer(geom, distance [, bufferStyle ]) 计算一侧的单个缓冲区
几何投影函数
由于其使用条款的限制,EPSG 数据集与 Proj4J 分开发布。为了使用 Apache Calcite 中的投影函数,用户必须在其依赖项中包含 EPSG 数据集。
C | 运算符语法 | 描述 |
---|---|---|
哦 | ST_SetSRID(geom, srid) | 返回具有新 SRID 的geom副本 |
哦 | ST_Transform(geom, srid) | 将geom从一个坐标参考系 (CRS)转换为srid指定的 CRS |
三角函数
未实现:
- ST_Azimuth(point1, point2) 返回从point1到point2的线段的方位角
地形函数
未实现:
- ST_TriangleAspect(geom) 返回三角形的长宽比
- ST_TriangleContouring(query [, z1, z2, z3 ][, varArgs ]*) 根据类别将三角形分割成更小的三角形
- ST_TriangleDirection(geom) 计算三角形最陡上升的方向并将其作为线串返回
- ST_TriangleSlope(geom) 以百分比形式计算三角形的斜率
- ST_Voronoi(geom [, outDimension [, EnvelopePolygon ]]) 创建 Voronoi 图
三角测量函数
C | 运算符语法 | 描述 |
---|---|---|
H | ST_ConstrainedDelaunay(geom [, 标志]) | 基于geom计算约束 Delaunay 三角剖分 |
H | ST_Delaunay(geom [, 标志]) | 基于geom中的点计算 Delaunay 三角剖分 |
未实现:
- ST_Tessellate(polygon)用自适应三角形对多边形(可能是 MULTIPOLYGON)进行细分
几何聚合函数
C | 运算符语法 | 描述 |
---|---|---|
H | ST_Accum(几何) | 将geom累加到数组中 |
H | ST_Collect(几何) | 将geom收集到 GeometryCollection 中 |
H | ST_Union(几何) | 计算geom中几何图形的并集 |
JSON 函数
在下面的:
- jsonValue是包含 JSON 值的字符串;
- path是包含 JSON 路径表达式的字符串;模式标志
strict
orlax
应该在路径的开头指定。
查询功能
运算符语法 | 描述 |
---|---|
JSON_EXISTS(jsonValue, 路径 [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ] ) | jsonValue是否满足使用JSON路径表达式path描述的搜索条件 |
JSON_VALUE(jsonValue, 路径 [ 返回类型 ] [ { ERROR | NULL | DEFAULT expr } ON EMPTY ] [ { ERROR | NULL | DEFAULT expr } ON ERROR ] ) | 使用 JSON 路径表达式路径从**jsonValue中提取 SQL 标量 |
JSON_QUERY(jsonValue, 路径 [ {WITHOUT [ ARRAY ] | WITH [ CONDITIONAL | UNCONDITIONAL ] [ ARRAY ] } WRAPPER ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON EMPTY ] [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } 错误]) | 使用路径JSON 路径表达式从jsonValue中提取 JSON 对象或 JSON 数组 |
笔记:
- and
ON ERROR
子句ON EMPTY
定义抛出错误或即将返回空值时函数的回退行为。 - 该
ARRAY WRAPPER
子句定义了如何在JSON_QUERY
函数中表示 JSON 数组结果。以下示例比较了包装器行为。
示例数据:
1 | {"a": "[1,2]", "b": [1,2], "c": "hi"} |
比较:
操作员 | $.A | $.B | $.C |
---|---|---|---|
JSON_VALUE | [1, 2] | 错误 | 你好 |
没有数组包装器的 JSON 查询 | 错误 | [1, 2] | 错误 |
使用无条件数组包装器的 JSON 查询 | [“[1,2]”] | [ [1,2] ] | [ “你好” ] |
带条件数组包装器的 JSON 查询 | [“[1,2]”] | [1,2] | [ “你好” ] |
未实现:
- JSON_TABLE
Constructor Functions
OPERATOR SYNTAX | DESCRIPTION |
---|---|
JSON_OBJECT( jsonKeyVal [, jsonKeyVal ]* [ nullBehavior ] ) | Construct JSON object using a series of key-value pairs |
JSON_OBJECTAGG( jsonKeyVal [ nullBehavior ] ) | Aggregate function to construct a JSON object using a key-value pair |
JSON_ARRAY( [ jsonVal [, jsonVal ]* ] [ nullBehavior ] ) | Construct a JSON array using a series of values |
JSON_ARRAYAGG( jsonVal [ ORDER BY orderItem [, orderItem ]* ] [ nullBehavior ] ) | Aggregate function to construct a JSON array using a value |
1 | jsonKeyVal: |
Note:
- The flag
FORMAT JSON
indicates the value is formatted as JSON character string. WhenFORMAT JSON
is used, the value should be de-parse from JSON character string to a SQL structured value. ON NULL
clause defines how the JSON output represents null values. The default null behavior ofJSON_OBJECT
andJSON_OBJECTAGG
isNULL ON NULL
, and forJSON_ARRAY
andJSON_ARRAYAGG
it isABSENT ON NULL
.- If
ORDER BY
clause is provided,JSON_ARRAYAGG
sorts the input rows into the specified order before performing aggregation.
Comparison Operators
OPERATOR SYNTAX | DESCRIPTION |
---|---|
jsonValue IS JSON [ VALUE ] | Whether jsonValue is a JSON value |
jsonValue IS NOT JSON [ VALUE ] | Whether jsonValue is not a JSON value |
jsonValue IS JSON SCALAR | Whether jsonValue is a JSON scalar value |
jsonValue IS NOT JSON SCALAR | Whether jsonValue is not a JSON scalar value |
jsonValue IS JSON OBJECT | Whether jsonValue is a JSON object |
jsonValue IS NOT JSON OBJECT | Whether jsonValue is not a JSON object |
jsonValue IS JSON ARRAY | Whether jsonValue is a JSON array |
jsonValue IS NOT JSON ARRAY | Whether jsonValue is not a JSON array |
Dialect-specific Operators
The following operators are not in the SQL standard, and are not enabled in Calcite’s default operator table. They are only available for use in queries if your session has enabled an extra operator table.
To enable an operator table, set the fun connect string parameter.
The ‘C’ (compatibility) column contains value:
- ‘*’ for all libraries,
- ‘b’ for Google BigQuery (‘fun=bigquery’ in the connect string),
- ‘c’ for Apache Calcite (‘fun=calcite’ in the connect string),
- ‘h’ for Apache Hive (‘fun=hive’ in the connect string),
- ‘m’ for MySQL (‘fun=mysql’ in the connect string),
- ‘q’ for Microsoft SQL Server (‘fun=mssql’ in the connect string),
- ‘o’ for Oracle (‘fun=oracle’ in the connect string),
- ‘p’ for PostgreSQL (‘fun=postgresql’ in the connect string),
- ’s’ for Apache Spark (‘fun=spark’ in the connect string).
One operator name may correspond to multiple SQL dialects, but with different semantics.
BigQuery’s type system uses confusingly different names for types and functions:
- BigQuery’s
DATETIME
type represents a local date time, and corresponds to Calcite’sTIMESTAMP
type; - BigQuery’s
TIMESTAMP
type represents an instant, and corresponds to Calcite’sTIMESTAMP WITH LOCAL TIME ZONE
type; - The timestampLtz parameter, for instance in
DATE(timestampLtz)
, has Calcite typeTIMESTAMP WITH LOCAL TIME ZONE
; - The
TIMESTAMP(string)
function, designed to be compatible the BigQuery function, return a CalciteTIMESTAMP WITH LOCAL TIME ZONE
; - Similarly,
DATETIME(string)
returns a CalciteTIMESTAMP
.
C | OPERATOR SYNTAX | DESCRIPTION |
---|---|---|
p | expr :: type | Casts expr to type |
m | expr1 <=> expr2 | Whether two values are equal, treating null values as the same, and it’s similar to IS NOT DISTINCT FROM |
* | ACOSH(numeric) | Returns the inverse hyperbolic cosine of numeric |
s | ARRAY(expr [, expr ]*) | Construct an array in Apache Spark |
s | ARRAY_APPEND(array, element) | Appends an element to the end of the array and returns the result. Type of element should be similar to type of the elements of the array. If the array is null, the function will return null. If an element that is null, the null element will be added to the end of the array |
s | ARRAY_COMPACT(array) | Removes null values from the array |
b | ARRAY_CONCAT(array [, array ]*) | Concatenates one or more arrays. If any input argument is NULL the function returns NULL |
s | ARRAY_CONTAINS(array, element) | Returns true if the array contains the element |
s | ARRAY_DISTINCT(array) | Removes duplicate values from the array that keeps ordering of elements |
s | ARRAY_EXCEPT(array1, array2) | Returns an array of the elements in array1 but not in array2, without duplicates |
s | ARRAY_INSERT(array, pos, element) | Places element into index pos of array. Array index start at 1, or start from the end if index is negative. Index above array size appends the array, or prepends the array if index is negative, with NULL elements. |
s | ARRAY_INTERSECT(array1, array2) | Returns an array of the elements in the intersection of array1 and array2, without duplicates |
s | ARRAY_JOIN(array, delimiter [, nullText ]) | Synonym for ARRAY_TO_STRING |
b | ARRAY_LENGTH(array) | Synonym for CARDINALITY |
s | ARRAY_MAX(array) | Returns the maximum value in the array |
s | ARRAY_MIN(array) | Returns the minimum value in the array |
s | ARRAY_POSITION(array, element) | Returns the (1-based) index of the first element of the array as long |
s | ARRAY_REMOVE(array, element) | Remove all elements that equal to element from the array |
s | ARRAY_PREPEND(array, element) | Appends an element to the beginning of the array and returns the result. Type of element should be similar to type of the elements of the array. If the array is null, the function will return null. If an element that is null, the null element will be added to the beginning of the array |
s | ARRAY_REPEAT(element, count) | Returns the array containing element count times. |
b | ARRAY_REVERSE(array) | Reverses elements of array |
s | ARRAY_SIZE(array) | Synonym for CARDINALITY |
b | ARRAY_TO_STRING(array, delimiter [, nullText ]) | Returns a concatenation of the elements in array as a STRING and take delimiter as the delimiter. If the nullText parameter is used, the function replaces any NULL values in the array with the value of nullText. If the nullText parameter is not used, the function omits the NULL value and its preceding delimiter. Returns NULL if any argument is NULL |
s | ARRAY_UNION(array1, array2) | Returns an array of the elements in the union of array1 and array2, without duplicates |
s | ARRAYS_OVERLAP(array1, array2) | Returns true if array1 contains at least a non-null element present also in *array2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise |
s | ARRAYS_ZIP(array [, array ]*) | Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays |
s | SORT_ARRAY(array [, ascendingOrder]) | Sorts the array in ascending or descending order according to the natural ordering of the array elements. The default order is ascending if ascendingOrder is not specified. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order |
* | ASINH(numeric) | Returns the inverse hyperbolic sine of numeric |
* | ATANH(numeric) | Returns the inverse hyperbolic tangent of numeric |
s | BIT_LENGTH(binary) | Returns the bit length of binary |
s | BIT_LENGTH(string) | Returns the bit length of string |
s | BIT_GET(value, position) | Returns the bit (0 or 1) value at the specified position of numeric value. The positions are numbered from right to left, starting at zero. The position argument cannot be negative |
b | CEIL(value) | Similar to standard CEIL(value) except if value is an integer type, the return type is a double |
m s | CHAR(integer) | Returns the character whose ASCII code is integer % 256, or null if integer < 0 |
b o p | CHR(integer) | Returns the character whose UTF-8 code is integer |
b | CODE_POINTS_TO_BYTES(integers) | Converts integers, an array of integers between 0 and 255 inclusive, into bytes; throws error if any element is out of range |
b | CODE_POINTS_TO_STRING(integers) | Converts integers, an array of integers between 0 and 0xD7FF or between 0xE000 and 0x10FFFF inclusive, into string; throws error if any element is out of range |
o | CONCAT(string, string) | Concatenates two strings, returns null only when both string arguments are null, otherwise treats null as empty string |
b m | CONCAT(string [, string ]*) | Concatenates one or more strings, returns null if any of the arguments is null |
p q | CONCAT(string [, string ]*) | Concatenates one or more strings, null is treated as empty string |
m p | CONCAT_WS(separator, str1 [, string ]*) | Concatenates one or more strings, returns null only when separator is null, otherwise treats null arguments as empty strings |
q | CONCAT_WS(separator, str1, str2 [, string ]*) | Concatenates two or more strings, requires at least 3 arguments (up to 254), treats null arguments as empty strings |
m | COMPRESS(string) | Compresses a string using zlib compression and returns the result as a binary string |
b | CONTAINS_SUBSTR(expression, string [ , json_scope => json_scope_value ]) | Returns whether string exists as a substring in expression. Optional json_scope argument specifies what scope to search if expression is in JSON format. Returns NULL if a NULL exists in expression that does not result in a match |
q | CONVERT(type, expression [ , style ]) | Equivalent to CAST(expression AS type) ; ignores the style operand |
p | CONVERT_TIMEZONE(tz1, tz2, datetime) | Converts the timezone of datetime from tz1 to tz2 |
* | COSH(numeric) | Returns the hyperbolic cosine of numeric |
* | COTH(numeric) | Returns the hyperbolic cotangent of numeric |
* | CSC(numeric) | Returns the cosecant of numeric in radians |
* | CSCH(numeric) | Returns the hyperbolic cosecant of numeric |
b | CURRENT_DATETIME([ timeZone ]) | Returns the current time as a TIMESTAMP from timezone |
m | DAYNAME(datetime) | Returns the name, in the connection’s locale, of the weekday in datetime; for example, it returns ‘星期日’ for both DATE ‘2020-02-10’ and TIMESTAMP ‘2020-02-10 10:10:10’ |
b | DATE(timestamp) | Extracts the DATE from a timestamp |
b | DATE(timestampLtz) | Extracts the DATE from timestampLtz (an instant; BigQuery’s TIMESTAMP type), assuming UTC |
b | DATE(timestampLtz, timeZone) | Extracts the DATE from timestampLtz (an instant; BigQuery’s TIMESTAMP type) in timeZone |
b | DATE(string) | Equivalent to CAST(string AS DATE) |
b | DATE(year, month, day) | Returns a DATE value for year, month, and day (all of type INTEGER) |
p q | DATEADD(timeUnit, integer, datetime) | Equivalent to TIMESTAMPADD(timeUnit, integer, datetime) |
p q | DATEDIFF(timeUnit, datetime, datetime2) | Equivalent to TIMESTAMPDIFF(timeUnit, datetime, datetime2) |
q | DATEPART(timeUnit, datetime) | Equivalent to EXTRACT(timeUnit FROM datetime) |
b | DATETIME(date, time) | Converts date and time to a TIMESTAMP |
b | DATETIME(date) | Converts date to a TIMESTAMP value (at midnight) |
b | DATETIME(date, timeZone) | Converts date to a TIMESTAMP value (at midnight), in timeZone |
b | DATETIME(year, month, day, hour, minute, second) | Creates a TIMESTAMP for year, month, day, hour, minute, second (all of type INTEGER) |
b | DATETIME_ADD(timestamp, interval) | Returns the TIMESTAMP value that occurs interval after timestamp |
b | DATETIME_DIFF(timestamp, timestamp2, timeUnit) | Returns the whole number of timeUnit between timestamp and timestamp2 |
b | DATETIME_SUB(timestamp, interval) | Returns the TIMESTAMP that occurs interval before timestamp |
b | DATETIME_TRUNC(timestamp, timeUnit) | Truncates timestamp to the granularity of timeUnit, rounding to the beginning of the unit |
b | DATE_FROM_UNIX_DATE(integer) | Returns the DATE that is integer days after 1970-01-01 |
p | DATE_PART(timeUnit, datetime) | Equivalent to EXTRACT(timeUnit FROM datetime) |
b | DATE_ADD(date, interval) | Returns the DATE value that occurs interval after date |
b | DATE_DIFF(date, date2, timeUnit) | Returns the whole number of timeUnit between date and date2 |
b | DATE_SUB(date, interval) | Returns the DATE value that occurs interval before date |
b | DATE_TRUNC(date, timeUnit) | Truncates date to the granularity of timeUnit, rounding to the beginning of the unit |
o | DECODE(value, value1, result1 [, valueN, resultN ]* [, default ]) | Compares value to each valueN value one by one; if value is equal to a valueN, returns the corresponding resultN, else returns default, or NULL if default is not specified |
p | DIFFERENCE(string, string) | 返回两个字符串相似度的度量,即它们的值具有共同的字符位置的数量SOUNDEX :如果值相同则为 4 ,如果值完全不同则为SOUNDEX 0SOUNDEX |
乙 | ENDS_WITH(字符串1, 字符串2) | 返回string2是否是**string1的后缀 |
哦 | EXTRACT(xml, xpath, [, 命名空间]) | 返回与 XPath 表达式匹配的一个或多个元素的 XML 片段。可选的命名空间值,指定前缀的默认映射或命名空间映射,在计算 XPath 表达式时使用 |
哦 | EXISTSNODE(xml, xpath, [, 命名空间]) | 确定使用指定的 xpath 遍历 XML 文档是否会产生任何节点。如果对 XPath 表达式匹配的一个或多个元素的文档片段应用 XPath 遍历后没有剩余节点,则返回 0。如果还有任何节点,则返回 1。可选的命名空间值,指定前缀的默认映射或命名空间映射,在计算 XPath 表达式时使用。 |
米 | EXTRACTVALUE(xml, xpathExpr)) | 返回第一个文本节点的文本,该文本节点是与 XPath 表达式匹配的一个或多个元素的子级。 |
HS | 阶乘(整数) | 返回整数的阶乘,整数范围为 [0, 20]。否则,返回 NULL |
HS | FIND_IN_SET(matchStr, textStr) | 返回逗号分隔的textStr中给定**matchStr的索引(从 1 开始)。如果未找到给定的matchStr或**matchStr包含逗号,则返回 0 。例如,FIND_IN_SET(‘bc’, ‘a,bc,def’) 返回 2 |
乙 | 楼层(值) | 与标准类似,FLOOR(value) 但如果value是整数类型,则返回类型是 double |
乙 | FORMAT_DATE(字符串,日期) | 根据指定的格式字符串格式化日期 |
乙 | FORMAT_DATETIME(字符串,时间戳) | 根据指定的格式字符串格式化时间戳 |
HS | FORMAT_NUMBER(值,十进制值) | 将数字值格式化为“#,###,###.##”,四舍五入到小数位DecimalVal。如果decimalVal为0,则结果没有小数点或小数部分 |
HS | FORMAT_NUMBER(值,格式) | 将数字值格式化为 MySQL 的 FORMAT格式,例如 ‘#,###,###.##0.00’ |
乙 | FORMAT_TIME(字符串,时间) | 根据指定的格式字符串格式化时间 |
乙 | FORMAT_TIMESTAMP(字符串时间戳) | 根据指定的格式字符串格式化时间戳 |
s | GETBIT(值,位置) | 相当于BIT_GET(value, position) |
博 | GREATEST(expr [, expr ]*) | 返回最大的表达式 |
黑社会 | IF(条件, 值1, 值2) | 如果条件为 TRUE,则返回value1 ,否则返回value2 |
乙 | IFNULL(值1,值2) | 相当于NVL(value1, value2) |
p | string1 ILIKE string2 [ ESCAPE string3 ] | string1是否匹配模式string2,忽略大小写(类似于LIKE ) |
p | string1 不喜欢 string2 [ ESCAPE string3 ] | string1是否与模式string2不匹配,忽略大小写(类似于NOT LIKE ) |
博 | INSTR(字符串, 子字符串 [, 来自 [, 出现次数 ] ]) | 返回子字符串在string中的位置,从 from (默认 1)开始搜索,直到找到子字符串第 n 次**出现(默认 1) |
米 | INSTR(字符串,子字符串) | 相当于POSITION(substring IN string) |
乙 | IS_INF(值) | 返回值是否为无穷大 |
乙 | IS_NAN(值) | 返回值是否为 NaN |
米 | JSON_TYPE(json值) | 返回一个字符串值,表示jsonValue的类型 |
米 | JSON_DEPTH(json值) | 返回一个整数值,表示jsonValue的深度 |
米 | JSON_PRETTY(jsonValue) | 返回jsonValue的漂亮打印 |
米 | JSON_LENGTH(jsonValue [, 路径]) | 返回一个整数,表示jsonValue的长度 |
米 | JSON_INSERT(jsonValue, 路径, val [, 路径, val ]*) | 返回一个 JSON 文档,插入jsonValue、path、val的数据 |
米 | JSON_KEYS(jsonValue [, 路径]) | 返回一个字符串,指示 JSON jsonValue的键 |
米 | JSON_REMOVE(jsonValue, 路径 [, 路径 ]) | 使用一系列路径表达式从**jsonValue中删除数据并返回结果 |
米 | JSON_REPLACE(jsonValue, 路径, val [, 路径, val ]*) | 返回一个 JSON 文档,替换jsonValue、path、val的数据 |
米 | JSON_SET(jsonValue, 路径, val [, 路径, val ]*) | 返回一个 JSON 文档,设置jsonValue、path、val的数据 |
米 | JSON_STORAGE_SIZE(json值) | 返回用于存储jsonValue的二进制表示形式的字节数 |
博 | LEAST(expr [, expr ]* ) | 返回最少的表达式 |
图像格式 | 左(字符串,长度) | 返回字符串中最左边的长度字符 |
乙 | 长度(字符串) | 相当于CHAR_LENGTH(string) |
HS | LEVENSHTEIN(字符串1, 字符串2) | 返回string1和string2之间的编辑距离 |
乙 | LOG(数字1 [, 数字2 ]) | 返回numeric1以**numeric2为底的对数,如果numeric2不存在,则返回以 e 为底的对数 |
博 | LPAD(字符串, 长度[, 模式]) | 返回一个字符串或字节值,该值由前面带有模式的长度**字符串组成 |
乙 | TO_BASE32(字符串) | 将字符串转换为 Base-32 编码形式并返回编码字符串 |
乙 | FROM_BASE32(字符串) | 以字符串形式返回 base-32字符串的解码结果 |
米 | TO_BASE64(字符串) | 将字符串转换为 base-64 编码形式并返回编码字符串 |
BM | FROM_BASE64(字符串) | 将 Base-64字符串的解码结果作为字符串返回 |
乙 | TO_HEX(二进制) | 将二进制转换为十六进制 varchar |
乙 | FROM_HEX(varchar) | 将十六进制编码的varchar转换为字节 |
博 | LTRIM(字符串) | 返回从开头删除所有空格的字符串 |
s | 地图() | 返回一个空地图 |
s | MAP(键,值[,键,值]*) | 返回具有给定键/值对的映射 |
s | MAP_CONCAT(地图[,地图]*) | 连接一个或多个地图。如果任何输入参数是NULL 该函数返回NULL 。请注意,方解石使用的是 LAST_WIN 策略 |
s | MAP_ENTRIES(地图) | 以数组形式返回映射的条目,条目的顺序未定义 |
s | MAP_KEYS(地图) | 以数组形式返回映射的键,条目的顺序未定义 |
s | MAP_VALUES(地图) | 以数组形式返回映射的值,条目的顺序未定义 |
s | MAP_FROM_ARRAYS(数组1,数组2) | 返回从array1和array2创建的映射。请注意,两个数组的长度应该相同,方解石使用 LAST_WIN 策略 |
s | MAP_FROM_ENTRIES(行数组) | 返回从具有两个字段的行数组创建的映射。注意一行中的字段数必须为2。注意方解石使用的是LAST_WIN策略 |
s | STR_TO_MAP(字符串[, stringDelimiter [, keyValueDelimiter]]) | 使用分隔符将字符串拆分为键/值对后返回映射。stringDelimiter 的默认分隔符为“,” , keyValueDelimiter的默认**分隔符为“:” 。请注意,方解石使用的是 LAST_WIN 策略 |
图像格式 | MD5(字符串) | 计算字符串的 MD5 128 位校验和并将其作为十六进制字符串返回 |
米 | 月份名称(日期) | 返回datetime中月份的名称(在连接的区域设置中);例如,它对于 DATE ‘2020-02-10’ 和 TIMESTAMP ‘2020-02-10 10:10:10’ 都返回 ‘二月’ |
哦 | NVL(值1,值2) | 如果value1不为 null,则返回value1 ,否则返回**value2 |
乙 | 偏移量(索引) | 对数组进行索引时,将index包裹起来会返回从 0 开始的索引处OFFSET 的值;如果索引越界则抛出错误 |
乙 | 序数(索引) | 类似于OFFSET 除了索引从 1 开始 |
乙 | PARSE_DATE(格式,字符串) | 使用 format 指定的格式将日期的**字符串表示形式转换为 DATE 值 |
乙 | PARSE_DATETIME(格式,字符串) | 使用 format 指定的格式将日期时间的**字符串表示形式转换为 TIMESTAMP 值 |
乙 | PARSE_TIME(格式,字符串) | 使用 format 指定的格式将时间的**字符串表示形式转换为 TIME 值 |
乙 | PARSE_TIMESTAMP(格式,字符串[,时区]) | 使用 format 指定的格式将时间戳的**字符串表示形式转换为timeZone中的 TIMESTAMP WITH LOCAL TIME ZONE 值 |
HS | PARSE_URL(urlString,partToExtract [,keyToExtract]) | 从urlString返回指定的partToExtract。partToExtract的有效值包括 HOST、PATH、QUERY、REF、PROTOCOL、AUTHORITY、FILE 和 USERINFO。keyToExtract指定要提取的查询 |
乙 | POW(数字1,数字2) | 返回numeric1的**numeric2次方 |
乙 | REGEXP_CONTAINS(字符串,正则表达式) | 返回字符串是否与正则**表达式部分匹配 |
乙 | REGEXP_EXTRACT(字符串, 正则表达式 [, 位置 [, 出现次数]]) | 返回string中与**regexp匹配的子字符串,从位置(默认 1)开始搜索,直到找到第 n 个匹配项(默认 1)。如果没有匹配则返回NULL |
乙 | REGEXP_EXTRACT_ALL(字符串,正则表达式) | 返回string中与**regexp匹配的所有子字符串的数组。如果没有匹配则返回空数组 |
乙 | REGEXP_INSTR(字符串, 正则表达式 [, 位置 [, 出现次数 [, 出现位置]]]) | 返回string中与**regexp匹配的子字符串的从 1 开始的最低位置,从位置(默认 1)开始搜索,直到找到第 n 个匹配项(默认 1)。将occurrence_position(默认0)设置为1返回子字符串的结束位置+1。如果没有匹配则返回0 |
蒙特利尔银行 | REGEXP_REPLACE(字符串,正则表达式,代表[,位置[,出现次数[,匹配类型]]]) | 将与regexp匹配的字符串的所有子字符串替换为 expr 中起始位置处的**rep(如果省略,默认为 1),occurrence指定要搜索的匹配项的出现次数(如果省略,默认为 1),matchType指定如何执行匹配 |
乙 | REGEXP_SUBSTR(string, regexp [, position [, occurrence]]) | Synonym for REGEXP_EXTRACT |
b m p | REPEAT(string, integer) | Returns a string consisting of string repeated of integer times; returns an empty string if integer is less than 1 |
b m | REVERSE(string) | Returns string with the order of the characters reversed |
b m p | RIGHT(string, length) | Returns the rightmost length characters from the string |
h s | string1 RLIKE string2 | Whether string1 matches regex pattern string2 (similar to LIKE , but uses Java regex) |
h s | string1 NOT RLIKE string2 | Whether string1 does not match regex pattern string2 (similar to NOT LIKE , but uses Java regex) |
b o | RPAD(string, length[, pattern ]) | Returns a string or bytes value that consists of string appended to length with pattern |
b o | RTRIM(string) | Returns string with all blanks removed from the end |
b | SAFE_ADD(numeric1, numeric2) | Returns numeric1 + numeric2, or NULL on overflow |
b | SAFE_CAST(value AS type) | Converts value to type, returning NULL if conversion fails |
b | SAFE_DIVIDE(numeric1, numeric2) | Returns numeric1 / numeric2, or NULL on overflow or if numeric2 is zero |
b | SAFE_MULTIPLY(numeric1, numeric2) | Returns numeric1 * numeric2, or NULL on overflow |
b | SAFE_NEGATE(numeric) | Returns numeric * -1, or NULL on overflow |
b | SAFE_OFFSET(index) | Similar to OFFSET except null is returned if index is out of bounds |
b | SAFE_ORDINAL(index) | Similar to OFFSET except index begins at 1 and null is returned if index is out of bounds |
b | SAFE_SUBTRACT(numeric1, numeric2) | Returns numeric1 - numeric2, or NULL on overflow |
* | SEC(numeric) | Returns the secant of numeric in radians |
* | SECH(numeric) | Returns the hyperbolic secant of numeric |
b m p | SHA1(string) | Calculates a SHA-1 hash value of string and returns it as a hex string |
b p | SHA256(string) | Calculates a SHA-256 hash value of string and returns it as a hex string |
b p | SHA512(string) | Calculates a SHA-512 hash value of string and returns it as a hex string |
* | SINH(numeric) | Returns the hyperbolic sine of numeric |
b m o p | SOUNDEX(string) | Returns the phonetic representation of string; throws if string is encoded with multi-byte encoding such as UTF-8 |
s | SOUNDEX(string) | Returns the phonetic representation of string; return original string if string is encoded with multi-byte encoding such as UTF-8 |
m | SPACE(integer) | Returns a string of integer spaces; returns an empty string if integer is less than 1 |
b | SPLIT(string [, delimiter ]) | Returns the string array of string split at delimiter (if omitted, default is comma) |
b | STARTS_WITH(string1, string2) | Returns whether string2 is a prefix of string1 |
m | STRCMP(string, string) | Returns 0 if both of the strings are same and returns -1 when the first argument is smaller than the second and 1 when the second one is smaller than the first one |
b p | STRPOS(string, substring) | Equivalent to POSITION(substring IN string) |
b m o p | SUBSTR(string, position [, substringLength ]) | Returns a portion of string, beginning at character position, substringLength characters long. SUBSTR calculates lengths using characters as defined by the input character set |
* | TANH(numeric) | Returns the hyperbolic tangent of numeric |
b | TIME(hour, minute, second) | Returns a TIME value hour, minute, second (all of type INTEGER) |
b | TIME(timestamp) | Extracts the TIME from timestamp (a local time; BigQuery’s DATETIME type) |
b | TIME(instant) | Extracts the TIME from timestampLtz (an instant; BigQuery’s TIMESTAMP type), assuming UTC |
b | TIME(instant, timeZone) | Extracts the time from timestampLtz (an instant; BigQuery’s TIMESTAMP type), in timeZone |
b | TIMESTAMP(string) | Equivalent to CAST(string AS TIMESTAMP WITH LOCAL TIME ZONE) |
b | TIMESTAMP(string, timeZone) | Equivalent to CAST(string AS TIMESTAMP WITH LOCAL TIME ZONE) , converted to timeZone |
b | TIMESTAMP(date) | Converts date to a TIMESTAMP WITH LOCAL TIME ZONE value (at midnight) |
b | TIMESTAMP(date, timeZone) | Converts date to a TIMESTAMP WITH LOCAL TIME ZONE value (at midnight), in timeZone |
b | TIMESTAMP(timestamp) | Converts timestamp to a TIMESTAMP WITH LOCAL TIME ZONE, assuming a UTC |
b | TIMESTAMP(timestamp, timeZone) | Converts timestamp to a TIMESTAMP WITH LOCAL TIME ZONE, in timeZone |
b | TIMESTAMP_ADD(timestamp, interval) | Returns the TIMESTAMP value that occurs interval after timestamp |
b | TIMESTAMP_DIFF(timestamp, timestamp2, timeUnit) | Returns the whole number of timeUnit between timestamp and timestamp2. Equivalent to TIMESTAMPDIFF(timeUnit, timestamp2, timestamp) and (timestamp - timestamp2) timeUnit |
b | TIMESTAMP_MICROS(integer) | Returns the TIMESTAMP that is integer microseconds after 1970-01-01 00:00:00 |
b | TIMESTAMP_MILLIS(integer) | Returns the TIMESTAMP that is integer milliseconds after 1970-01-01 00:00:00 |
b | TIMESTAMP_SECONDS(integer) | Returns the TIMESTAMP that is integer seconds after 1970-01-01 00:00:00 |
b | TIMESTAMP_SUB(timestamp, interval) | Returns the TIMESTAMP value that is interval before timestamp |
b | TIMESTAMP_TRUNC(timestamp, timeUnit) | Truncates timestamp to the granularity of timeUnit, rounding to the beginning of the unit |
b | TIME_ADD(time, interval) | Adds interval to time, independent of any time zone |
b | TIME_DIFF(time, time2, timeUnit) | Returns the whole number of timeUnit between time and time2 |
b | TIME_SUB(time, interval) | Returns the TIME value that is interval before time |
b | TIME_TRUNC(time, timeUnit) | Truncates time to the granularity of timeUnit, rounding to the beginning of the unit |
m o p | TO_CHAR(timestamp, format) | Converts timestamp to a string using the format format |
b | TO_CODE_POINTS(string) | Converts string to an array of integers that represent code points or extended ASCII character values |
o p | TO_DATE(string, format) | Converts string to a date using the format format |
o p | TO_TIMESTAMP(string, format) | Converts string to a timestamp using the format format |
b o p | TRANSLATE(expr, fromString, toString) | Returns expr with all occurrences of each character in fromString replaced by its corresponding character in toString. Characters in expr that are not in fromString are not replaced |
b | TRUNC(numeric1 [, numeric2 ]) | Truncates numeric1 to optionally numeric2 (if not specified 0) places right to the decimal point |
q | TRY_CAST(value AS type) | Converts value to type, returning NULL if conversion fails |
b | UNIX_MICROS(timestamp) | Returns the number of microseconds since 1970-01-01 00:00:00 |
b | UNIX_MILLIS(timestamp) | Returns the number of milliseconds since 1970-01-01 00:00:00 |
b | UNIX_SECONDS(timestamp) | Returns the number of seconds since 1970-01-01 00:00:00 |
b | UNIX_DATE(date) | Returns the number of days since 1970-01-01 |
s | URL_DECODE(string) | 使用特定的编码方案解码“application/x-www-form-urlencoded”格式的字符串*,解码错误时返回原始字符串* |
s | URL_ENCODE(字符串) | 使用特定的编码方案将字符串转换为“application/x-www-form-urlencoded”格式 |
哦 | XMLTRANSFORM(xml,xslt) | 将 XSLT 转换xslt转换为 XML 字符串xml并返回结果 |
笔记:
Calcite 没有 Redshift 库,因此使用 Postgres 库。这些函数
DATEADD
是DATEDIFF
在 Redshift 而不是 Postgres 中实现的,但仍然出现在 Calcite 的 Postgres 库中函数
DATEADD
,DATEDIFF
,DATE_PART
需要 Babel 解析器JSON_TYPE
///如果参数为null则JSON_DEPTH
返回JSON_PRETTY
nullJSON_STORAGE_SIZE
JSON_LENGTH
//如果第一个参数为null则返回JSON_KEYS
nullJSON_REMOVE
JSON_TYPE
通常返回一个大写字符串标志,指示 JSON 输入的类型。目前支持的类型标志有:- 整数
- 细绳
- 漂浮
- 双倍的
- 长的
- 布尔值
- 日期
- 目的
- 大批
- 无效的
JSON_DEPTH
定义 JSON 值的深度如下:- 空数组、空对象或标量值的深度为 1;
- 仅包含深度为 1 的元素的非空数组或仅包含深度为 1 的成员值的非空对象的深度为 2;
- 否则,JSON 文档的深度大于 2。
JSON_LENGTH
定义 JSON 值的长度如下:- 标量值的长度为 1;
- 数组或对象的长度是包含的元素数量。
方言特定的聚合函数。
C | 运算符语法 | 描述 |
---|---|---|
C | 骨料(m) | 在当前 GROUP BY 键的上下文中计算度量m |
BP | ARRAY_AGG( [ ALL | DISTINCT ] 值 [ 尊重 NULLS | 忽略 NULLS ] [ ORDER BY orderItem [, orderItem ]* ] ) | 将值收集到数组中 |
BP | ARRAY_CONCAT_AGG([ ALL | DISTINCT ] 值 [ ORDER BY orderItem [, orderItem ]* ] ) | 将数组连接成数组 |
p | BOOL_AND(条件) | 同义词EVERY |
p | BOOL_OR(条件) | 同义词SOME |
乙 | COUNTIF(条件) | 返回条件为 TRUE 的行数;相当于COUNT(*) FILTER (WHERE condition) |
米 | GROUP_CONCAT( [ ALL | DISTINCT ] value [, value ]* [ ORDER BY orderItem [, orderItem ]* ] [ SEPARATOR 分隔符 ] ) | MySQL 特定的变体LISTAGG |
乙 | 逻辑与(条件) | 同义词EVERY |
乙 | LOGICAL_OR(条件) | 同义词SOME |
s | MAX_BY(值,补偿) | 同义词ARG_MAX |
s | MIN_BY(值,补偿) | 同义词ARG_MIN |
乙 | PERCENTILE_CONT(值, 分数 [ RESPECT NULLS | IGNORE NULLS ] ) OVER windowSpec | 标准的同义词PERCENTILE_CONT ,PERCENTILE_CONT(value, fraction) OVER (ORDER BY value) 相当于标准PERCENTILE_CONT(fraction) WITHIN GROUP (ORDER BY value) |
乙 | PERCENTILE_DISC(值, 分数 [ RESPECT NULLS | IGNORE NULLS ] ) OVER windowSpec | 标准的同义词PERCENTILE_DISC ,PERCENTILE_DISC(value, fraction) OVER (ORDER BY value) 相当于标准PERCENTILE_DISC(fraction) WITHIN GROUP (ORDER BY value) |
BP | STRING_AGG( [ ALL | DISTINCT ] 值 [, 分隔符] [ ORDER BY orderItem [, orderItem ]* ] ) | 同义词LISTAGG |
用法示例:
JSON_TYPE 示例
SQL
1 | SELECT JSON_TYPE(v) AS c1, |
结果
C1 | C2 | C3 | C4 |
---|---|---|---|
目的 | 大批 | 整数 | 布尔值 |
JSON_DEPTH 示例
SQL
1 | SELECT JSON_DEPTH(v) AS c1, |
结果
C1 | C2 | C3 | C4 |
---|---|---|---|
3 | 2 | 1 | 1 |
JSON_LENGTH 示例
SQL
1 | SELECT JSON_LENGTH(v) AS c1, |
结果
C1 | C2 | C3 | C4 |
---|---|---|---|
1 | 2 | 1 | 1 |
JSON_INSERT 示例
SQL
1 | SELECT JSON_INSERT(v, '$.a', 10, '$.c', '[1]') AS c1, |
结果
C1 | C2 |
---|---|
{“a”:1,“b”:[2],“c”:“[1]”} | {“a”:1,“b”:[2],“c”:“[1]”} |
JSON_KEYS 示例
SQL
1 | SELECT JSON_KEYS(v) AS c1, |
结果
C1 | C2 | C3 | C4 | C5 |
---|---|---|---|---|
[“a”,“b”] | 无效的 | [“C”] | 无效的 | 无效的 |
JSON_REMOVE 示例
SQL
1 | SELECT JSON_REMOVE(v, '$[1]') AS c1 |
结果
C1 |
---|
[“广告”] |
JSON_REPLACE 示例
SQL
1 | SELECT |
结果
C1 | C2 |
---|---|
{“a”:1,“b”:[2],“c”:“[1]”} | {“a”:1 , “b”:[2] , “c”:“[1]”}”) |
JSON_SET 示例
SQL
1 | SELECT |
结果
C1 | C2 |
---|---|
{“a”:10,“b”:[2]} | 10 |
JSON_STORAGE_SIZE 示例
SQL
1 | SELECT |
结果
C1 | C2 | C3 | C4 |
---|---|---|---|
29 | 35 | 37 | 36 |
解码示例
SQL
1 | SELECT DECODE(f1, 1, 'aa', 2, 'bb', 3, 'cc', 4, 'dd', 'ee') as c1, |
结果
C1 | C2 | C3 | C4 | C5 |
---|---|---|---|---|
啊 | BB | 抄送 | DD | 伊 |
翻译示例
SQL
1 | SELECT TRANSLATE('Aa*Bb*Cc''D*d', ' */''%', '_') as c1, |
结果
C1 | C2 | C3 | C4 |
---|---|---|---|
Aa_Bb_CcD_d | Aa_Bb_CcD_d | Aa_Bb_CcD_d | Aa_Bb_CcD_d |
用户定义函数
方解石是可延伸的。您可以使用用户代码定义每种函数。对于每种函数,通常有多种定义函数的方法,从方便到高效。
要实现标量函数,有 3 个选项:
- 创建一个具有公共静态方法的类
eval
,并注册该类; - 创建一个具有公共非静态
eval
方法和不带参数的公共构造函数的类,并注册该类; - 创建一个具有一个或多个公共静态方法的类,并注册每个类/方法组合。
要实现聚合函数,有两种选择:
- 创建一个具有 public static 和方法的类
init
,add
并result
注册该类; - 创建一个具有公共非静态 和 方法的类
init
,以及一个不带参数的公共构造函数,并注册该类。
merge
(可选)向类添加公共方法;这允许 Calcite 生成合并小计的代码。
(可选)让您的类实现 SqlSplittableAggFunction 接口;这使得 Calcite 可以跨多个聚合阶段分解函数,从汇总表中汇总,并通过连接推送它。
要实现表函数,有 3 个选项:
eval
创建一个具有返回 ScannableTable 或 QueryableTable的静态方法的类,并注册该类;eval
创建一个具有返回 ScannableTable 或 QueryableTable的非静态方法的类,并注册该类;- 创建一个具有一个或多个返回 ScannableTable 或 QueryableTable的公共静态方法的类,并注册每个类/方法组合。
要实现表宏,有 3 个选项:
eval
创建一个具有返回 TranslatableTable 的静态方法的类,并注册该类;eval
创建一个具有返回 TranslatableTable 的非静态方法的类,并注册该类;- 创建一个具有一个或多个返回 TranslatableTable 的公共静态方法的类,并注册每个类/方法组合。
Calcite 从实现函数的 Java 方法的参数和返回类型推导出函数的参数类型和结果类型。此外,您可以使用参数注释指定每个参数的名称和可选性 。
使用命名参数和可选参数调用函数
通常,当您调用函数时,需要按顺序指定其所有参数。但如果函数有很多参数,特别是如果您想随着时间的推移添加更多参数,这可能会成为问题。
为了解决这个问题,SQL 标准允许您按名称传递参数,并定义可选参数(即,具有默认值,如果未指定则使用默认值)。
假设您有一个 function f
,声明如下伪语法:
1 | FUNCTION f( |
该函数的所有参数都有名称和parameters,并且 有默认值b
,因此都是可选的。(在方解石中,是可选参数唯一允许的默认值; 将来可能会改变。)
当调用带有可选参数的函数时,可以省略列表末尾的可选参数,或者DEFAULT
对任何可选参数使用关键字。这里有些例子:
f(1, 2, 3, 4, 5)
按顺序为每个参数提供一个值;f(1, 2, 3, 4)
省略e
,获取其默认值NULL
;f(1, DEFAULT, 3)
省略d
和e
,并指定使用默认值b
;f(1, DEFAULT, 3, DEFAULT, DEFAULT)
和前面的例子效果一样;f(1, 2)
不合法,因为c
不是可选的;f(1, 2, DEFAULT, 4)
不合法,因为c
不是可选的。
您可以使用语法按名称指定参数=>
。如果一个参数被命名,那么它们都必须被命名。参数可以是任何其他参数,但不得多次指定任何参数,并且您需要为每个参数提供一个不可选的值。这里有些例子:
f(c => 3, d => 1, a => 0)
相当于f(0, NULL, 3, 1, NULL)
;f(c => 3, d => 1)
不合法,因为您尚未指定 的值a
并且a
不是可选的。
SQL Hint
提示是给优化器的指令。在编写SQL时,您可能会知道优化器未知的数据信息。提示使您能够做出通常由优化器做出的决策。
- 规划器执行者:没有完美的规划器,因此实现提示以允许用户更好地控制执行是有意义的。例如:“永远不要将此子查询与其他子查询合并”(
/*+ no_merge */
);“将这些表视为前导表”(/*+ leading */
) 以影响连接顺序等; - 附加元数据/统计信息:一些统计信息,例如“扫描的表索引”或“某些洗牌键的倾斜信息”对于查询来说是动态的,用提示配置它们会非常方便,因为我们来自规划器的规划元数据非常方便通常不太准确;
- 算子资源限制:在很多情况下,我们会给执行算子一个默认的资源配置,即最小并行度、内存(资源消耗 UDF)、特殊资源要求(GPU 或 SSD 磁盘)……对资源进行分析会非常灵活每个查询都有提示(不是作业)。
句法
方解石支持两个位置的提示:
- 查询提示:关键字后
SELECT
; - 表提示:位于引用的表名称之后。
例如:
1 | SELECT /*+ hint1, hint2(a=1, b=2) */ |
语法如下:
1 | hintComment: |
它在方解石中处于实验阶段,尚未完全实施,我们已实施的是:
- 解析器支持上述语法;
RelHint
代表一个提示项;- 在 sql-to-rel 转换和规划器规划期间传播提示的机制。
我们还没有添加任何内置提示项,如果我们认为提示足够稳定,我们会引入更多。
MATCH_识别
MATCH_RECOGNIZE
是一个 SQL 扩展,用于识别复杂事件处理 (CEP) 中的事件序列。
它在方解石中处于实验阶段,但尚未完全实施。
句法
1 | matchRecognize: |
在patternQuantifier中,repeat是正整数,minRepeat和maxRepeat是非负整数。
DDL 扩展
DDL 扩展仅在 calcite-server 模块中可用。要启用,请包含calcite-server.jar
在类路径中,并添加 parserFactory=org.apache.calcite.sql.parser.ddl.SqlDdlParserImpl#FACTORY
到 JDBC 连接字符串(请参阅连接字符串属性 parserFactory)。
1 | ddlStatement: |
在createTableStatement中,如果指定AS query ,则可以省略**tableElement列表 ,或者可以省略任何tableElement的数据类型,在这种情况下,它只是重命名基础列。
在columnGenerator中,如果未指定VIRTUAL
或STORED
对于生成的列,VIRTUAL
则为默认值。
在createFunctionStatement和usingFile中,classNameLiteral 和filePathLiteral是字符文字。
声明用户定义类型的对象
在架构中定义并安装对象类型后,您可以使用它在任何 SQL 块中声明对象。例如,您可以使用对象类型来指定属性、列、变量、绑定变量、记录字段、表元素、形式参数或函数结果的数据类型。在运行时,创建对象类型的实例;也就是说,该类型的对象被实例化。每个对象可以保存不同的值。
例如,我们可以声明类型address_typ
和employee_typ
:
1 | CREATE TYPE address_typ AS ( |
使用这些类型,您可以实例化对象,如下所示:
1 | employee_typ(315, 'Francis', 'Logan', 'FLOGAN', |
写在最后
笔者因为工作原因接触到 Calcite,前期学习过程中,深感 Calcite 学习资料之匮乏,因此创建了 Calcite 从入门到精通知识星球,希望能够将学习过程中的资料和经验沉淀下来,为更多想要学习 Calcite 的朋友提供一些帮助。