Pig 适配器

原文链接：https://calcite.apache.org/docs/pig_adapter.html

概述

Pig 适配器允许你用 SQL 编写查询并使用 Apache Pig 执行它们。

一个简单的例子

让我们从一个简单的例子开始。首先，我们需要一个模型定义，如下所示。

{
  "version": "1.0",
  "defaultSchema": "SALES",
  "schemas": [ {
    "name": "PIG",
    "type": "custom",
    "factory": "org.apache.calcite.adapter.pig.PigSchemaFactory",
    "tables": [ {
      "name": "t",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.pig.PigTableFactory",
      "operand": {
        "file": "data.txt",
        "columns": ["tc0", "tc1"]
      }
    }, {
      "name": "s",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.pig.PigTableFactory",
      "operand": {
        "file": "data2.txt",
        "columns": ["sc0", "sc1"]
      }
    } ]
  } ]
}

现在，如果你编写 SQL 查询

1
2
3

select *
from "t"
join "s" on "tc1" = "sc0"

Pig 适配器将生成 Pig Latin 脚本

1
2
3

t = LOAD 'data.txt' USING PigStorage() AS (tc0:chararray, tc1:chararray);
s = LOAD 'data2.txt' USING PigStorage() AS (sc0:chararray, sc1:chararray);
t = JOIN t BY tc1, s BY sc0;

然后使用 Pig 运行时执行它，通常是 Apache Hadoop 上的 MapReduce。

与 Piglet 的关系

Calcite 还有另一个名为 Piglet 的组件。它允许你用 Pig Latin 的子集编写查询，并使用任何适用的 Calcite 适配器执行它们。因此，Piglet 基本上与 Pig 适配器相反。

写在最后

笔者因为工作原因接触到 Calcite，前期学习过程中，深感 Calcite 学习资料之匮乏，因此创建了 Calcite 从入门到精通知识星球，希望能够将学习过程中的资料和经验沉淀下来，为更多想要学习 Calcite 的朋友提供一些帮助。

Calcite 从入门到精通