摘要
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite’s architecture consists of a modular and extensible query optimizer
with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in bigdata frameworks. It is an active project that continues to introduce
support for the new types of data sources, query languages, and approaches to query processing and optimization
Apache Calcite是一個基礎的框架,它提供查詢處理,優化器,拓展查詢語言,這些拓展語言可以支持許多流行的開源數據處理系統,例如 Apache Hive, Apache Storm, Apache Flink, Druid, and MapD。Apache Calcite
主要組成部分
模塊 | 描述 |
---|---|
查詢解析器 | 處理多樣化查詢語言 |
查詢優化器 | 模塊化,可拓展,包含幾百個內建優化規則 |
適配器 | 可拓展,支持異構多種數據模型(關系型,半結構型,流式,地理hash) |
關鍵詞
Apache Calcite,關系語法,數據管理,查詢關系代數,模塊化優化器,存儲適配器
1. 引言
Following the seminal System R, conventional relational database engines dominated the data processing landscape. Yet, as far back as 2005, Stonebraker and ?etintemel [49] predicted that we would see the rise a collection of specialized engines such as column stores,stream processing engines, text search engines, and so forth. Theyargued that specialized engines can offer more cost-effective performance and that they would bring the end of the “one size fits all” paradigm. Their vision seems today more relevant than ever.Indeed, many specialized open-source data systems have since become popular such as Storm [50] and Flink [16] (stream processing),
Elasticsearch [15] (text search), Apache Spark [47], Druid [14], etc.As organizations have invested in data processing systems tailored towards their specific needs, two overarching problems have
arisen:
? The developers of such specialized systems have encountered related problems, such as query optimization [4, 25]
or the need to support query languages such as SQL and related extensions (e.g., streaming queries [26]) as well as
language-integrated queries inspired by LINQ [33]. Without a unifying framework, having multiple engineers independently develop similar optimization logic and language support wastes engineering effort.
? Programmers using these specialized systems often have to integrate several of them together. An organization might rely on Elasticsearch, Apache Spark, and Druid. We need to build systems capable of supporting optimized queries across heterogeneous data sources [55].
Apache Calcite was developed to solve these problems. It isa complete query processing system that provides much of the common functionality—query execution, optimization, and query languages—required by any database management system, except for data storage and management, which are left to specialized engines. Calcite was quickly adopted by Hive, Drill [13], Storm, and many other data processing engines, providing them with advanced query optimizations and query languages.1 For example,Hive [24] is a popular data warehouse project built on top of ApacheHadoop. As Hive moved from its batch processing roots t owards an interactive SQL query answering platform, it became clear that the project needed a powerful optimizer at its core. Thus, Hive adopted Calcite as its optimizer and their integration has been growing since.Many other projects and products have followed suit, including Flink, MapD [12], etc.
Furthermore, Calcite enables cross-platform optimization by exposing a common interface to multiple systems. To be efficient, the optimizer needs to reason globally, e.g., make decisions across different systems about materialized view selection.
看到一系列專業方向引擎,例如列數據庫,流處理引擎,文本搜索引擎等等,將會逐步發展起來,結束“one size fits all” 的時代
隨著組織為了滿足特定的訴求開發數據處理系統時候,出現了兩個難題
● 定制系統的開發工程師幾乎遇到了相同類似的問題,例如查詢優化器[4,25],或者查詢語言(SQL[26])和拓展語言(流式查詢,集成語言的查詢LINQ[33])的支持,如果沒有一個統一的框架,許多工程師各自開發了一套相似的優化邏輯和語言支持,就會浪費大量的精力,不可復用。
● 用定制數據系統的開發者經常不得不把他們集成在一起,一個團隊也許依賴Elasticsearch, Apache Spark, and Druid,我們需要構建跨越異構數據源支持優化查詢的能力
開發Apache Calcite是為了解決以上的問題,它是一個完整的查詢處理系統,提供許多核心的通用能力,比如查詢執行器,優化器,查詢語言,它不支持數據存儲和數據管理,這些留給了定制數據系統各自去實現。
Building a common framework does not come without challenges. In particular, the framework needs to be extensible and flexible enough to accommodate the different types of systems requiring integration.We believe that the following features have contributed to Calcite’s wide adoption in the open source community and industry:
- Open source friendliness. Many of the major data processing platforms of the last decade have been either open
source or largely based on open source. Calcite is an opensource framework, backed by the Apache Software Foundation (ASF) [5], which provides the means to collaboratively develop the project. Furthermore, the software is written in Java making it easier to interoperate with many of the latest data processing systems [12, 13, 16, 24, 28, 44] that are often written themselves in Java (or in the JVM-based Scala), especially those in the Hadoop ecosystem.- Multiple data models. Calcite provides support for query optimization and query languages using both streaming
and conventional data processing paradigms. Calcite treats streams as time-ordered sets of records or events that are not persisted to the disk as they would be in conventional data processing systems.- Flexible query optimizer. Each component of the optimizer is pluggable and extensible, ranging from rules to cost
models. In addition, Calcite includes support for multiple planning engines. Hence, the optimization can be broken down into phases handled by different optimization engines depending on which one is best suited for the stage.- Cross-system support. The Calcite framework can run and optimize queries across multiple query processing systems and database backends.
- Reliability. Calcite is reliable, as its wide adoption over many years has led to exhaustive testing of the platform.
Calcite also contains an extensive test suite validating all components of the system including query optimizer rules
and integration with backend data sources.- Support for SQL and its extensions. Many systems do not provide their own query language, but rather prefer to rely
on existing ones such as SQL. For those, Calcite provides support for ANSI standard SQL, as well as various SQL dialects and extensions, e.g., for expressing queries on streaming or nested data. In addition, Calcite includes a driver conforming to the standard Java API (JDBC). The remainder is organized as follows. Section
構造一個通用框架有很多挑戰,這個框架需要是足夠靈活的,可拓展的去支持不同系統的集成需求,下面的特性幫助Calcite被廣大社區和生產環境使用:
- 友好的開源氛圍
- 多種數據模型
- 靈活的優化器
- 跨系統支持
- 可靠性
- 支持SQL語法和可拓展
The remainder is organized as follows. Section 2 discusses related work. Section 3 introduces Calcite’s architecture and its main components. Section 4 describes the relational algebra at the core of Calcite. Section 5 presents Calcite’s adapters, an abstraction to define how to read external data sources. In turn, Section 6 describes Calcite’s optimizer and its main features, while Section 7 presents the extensions to handle different query processing paradigms. Section 8 provides an overview of the data processing systems already using Calcite. Section 9 discusses possible future extensions for the framework before we conclude in Section 10.
剩余部分組織如下
第2部分:討論相關工作
第3部分:介紹Calcite的架構和它的主要組件
第4部分:介紹Calcite的核心-關系代數
第5部分:展示Calcite的適配器,定義了怎樣讀取外部數據源的一種抽象
第6部分:反過來描述Calcite優化器和它的核心功能
第7部分:處理不同查詢解析的拓展
第8部分:統計和概覽已經使用Calcite的引擎
第9部分:討論未來框架可能的功能拓展
第10部分:結論
2. 相關工作
雖然Calcite現在被很多Hadoop生態系統的大數據分析引擎使用,但是它背后許多的技術點都不是新穎的,比如
查詢優化的想法來源于
- Volcano【The Volcano Optimizer Generator- Extensibility and Efficient Search】
- Cascades【The Cascades Framework for Query Optimization】
物化重寫 - 【Optimizing Queries with Materialized Views. In Proceedings of the Eleventh International Conference on Data Engineering】
- 【Optimizing Queries Using Materialized Views: A Practical, Scalable Solution】
- 【Implementing Data Cubes Efficiently
Though Calcite is currently the most widely adopted optimizer for big-data analytics in the Hadoop ecosystem, many of the ideas that lie behind it are not novel. For instance, the query optimizer builds on ideas from the Volcano [20] and Cascades [19] frameworks,incorporating other widely used optimization techniques such as materialized view rewriting [10, 18, 22]. There are other systems that try to fill a similar role to Calcite.
?? Orca [45] is a modular query optimizer used in data management products such as Greenplum and HAWQ. Orca decouples the optimizer from the query execution engine by implementing a framework for exchanging information between the two known as Data eXchange Language. Orca also provides tools for verifying the correctness and performance of generated query plans. In contrast to Orca, Calcite can be used as a standalone query execution engine that federates multiple storage and processing backends, including pluggable planners, and optimizers.
?? Spark SQL [3] extends Apache Spark to support SQL query executionwhich can also execute queries over multiple data sourcesas in Calcite. However, although the Catalyst optimizer in Spark SQL also attempts to minimize query execution cost, it lacks the dynamic programming approach used by Calcite and risks falling
into local minima.
?? Algebricks [6] is a query compiler architecture that provides a data model agnostic algebraic layer and compiler framework for big data query processing. High-level languages are compiled to Algebricks logical algebra. Algebricks then generates an optimized job targeting the Hyracks parallel processing backend. While Calcite shares a modular approach with Algebricks, Calcite also includes a support for cost-based optimizations. In the current version of Calcite, the query optimizer architecture uses dynamic programming-based planning based on Volcano [20] with extensions for multi-stage optimizations as in Orca [45]. Though in principle Algebricks could support multiple processing backends (e.g.,Apache Tez, Spark), Calcite has provided well-tested support for
diverse backends for many years.
??Garlic [7] is a heterogeneous data management system which represents data from multiple systems under a unified object model. However, Garlic does not support query optimization across different systems and relies on each system to optimize its own queries.
??FORWARD [17] is a federated query processor that implementsa superset of SQL called SQL++ [38]. SQL++ has a semi-structured data model that integrate both JSON and relational data models whereas Calcite supports semi-structured data models by representing them in the relational data model during query planning.
FORWARD decomposes federated queries written in SQL++ into subqueries and executes them on the underlying databases according to the query plan. The merging of data happens inside the
FORWARD engine.
??Another federated data storage and processing system is Big-DAWG, which abstracts a wide spectrum of data models including relational, time-series and streaming. The unit of abstraction in
BigDAWG is called an island of information. Each island of information has a query language, data model and connects to one or more storage systems. Cross storage system querying is supported within the boundaries of a single island of information. Calcite instead provides a unifying relational abstraction which allows querying
across backends with different data models.
?? Myria is a general-purpose engine for big data analytics, with
advanced support for the Python language [21]. It produces query
plans for other backend engines such as Spark and PostgreSQL.
3. 架構
Calcite contains many of the pieces that comprise a typical database management system. However, it omits some key components, e.g., storage of data, algorithms to process data, and a repository for storing metadata. These omissions are deliberate: it makes Calcite an excellent choice for mediating between applications having one
or more data storage locations and using multiple data processing engines. It is also a solid foundation for building bespoke data processing systems.
Figure 1 outlines the main components of Calcite’s architecture. Calcite’s optimizer uses a tree of relational operators as its internal representation. The optimization engine primarily consists of three components: rules, metadata providers, and planner engines. We discuss these components in more detail in Section 6. In the figure, the dashed lines represent possible external interactions with the framework. There are different ways to interact with Calcite.
圖1大概描述了Calcite架構的主要組件。Calcite優化器使用關系表達式樹來作為它的內在表示。優化器主要由規則,元數據提供者和計劃引擎三部分組成,虛線表示外部和框架的交互,有許多不同的和Calcite交互的方式。
??First, Calcite contains a query parser and validator that can translate a SQL query to a tree of relational operators. As Calcite does not contain a storage layer, it provides a mechanism to define
table schemas and views in external storage engines via adapters(described in Section 5), so it can be used on top of these engines.
??Second, although Calcite provides optimized SQL support to systems that need such database language support, it also provides optimization support to systems that already have their own language
parsing and interpretation:
- Some systems support SQL queries, but without or with limited query optimization. For example, both Hive and Spark initially offered support for the SQL language, but they did not include an optimizer. For such cases, once the query has been optimized, Calcite can translate the relational expression back to SQL. This feature allows Calcite to work as a stand-alone system on top of any data management system
with a SQL interface, but no optimizer.- The Calcite architecture is not only tailored towards optimizing SQL queries. It is common that data processing
systems choose to use their own parser for their own query language. Calcite can help optimize these queries as well.
Indeed, Calcite also allows operator trees to be easily constructed by directly instantiating relational operators. One
can use the built-in relational expressions builder interface. For instance, assume that we want to express the following Apache Pig [41] script using the expression builder
This interface exposes the main constructs necessary for building relational expressions. After the optimization phase
is finished, the application can retrieve the optimized relational expression which can then be mapped back to the
system’s query processing unit.
第一,Calcite包含一個查詢解析器和校驗器,它能轉化一個SQL查詢到一個關系表達式樹。由于Calcite并沒有自己的存儲層,它通過適配器來表達外部存儲引擎
第二,Calcite也支持框架有自己sql解析器,之后進行優化
舉個例子,對于有自己sql解析器的框架的sql語句如下所示
可以用builder代碼轉化為calcite的 relNode關系表達式
Calcite可以對relNode進行優化,之后可以再把關系表達式再轉換為SQL
4.查詢關系代數
??Operators. Relational algebra [11] lies at the core of Calcite. In addition to the operators that express the most common data manipulation operations, such as filter, project, join etc., Calcite includes additional operators that meet different purposes, e.g., being able to concisely represent complex operations, or recognize optimization
opportunities more efficiently. For instance, it has become common for OLAP, decision making, and streaming applications to use window definitions to express complex analytic functions such as moving average of a quantity
over a time period or number or rows. Thus, Calcite introduces a window operator that encapsulates the window definition, i.e., upper and lower bound, partitioning etc., and the aggregate functions to execute on each window.
?? Traits. Calcite does not use different entities to represent logical and physical operators. Instead, it describes the physical properties associated with an operator using traits. These traits help the optimizer evaluate the cost of different alternative plans. Changing a trait value does not change the logical expression being evaluated, i.e., the rows produced by the given operator will still be the same.
?? During optimization, Calcite tries to enforce certain traits on relational expressions, e.g., the sort order of certain columns. Relational operators can implement a converter interface that indicates how to convert traits of an expression from one value to another.
?? Calcite includes common traits that describe the physical properties of the data produced by a relational expression, such as ordering, grouping, and partitioning. Similar to the SCOPE optimizer [57], the Calcite optimizer can reason about these properties and exploit them to find plans that avoid unnecessary operations. For example, if the input to the sort operator is already correctly ordered— possibly because this is the same order used for rows in the backend system—then the sort operation can be removed.
?? In addition to these properties, one of the main features of Calcite is the calling convention trait. Essentially, the trait represents the data processing system where the expression will be executed. Including the calling convention as a trait allows Calcite to meet its goal of optimizing transparently queries whose execution might
span over different engines i.e., the convention will be treated as any other physical property.
?? For example, consider joining a Products table held in MySQL to an Orders table held in Splunk (see Figure 2). Initially, the scan of Orders takes place in the splunk convention and the scan of Products is in the jdbc-mysql convention. The tables have to be scanned inside their respective engines. The join is in the logical convention,
meaning that no implementation has been chosen. Moreover, the SQL query in Figure 2 contains a filter (where clause) which is pushed into splunk by an adapter-specific rule (see Section 5). One possible implementation is to use Apache Spark as an external engine: the join is converted to spark convention, and its inputs are converters from jdbc-mysql and splunk to spark convention. But there is a more efficient implementation: exploiting the fact that Splunk can perform lookups into MySQL via ODBC, a planner rule pushes the join through the splunk-to-spark converter, and the join is now in splunk convention, running inside the Splunk engine.
4.1 Operators
關系代數[11]是Calcite的核心,Calcite不僅僅包含可以表示大部分通用數據操作的標識符(例如filter, project, join等),還包括額外的操作符,這些操作符可以滿足不同的目的,比如簡明的表達復雜操作,或者辨識更有效的優化機會
4.2 Traits
Calcite沒有使用不同的實體來表示邏輯和物理操作符,替代的,它用帶有Traits的操作符來描述物理屬性,這些特性幫助優化器來評估不同替代計劃的代價。改變一個特質的值不會改變邏輯表達式。例如這些由操作符產生的行依然不變。
RelTrait 與 RelTraitDef
RelTrait 表示 RelNode 的一種性質,用于指定物理執行屬性,比如是否需要排序,數據的分布(distribution),它使用時由 RelTraitDef 來定義,目前分為三類:
ConventionTraitDef,表示由何種數據處理引擎處理,對應的 RelTrait 類型為 Convention,邏輯執行計劃中,其值默認為 org.apache.calcite.plan.Convention#NONE,物理執行計劃的對應的值在優化前,通過方法 org.apache.calcite.plan.RelOptPlanner#changeTraits 指定,Calcite 已經定義的有 EnumerableConvention,BindableConvention,JdbcConvention 等。如果為 EnumerableConvention,那么生成的物理執行計劃將由 Calcite 的 linq4j 引擎執行,此外每種 Convention 都對應具體的關系表達式的轉換規則。
RelCollationTraitDef,表示排序規則的定義,對應的 RelTrait 為 RelCollation。比如對于排序表達式(也稱算子) org.apache.calcite.rel.core.Sort,就存在一個 RelCollation 類型的屬性 collation。
RelDistributionTraitDef,表示數據在物理存儲上的分布方式,對應的 RelTrait 為 RelDistribution。
另外,對于每個 RelNode 對象,都會有 RelTraitSet,這是 RelTrait 的一個有序集合,RelNode 的 RelTrait 都是保存在該集合中的。
在優化過程中,Calcite嘗試實施確定的特質在關系表達式上,例如確定列的排序順序,關系表達式可以實現
convert接口來指示怎樣把一個表達式的特質轉換為另一個。
??上面的查詢描述了ConventionTrait,掃描Orders表發生在splunk convention,掃描Products表發生在jdbc-mysql convention,各自引擎進行掃描。
??一個可能的實現就是用Apache Spark作為一個外部引擎,這個連接join被轉化為spark convention,join的輸入為jdbc-mysql和splunk.
??但是也有一個更有效的實現。利用Splunk可以使用ODBC訪問MySQL的原理,使用splunk convention查詢。再通過splunk-to-spark converter運行在Splunk引擎。
5.適配器
An adapter is an architectural pattern that defines how Calcite incorporates diverse data sources for general access. Figure 3 depicts its components. Essentially, an adapter consists of a model, a schema, and a schema factory. The model is a specification of the physical properties of the data source being accessed. A schema is
the definition of the data (format and layouts) found in the model.The data itself is physically accessed via tables. Calcite interfaces with the tables defined in the adapter to read the data as the query is being executed. The adapter may define a set of rules that are added to the planner. For instance, it typically includes rules to convert various types of logical relational expressions to the corresponding relational expressions of the adapter’s convention. The schema factory component acquires the metadata information from the model and generates a schema.
圖三 Calcite數據源適配器設計
As discussed in Section 4, Calcite uses a physical trait known as the calling convention to identify relational operators which correspond to a specific database backend. These physical operators implement the access paths for the underlying tables in each adapter.When a query is parsed and converted to a relational algebra expression,
an operator is created for each table representing a scan of the data on that table. It is the minimal interface that an adapter must implement. If an adapter implements the table scan operator, the Calcite optimizer is then able to use client-side operators such as sorting, filtering, and joins to execute arbitrary SQL queries against these tables.
?This table scan operator contains the necessary information theadapter requires to issue the scan to the adapter’s backend database.To extend the functionality provided by adapters, Calcite defines an enumerable calling convention. Relational operators with the enumerable calling convention simply operate over tuples via an iterator interface. This calling convention allows Calcite to implement operators which may not be available in each adapter’s
backend. For example, the EnumerableJoin operator implements joins by collecting rows from its child nodes and joining on the desired attributes.
?For queries which only touch a small subset of the data in a table, it is inefficient for Calcite to enumerate all tuples. Fortunately, the same rule-based optimizer can be used to implement adapter-specific rules for optimization. For example, suppose a query involves filtering and sorting on a table. An adapter which can perform filtering on the backend can implement a rule which matches a LogicalFilter and converts it to the adapter’s calling convention. This rule converts the LogicalFilter into another Filter instance. This new Filter node has a lower associated cost that allows Calcite to optimize queries across adapters.
?The use of adapters is a powerful abstraction that enables not only optimization of queries for a specific backend, but also across multiple backends. Calcite is able to answer queries involving tables across multiple backends by pushing down all possible logic to each backend and then performing joins and aggregations on the resulting data. Implementing an adapter can be as simple as providing a table scan operator or it can involve the design of many advanced optimizations. Any expression represented in the relational algebra can be pushed down to adapters with optimizer rules.
??本質上,適配器由模型,模式,模式工廠三部分組成,模型是接觸的數據源物理屬性的明確說明,在calcite中它是一個json配置文件,模式是在模型中發現的數據(格式和布局)的定義。
這部分可以看下源碼demo
??對于只能接觸一個表中小部分數據的查詢,對于Calcite來說迭代查詢所有元組是低效,基于規則的優化器可以用來實現適配器定制的規則來用于優化。例如,假設對于一張表執行排序和過濾查詢,
可以針對后臺數據庫實行過濾的適配器可以實現一個規則,這個規則的觸發條件是匹配到LogicalFilter,之后將它轉換成適配器的calling convention,這個規則把LogicalFilter轉換成另外的過濾器實例,這個新的過濾器實例擁有更小的關聯代價,允許Calcite跨優化器執行優化(條件下推)。
6.查詢過程和優化器
The query optimizer is the main component in the framework.Calcite optimizes queries by repeatedly applying planner rules to a relational expression. A cost model guides the process, and the planner engine tries to generate an alternative expression that has the same semantics as the original but a lower cost.
Every component in the optimizer is extensible. Users can add relational operators, rules, cost models, and statistics.
??查詢優化器是框架里的核心組件,Calcite通過對一個關系表達式不斷地執行計劃規則來優化查詢,一個代價估算的模型指引優化過程,計劃引擎嘗試生成一個可以替代的關系表達式,擁有和原關系表達式一樣的語義和更低的代價。每一個組件都是可以拓展的,用戶可以增加過站關系表達式操作符,規則,代價模型和統計信息。
6.1 計劃規則(Planner rules)
??Calcite includes a set of planner rules to transform expression trees. In particular, a rule matches a given pattern in
the tree and executes a transformation that preserves semantics of that expression. Calcite includes several hundred optimization rules. However, it is rather common for data processing systems relying on Calcite for optimization to include their own rules to allow specific rewritings.
??For example, Calcite provides an adapter for Apache Cassandra [29], a wide column store which partitions data by a subset of columns in a table and then within each partition, sorts rows based on another subset of columns. As discussed in Section 5, it is beneficial for adapters to push down as much query processing as
possible to each backend for efficiency. A rule to push a Sort into Cassandra must check two conditions:
??(1) the table has been previously filtered to a single partition
(since rows are only sorted within a partition) and
??(2) the sorting of partitions in Cassandra has some commonprefix with the required sort.
??This requires that a LogicalFilter has been rewritten to aCassandraFilter to ensure the partition filter is pushed down to the database. The effect of the rule is simple (convert a LogicalSort into a CassandraSort) but the flexibility in rule matching enables backends to push down operators even in complex scenarios.
??For an example of a rule with more complex effects, consider the following query:
??The query corresponds to the relational algebra expression presented in Figure 4a. Because the WHERE clause only applies to the sales table, we can move the filter before the join as in Figure 4b. This optimization can significantly reduce query execution time since we do not need to perform the join for rows which do match the predicate. Furthermore, if the sales and products tables were contained in separate backends, moving the filter before
the join also potentially enables an adapter to push the filter into the backend. Calcite implements this optimization via
FilterIntoJoinRule which matches a filter node with a join node as a parent and checks if the filter can be performed by the join. This optimization illustrates the flexibility of the Calcite approach to optimization.
這個查詢可以表示為圖4a
圖4 FilterIntoJoinRule 應用
??Calcite包含了一組計劃規則可以轉換關系表達式樹。對于符合關系表達式樹的規則,會對關系表達式書進行轉換,轉換會保持原語義。Calcite包含幾百個優化規則。對于基于Calcite優化器的數據庫系統(dremio)可以包含自己特有的優化規則來實現重寫.
例如,Calcite為Apache Cassandra提供一個適配器,Apache Cassandra是一個多列存儲引擎,通過一組表中的列進行分區,在每個分區內,行根據列組進行排序。就像在第5部分討論的,對于適配器來說,下推盡可能多的查詢處理到背后的數據引擎是很有效的。下推Sort到Cassandra必須滿足兩個條件:
(1)這個表已經過濾出來一個單獨的分區,因為rows排序只在每個分區內有效
(2)Cassandra中分區的排序根據需要的排序規則有一些通用的前綴
這就需要LogicalFilter被重寫成CassandraFilter來確保分區的filter已經下推到數據庫。這個規則的影響是簡單的(轉換LogicalSort到CassandraSort),靈活的規則命中機制可以確保在復雜的情形中依然可以下推操作符到后臺數據庫。
??因為WHER從句僅僅作用于sales表,我們可以移動這個filter到join之前,就像圖4表達的那樣。這個優化可以顯著地減少查詢時間,因為我們可以沒有必要針對不符合斷言的數據進行連接操作。此外,如果sales和products表屬于不同的后臺數據庫,允許適配器將過濾下推到后臺服務器。Calcite通過FilterIntoJoinRule規則實現了這項優化。規則的命中機制是關系表達式節點的父節點是join節點和檢查join是否可以應用于filter。上面這個優化例子說明了Calcite優化的靈活性。
6.2 元數據提供者(Metadata providers)
Metadata is an important part of Calcite’s optimizer, and it serves two main purposes: (i) guiding the planner
towards the goal of reducing the cost of the overall query plan, and (ii) providing information to the rules while they are being applied.
元數據對于Calcite的優化器來說是一個很重要的部分,他有兩個主要目的
(1)指引計劃朝著減少代價的目標進行
(2)為規則的執行提供必要的信息
??Metadata providers are responsible for supplying that information to the optimizer. In particular, the default metadata providers implementation in Calcite contains functions that return the overall cost of executing a subexpression in the operator tree, the number of rows and the data size of the results of that expression, and the maximum degree of parallelism with which it can be executed. In turn, it can also provide information about the plan structure, e.g., filter conditions that are present below a certain tree node.
??Calcite provides interfaces that allow data processing systems toplug their metadata information into the framework. These systems may choose to write providers that override the existing functions, or provide their own new metadata functions that might be used during the optimization phase. However, for many of them, it is
sufficient to provide statistics about their input data, e.g., number of rows and size of a table, whether values for a given column are unique etc., and Calcite will do the rest of the work by using its default implementation.
As the metadata providers are pluggable, they are compiled and instantiated at runtime using Janino [27], a Java lightweight compiler. Their implementation includes a cache for metadata results, which yields significant performance improvements, e.g., when we need to compute multiple types of metadata such as cardinality,\ average row size, and selectivity for a given join, and all these computations rely on the cardinality of their inputs.
??元數據提供者負責給優化器提供必要的信息。
6.3 計劃引擎(Planner engines)
The main goal of a planner engine is to trigger the rules provided to the engine until it reaches a given objective. At
the moment, Calcite provides two different engines. New engines are pluggable in the framework.
計劃引擎的主要目的是觸發提供給引擎的規則直到達到了優化目標。于此,Calcite提供了兩種不同的引擎
6.3.1 基于代價的引擎(CBO)
The first one, a cost-based planner engine, triggers the input ruleswith the goal of reducing the overall expression cost. The engineuses a dynamic programming algorithm, similar to Volcano [20],to create and track different alternative plans created by firing the rules given to the engine. Initially, each expression is registered with the planner, together with a digest based on the expression attributes and its inputs. When a rule is fired on an expression e1
and the rule produces a new expression e2, the planner will add e2 to the set of equivalence expressions Sa that e1 belongs to. In addition, the planner generates a digest for the new expression, which is compared with those previously registered in the planner. If a similar digest associated with an expression e3 that belongs to a set Sb is found, the planner has found a duplicate and hence will merge Sa and Sb into a new set of equivalences. The process
continues until the planner reaches a configurable fix point. In particular, it can (i) exhaustively explore the search space until all rules have been applied on all expressions, or (ii) use a heuristicbased approach to stop the search when the plan cost has not improved by more than a given threshold δ in the last planner iterations. The cost function that allows the optimizer to decide which plan to choose is supplied through metadata providers. The default cost function implementation combines estimations for CPU, IO, and memory resources used by a given expression.
觸發規則,這些規則的目標是減少全面表達式的代價。使用類似于Volcano的動態規劃算法,通過激活的引擎規則創建和追蹤不同可替代的計劃。默認的代價函數包含一個表達式使用的CPU,IO,內存資源.
6.3.2 啟發式優化器(RBO)
The second engine is an exhaustive planner, which triggers rules exhaustively until it generates an expression that is no longer modified by any rules. This planner is useful to quickly execute rules without taking into account the cost of each expression.
窮舉地觸發規則,直到它生成一個表達式,表達式再也不能被規則修改
Users may choose to use one of the existing planner engines depending on their concrete needs, and switching from one to another, when their system requirements change, is straightforward. Alternatively, users may choose to generate multi-stage optimization logic, in which different sets of rules are applied in consecutive phases of the optimization process. Importantly, the existence of two planners allows Calcite users to reduce the overall optimization
time by guiding the search for different query plans.
6.4 物化結果
??One of the most powerful techniques to accelerate query processing in data warehouses is the precomputation of
relevant summaries or materialized views. Multiple Calcite adapters and projects relying on Calcite have their own notion of materialized views. For instance, Cassandra allows the user to define materialized views based on existing tables which are automatically maintained by the system.
??These engines expose their materialized views to Calcite. The optimizer then has the opportunity to rewrite incoming queries to use these views instead of the original tables. In particular, Calcite provides an implementation of two different materialized view based rewriting algorithms.
??The first approach is based on view substitution [10, 18]. The aim is to substitute part of the relational algebra tree with an equivalent expression which makes use of a materialized view, and the algorithm proceeds as follows: (i) the scan operator over the materialized view and the materialized view definition plan are registered with the planner, and (ii) transformation rules that try to unify expressions in the plan are triggered. Views do not need to exactly match expressions in the query being replaced, as the rewriting algorithm in Calcite can produce partial rewritings that include additional operators to compute the desired expression, e.g., filters with residual predicate conditions.
??The second approach is based on lattices [22]. Once the data sources are declared to form a lattice, Calcite represents each of the materializations as a tile which in turn can be used by the optimizer
to answer incoming queries. On the one hand, the rewriting algorithm is especially efficient in matching expressions over data sources organized in a star schema, which are common in OLAP applications. On the other hand, it is more restrictive than view substitution, as it imposes restrictions on the underlying schema.
數據倉庫處理查詢最有效的加速技術之一就是預計算相關聚合結果或者是結果物化。多種Calcite適配器或者基于Calcite的項目有自己的物化視圖概念。
Calcite提供了兩種物化視圖重寫算法的實現。
(1)基于view substitution。這個實現的目的就是使用等效的物化視圖替換部分關系表達式樹。
(2)基于lattices,當數據源被聲明來源于lattice,Calcite可以使用tile表達每個物化結果。這些tile可以被優化器使用響應查詢。一方面,重寫算法對于星型查詢的數據源的表達式依然有效。
7.Calcite拓展
As we have mentioned in the previous sections, Calcite is not only tailored towards SQL processing. In fact, Calcite provides extensions to SQL expressing queries over other data abstractions, such as semistructured,
streaming and geospatial data. Its internal operators adapt to these queries. In addition to extensions to SQL, Calcite also includes a language-integrated query language. We describe these extensions throughout this section and provide some examples.
像我們在前面章節里提到的,Calcite對于SQL處理不是定制的。事實上,Calcite通過數據抽象提供了SQL查詢語言拓展,這些抽象包含半結構化數據,流式,和地理位置數據。Calcite內部的操作符可以適配這些查詢。除了對SQL查詢進行拓展,Calcite也可以包含整合語言的查詢語言,在下面描述這些拓展和舉例。
7.1 半結構化數據
Calcite supports several complex column data types that enable a hybrid of relational and semi-structured data to be stored in tables. Specifically, columns can be of type ARRAY, MAP, or MULTISET. Furthermore, these complex types can be nested so it is possible for example, to have a MAP where the values are of type ARRAY. Data
within the ARRAY and MAP columns (and nested data therein) can be extracted using the [] operator. The specific type of values stored in any of these complex types need not be predefined.
??For example, Calcite contains an adapter for MongoDB [36], a document store which stores documents consisting of data roughly equivalent to JSON documents. To expose MongoDB data to Calcite, a table is created for each document collection with a single column named _MAP: a map from document identifiers to their data. In many cases, documents can be expected to have a common structure. A collection of documents representing zip codes may each contain columns with a city name, latitude and longitude. It can be useful to expose this data as a relational table. In Calcite, this is achieved by creating a view after extracting the desired values and casting
them to the appropriate type:
With views over semi-structured data defined in this manner, it becomes easier to manipulate data from different semi-structured sources in tandem with relational data.
Calcite支持幾種復雜的列數據類型,可以支持存儲在表里的關系和半結構化的數據的混合。數據類型可以是ARRAY, MAP, or MULTISET,這些復雜的類型可是嵌套的,例如可以擁有MAP,它的值是ARRAY.在ARRAY和MAP列中的數據可以使用[]操作符提取出來。
7.2 流式數據
Calcite provides first-class support for streaming queries [26] based on a set of streaming-specific extensions to standard SQL, namely STREAM extensions, windowing extensions, implicit references to streams via window expressions in joins, and others. These extensions were inspired by the Continuous Query Language [2]
while also trying to integrate effectively with standard SQL. The primary extension, the STREAM directive tells the system that the user is interested in incoming records, not existing ones.
??In the absence of the STREAM keyword when querying a stream, the query becomes a regular relational query, indicating the system should process existing records which have already been received from a stream, not the incoming ones.
??Due to the inherently unbounded nature of streams, windowing is used to unblock blocking operators such as aggregates and joins. Calcite’s streaming extensions use SQL analytic functions to express sliding and cascading window aggregations, as shown in the following example.
Tumbling, hopping and session windows2 are enabled by the TUMBLE, HOPPING, SESSION functions and related utility functions such as TUMBLE_END and HOP_END that can be used respectively in GROUP BY clauses and projections.
??Streaming queries involving window aggregates require the presence of monotonic or quasi-monotonic expressions in the GROUP BY clause or in the ORDER BY clause in case of sliding and cascading
window queries. Streaming queries which involve more complex stream-to-stream joins can be expressed using an implicit (time) window expression in the JOIN clause.
In the case of an implicit window, Calcite’s query planner validates that the expression is monotonic.
Calcite通過對于標準SQL的拓展對于流式查詢提供了很好的支持,這些拓展包含STREAM拓展,窗口函數拓展,通過joins里窗口表達式隱式使用流式。這些拓展的靈感來源于Continuous Query Language(CQL),最主要的拓展STREAM指令告訴系統用戶操作流入的數據,而不是存在的數據。
7.3 地理數據查詢
Geospatial support is preliminary in Calcite, but is being implemented using Calcite’s relational algebra. The core of this implementation consists in adding a new GEOMETRY data type which encapsulates different geometric objects such as points, curves, and polygons. It is expected that Calcite will be fully compliant with the OpenGIS Simple Feature Access [39] specification which defines a standard for SQL interfaces to access geospatial data. An example
query finds the country which contains the city of Amsterdam:
地理位置查詢在Calcite中初步支持,使用Calcite的關系代數來支持,這個實現的核心部分在于添加了一個新的數據類型GEOMETRY,這個類型可以表示不同的地理幾何對象,比如點,曲線,多邊形等,Calcite將完全遵從OpenGIS Simple Feature Access的標準定義。
7.4 整合語言查詢
Calcite can be used to query multiple data sources, beyond just relational databases. But it also aims to support more than just the SQL language. Though SQL remains the primary database language,
many programmers favour language-integrated languages like LINQ [33]. Unlike SQL embedded within Java or C++ code, language-integrated query languages allow the programmer to write all of her code using a single language. Calcite provides Language-Integrated Query for Java (or LINQ4J, in short) which closely follows the convention set forth by Microsoft’s LINQ for the .NET languages.
Calcite可以用來查詢多種數據源,不僅僅是關系型數據庫。它不僅僅用來支持SQL語言。雖然SQL語言依然是數據庫中的主要語言,需要稱需要傾向于LINQ(Language-Integrated Query)的查詢語言,不像SQL語言,LINQ可以集成java或者C++代碼,整合語言的查詢語言允許開發者使用一種語言來編寫代碼。Calcite支持Language-Integrated Query for Java (LINQ4J),遵從微軟為.NET語言設計的LINQ的語法
8.工業和學術應用
Calcite enjoys wide adoption, specially among open-source projects used in industry. As Calcite provides certain integration flexibility, these projects have chosen to either (i) embed Calcite within their core, i.e., use it as a library, or (ii) implement an adapter to allow Calcite to federate query processing. In addition, we see a growing interest in the research community to use Calcite as the cornerstone of the development of data management projects. In the following, we describe how different systems are using Calcite.
Calcite被廣泛使用,尤其是在開源的工業級項目中。因為Calcite提供了集成的靈活性,項目集成Calcite成他們的核心模塊,或者使用它作為庫,或者實現了一個適配器來允許Calcite進行聯邦查詢。另外,我們看到研究型社區使用Calcite作為數據引擎的基礎模塊,下面我們來描述下不同的系統是怎樣使用Calcite的
8.1 集成calcite
表1 集成Calcite的應用列表
Table 1 provides a list of software that incorporates Calcite as a library, including (i) the query language interface that they expose to users, (ii) whether they use Calcite’s JDBC driver (called Avatica),
(iii) whether they use the SQL parser and validator included in Calcite, (iv) whether they use Calcite’s query algebra to represent their operations over data, and (v) the engine that they rely on for execution, e.g., their own native engine, Calcite’s operators (referred to as enumerable), or any other project.
??Drill [13] is a flexible data processing engine based on the Dremelsystem [34] that internally uses a schema-free JSON document datamodel. Drill uses its own dialect of SQL that includes extensions to express queries on semi-structured data, similar to SQL++ [38].
??Hive [24] first became popular as a SQL interface on top of the MapReduce programming model [52]. It has since moved towards being an interactive SQL query answering engine, adoptingCalcite as its rule and cost-based optimizer. Instead of relying onCalcite’s JDBC driver, SQL parser and validator, Hive uses its own implementation of these components. The query is then translated into Calcite operators, which after optimization are translated into Hive’s physical algebra. Hive operators can be executed by multiple engines, the most popular being Apache Tez [43, 51] and Apache Spark [47, 56].
??Apache Solr [46] is a popular full-text distributed search platform built on top of the Apache Lucene library [31]. Solr exposes multiple query interfaces to users, including REST-like HTTP/XML and JSON APIs. In addition, Solr integrates with Calcite to provide SQL compatibility.
?? Apache Phoenix [40] and Apache Kylin [28] both work on top of Apache HBase [23], a distributed key-value store modeled after Bigtable [9]. In particular, Phoenix provides a SQL interface and orchestration layer to query HBase. Kylin focuses on OLAP-style
??SQL queries instead, building cubes that are declared as materializedviews and stored in HBase, and hence allowing Calcite’s optimizer to rewrite the input queries to be answered using those cubes. In Kylin, query plans are executed using a combination of Calcite native operators and HBase.
??Recently Calcite has become popular among streaming systems too. Projects such as Apache Apex [1], Flink [16], Apache Samza [44], and Storm [50] have chosen to integrate with Calcite, using its components to provide a streaming SQL interface to their users. Finally, other commercial systems have adopted Calcite,
表1提供了一個集成Calcite作為庫的軟件列表,使用方面包括
(1)暴露給用戶的查詢接口
(2)使用Calcite的 JDBC driver(avatica)
(3)使用Calcite的sql解析和校驗器
(4)使用Calcite的關系代數表達式
(5)執行引擎,例如他們自己的本地引擎,使用Calcite的關系表達式或者其他的項目
8.2 Calcite適配器
Instead of using Calcite as a library, other systems integrate with Calcite via adapters which read their data sources. Table 2 provides the list of available adapters in Calcite. One of the main key components of the implementation of these adapters is the converter responsible for translating the algebra expression to be pushed to the system into the query language supported by that system. Table 2 also shows the languages that Calcite translates into for
its own SQL-like language called CQL whereas the adapter for Apache Pig [41] generates queries expressed in Pig Latin [37]. The adapter for Apache Spark [47] uses the Java RDD API. Finally, Druid [14], Elasticsearch [15] and Splunk [48] are queried through REST HTTP API requests. The queries generated by Calcite for these systems are expressed in JSON or XML.
each of these adapters.
??The JDBC adapter supports the generation of multiple SQL dialects, including those supported by popular RDBMSes such as PostgreSQL and MySQL. In turn, the adapter for Cassandra [8] generates
表2 Calcite適配器列表
對于集成Calcite作為庫,有些系統選擇使用Calcite作為適配器來讀取他們的數據源。
8.3 學術研究應用
In a research setting, Calcite has been considered [54] as a polystorealternative for precision medicine and clinical analysis scenarios. In those scenarios, heterogeneous medical data has to be logically assembled and aligned to assess the best treatments based on the comprehensive medical history and the genomic profile of the patient.
The data comes from relational sources representing patients’ electronic medical records, structured and semi-structured sources representing various reports (oncology, psychiatry,laboratory tests, radiology, etc.), imaging, signals, and sequence data, stored in scientific databases. In those circumstances, Calcite represents a good
foundation with its uniform query interface, and flexible adapter architecture, but the ongoing research efforts are aimed at (i) introduction of the new adapters for array, and textual sources, and (ii) support efficient joining of heterogeneous data sources.
9.未來工作
The future work on Calcite will focus on the development of the new features, and the expansion of its adapter architecture:
- Enhancements to the design of Calcite to further support its use a standalone engine, which would require a support for data definition languages (DDL), materialized views, indexes and constraints.
- Ongoing improvements to the design and flexibility of the planner, including making it more modular, allowing users Calcite to supply planner programs (collections of rules organized into planning phases) for execution.
- Incorporation of new parametric approaches [53] into the design of the optimizer.
- Support for an extended set of SQL commands, functions, and utilities, including full compliance with OpenGIS.
- New adapters for non-relational data sources such as array databases for scientific computing.
- Improvements to performance profiling and instrumentation.
Calcite未來的工作將聚焦在拓展新功能和適配器架構的拓展:
● Calcite增強設計來支持它作為一個獨立引擎來使用,支持DDL,物化視圖,索引和約束
● 持續提升Calcite計劃引擎的靈活性,包括使它更加模塊化,允許用戶提供計劃引擎programs(計劃階段的一組規則)
● 引入多參數并行化優化器的設計思想到優化器的設計
● 支持符合OpenGIS標準的SQL命令,函數和基礎設施
● 對于非關系型數據源的適配器支持,例如array數據庫和科學計算
● 改進性能,資料收集和圖表化
9.1 性能測試和評估
Though Calcite contains a performance testing module, it does not evaluate query execution. It would be useful to assess the performance of systems built with Calcite. For example, we could compare the performance of Calcite with similar frameworks. Unfortunately, it might be difficult to craft fair comparisons. For example, like Calcite, Algebricks optimizes queries for Hive. Borkar et al. [6] compared Algebricks with the Hyracks scheduler against Hive version 0.12 (without Calcite). The work of Borkar et al. precedes significant engineering and architectural changes into Hive. Comparing Calcite against Algebricks in a fair manner in terms of timings does not seem feasible, as one would need to ensure that each uses the same execution engine. Hive applications rely mostly on either Apache Tez or Apache Spark as execution engines whereas Algebricks is tied to its own framework (including Hyracks).
??Moreover, to assess the performance of Calcite-based systems, we need to consider two distinct use cases. Indeed, Calcite can be used either as part of a single system—as a tool to accelerate the construction of such a system—or for the more difficult task of combining several distinct systems—as a common layer. The former is tied to the characteristics of the data processing system, and because Calcite is so versatile and widely used, many distinct benchmarks are needed. The latter is limited by the availability of existing heterogeneous benchmarks. BigDAWG [55] has been used to integrate PostgreSQL with Vertica, and on a standard benchmark, one gets that the integrated system is superior to a baseline where entire tables are copied from one system to another to answer specific queries. Based on real-world experience, we believe that more ambitious goals are possible for integrated multiple systems: they should be superior to the sum of their parts.
10.結論
Emerging data management practices and associated analytic uses of data continue to evolve towards an increasingly diverse, and heterogeneous spectrum of scenarios. At the same time, relational data sources, accessed through the SQL, remain an essential means to how enterprises work with the data. In this somewhat dichotomous
space, Calcite plays a unique role with its strong support for both traditional, conventional data processing, and for its support of other data sources including those with semi-structured, streaming and geospatial models. In addition, Calcite’s design philosophy with a focus on flexibility, adaptivity, and extensibility, has been another factor in Calcite becoming the most widely adopted query optimizer, used in a large number of open-source frameworks. Calcite’s dynamic and flexible query optimizer, and its adapter architecture allows it to be embedded selectively by a variety of data management frameworks such as Hive, Drill, MapD, and Flink. Calcite’s support for heterogeneous data processing, as well as for the extended set of relational functions will continue to improve, in both functionality and performance.