First thing to understand is why to use Presto or Trino.
- We had been running two clusters specifically Hortonworks (HDP) variant & Cloudera (CDP) variant.
- Hive Tables built on HDP were mostly ORC whereas Tables that existed for us on CDP were mostly Parquet.
- We wanted to add ad-hoc querying functionality to our cluster. And, we came across Apache Impala as an excellent tool for this purposes.
- Only CDP supported Apache Impala.
- Impala had limitation to work with Parquet, Kudu, HBase. Before CDP 6.* there was no support for ORC file format with Impala.
- Thus, we came to know about PrestoDB, which was built at Facebook, and was an excellent distributed SQL Engine for ad-hoc querying.
- It not only supported ORC but has connectors for multiple data sources.
- Developed at Facebook ( 2012)
- Supported by Presto Foundation establish by Linux Foundation (2019)
- Original Developers & Linux Foundation get into conflict on naming & branding.
- Did a hard Fork of PrestoSQL, rebranded it as Trino ( Dec 2020)
- Supported by non profit org - Trino Software Foundation.
- Now, there are 2 separate variants- PrestoDB & Trino.
- And, certainly have different vision(s).
Presto |
Trino |
Apache License 2.0, supported by The Presto Foundation
hosted by Linux Foundation |
Apache License 2.0 and supported by the Trino Software
Foundation. |
Presto on YARN – https://prestodb.io/presto-yarn/ Apache Slider was supported by HDP but not by CDP https://www.cloudera.com/products/open-source/apache-hadoop/apache-slider.html |
Trino on YARN abandoned. |
https://ahana.io/presto-vs-trino/
- Trino not used at Facebook |
False claims of Trino being used at Facebook |
PrestoDB still leading GitHub Stars |
PrestoSQL/ Trino matching up with PrestoDB |
Ahana still part of Presto Foundation and supporting
PrestoDB |
Starburst is also member of Presto Foundation and
managing conformance program with other members, to produce enterprise-grade
distributions for Presto, which they develop from Trino. But, they still suggest that its same software - https://www.starburst.io/blog/prestosql-becomes-trino/ |
Less inclined towards creating new connectors. Refer - https://prestodb.io/docs/current/connector.html |
Seemingly, Trino is more inclined towards creating new
connectors, like they already have Atop, Ignite, Kinesis, SingleStore
connector which are not there in PrestoDB Refer - https://trino.io/docs/current/connector.html |
Presto has worked towards performance Gains, as listed below – Aria - push down entire expressions to the data source for
some file formats like ORC Presto Unlimited - create temporary in-memory bucketed
tables dynamic SQL functions Presto-on-Spark to get ETL Fault Tolerance. RaptorX Project for Caching Disaggregated Coordinator for scaling horizontally. There are more features which are developed by Ahana - https://ahana.io/presto-vs-trino/ |
Trino Lacks these developments of Presto. But, may be
supported by Enterprise Starburst |
Reliability and scalability.
- We were no interested in new connectors, or docker / cloud at this moment. Our interest were with performance gains like RaptorX caching, Aria scan and predicate pushdown, and Presto on Spark ( for reliability and fault tolerance )
- PrestoDB is hosted by Linux Foundation, giving confidence to us on usage.
Comments
Post a Comment