Actian Vector Explained

Actian Vector
Developer:Actian Corporation
Latest Release Version:Vector 7.0
Latest Release Date:[1]
Operating System:Cross-platform
Genre:RDBMS
License:Proprietary

Actian Vector (formerly known as VectorWise) is an SQL relational database management system designed for high performance in analytical database applications.[2] It published record breaking results on the Transaction Processing Performance Council's TPC-H benchmark for database sizes of 100 GB, 300 GB, 1 TB and 3 TB on non-clustered hardware.[3] [4] [5] [6]

Vectorwise originated from the X100 research project carried out within the Centrum Wiskunde & Informatica (CWI, the Dutch National Research Institute for Mathematics and Computer Science) between 2003 and 2008.It was spun off as a start-up company in 2008, and acquired by Ingres Corporation in 2011.[7] It was released as a commercial product in June, 2010,[8] [9] [10] [11] initially for 64-bit Linux platform, and later also for Windows.Starting from 3.5 release in April 2014, the product name was shortened to "Vector".[12] In June 2014, Actian Vortex was announced as a clustered massive parallel processing version of Vector, in Hadoop with storage in HDFS.[13] [14] Actian Vortex was later renamed to Actian Vector in Hadoop.

Technology

The basic architecture and design principles of the X100 engine of the VectorWise database were well described in two Phd theses of VectorWise founders Marcin Żukowski: "Balancing Vectorized Query Execution with Bandwidth-Optimized Storage"[15] and Sandor Héman: "Updating Compressed Column Stores",[16] under supervision of another founder, professor Peter Boncz. The X100 engine was integrated with Ingres SQL front-end, allowing the database to use the Ingres SQL syntax, and Ingres set of client and database administration tools.[17]

The query execution architecture makes use of "Vectorized Query Execution" processing in chunks of cache-fitting vectors of data. This allows to involve the principles of vector processing and single instruction, multiple data (SIMD) to perform the same operation on multiple data simultaneously and exploit data level parallelism on modern hardware. It also reduces overheads found in traditional "row-at-a-time processing" found in most RDBMSes.

The database storage is in a compressed column-oriented format,[18] with scan-optimised buffer manager. In Actian Vortex in HDFS the same proprietary format is used.

Loading big amounts of data is supported through direct appends to stable storage, while small transactional updates are supported through patent-pending[19] Positional Delta Trees (PDTs)[20] specialized B-tree-like structures of indexed differences on top of stable storage, which are seamlessly patched during scans, and which are transparently propagated to stable storage in a background process. The method of storing differences in patch-like structures and rewriting the stable storage in bulk made it possible to work in a filesystem like HDFS, in which files are append-only.

History

A comparative Transaction Processing Performance Council TPC-H performance test of MonetDB carried out by its original creator at Centrum Wiskunde & Informatica (CWI) in 2003 showed room for improvement in its performance as an analytical database. As a result, CWI researchers proposed a new architecture using pipelined query processing ("vectorised processing") to improve the performance of analytical queries. This led to the creation of the "X100" project, with the intention of designing a new kernel for MonetDB, to be called "MonetDB/X100".[21] [22]

The X100 project team won the 2007 DaMoN Best Paper Award for the paper "Vectorized Data Processing on the Cell Broadband Engine"[23] [24] as well as the 2008 DaMoN Best Paper Award for the paper "DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing".[25] [26]

In August 2009 the originators for the X100 project won the "Ten Year Best Paper Award" at the 35th International Conference on Very Large Data Bases (VLDB) for their 1999 paper "Database architecture Optimized for the new bottleneck: Memory access". It was recognised by the VLDB that the project team had made great progress in implementing the ideas contained in the paper over the previous 10 years.[27] The central premise of the paper is that traditional relational database systems were designed in the late 1970s and early 1980s during a time when database performance was dictated by the time required to read from and write data to hard disk. At that time available CPU was relatively slow and main memory was relatively small, so that very little data could be loaded into memory at a time. Over time hardware improved, with CPU speed and memory size doubling roughly every two years in accordance with Moore’s law, but that the design of traditional relational database systems had not adapted. The CWI research team described improvements in database code and data structures to make best use of modern hardware.[28]

In 2008 the X100 project was spun off from MonetDB as a separate project, with its own company, and renamed "VectorWise". Co-founders included Peter A. Boncz and Marcin Żukowski.[29] [30]

In June 2010, the VectorWise technology was officially announced by Ingres Corporation,[31] with the release of Ingres VectorWise 1.0.[32]

In March 2011, VectorWise 1.5 was released,[33] publishing a record breaking result on TPC-H 100 GB benchmark.[34] New features included parallel query execution (single query executed on multiple CPU cores), improved bulk loading and enhanced SQL support.In June 2011, VectorWise 1.6 was released, publishing record breaking results on TPC-H 100 GB,[35] 300 GB[36] and 1 TB[37] non-clustered benchmark.

In December 2011, VectorWise 2.0 was released[38] with new SQL support for analytical functions such as rank and percentile and enhanced date, time and timestamp datatypes, and support for disk spilling in hash joins and aggregation.

In June 2012, VectorWise 2.5 was released.[39] In this release storage format was reorganized to allow storing the database in multiple location, the background update propagation mechanism from PDTs to stable storage was enhanced to allow rewriting only the changed blocks instead of full rewrites, and a new patented[40] Predictive Buffer Manager (PBM) was introduced.[41]

In March 2013, VectorWise 3.0 was released.[42] New features included more efficient storage engine, support for more data types and analytical SQL functions, enhanced DDL features, and improved monitoring and profiling accessibility.

In March 2014, Actian Vector 3.5 was released, with a new rebranded and shortened name. New features included support for partitioned tables, improved disk spilling, online backup capabilities and improved SQL support - e.g. MERGE/UPSERT DML operations and FIRST_VALUE and LAST_VALUE window aggregation functions.

In June 2014, at Hadoop Summit 2014 in San Jose, Actian announced Actian Vortex clustered MPP version of Vector, with same level of SQL support working in Hadoop with storage directly in HDFS.Actian Vortex was later renamed to Actian Vector in Hadoop, and non-clustered Actian Vector releases are also updated to match. In March 2015 Actian Vector 4 was released, and Actian Vector in Hadoop 4 was released in December 2015.[43]

In March 2019, Actian Avalanche was released as a cloud data platform, with Vector as the core engine for the Warehouse offering.[44] In November 2023, Actian rebranded and relaunched Avalanche as Actian Data Platform, including new capabilities for Data Quality.[45]

Release history

Actian Vector

ReleaseGeneral availabilityEnd of Enterprise SupportEnd of Extended SupportEnd of Obsolescence SupportMarquee Features
October 22, 2024October 31, 2027October 31, 2029October 31, 2031Auto Partitioning, Developer SDK, Table Cloning, Advanced External Tables, Spark UDFs, REGEX Pattern Matching, ML Inference using TensorFlow[46]
December, 2022December 31, 2025December 31, 2027December 31, 2029Automatic Log Rotation, Add UDF Engine Startup to Ingstart Utility, Exception Handling in Database Procedures, Extend Pattern-Matching Capabilities, Extend UDF visibility, Query Result Caching - Spill to Disk, Remote File System Support for Vector Non-MPP, Smart MinMax Index, Warm Standby[47]
November, 2021November 30, 2024November 30, 2026November 30, 2028Automatic Partitioning, Query Result Caching, Spark Vector Connector 3.0, UUID support for the ODBC driver, Workload Management Enhancements, Scalar User-defined Functions Enhancements, Encryption Key Management[48]
June, 2020June 30, 2023June 30, 2025June 30, 2027JSON Support, Scalar UDFs, Workload Management, Reverse Strings, Data at Rest Encryption Enhancements, Wildcards in File Names for COPY VWLOAD, Pivot Tables, External Table Enhancement[49]
May, 2018September 30, 2021September 30, 2023September 30, 2025Function-based encryption, Column masking, External tables, MEDIAN and PERCENTILE_CONT aggregate functions, Support for Vector tables in database procedures, Alterable min-max index, Nullable unique keys[50]
May, 2018June 30, 2021June 30, 2023June 30, 2025Vector 5.1 was made available in the Amazon AWS Marketplace and the Microsoft Azure Marketplace for deployment in the cloud
June, 2016June 30, 2020June 30, 2022June 30, 2024UUID data type and functions, Clonedb utility, SQL syntax for parallel vwload and CSV export, Spark-Vector Connector enhancements, Distributed Write Ahead Log, Automatic histogram generation for statistics, SET SERVER_TRACE and SET SESSION_TRACE statements[51]
March, 2015December 31, 2018December 31, 2020December 31, 2022Query level auditing (C2 security), Data at rest encryption, Statements MODIFY…TO COMBINE and MODIFY…TO RECONSTRUCT, CREATE/DROP STATISTICS statements, INTERSECT/EXCEPT set operators, CREATE TABLE IF NOT EXISTS statement, Aggregate Window Functions additions, Spark-Vector Connector and Loader, min-max indexes on a subset of columns[52]
March, 2014March 31, 2017March 31, 2019March 31, 2021Partitioned tables, Parallel vwload, I/O performance improvements, Secondary Indexes, Incremental Backup, MERGE statement, FIRST_VALUE and LAST_VALUE functions, Declaration-only constraints [53]
April, 2013April 15, 2016April 30, 2017Not AvailableNew Analytical Functions and SQL extensions (e.g. LAG, LEAD, ROLLUP, CUBE, etc.), New Data Types and Functions, Time zone support, Performance and Connectivity enhancements.[54] In SP1, key updates include: IPV4 and IPV6 data types, New SQL functions, Disaster Recovery and High Availability improvements.[55]
June, 2012June 1, 2015April 30, 2017Not Available
November, 2011November, 2011April 30, 2017Not Available

Actian Vector in Hadoop

ReleaseGeneral availabilityEnd of Enterprise SupportEnd of Extended SupportEnd of Obsolescence SupportMarquee Features
April 24, 2020April 30, 2023April 30, 2026Not AvailableSame list of improvements made available in Vector 6.0[56]
November, 2018November 30, 2021November 30, 2023Not AvailableSupport for HDFS Federation, Support for ADL Storage (Gen 1 and 2), Data Import/Export enhancements, Database Administration and Performance improvements, External Tables enhancement[57]
October, 2018October 31, 2020October 31, 2022Not AvailableDetection of Hadoop YARN resources at install time, Support for Apache Knox and Apache Ranger, Data Import/Export enhancements, Database Administration and Performance improvements, External Tables[58]
December, 2015December 31, 2018December 31, 2020December 31, 2022Hadoop YARN integration, Data at rest encryption, Query level auditing (C2 security), Performance optimizations, Installer improvements, Support for 2048 columns, CSVEXPORT system command, Aggregate window functions additions[59]

In 2024, Actian decided to withdraw End of Obsolescence Support for Actian Vector in Hadoop, after discontinuing the marketing of this product line, thus making 6.0 its last release and Actian Data Platform's Cloud Data Warehouse service the only MPP implementation of Vector available.

See also

External links

Notes and References

  1. Web site: Vector 6.3 Delivers Easier Administration, Greater Automation and Better Productivity for Data Analytics . 9 December 2022 . 2023-04-13 .
  2. Web site: Vectorwise Enterprise. 3 May 2012. Actian Corporation.
  3. Web site: TPC-H - Top Ten Performance Results - Non-Clustered. 3 May 2012. Transaction Processing Performance Council.
  4. Vectorwise Smashes TPC-H Record at Scale Factor 100 Delivering 340% of Previous Best Record. Actian Corporation. 15 February 2011. 7 February 2016 .
  5. Vectorwise Breaks 300GB and 1TB TPC-H Benchmark Records Hands Down. Actian Corporation. 4 May 2011. 7 February 2011 .
  6. Web site: Actian Analytics Platform Outperforms All Others By 2X, Sets New Record In Latest TPC-H Benchmark. 20 Aug 2016. Actian Corporation.
  7. Web site: CWI spin-off company VectorWise sold to Ingres Corporation.
  8. News: Clarke. Gavin. Ingres' VectorWise rises to answer Microsoft. The Register. 2 February 2010.
  9. News: Babcock. Charles. Ingres Unveils VectorWise Database Engine. InformationWeek. 9 June 2010.
  10. News: Suleman. Khidr. Ingres launches VectorWise database engine . V3.co.uk . 8 June 2010.
  11. Book: 10.1145/2213836.2213967 . 978-1-4503-1247-9 . From x100 to vectorwise . Proceedings of the 2012 international conference on Management of Data - SIGMOD '12 . 861 . 2012 . Zukowski . Marcin . Boncz . Peter . 9187072 .
  12. Web site: Pssst: Want to Hear About Actian Vector 3.5?. 2016-05-04.
  13. Web site: Vector(wise) goes Hadoop.
  14. Web site: Peter Boncz - Actian Vector on Hadoop: The First Industrial-strength DBMS to Truly Leverage Hadoop. YouTube.
  15. Żukowski. Marcin . Balancing vectorized query execution with bandwidth-optimized storage. Universiteit van Amsterdam. 11 September 2009. 7 February 2016.
  16. Héman. Sandor . Updating Compressed Column Stores . Vrije Universiteit Amsterdam. 2015. 7 February 2016.
  17. Inkster . Doug . Żukowski . Marcin . Boncz . Peter . Integration of VectorWise with Ingres . SIGMOD Record . 40 . 3 . 45–53 . September 2011 . 7 February 2016. 10.1145/2070736.2070747 . 1871/33100 . 6372175 .
  18. Zukowski . Marcin . Boncz . Peter . Vectorwise: Beyond Column Stores . IEEE Data Engineering Bulletin . 35 . 1 . 21–27 . March 2012 . 4 May 2012.
  19. US. 20100235335. application. Column-store database architecture utilizing positional delta tree update system and methods. 2010-09-16. 2010-03-08. 2009-03-11. Sandor ABC Heman, Peter A. Boncz, Marcin Zukowski, Nicolaas J. Nes.
  20. Héman . Sándor . Żukowski . Marcin . Nes . Niels . Sidirourgos . Lefteris . Boncz . Peter . Positional update handling in column stores . SIGMOD Conference 2010 . 543–554 .
  21. Web site: Homepage of Peter Boncz . 7 February 2016.
  22. Web site: Faster database technology with MonetDB/X100 . 4 May 2012. CWI Amsterdam.
  23. Héman, S.. Nes, N.J.. Zukowski, M.. Boncz, P.A.. Vectorized Data Processing on the Cell Broadband Engine. Universiteit van Amsterdam. 2007. 4 May 2012.
  24. Web site: Third International Workshop on Data Management on New Hardware (DaMoN 2007). 4 May 2012. Carnegie Mellon’s School of Computer Science (SCS).
  25. Book: 10.1145/1457150.1457160 . 9781605581842. DSM vs. NSM. Proceedings of the 4th international workshop on Data management on new hardware - DaMoN '08. 47. 2008. Zukowski. Marcin. Nes. Niels. Boncz. Peter. 11946467.
  26. Web site: Fourth International Workshop on Data Management on New Hardware (DaMoN 2008). 4 May 2012. .
  27. Web site: 10-year Best Paper Award – VLDB 2009. 4 May 2012. .
  28. Book: Boncz, Peter . Manegold, Stefan . Kersten, Martin L. . Database architecture optimized for the new bottleneck: Memory access . . 15 June 1999 . 1-55860-615-7 . Proceedings of the 25th International Conference on Very Large Data Bases . 54–65 . 11 December 2013 .
  29. Web site: Goodbye VectorWise, farewell ParAccel? . DBMS2 . Curt Monash . 25 April 2013 . 11 December 2013 .
  30. Web site: Peter Boncz . Staff web page . CWI . 11 December 2013 .
  31. News: Clark. Don. Database-Software Firm Tries 'Action Apps'. The Wall Street Journal. 22 September 2011.
  32. Web site: Ingres Vectorwise 1.0 . 7 February 2016.
  33. Web site: An early look at Actian VectorWise 1.5.
  34. Web site: TPC-H SF100 Vectorwise 1.5.
  35. Web site: TPC-H SF100 Vectorwise 1.6.
  36. Web site: TPC-H SF300 Vectorwise 1.6.
  37. Web site: TPC-H SF1000 Vectorwise 1.6.
  38. Web site: An even faster VectorWise.
  39. Web site: Actian Releases Vectorwise 2.5 – Record-Breaking Database Is Now Even Faster.
  40. US. 8825959 B1. patent. Method and apparatus for using data access time prediction for improving data buffering policies. 2014-09-02. 2012-07-31. Michal Switakowski, Peter Boncz, Marcin Zukowski.
  41. Świtakowski, Michał . Boncz, Peter . Żukowski, Marcin . From Cooperative Scans to Predictive Buffer Management . VLDB 2012 . August 2012 . Proceedings of the VLDB Endowment . 5 . 12 . 1759–1770 . 10.14778/2367502.2367515 . 7 February 2016 . 2012arXiv1208.4170S . 1208.4170 . 17184937 .
  42. Web site: Actian Announces Availability of Vectorwise 3.0 for Getting Fast Answers from Big Data.
  43. Web site: Lifecycle Dates - Actian Vector and Vector in Hadoop.
  44. Web site: Actian Avalanche Real-Time Connected Data Warehouse adds integration.
  45. Web site: Actian Data Platform Relaunches With Integration as a Service.
  46. Web site: Actian Vector - New Features in Version 7.0. 2024-11-01. Actian Documentation. en.
  47. Web site: Actian Vector - New Features in Version 6.3. 2024-11-01. Actian Documentation. en.
  48. Web site: Actian Vector - New Features in Version 6.2. 2024-11-01. Actian Documentation. en.
  49. Web site: Actian Vector - New Features in Version 6.0. 2024-11-01. Actian Documentation. en.
  50. Web site: Actian Vector - New Features in Version 5.1. 2024-11-01. Actian Documentation. en.
  51. Web site: Actian Vector - New Features in Version 5.0. 2024-11-01. Actian Documentation. en.
  52. Web site: Actian Vector - New Features in Version 4.2. 2024-11-01. Actian Documentation. en.
  53. Web site: Actian Vector - New Features in Version 3.5. 2024-11-01. Actian Documentation. en.
  54. Web site: Actian Vector - New Features in Version 3.0. 2024-11-01. Actian Documentation. en.
  55. Web site: Actian Vector - New Features in Version 3.0 SP1. 2024-11-01. Actian Documentation. en.
  56. Web site: Actian Vector in Hadoop - New Features in Version 6.0. 2024-11-01. Actian Documentation. en.
  57. Web site: Actian Vector in Hadoop - New Features in Version 5.1. 2024-11-01. Actian Documentation. en.
  58. Web site: Actian Vector in Hadoop - New Features in Version 5.0. 2024-11-01. Actian Documentation. en.
  59. Web site: Actian Vector in Hadoop - New Features in Version 4.2. 2024-11-01. Actian Documentation. en.