History of Vertica

Vertica project (previously known as C-Store) was launched in 2005 by Michael Stonebraker and Andrew Palmer. The main idea of Big Data Analytics warehouse was to develop simple and effective MPP architecture focused on real-time massive data loading with simultaneous effective ad-hoc queries execution over large volumes of data without strict requirements to server administration support and manual tuning of query performance. To achieve this, several architectural innovations and ideas have been implemented in Vertica, allowing to consider all requirements to such systems.
2005 - Vertica was founded
2011 - Platform 5 released: tighter integration with Hadoop and column-oriented storage organization
2011 - Vertica was acquired by HP
2013 - Platform 6 (Bulldozer) released: enhanced functionality and updated components and drivers
late 2013 - Platform 7 (Crane) released: enhanced capability to handle semi-structured data at Open-SQL level with support of hot and cold data; enhanced functionality
2015 - Platform 7.2 (Excavator) released
2016 - Platform 7.3 released
2016 - Platform 8 (FrontLoader) released
2017 - transfer of rights under the brand Micro Focus
Why Big Data?
There are several problems traditional DBMS face when dealing with large data volumes:
- Speed and volumes of data loading
- Volumes and time of data storage
- Storage and analysis of random data
- Speed and volume of analysis
- Continuous business needs/ implementability tradeoff
What is Big Data?
Data sets that are so large or complex that traditional data processing application software is inadequate to deal with them.
DWH requirements
- Controlled scalability
- 24/7/365 fault tolerance
- Concurrent real time data loading
- Zero administration
- Automatic performance control during ad hoc query execution
- Effective data compression
- Development and testing areas arrangement
Controlled scalability
Final data warehouse volume is not a definable parameter and will progressively grow as new data sources are connected and scopes of imported data are increased. It is therefore important to ensure continuous increase in performance and expansion of stored data volumes by adding new servers without additional costs associated with licensing and rebuilding of data warehouse. These requirements restrict the choice by MPP systems, ruling out options like SMP servers. Vertica allows to add processing power and increase general disk volume as required.
* Vertica licensing is based on raw data size.
24/7/365 Fault Tolerance
Data warehouse should have maximum protection against critical failures and minimum maintenance or software and hardware upgrade time. It is a must to ensure continuous availability of real time data. Warehouse clients are users and other systems with specific data availability requirements. The vast majority of maintenance checks of Vertica DBMS do not require service interruption. Shutdown of one or more servers in a cluster results only in some loss of efficiency.
Concurrent Real Time Data Loading
Information generated by other systems or equipment from multiple sources serves as a source of warehouse data. A process of continuous collection and uploading of information is required to ensure data is always up-to-date. Considering large volumes of arriving data, uploading process shall ensure concurrency, as queued upload is not always possible.
Zero administration
New projects mean new requirements to warehousing. As a result, outsourcing for enhancement of warehouse functionality has low efficiency due to time-consuming tasks transfer to an external performer (TOR, approvals, design, testing, acceptance, etc.). In this context, data warehouse must easily and effectively operate automatically, when routine maintenance, enhancements and optimization are done in-house. Vertica server administration stands for organizing users' connection pools, enabling to control resource consumption in sessions and carry regular audit for the purpose of warehouse optimization (automatically).
Efficiency Management
Data warehouse is a supplier of data for other systems and BI. It is impossible to pre-identify a scope of queries from various systems and optimize the system operation accordingly. Therefore, data warehouse must allow defining the most popular parts of data, method of their storage and sorting during data model engineering by automatically assuming optimization of ad-hoc queries without any manual tuning of incoming queries. Vertica data handling optimization includes accurate data storage design, achieved by controllable segmentation, partitioning and sorting of data. You can describe duplicate data structures with projections as appropriate table fields, with own segmentation, sorting and, if required, grouping to be stored in one block. It considerably accelerates queries execution.
Organization of Development Areas
Fault tolerance and high availability requirements do not allow deploying areas for development and testing of new functionalities on a commercial server. Separate hardware is required to accommodate such work. Vertica allows deploying test and developer areas at no additional cost within the same license.
Vertica architecture
Write-Optimized Store – WOS
Read-Optimized Store – ROS
Tuple Mover – TM (data optimizer)
- up to 50-1000 times faster query processing than traditional row systems
- up to 10x data uploading velocity increment
- easy to install/ use
- high scalability and massively parallel processing
- industrially standard platform x86
- Hybrid in-memory/on-disk architecture
- Storage of data near CPU
Data compression
- Optimization of repeated data storage
- Over 12 compression schemes
- Data-driven choice
- System-chosen compression
- 50% – 90% compression on average
- Internal queries in compressed (coded) form
Vertica Flex Zone is a special area to store and process unstructured data.
Vertica DB allows creating flex tables and loading data from CSV and JSON files.
Column-oriented storage
Vertica optimization
Administration
Vertica Advantages
Key success factors