+7 499 390-88-92
BIG DATA
laboratory
Vertica

Vertica

hpe-vertica-big.jpg

History of Vertica

HPE01.jpg Vertica project (previously known as C-Store) was launched in 2005 by Michael Stonebraker and Andrew Palmer. The main idea of Big Data Analytics warehouse was to develop simple and effective MPP architecture focused on real-time massive data loading with simultaneous effective ad-hoc queries execution over large volumes of data without strict requirements to server administration support and manual tuning of query performance. To achieve this, several architectural innovations and ideas have been implemented in Vertica, allowing to consider all requirements to such systems.
2005 - Vertica was founded
2011 - Platform 5 released: tighter integration with Hadoop and column-oriented storage organization
2011 - Vertica was acquired by HP
2013 - Platform 6 (Bulldozer) released: enhanced functionality and updated components and drivers
late 2013 - Platform 7 (Crane) released: enhanced capability to handle semi-structured data at Open-SQL level with support of hot and cold data; enhanced functionality
2015 - Platform 7.2 (Excavator) released
2016 - Platform 7.3 released
2016 - Platform 8 (FrontLoader) released
2017 - transfer of rights under the brand Micro Focus

Why Big Data?

There are several problems traditional DBMS face when dealing with large data volumes:
  • Speed and volumes of data loading
  • Volumes and time of data storage
  • Storage and analysis of random data
  • Speed and volume of analysis
  • Continuous business needs/ implementability tradeoff

What is Big Data?

Data sets that are so large or complex that traditional data processing application software is inadequate to deal with them.
HPE02.jpg

DWH requirements

  • Controlled scalability
  • 24/7/365 fault tolerance
  • Concurrent real time data loading
  • Zero administration
  • Automatic performance control during ad hoc query execution
  • Effective data compression
  • Development and testing areas arrangement

Controlled scalability

Final data warehouse volume is not a definable parameter and will progressively grow as new data sources are connected and scopes of imported data are increased. It is therefore important to ensure continuous increase in performance and expansion of stored data volumes by adding new servers without additional costs associated with licensing and rebuilding of data warehouse. These requirements restrict the choice by MPP systems, ruling out options like SMP servers. Vertica allows to add processing power and increase general disk volume as required.

* Vertica licensing is based on raw data size.

24/7/365 Fault Tolerance

Data warehouse should have maximum protection against critical failures and minimum maintenance or software and hardware upgrade time. It is a must to ensure continuous availability of real time data. Warehouse clients are users and other systems with specific data availability requirements. The vast majority of maintenance checks of Vertica DBMS do not require service interruption. Shutdown of one or more servers in a cluster results only in some loss of efficiency.

Concurrent Real Time Data Loading

Information generated by other systems or equipment from multiple sources serves as a source of warehouse data. A process of continuous collection and uploading of information is required to ensure data is always up-to-date. Considering large volumes of arriving data, uploading process shall ensure concurrency, as queued upload is not always possible.

Zero administration

New projects mean new requirements to warehousing. As a result, outsourcing for enhancement of warehouse functionality has low efficiency due to time-consuming tasks transfer to an external performer (TOR, approvals, design, testing, acceptance, etc.). In this context, data warehouse must easily and effectively operate automatically, when routine maintenance, enhancements and optimization are done in-house. Vertica server administration stands for organizing users' connection pools, enabling to control resource consumption in sessions and carry regular audit for the purpose of warehouse optimization (automatically).

Efficiency Management

Data warehouse is a supplier of data for other systems and BI. It is impossible to pre-identify a scope of queries from various systems and optimize the system operation accordingly. Therefore, data warehouse must allow defining the most popular parts of data, method of their storage and sorting during data model engineering by automatically assuming optimization of ad-hoc queries without any manual tuning of incoming queries. Vertica data handling optimization includes accurate data storage design, achieved by controllable segmentation, partitioning and sorting of data. You can describe duplicate data structures with projections as appropriate table fields, with own segmentation, sorting and, if required, grouping to be stored in one block. It considerably accelerates queries execution.

Organization of Development Areas

Fault tolerance and high availability requirements do not allow deploying areas for development and testing of new functionalities on a commercial server. Separate hardware is required to accommodate such work. Vertica allows deploying test and developer areas at no additional cost within the same license.

Vertica architecture

1-1.jpg
2.jpg

Write-Optimized Store – WOS

Read-Optimized Store – ROS

Tuple Mover – TM (data optimizer)

  • up to 50-1000 times faster query processing than traditional row systems
  • up to 10x data uploading velocity increment
  • easy to install/ use
  • high scalability and massively parallel processing
  • industrially standard platform x86
  • Hybrid in-memory/on-disk architecture
  • Storage of data near CPU

Data compression
  • Optimization of repeated data storage
  • Over 12 compression schemes
  • Data-driven choice
  • System-chosen compression
  • 50% – 90% compression on average
  • Internal queries in compressed (coded) form
Рисунок2.png
Vertica Flex Zone is a special area to store and process unstructured data.
Vertica DB allows creating flex tables and loading data from CSV and JSON files.

Column-oriented storage

3-1.jpg

Vertica optimization

4-1.jpg

Administration

5-1.jpg

Vertica Advantages

6-1.jpg

Key success factors

7-1.jpg


We are the only company in Russia with Accredited Solutions Expert Vertica Big Data Solutions Administrator certification. We have the largest practical experience in working with Vertica amid other Russian companies accredited by Vertica.
+7 499 390-88-92