Platformcloud9.com

Ingres could be steaming with VectorWise

HS-EmmaMcGrattan.jpg

Database vendor Ingres may just have come up with the application that opens up Cloud-based services to a far wider set of users. A typical 70x performance improvement in analysing data is likely to a serious attraction all by itself. It could also have set rolling a small stone that could snowball into a `killer application’ that makes moving to parallel processing a very sensible option for the near future.

The company, which to be fair has had a somewhat chequered recent history, last year took its open source database and joined it with a processing engine provided by VectorWise, a commercial off-shoot of the CWI Research Institute in Amsterdam. This uses a new vectorized data processing architecture that allows full utilization of the computational power of modern CPUs, exploiting features like CPU caches, super-scalar execution pipelines and SIMD instructions. Much of this work has been done in close collaboration with Intel.
 
The result has already started to create waves, particularly around the early claims for its startling performance. It is said to perform many analytical tasks up to 70 times faster than currently available relational database systems. Speed, however, is not necessarily that helpful if the way to achieve it requires specialised hardware or complex new software to be learned and understood. This combination, known as Ingres VectorWise, avoids that trap, by running standard SQL query language on commodity 64-bit, x-86 architected servers.
 
This combination should prove attractive to a wide range of users. Those running large, compute hungry on-premise applications such as financial risk analysis will certainly have a use for it. But so will the growing number of cloud service providers, where the performance advantages could bring a positive impact to service throughput.
 
Still in Beta, Ingres VetcorWise got its first UK showing this week at the Ingres User Association meeting in London, where it was demo’d using Amazon EC2 with Elastic Block Storage. It ran the same benchmark, on the same Amazon instance, as an Oracle database, the only difference being that the Oracle test was set up to query 50 million rows, while VectorWise was set to query 100 million rows.
 
According to Emma McGrattan, Senior VP of Engineering at Ingres, the Oracle system came back with a response in just under one minute. By contrast, VectorWise came back in just over one second.
 
It is specifically geared for business intelligence, data warehousing, and analytic workloads, so she is confident it will work well as the basis of a managed service or a SaaS provider. The performance improvement will mean that service providers will be able to service the needs of a large number of customers. As yet, the company has not announced any partnerships with any cloud vendors. “But you can imagine that the next steps will be to interest them in offering it as part of their own offerings,” McGrattan said.
 
At this early stage, the current release does not support multi-tenancy, so Ingres recommends that a separate instance is required for every application. But the database and processor engine run in Linux on any standard x86 architecture, 64-bit Intel or AMD processor, and these are capable of running increasingly large numbers of virtual servers. These are the types of server now being specified both by service providers and on-premise users as they consolidate existing server farms.
 
“Many customers are also concerned about the utilisation of their existing servers,” she said. “As customers have been looking to trial VectorWise they have been setting up virtual machines on existing servers and ring-fencing them,” she said. “One customer set up a trial on a three year-old server that wasn’t being used. So it doesn’t take specialist equipment to run it, as long as it has a 64-bit chip.”
 
Applications are written as SQL queries and it has all the standard interfaces such as JDBC, ODBC and Microsoft .NET which VectorWise can pick up. So a wide range of existing applications should port directly. “So long as standard SQL is used porting applications is somewhat trivial,” she said. “Where we do have challenges is where someone is using proprietary technology such as PL/SQL where there might be a need for some rewriting. But where there is a standard reporting tool such as Business Objects it is just a case of defining a different JDBC data source.”
 
This means that it should not require any specific training to use the system, which means that there is a large installed base of existing user skills available to exploit it. This should make it particularly attractive to smaller specialists such as systems integrators, particularly as they consider a move to Cloud-based service delivery. Most of their existing applications should port directly, and be available to run as a delivered service.
 
The move to vector-based processing also offers the opportunity to move a wide range of applications on to parallel processing architectures as that technology starts to develop. Collaborator Intel is already seriously looking at the potential of parallel processing, both with its current multicore processors and with so-called `manycore’ devices such as graphics processors, which are already being used for processing large, business-oriented datasets. These have a strong vector processing capability, and this work by Intel would seem to suggest that VectorWise could make something of a `killer’ application with upcoming parallel architectures.
 
“You’re absolutely right,” McGrattan said. “It is not something we are working on currently. The focus has been on getting the product through to GA for today’s demo. But it is something that would make a lot of sense for collaboration on in the future. It’s not officially on the road map yet, though further parallelisation is.”
 
So for now Ingres is just following the road map set out by Intel for the Xeon product line. But a clear marker has been thrown down by Ingres that Ingres VectorWise is being lined up to exploit the processing power possible with a move to parallel architectures, offering users a clear and direct transition vehicle for many of its core business applications.
 
Technically, Ingres VectorWise uses a column-based storage approach where the data set is dynamically divided up into vectors. This vectorized execution process passes on multiple tuples - single data records - as vectors of values between relational operators, rather than executing single tuples sequentially. The VectorWise team has developed a suite of vectorized versions of relational operators, such as vectroized selection, vectorized project, join, sort and the rest.
 
In addition, the processing takes place in the cache memory of the processor, which means that the associated main memory gets used as the buffer for data I/O and the holding of large data structures. It is this approach which speeds up the performance of VectorWise to a level where it is comparable with task-specific code written in C.

 

Post new Comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <br>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.

Sponsor Zone

Twitter