Blogs

Flash readiness requires wisdom and insight

Nick Isitt –

There is no doubt that the ‘holy grail’ for data centers today is all about how to achieve peak performance. You don’t have to look far to notice the lively debates around the virtues and limitations of flash memory. But, what is often overlooked is the reality of today’s heterogeneous IT infrastructure – that, from a cost vs performance perspective, spending thousands on flash arrays alone may not assure optimum performance. When it comes to the entire SAN, even with the installation of super-fast new flash arrays, overall output can be still offset by sluggish bottlenecks in other areas. Without a deep understanding of what is really going on within the unique complexity of your data center’s health, how would you know that the spend on purchasing flash or any other performance enhancing application is even the right decision?

The use case for flash memory storage is compelling. The fastest possible data access, from a technology that delivers a much better ‘cost-to-performance’ ratio than traditional disk storage and in a form factor that makes spinning media look decidedly ‘bloated’. This stuff should be literally flying off the shelves, but until the industry reaches that magical inflection point, where the cost delta ($’s per GB) between silicon and disk ceases to exist, there are still some barriers to widespread adoption.

Aside of the cost debate, there are lots of different placement and packaging options, which are numerous and confusing. Will a dedicated array deliver a noticeable performance gain over a server-based memory extension product? Do you opt for a hardware centric or purpose-built software optimized design? Should you consider a SAN caching appliance or would a hybrid model (SSD’s using auto-tiering in your legacy disk array) suffice?

There’s a whole heap of other ‘variables’ that also come into play when evaluating flash memory storage; access patterns, I/O-mix, I/O parameters and load properties that need to be carefully considered, particularly in a multi-tenant scenario.

Then there’s the added nuisance that different vendor solutions will inevitably deliver varying real-world performance. How does that happen when ‘all flash is created equal’, isn’t it?

What the vendor marketing spec’ sheets state (millions of IOPS /sub-millisecond latency) and what you actually attain during real application testing can be worryingly diverse (Google ‘the write-cliff effect’ and you’ll see what I mean). It’s enough to cause a migraine!

‘Application acceleration’ is an appealing proposition though. Sufficiently enticing for most organisations that I’ve spoken to in the last couple of years, to deploy solid state storage without fully understanding and evaluating all of these factors and even more importantly, determining how the surrounding infrastructure will respond to the introduction of memory-speed storage.

It baffles me slightly (accepting my technical ignorance), that customers (and flash vendors) often seem unmindful of other elements and devices that sit in the data path, which collectively determine the ultimate speed and performance of a SAN. If response times are throttle-stopped by the weakest link in the transaction flow, then surely it makes sense to analyse storage traffic throughout the entire stack?

I’m stating the obvious perhaps, but without this level of visibility, hundreds of thousands of dollars are invested on shiny new hardware, based largely on a high level interpretation of I/O-intensity, which is inferred and imprecise. The problem is that almost all standard benchmarking for flash storage focusses on database trace file analysis (performance monitoring output), which looks at database I/O processing tasks, rather than fully characterizing the workflow end-to-end. Flash vendors use this to review CPU utilization, throughput, queue depths and I/O-wait latency to highlight potential areas for improvement. Whilst this shows if there’s an overall path latency problem, it can’t pinpoint where and why, or propose suitable remediation.

This doesn’t provide a truly holistic, correlated view, so potential ‘gremlins’ lurking in the shadows remain undetected until it’s too late and you’ve already parted with your hard earned cash. Only then do unfortunate resource conflicts and ‘misalignments’ elsewhere in the infrastructure start to surface; wrongly configured parallelisation settings, O/S versions that aren’t optimized for flash storage, under-sized I/O interfaces, network congestion and other stressed components which are now overloaded. This is my experience.

Don’t get me wrong, solid state storage is a great ‘enabling technology’, but in the majority of situations where I’ve seen it deployed, it compounds load concentration, exposing weaknesses and bottlenecks elsewhere in the SAN which would otherwise remain masked and undetermined.

So what’s the answer?

Virtual Instruments (VI) unique method of instrumenting the SAN, coupled with powerful analytics, means impact analysis is measured and precise. Real-time acquisition and correlation of performance data, extracting every individual read/write command in the data-flow directly from the Fibre Channel protocol, provides a high-fidelity, unbiased view of the entire SAN and explicit, dynamic measurement of all interactions between the layers which could cause application latency.

VirtualWisdom4 gathers hundreds of critical performance metrics from various sources at the physical layer (something that non-physical monitoring tools simply can’t do), providing full transaction tracing, which can be used for performance impact modelling and evidence-based decision support, completely removing the guesswork, minimising risk and ensuring the success of your flash implementation.

VirtualWisdom4 helps eliminate performance ‘blind spots’ and prevent configuration-based issues from impacting the performance and availability of your flash investment. Your infrastructure is properly balanced and optimally tuned, proactively resolving problems before they occur, rather than simply shifting the bottleneck elsewhere.

In the face of continual change and increasing complexity (layers of abstraction & the dynamic demands of a virtualized compute environment), detailed planning and consideration of this nature is essential as enterprises strive to virtualize more latency-sensitive applications.

An assessment should be a pre-cursor to any flash storage implementation, to verify the likely impact, validate ROI and TCO computations and provide valuable prescriptive guidance for optimization and forward-engineering of the SAN.