AI storage: NAS vs SAN vs object for training and inference

AI storage: NAS vs SAN vs Object for training and inference (19459000)

Artificial Intelligence operations can place different storage demands during training, inference and so on. We examine NAS, SAN, and object storage in AI and how they can be balanced for AI projects.

by

Published: 23 May 2025

Artificial intelligence (AI), relies upon vast amounts of data.

AI projects are a priority for enterprises. Large language models (LLMs), Generative AI (Genai)requires large volumes of data to train models and store outputs from AI enabled systems.

This data is unlikely to be stored in one system or location. Customers will use multiple data sources including structured data stored in databases, and often unstructured information. Some of these sources will be on-premises, while others will be in the cloud.

In order to satisfy AI’s appetite for data, system designers need to consider storage options across storage area networks, network attached storage, and possibly object storage.

This article examines the pros and cons for AI projects of file, block and object storage and the challenges for organisations in finding the right mix.

AI’s data mountain

Current AI projects are rarely if ever characterized by a single data source. Generative AI models use a variety of data, most of which is unstructured. Documents, images, audio, video, and computer code are just a few examples.

Understanding relationships is the key to generative AI. Patrick Smith, Pure Storage,
You still have your source data in your unstructured files or objects, and vectorised data on blocks.

The more data sources you have, the better. Enterprises also link LLMs with their own data sources either directly or via retrieval augmented creation (RAG), which improves accuracy and relevance. This data can be documents, but it can also include enterprise applications which store data in relational databases.

“A large part of AI is driven from unstructured information, so applications will point to files, images and video – all unstructured,” says Patrick Smith. He is the field chief technology officer EMEA for storage supplier Pure Storage. “But people look at their production data and want to link them to their generative AI project.”

He adds that this includes adding vectorisation in databases, which are commonly supported by main relational database providers, such as Oracle.

NAS

This raises the issue of where to store data for system architects who support AI project. It is often easier to leave data sources in their current state, but that’s not always possible.

It could be that data needs to be further processed, the AI application must be isolated from production systems or current storage systems do not have the throughput required by the AI application.

Vectorisation can also lead to a large increase in data volume – 10 times is not uncommon – which puts additional demands on storage systems.

Storage needs to be flexible, scalable and able to handle AI project data at each stage. Inference, or running the model on production, might not need as much raw data but require a higher throughput with minimal latency.

Enterprises keep the majority of their unstructured files on file access NAS storage. NAS is relatively cheaper and easier to scale and manage than alternatives like direct-attached SAN storage or block access NAS storage.

It is more likely that structured data will be stored in block storage. Usually, this data will be stored on a SAN. Direct attached storage may be sufficient for smaller AI project.

In this case, achieving optimal performance – both in terms of IOPS (input/output) and throughput – from the storage array offsets the complexity of NAS. Enterprise production systems such as enterprise resource management (ERP) or customer relationship management (CRM) will use SAN to store their data. In practice, AI data is likely to come from SAN or NAS environments.

AI data can be stored in SAN or NAS. Bruce Kornfeld is chief product officer of StorMagic. He says that it’s all down to how the AI tools need or want to access the data. “You can store AI on a SAN but AI tools will not typically read the blocks.” They’ll use some type of file-access protocol to access the block data.”

There is no guarantee that one protocol is better than another. It depends on the nature and output of the AI system.

A NAS system might be sufficient for a primarily image or document-based AI system. For applications such as autonomous driving, surveillance or video analytics, systems may use a SAN storage or even high-speed locally stored.

Data architects will need to differentiate between the training and inference phases in their projects, and determine whether the overhead associated with moving data between storage systems is greater than any performance benefits, particularly in training.

Enter the object storage

Some organisations have looked at object storage to unify data sources for AI. Enterprises are increasingly using object storage, and it’s not just cloud storage. On-premise object stores have also gained market share.

Object storage has many advantages for AI. Not least is its flat structure, global namespace, low management overheads (relatively), ease of expansion, and low cost.

Performance has not been an object storage strength. This has made it more suitable for tasks like archiving, rather than applications that require low latency and a high level of data throughput.

However, suppliers are working to close this performance gap. Pure Storage and NetApp both sell storage systems capable of handling file, object, and in some cases block. Pure’s FlashBlade and NetApp’s OnTap operating system are among the systems. These technologies allow storage managers to use the best formats without having to rely on specific hardware.

Other companies, like Hammerspace with its Hyperscale-NAS, are aiming to squeeze more performance out of equipment running the network file system. They argue that this prevents bottlenecks when storage cannot keep up with data hungry graphics processing units (GPUs).

Checking all the boxes.

Until better-performing object storage becomes more widely available or more enterprises migrate to universal storage platforms AI will likely use NAS, SAN and object storage in combination. The balance between the elements will likely change over the course of an AI project and as AI tools, applications, and tools evolve. Smith, a Pure employee, has seen requests at Pure for new hardware to handle unstructured data. However, block and vector databases are met by most customers using existing hardware.

He says that “generative AI is all about relationships.” “You still have your source data in your unstructured data (file or object) and your vectorised data on block.”

Cloud NAS (cloud-attached storage or cloud network-attached storage), what is it? What is direct-attached (DAS) storage and how does it function?

by Paul Kirvan.

by Rich chestnut.

  • By
    Antony Adshead.
  • www.aiobserver.co

    More from this stream

    Recomended