Unstructured Data Management: On-Premises and Cloud Security Guide
From documents and emails to videos and project files, most of your business-critical data is unstructured, and it is growing rapidly. Yet this data often lacks clear ownership, structure or protection, making it vulnerable to ransomware, human error and compliance risks.
In this guide, we explore the biggest challenges in unstructured data management and explain how to protect your data across NAS, file servers and cloud environments.
What is Unstructured Data?
Unstructured data is information that does not follow a predefined data model or is not organized in a systematic manner. Unlike structured data, which is neatly arranged in rows and columns within relational databases, unstructured data is files stored in file systems or object storage. Unstructured data can be stored as text, audio, photo, video and other types in specific file formats.
Examples of unstructured data include:
- Text documents: PDF, DOC, TXT, RTF and other files.
- Multimedia content, such as images, videos and audio files: RAW, JPG, PNG, TIF, CDR, PSD, AVI, MOV, MPG, MP4, MP3, M4A, WAV and other file formats.
- Emails and messages.
- Sensor data and logs.
The majority of data generated worldwide can be categorized as unstructured data, which is why it is necessary to protect it.
The role of NAS in storing and managing unstructured data
Network Attached Storage (NAS) is popular among small/medium-sized businesses and individual users for storing unstructured data such as files. While storing structured data includes managing a relational database that requires advanced knowledge, storing files on NAS is affordable and convenient. The main advantages of using NAS to store files are scalability, reliability and accessibility:
- You can add more disks (HDD and SSD) to expand storage space on NAS. If there are no free ports for disks, expansion units can be used.
- Configuring software RAID 1 and RAID 10 ensures data redundancy if a hard disk drive fails, prevents data loss and provides the ability to rebuild the array using data on healthy disks.
- It is convenient to configure file shares to have centralized data storage and provide access to multiple users, allowing them to collaborate effectively. NAS devices support multiple file-sharing protocols, such as SMB, NFS and AFP, to configure file shares for Windows, Linux and macOS.
NAS devices are cost-efficient for small and medium-sized businesses compared to advanced Storage Area Network (SAN) systems used primarily by large enterprises. In certain cases, using NAS as unstructured data storage can be more rational than using a traditional file server.
Key Risks and Challenges in Protecting Unstructured Data
Like any other important data, unstructured data should be protected. However, due to its diverse nature, volume and other characteristics, configuring unstructured data protection presents some risks and challenges.
Increasing data volume and complexity
The amount of data an organization generates continuously increases, meaning more storage is needed for unstructured data backups. Additional storage requires additional costs, so using storage space-saving approaches and minimizing unnecessary data storage is essential. Replication costs are higher than for a backup because replication requires higher storage performance, especially if you are using real-time replication. Scaling up production systems also requires scaling up backup storage.
Unstructured data consists of files in different formats and may be necessary to protect the most critical data. Protecting only the needed data saves time and storage. For example, you may not need to back up log or temporary files. Unstructured data usually lacks consistent metadata to categorize and prioritize what to back up.
Unstructured data can be stored in multiple locations: file servers, NAS devices and cloud storage. Depending on the data location (platform), different approaches to back up data may be needed. After making changes to source systems, the backup configuration can be adjusted.
Risks of data loss and downtime
Applications, services and users use unstructured data that is important for regular operations. Unstructured data is widely used in the media and healthcare industries, as well as for user collaboration by sharing files. Losing this data can cause downtime, operation interruption and other negative consequences. The pace of unstructured data growth and changes creates a risk that some new data may be left unprotected if the Recovery Point Objective (RPO) is too high.
Unstructured data often includes unique content, such as documents, project files, photos, videos and other important files. Losing them can be critical for users and businesses. If a Service Level Agreement (SLA) is violated, penalties can be applied.
Security threats and data privacy risks
Sophisticated cyber attacks constitute a threat to unstructured data. In case of an attack, ransomware can encrypt data, but the most dangerous ransomware can steal data by uploading it to the attackers before encryption. Ransomware can infect the computers of end users and destroy files on file shares accessed.
Ransomware attacks often start by targeting end-users via email. Phishing, social engineering and other sophisticated techniques trick users into opening infected files or malicious links. Ransomware also uses software vulnerabilities to penetrate a system, infect it, spread over the network and destroy or encrypt files. If ransomware uploads data to attackers, it is a serious privacy risk because unstructured data often contains private information.
Unstructured data governance and GDPR compliance
Compliance with regulatory standards, such as the General Data Protection Regulation (GDPR), adds more complexity to unstructured data protection. It is important to identify sensitive data that must be protected to meet the GDPR requirements, as this data can be scattered across multiple locations and mixed with other data types.
GDPR mandates protecting Personally Identifiable Information (PII). Unstructured data, such as emails, contracts and videos, often contains PII in unstructured formats, making it hard to locate and track. Unstructured data stored across different geographic locations must comply with GDPR rules on cross-border data transfers, complicating compliance.
GDPR requires organizations to retain personal data only for as long as necessary. Applying retention policies to unstructured data is challenging due to its unorganized nature. Identifying and deleting redundant or outdated unstructured data to comply with GDPR can be resource-intensive. According to GDPR, organizations must delete personal user data if a user requests it.
The difficulties in unstructured data governance or management include the lack of metadata to categorize data and data sprawl because of the growing volumes of data. Thus, solutions to store and back up data must be scalable. Data is often shared within and outside organizations, so it could be difficult to properly configure access control to satisfy all sides and implement consistent policies.
Strategies for On-Premises Data Protection
Optimal data protection strategies for unstructured data stored on-premises can minimize the risk of data loss. These strategies ensure data integrity, availability and confidentiality while reducing the risks of cyberattacks, hardware failures or insider misuse. One of the main points to consider is the protection of unstructured data storage.
Regular backups for unstructured data protection
Regular backups are a crucial element of any data protection strategy. You can configure automated backups to run regularly for data stored locally on servers and NAS devices.
- Determine backup frequency based on the necessity of data and organizational needs.
- Use a combination of full and incremental backups to balance storage efficiency and recovery speed.
- Follow the 3-2-1 backup rule for an effective and reliable on-premises data protection strategy.
- Test backups to ensure that data is consistent and you can recover it when needed.
- Enable backup encryption to avoid unauthorized access and data leaks in case of cyber-attacks.
- Enable backup immutability to prevent backups from being encrypted, altered and destroyed by ransomware.
Best practices for securing NAS systems
NAS devices are widely used as storage platforms to store large volumes of unstructured data on-premises. At the same time, NAS devices are desired targets for cyber-criminals, so organizations must protect NAS systems and stored data. Following the best practices below can help in unstructured data protection on NAS devices:
- Disable default accounts and create customized accounts with strong passwords for administrators.
- Configure access permissions correctly.
- Configure the firewall to ensure that NAS can be accessed only from allowed locations for allowed protocols and ports. Disable unnecessary protocols in NAS settings.
- Install security patches and updates provided by the NAS vendor to fix known vulnerabilities.
- Back up data stored on NAS regularly.
- Ensure the physical location of the NAS device is secure, with restricted access to authorized users only.
- Install security updates as soon as they are released.
Effective governance strategies for on-premises data
Effective strategies for unstructured data management help manage data across various formats (documents, images, video, etc.) while ensuring security and compliance. Due to the decentralized nature of unstructured data, implementing effective governance requires a combination of policies, tools and processes.
- Classify unstructured data based on its sensitivity, value and compliance requirements (confidential, public, personal data).
- Define categories like sensitive, restricted and public to prioritize protection efforts.
- Configure role-based access control to restrict access and ensure users access only the data they need.
- Define data retention and lifecycle management policies. Define retention periods for different types of unstructured data to ensure that data is stored only as long as necessary. Align retention policies with compliance requirements (such as GDPR, HIPAA and others) to avoid legal issues. Configure automatic deletion of data that has exceeded its retention period to reduce storage costs and minimize the risk of exposure.
- Monitor your infrastructure to detect issues and fix them as soon as possible before serious failures occur. Use data loss prevention systems to detect abnormal activities related to ransomware infections. Configure automatic notifications and alerts.
- Educate employees about cyber threats, the main strategies for cyber attacks and the symptoms of ransomware infections. If users can detect an attempted cyber attack and perform the right actions, they can help prevent it. Users should report suspicious behavior that can signify a cyber attack.
- Create an incident response plan, disaster recovery plan and business continuity plan. Test these plans to ensure they work as expected in case of failure or disaster.
How Can You Protect Your Data in the Cloud?
Cloud platforms are becoming increasingly popular due to their reliability and availability. Amazon S3 and Azure Blob storage are the most popular examples of object storage for storing unstructured data in the public cloud. However, data stored in the cloud can be lost due to accidental deletion, human error, software error or cyber attack. That’s why you should consider unstructured data protection when storing data in the public cloud.
Top strategies for cloud data protection
Consider the following unstructured data protection strategies to secure data in the public cloud:
- Use encrypted connections to access cloud storage. Enable encryption for files in transit and at rest. This strategy prevents data interception by third parties and related data leaks.
- Configure role-based access control. Define roles and set permissions for users. Use the principle of least privileges and configure user permissions to perform the needed tasks but no more. Enable multi-factor authentication to access cloud resources. Use Identity and Access Management (IAM) tools to create policies and manage permissions across cloud services.
- Configure backups for unstructured data storage in the cloud. Develop and test a plan for cloud-based disaster recovery.
- Ensure that your data protection strategies align with regulatory frameworks like GDPR, HIPAA or PCI-DSS, which require specific measures to protect unstructured data.
- Understand the shared responsibility model for security with your cloud provider. According to this model, the provider secures the infrastructure, while the customer is responsible for securing data and access.
Cloud-native tools and automation
Cloud storage vendors usually provide native tools that can be used for recovery purposes. These tools include Identity and Access Management (IAM), multi-factor authentication, encryption, ACLs, firewalls and others.
- Enable version control (versioning) to roll back changes and recover the previous version of files if the latest version is corrupted or wrong changes are written.
- Enable immutability using the write-once-read-many (WORM) approach if object storage is used in the cloud.
- Use storage tiers to optimize costs for primary and secondary data as well as backups stored in the cloud.
- Consider lifecycle management tools for data governance and automation of outdated data.
- Use cloud-native monitoring tools to monitor data stored in the cloud and identify abnormal activities.
- Enable the data loss prevention system, if provided by the cloud vendor.
Ensuring privacy in multi-cloud and hybrid setups
If an organization stores data in multiple public clouds or in a public and a private cloud (a combination of on-premises and cloud storage is a hybrid setup), it should implement measures to ensure privacy in this environment. Possible complexities are data fragmentation across different platforms, varying compliance requirements and securing data across multiple locations. Protecting data in hybrid environments primarily includes combining protection measures for cloud and on-premises environments.
- Enable data encryption across all environments. Use a centralized Key Management System (KMS) to manage encryption keys. Encrypt data before transferring it to the cloud using client-side encryption.
- Back up data stored in all locations – in cloud storage and on-premises. Select a backup solution that supports all cloud providers and local platforms where data is stored.
Effective Unstructured Data Management and Governance Tips
Unstructured data, such as emails, documents, videos and other media content, makes up the majority of organizational data but often lacks structure, making it harder to manage and protect. Implementing solid governance and management strategies helps organizations handle this data more effectively.
Consider the following points when configuring access controls as part of an unstructured data management strategy.
- Which permissions should a user have?
- Should a user be permitted to write changes?
- Is it enough to provide the read-only access for this user?
- Who are users allowed to share content with?
- What are the methods to share data? Is it a file share on NAS or a cloud storage platform?
Organizations should be able to control data access and usage granularly as well as provide data-centric access controls. These strategies can help prevent unauthorized access and data loss.
How NAKIVO Simplifies NAS Data Protection
NAKIVO Backup & Replication is a dedicated data protection solution that can back up unstructured data stored on file servers, NAS devices and in the cloud. The NAKIVO solution supports file share backups for NFS and SMB file shares with various useful features that ensure high performance and reliability.
- Automated backup of file shares. Schedule backup jobs to ensure that you don’t miss a backup.
- Flexible retention settings. Configure retention policies to keep the needed recovery points for different periods. The GFS retention scheme and other complex retention schemes are supported. You can configure retention settings to meet the necessary compliance requirements.
- Backup encryption. Encrypt backup at transit (when transferring data over the network) and at rest (in a backup repository). Source backup encryption allows you to encrypt backups on the source side, transfer encrypted data and store it in a backup repository.
- Immutable backups. Enable backup immutability to protect backups against ransomware and other types of unauthorized modification or deletion.
- Space-saving capabilities. Run incremental backups to copy only data changes and reduce storage space consumption. Running incremental backups with periodic full backups improves reliability and reduces the risk of incremental backup chain corruption. Backup compression allows you to save more storage space in backup repositories.
- Multiple backup locations. You can store backups in different storage locations on-premises and in the cloud. Local folders, SMB/NFS file shares and tape can be used for local storage. For cost-effective backup storage in the public cloud, you can use supported object storage in Azure Blob Storage, AWS S3 and other S3-compatible storage.
- Granular recovery. You can select specific files from a backup to quickly recover the needed data. Full recovery is also supported.
NAKIVO Backup & Replication also supports Microsoft 365 backup, including unstructured data stored as emails, documents and other information in Exchange Online, OneDrive for Business, SharePoint Online and Microsoft Teams.
Conclusion
Unstructured data protection requires implementing backups to ensure data availability and minimize the chance of data loss and downtime. You can also configure file share access control to adhere to compliance requirements. Protect backups against unauthorized access by using encryption, immutability and access control. Use NAKIVO Backup & Replication to back up unstructured data stored on file shares hosted by file servers and NAS devices and in the cloud.