How We Dramatically Cut Data Storage Costs in Just One Quarter | No Paid Tools Needed!

how_we_cut_data_storage_costs_in_4_ways

In today’s cloud-driven world, data storage is a critical component for any business. At Haptik, we primarily rely on Azure Blob Storage along with AWS S3 for our data storage requirements. Their ease of use and integration with other cloud services makes them our go-to solutions for many use-cases. However, this convenience can come at a steep price if storage isn’t properly optimized. Costs can skyrocket with unused storage, redundant data, and inefficient management.

The Problem: Rising Storage Costs

Although Azure Blob Storage and AWS S3 offer scalability, flexibility and security, making them excellent choices for managing vast amounts of data, without proper management, storage costs can quickly spiral out of control. There are other factors such as data transfer and usage which also has an impact on the cost of these storage solutions. However we have managed to reduce our storage costs dramatically within a quarter, all without investing in paid tools. Here’s how we did it.

1. Implement Lifecycle Policies

One of the most effective ways to manage storage costs is through lifecycle policies. Both Azure Blob Storage and AWS S3 offer lifecycle management features that automatically transition data to cheaper storage tiers or delete it after a certain period.

Best Practices:

  • Data Management Policies: Set up clear policies for data retention and automated transitions. For example: Set a lifecycle policy to delete data after n days in Azure Blob Storage 
# Set Lifecycle policy for <RetentionDays> for Storage Account in Azure:
az storage account management-policy create \
    --account-name <StorageAccountName> \
    --resource-group <ResourceGroupName> \
    --policy '{
        "rules": [
            {
                "enabled": true,
                "name": "delete-old-blobs",
                "type": "Lifecycle",
                "definition": {
                    "filters": {
                        "blobTypes": [
                            "blockBlob"
                        ]
                    },
                    "actions": {
                        "baseBlob": {
                            "delete": {
                            "daysAfterCreationGreaterThan": <RetentionDays>
                            }
                        }
                    }
                }
            }
        ]
    }'
  • Multipart Uploads: Keep an eye on incomplete multipart uploads. These can accumulate over time, leading to unnecessary costs. Implement a policy to automatically delete incomplete uploads after a set time.

Things to look out for:

  • Ensure that lifecycle policies align with your data access patterns. Transitioning data too soon might lead to increased retrieval costs. 

  • For pattern based lifecycle management, apply the pattern to a smaller dataset in a test bucket before implementing it on your critical data.

  • Monitor the effects of these policies closely to avoid accidental data loss.

2. Leverage Different Storage Classes in S3

AWS S3 offers various storage classes, each designed for different use cases. Leveraging these can drastically reduce costs. These are especially suitable for archival data such as audit data or long term backup data.

Best Practices:

  • Transitioning Storage Classes: Regularly transition data to the most cost-effective storage class based on access patterns. For example, use S3 Standard for frequently accessed data and S3 Glacier for long-term archives. This can be easily achieved using lifecycle management.

  • Intelligent Tiering: Utilize S3 Intelligent-Tiering, which automatically moves data between two access tiers (frequent and infrequent) based on changing access patterns.

Things to look out for:

  • Be cautious with frequent transitions between classes, as this might incur additional costs. While doing operations on TBs of data at once expect one-time high cost for transitioning your data to infrequent access tiers.

  • Be mindful of the minimal storage duration for infrequent access, deleting data before the minimal duration will again incur high cost.

3. Handle Versions in Azure Blob Storage and S3

Data versioning can be both a lifesaver and a hidden cost driver. Managing versions effectively is crucial for cost control.

Best Practices:

  • Check for Versions: Regularly audit stored versions to identify and delete outdated or unnecessary versions.

  • Use automation for Version Management: A very simple bash script can help automate the process of settings retention policy to ensure deletion of outdated versions. 

For example: Modify Azure blob storage properties to set version retention

# Set version retention to n days for Storage Account in Azure
az storage account blob-service-properties update \
    --account-name <StorageAccountName> \
    --resource-group <ResourceGroupName> \
    --enable-versioning true \
    --delete-retention-days <RetentionDays>

Things to look out for:

  • Ensure that version deletion scripts don’t inadvertently remove important data. Test extensively before automation and add necessary loggers

4. Choose the Right Disk Type

Disk optimization is another crucial aspect often overlooked. Choosing the appropriate disk type is critical for cost management, especially when dealing with databases like RDS. 

Best Practices:

  • Leverage Right Disk Types: Use Standard disks for less critical applications in Azure and Premium disks for high-performance needs. For AWS EC2 workloads gp3 disk type is 20% cheaper than gp2 for storage and it offers performance benefits for high I/O operations.

  • Do once then automate: Simple tasks like ensuring scanning and updating the disk type can easily be handled by automation scripts. 

For example:

#Script to Update Disk Type to GP3 in AWS
aws ec2 modify-volume --volume-id <volume id> --volume-type gp3

Things to look out for:

  • GP3 disks can be expensive for RDS for large amounts of storage so ensure they are only used where necessary.
  • Regular review of snapshots and volumes along with disks is a must to keep the costs in check, let GPT help you with automation scripts for such simple tasks.

Long-Term Monitoring Solutions

Finally, consistent monitoring is key to sustaining cost reductions. There are a few simple strategies that we have brought into practice to ensure the same. 

  • Leverage tools like Azure Cost Management and AWS Cost Explorer to track storage usage and costs over time. 
  • Set up alerts for unusual spending patterns or less-used objects using Azure Alerts and AWS cloudwatch.
  • Monitor for scenarios where a large number of small sized files are uploaded to S3 or blob, these will also add to the cost and might not be visible as a drastic increase in storage size.
  • Set up regular audits which often reveal opportunities to further optimize storage and reduce costs.
  • Regularly revisit the automations in place such lifecycle policies and scripts to ensure coverage is maintained. 
  • One more crucial step will be to monitor the costs of Security and Auditing services as they will also increase in tandem with the increase in activity like frequent file upload and deletion.

This is it, by implementing these strategies, we managed to slash our storage costs drastically in just one quarter—without the need for paid tools. The key is to stay proactive, regularly review your storage usage, and adjust your strategies as needed. With the right approach, significant cost savings are well within reach.

Also Read: Chatbot Security: 13 Considerations for SaaS Chatbot/IVA Assessment (Checklist Included)

Related Articles

View All