In Linux-based virtual machines, every additional gigabyte of allocated disk space directly increases storage costs—especially when multiplied across many instances in cloud or data center environments. Keeping Linux VM disks as lean as possible not only helps control spend, but can also streamline backups, snapshots, and system maintenance. Additionally, full capacity filesystems can cause outages in your application which are not always easy to diagnose and fix. Regular housekeeping—such as rotating and pruning log files, clearing temporary data, removing unused packages and old kernels, and cleaning out cached files—prevents silent disk bloat over time. By adopting disciplined disk management practices and regularly reviewing what’s stored on each file system, teams can avoid paying for idle or forgotten data while ensuring their Linux VMs remain efficient, manageable, and cost-effective. This article describes identifying large files and folders in Linux environments.
First we'll dive in to the basic file system management commands. The following can be used to get a sense of the size of your disks and how much space files are taking up on them:
lsblk - Lists information about block devices attached to the machine
df - Lists all file systems along with their total size, currently consumed size, and where they're mounted, the "-h" argument can be provided to list sizes in a more human readable format
du - Lists disk usage of all files recursively, the "-h" argument can be provided to list sizes in a more human readable format, the "--max-depth n" argument (in some linux distros, "--max n" is sufficient) can be provided to display the total disk usage for files and folders n levels deep
That last command is what we'll be primarily using. To locate areas of heavy disk usage navigate to the root directory with "cd /" and run "du -h --max-depth 1" to display the size of each of the root directories. The --max-depth parameter can be set to a higher value to return more results. Alternatively, once directories with large file concentrations are identified, simply navigate to them with "cd <directory path>" and run the above du command. This approach quickly identifies what is taking up space on disk.
Once large file concentrations have been identified, the next step is deleting. Before deleting anything, make absolutely certain it is not a file or folder required by either the operating system or by any applications on the server. We also strongly recommend having reliable snapshots/backups of your VMs prior to any cleanup operations. Typically log files, temp files, or old installers are safe to delete. The actual deletion of these files is outside the scope of this article, sometimes files can simply be deleted with "rm", sometimes applications need to be reconfigured .
Effective disk management on Linux virtual machines is not a one-time task, but an ongoing operational discipline. By regularly assessing file system usage with tools like lsblk, df, and du, and then thoughtfully targeting large, non-essential files and directories, teams can keep disk usage under control without compromising system stability. Combined with sensible housekeeping practices—such as log rotation, cleanup of temporary data, and review of legacy installers or artifacts—these techniques help organizations reduce storage costs, streamline backups and maintenance, and maintain predictable performance across their Linux environments. Over time, a consistent approach to monitoring and cleanup becomes a key component of both cost optimization and operational resilience.