Database design : preferred field length for file paths

The Length of a file path cannot be predicted. It could be very short as 'C:\' or could be very lengthy like 'C:\Program Files\Microsoft SQL Server\110\LocalDB\Binn\Resources\1033' or even more. But in database level there is no harm using something like VARCHAR(MAX)

See Maximum size of VARCHAR(MAX)

You can use VARCHAR(MAX) or NVARCHAR(MAX).

These are variable length fields meaning they are designed to store values of different length. There is no extra overhead for longer values over shorter values.

Defining MAX means the field can be up to 2GB.

From MSDN (varchar), nvarchar has similar documentation:

Use varchar when the sizes of the column data entries vary considerably.

Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.

Use the appropriate length for the data you intend to support. Since you're using SQL Server you should use nvarchar(260) as the upper limit for storing path names, since that is the specification limit for typical Windows machines. Under certain circumstances you can create paths longer than that, however Windows Explorer will tend to have problems handling them. SQL Server cannot handle filenames longer than 260 characters. This includes SQL Server on Linux.

I can prove SQL Server uses an nvarchar(260) column internally to store SQL Server Database filenames, with the path included. Checking the definition of the sys.master_files view, we see the following T-SQL:

 CREATE VIEW sys.master_files AS
    SELECT
        database_id     = f.dbid,
        file_id         = f.fileid,
        file_guid       = f.fileguid,
        type            = f.filetype,
        type_desc       = ft.name,
        data_space_id   = f.grpid,
        name            = f.lname,
        physical_name   = f.pname,
        state           = convert(tinyint, case f.filestate     -- Map enum EMDFileState to AvailablityStates
                                when 0 then 0 when 10 then 0    -- ONLINE
                                when 4 then 7   -- DEFUNCT
                                when 5 then 3 when 9 then 3 -- RECOVERY_PENDING
                                when 7 then 1 when 8 then 1 when 11 then 1  -- RESTORING
                                when 12 then 4  -- SUSPECT
                                else 6 end),    -- OFFLINE
        state_desc      = st.name,
        f.size,
        max_size            = f.maxsize,
        f.growth,
        is_media_read_only  = sysconv(bit, f.status & 8),       -- FIL_READONLY_MEDIA
        is_read_only            = sysconv(bit, f.status & 16),  -- FIL_READONLY
        is_sparse           = sysconv(bit, f.status & 256), -- FIL_SPARSE_FILE
        is_percent_growth   = sysconv(bit, f.status & 32),  -- FIL_PERCENT_GROWTH
        is_name_reserved        = sysconv(bit, case f.filestate when 3 then 1 else 0 end), -- x_efs_DroppedReusePending
        create_lsn          = GetNumericLsn(f.createlsn),
        drop_lsn                = GetNumericLsn(f.droplsn),
        read_only_lsn           = GetNumericLsn(f.readonlylsn),
        read_write_lsn      = GetNumericLsn(f.readwritelsn),
        differential_base_lsn   = GetNumericLsn(f.diffbaselsn),
        differential_base_guid  = f.diffbaseguid,
        differential_base_time  = nullif(f.diffbasetime, 0),
        redo_start_lsn          = GetNumericLsn(f.redostartlsn),
        redo_start_fork_guid    = f.redostartforkguid,
        redo_target_lsn     = GetNumericLsn(f.redotargetlsn),
        redo_target_fork_guid   = f.forkguid,
        backup_lsn          = GetNumericLsn(f.backuplsn),
        credential_id       = cr.credential_id
    FROM sys.sysbrickfiles f
    LEFT JOIN sys.syspalvalues st ON st.class = 'DBFS' AND st.value = f.filestate
    LEFT JOIN sys.syspalvalues ft ON ft.class = 'DBFT' AND ft.value = f.filetype
    LEFT JOIN sys.credentials cr ON f.pname LIKE cr.name + N'%' COLLATE database_default
    WHERE f.dbid < 0x7fff -- consistent with sys.databases
        AND f.pruid = 0
        AND f.filestate NOT IN (1, 2)   -- x_efs_Dummy, x_efs_Dropped
        AND has_access('MF', 1) = 1

Microsoft Docs for sys.master_files says this about the physical_name column:

physical_name nvarchar(260) Operating-system file name.

But let's not trust that. We see the physical file name is referenced as physical_name = f.pname. And the table alias "f" points to FROM sys.sysbrickfiles f. Therefore, SQL Server stores the filename in sys.sysbrickfiles, which is an internal table that is only visible from the Dedicated Administrator Connection, or DAC as its frequently known. Connecting to the DAC, and generating a temp table from the output of sys.sysbrickfiles, we see the following:

CREATE TABLE #sysbrickfiles
(
      brickid           int              NOT NULL
    , dbid              int              NOT NULL
    , pruid             int              NOT NULL
    , fileid            int              NOT NULL
    , grpid             int              NOT NULL
    , status            int              NOT NULL
    , filetype          tinyint          NOT NULL
    , filestate         tinyint          NOT NULL
    , size              int              NOT NULL
    , maxsize           int              NOT NULL
    , growth            int              NOT NULL
    , lname             nvarchar(128)    NOT NULL
    , pname             nvarchar(260)    NOT NULL
    , createlsn         binary(10)       NULL
    , droplsn           binary(10)       NULL
    , fileguid          uniqueidentifier NULL
    , internalstatus    int              NOT NULL
    , readonlylsn       binary(10)       NULL
    , readwritelsn      binary(10)       NULL
    , readonlybaselsn   binary(10)       NULL
    , firstupdatelsn    binary(10)       NULL
    , lastupdatelsn     binary(10)       NULL
    , backuplsn         binary(10)       NULL
    , diffbaselsn       binary(10)       NULL
    , diffbaseguid      uniqueidentifier NULL
    , diffbasetime      datetime         NOT NULL
    , diffbaseseclsn    binary(10)       NULL
    , redostartlsn      binary(10)       NULL
    , redotargetlsn     binary(10)       NULL
    , forkguid          uniqueidentifier NULL
    , forklsn           binary(10)       NULL
    , forkvc            bigint           NOT NULL
    , redostartforkguid uniqueidentifier NULL
);

As you can see, the pname column is indeed defined as nvarchar(260).

Also, if we attempt to create a database using a filename that is longer than 260 characters, we see an error is returned:

Msg 103, Level 15, State 3, Line 7
The file that starts with 'F:\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAARGH.mdf' is too long. Maximum length is 259.

Using anything other than an nvarchar(260) column to store filenames in SQL Server is both wasteful, and creates technical debt.

The length of a column is extremely important performance-wise. Column length directly affects:

memory grants for queries against the column. When the query processor creates a query plan, it uses the size of each column present in the query as a basis for the amount of memory required to run the query. It doesn't use the actual size of the data present in each column, instead it "guesses" that the average size of the data will be 50% of the maximum length of the column.
Ability to index the column efficiently. Larger columns create significantly larger indexes. Larger indexes require the more memory and disk throughput than smaller indexes. SQL Server has a maximum key length of 1700 bytes for non-clustered indexes (as of SQL Server 2016) and a maximum key length of 900 bytes for clustered indexes. If you attempt to create an index on columns larger than those maximum amounts, you get errors, and possibly not until run-time when it can be very costly to fix.
character-based primary/foreign key performance is severely affected by larger columns lengths. When primary keys are referenced via foreign keys, the size requirements for memory, disk, and I/O are duplicated for each foreign key. Take for example a Customer table where the key is the CustomerName column, defined as varchar(500). Every table that references customers will now require a 500-byte CustomerName column. If that column was defined as a varchar(100) instead, every query referencing those columns will save 200 bytes per row in memory and disk I/O.
Erik Darling shows that Predicate Pushdown does not work for (MAX) data types, which can severely limit performance.

If your using SQL Server, it's good to know that Microsoft is using nvarchar(260) fields to store file path and name in the system tables (like sys.database_files, or sys.sysaltfiles, or sys.master_files).

Column name      Data type        Description
-------------    -------------    ---------------------------
physical_name    nvarchar(260)    Operating-system file name.

Good practice could be to use the same format to store your path and file name.

You will, of course, need to enforce the size in your UI to be sure that it will not be truncated during INSERT or UPDATE.

Database design : preferred field length for file paths

Tags:

Database

Sql Server

Database Design

Related

Recent Posts