Files

This page provides technical documentation on how Metrici processes files. See File library for details of file processing components.

How Metrici processes files

Metrici does not allow end users to store and retrieve files directly on the Metrici server. Instead, Metrici represents files as nodes, know as file nodes. When the node is accessed, the file is retrieved.

Different sorts of files

Metrici supports three classes of file nodes:

  • Content files, which serve the value of the File Content (system.FILE_CONTENT) member of the node as a file.
  • System files, which serve files directly from the server. This is reserved for internal use.
  • User files, which hold files uploaded by users or generated on their behalf.

All file node node types use two display properties:

  • The Display Class (system.PROPERTIES.displayClass) property must be set to "file".
  • The File Class (system.PROPERTIES.fileClass) property must be set to identify the class of file. It is set to "content", "system" or "user".

Members on the file node are used to specify information about the file.

  • File Extension (system.FILE_EXTENSION) specifies an extension to add to the URL so that the file is saved with the correct extension when downloaded. This is also used to guess the correct content type.
  • File Size (system.FILE_SIZE) specifies the size of the file in bytes.
  • File Name (system.FILE_NAME) provides a file name for the file. This should include the file name with the extension but without a path. This is optional, and may be used to build a download attribute in a file link - see the section below on download.

User files

User files are held in a user file store. They are written to the store when a user uploads a file, or by internal routines that generate user files.

Files in the user file store are identified using a storage key. User file nodes store this key in the File Storage Key (system.FILE_STORAGE_KEY) member.

File Storage Key contains a long, random sequence that is effectively unguessable. Any node with the file storage key is considered authorised to access the data. This allows files nodes to be copied in an intuitive way without duplicating the underlying storage.

Files in the user file area are arranged by the user who initially uploaded them. This is not then relevant to how the files are used, but is used when accounting for storage use.

The files in the user file area are managed automatically. A daily housekeeping job removes any files that are not referenced by file nodes. The system may change the value of storage keys, even within frozen nodes, in order to manage the underlying files. Solutions should not store storage keys other than in the File Storage Key member, and should allow for storage keys to change through time.

Accessing files

Files are accessed using normal node conventions. For example, the file node somecompany.myfiles.profile_photo can be accessed as ${rootPath}somecompany/myfiles/profile_photo.

Files which may contain HTML can be dangerous when unwittingly loaded into the browser. Metrici therefore serves HTML files as downloads, rather than displaying them in the browser. The file mechanism can not be used to circumvent Metrici's content protection.

To help the browser interpret the file, when you access a file node, Metrici will check to see if there is an extension at the end of the URL.

  • If there is an extension and it matches the one defined on the file node, the file is served.
  • If there is no extension, or the given extension does not match the one defined on the file node, the browser is redirected to access the node with the correct extension.

For this reason, scripts that generate URLs for files should examine and append the file extension.  If, for example, the extension for somecompany.myfiles.profile_photo is 'png', then the generated URL would be ${rootPath}somecompany/myfiles/profile_photo.png.

Extensions can be used on nodes that are not file nodes; they are just ignored.

Resources

As well as accessing files using normal node conventions, the resources method provides an alternative way of accessing files, which is more suitable for resources referenced within pages, such as images, CSS and JavaScript files. It also allows the entries within a zip file to be served as separate files.

The URL for files access through the resources mechanism is:

${rootPath}resources/full.node.reference

or, for an entry within a zip file

${rootPath}resources/full.node.reference/path.foo

where full.node.reference is a node or node version reference, and path.foo is the path to the entry within the zip file.

The differences between serving files through node URLs and serving them throught the resources method are summarised in the table below:

Node URL Resources method
Serves user files, content files and system files. Serves user files and system files.
No special support for zip files, whole zip file is served. Allows the entries (files) within a zip file to be served as separate files.
Full node access checks performed each time. Node access checks are simplified and results cached.
Integrated with normal node error handling, which provides the user with suitable actions if files are not found or they are not authorised. Errors result in HTTP 404 "not found" conditions, which is more suitable for page resources.
Files are not cached on the browser. Files may be cached on the browser. Not suitable for sensitive files.
Files are always up-to-date. In rare circumstances users may need to sign out and in again to pick up latest versions of files.
Uses and appends extensions. Does not use extensions except when serving entries from a zip file.
Suitable for files that are served occasionally in their own right, such as word processing documents or PDF files. Suitable for files that are served constantly as page resources, such as background images, CSS and JavaScript.

Credentials

If you are using a node URL and are not signed in to Metrici, you will be redirected to the sign in page. Similarly, if the file does not exist, you will be redirected to an error page.

You can pass credentials with the URL, either as a GET or a POST. In this case, if your credentials are not valid or the file does not exist, Metrici will return the appropriate HTTP status code (403 for not authorized, 404 for not found), rather than a redirect to sign in or a Metrici error page, i.e. the same behaviour as the resources method. This is useful when retrieving files as part of an integration solution, as it allows standard HTTP status codes to be used to signal error conditions, rather than returning a page of HTML when there is an error.

Download

Because of browser differences and browser security controls, the behaviour of files when downloaded is quite complicated.

Metrici distringuishes between files that can be safely rendered in the browser (images, text, pdf), and other files which might (in some browsers) not be safe to render. In particular, XML files (including SVG) and HTML files are not considered safe to render in the browser, because they could contain JavaScript which would be executed within the Metrici session. All unknown types (such as spreadsheets or office documents) are considered unsafe.

Files that are safe to render may be rendered in the browser. Metrici attempts to force other file types to download. Because of browser restrictions, Metrici sets a file name for this download which most browsers will follow. The file name will be based on the system.FILE_NAME member of the file node, and defaults to the local reference of the node plus the extension.

Other file types can be forced to download by adding the HTML5 download attribute. The value of the attribute should be set to the file name, either read from system.FILE_NAME or calculated from the local reference and extension. If you set a different file name in the download attribute, you will not get consistent results. In most modern browsers, for "safe" files such as images, the download attribute will be followed, but "unsafe" files such as Excel, the system.FILE_NAME or default file name will be used and the value of the download attribute ignored. (This is because the need to force these files to download requires that a file name is sent in the header, which in most browsers overrides the one set in the download attribute.)

You can also set a download parameter on the URL, which works much the same as a download attribute and which will not be ignored for "unsafe" files. You can use this approach to download a file from a form submit, by redirecting the user to a link with ?download=filename on it. Add the class "submit-multiple" to the submit button that triggers the download or the form will be blocked to prevent multiple submission.

In conclusion:

  • If you want the user to browse safe files in the browser and download others, just provide a link to the file.
  • If you want to force downloads, set the download attribute based on system.FILE_NAME or local reference + extension.

How file upload works

After normal request parameters have been processed, the request is tested to see if it has any uploaded files. (Specifically, it tests for the multipart/form-data encoding type.)

Any uploaded files are retrieved and written to the user file store.

The file upload parameter is converted into a text parameter with a series newline-delimited keyword=value pairs. This contains the following keywords:

  • storageKey – the file storage key.
  • contentType – the content type of the file.
  • extension – the extension part of the file name.
  • size – the size of the file, in bytes.
  • fileName – the name portion of the uploaded file.
  • localReference – the name portion of the uploaded file, without an extension, converted into a valid local reference for the node.

The receiving process decides what to do with the file upload parameter. Typically the storageKey, extension and size are written to the appropriate members of the file node, the file name is used as a node name, and the local reference used as a local reference.

The file upload parameter can be intercepted by a script and used to create a file node.

Uploaded files must not exceed a system-defined maximum, currently set to 50MB.