Explore the latest on remote analysis with omero2pandas.
OMERO Plus manages large and complex image analysis results alongside managed images. Classically, such data is stored in OMERO’s Binary Repository, a binary data store managed by the OMERO server. Now, as announced in our roadmap for analytical data, large tabular analysis results can be stored and registered remotely. The greater flexibility offered by this feature permits data storage in the most convenient and cost-effective location for each use-case. Glencoe has also used file formats for tabular data storage (TileDB) that are compatible with cloud-based object storage such as Amazon S3. See our roadmap for our next steps in this effort.
An API for remote table registration is now available in the latest release of OMERO Plus, and the open-source omero2pandas Python package has been updated with a reference implementation of how to use this feature.
Remote table registration
Registering a table which is stored outside the OMERO Binary Repository requires two key steps:
- Convert the table to the TileDB file format.
- Register the resulting file with the OMERO Plus server using a HTTP JSON API.
If the storage that the TileDB is to be written to is visible to (and mapped consistently to) both the client and the server (e.g. a mapped network drive), the omero2pandas library provides an interface for doing this as a single operation:
import omero2pandas
omero2pandas.upload_table(
"/path/to/input.csv", "<Table name>", parent_type="Image", parent_id=101,
local_path="/shared_drive/tables/my_omero_table.tiledb",
)
If the mapping of the shared storage from the server’s perspective is different from the client machine’s, you can also manually specify the path where the server should see the file using the remote_path argument. For example:
import omero2pandas
omero2pandas.upload_table(
"/path/to/input.csv", "<Table name>", parent_type="Image", parent_id=101,
local_path="J:/shared_drive/tables/my_omero_table.tiledb",
remote_path="/server_mount/tables/my_omero_table.tiledb",
)
This system also provides support for scenarios where the client and server use different file path formats, such as a client running Windows uploading to a server running Linux.
Security
The remote registration API includes some protections to ensure client and server access to the TileDB file. This is achieved using a “SecretToken” which is embedded in the TileDB metadata.
The token should be provided when registering a table with the new API. This achieves two important checks:
- Knowing the SecretToken signals that the user does have permission to read the table.
- Token matching confirms that the file seen by the server is the same file that the user requested the server to register.
The reference implementation in omero2pandas generates a random SecretToken using the cryptographically strong secrets module. While it is possible to create and register a TileDB file with a non-random or even no SecretToken at all, this is not recommended.
Creating and registering the table separately
It may sometimes be necessary to create a table locally, upload it to storage visible to OMERO Plus, and then perform the registration. Omero2pandas also supports such a workflow by running the steps described above individually. Documentation on this is available here.
Future development
Storing large analytical datasets outside the Binary Repository provides greater flexibility and scalability for data management and analysis within and beyond the OMERO Plus ecosystem. Glencoe Software will continue to improve the omero2pandas library and exploit this new functionality for the integrated analysis workflows used by our customers.