AiiDA v2.7.0
preview#
As the release of aiida-core
version 2.7.0
is just around the corner, in this blog post, we’d like to give you an overview of the various exciting new features and important bug fixes of this minor release. You can already find release candidates on
pypi and
conda-forge, as well as a
docker image for testing purposes. Feedback welcome!
Asynchronous SSH connection (#6626)#
Previously, when data transfer with a remote computer was active, the responsible transport plugins blocked further program execution until the communication was completed. This long-standing limitation presented a potential opportunity for performance improvements.
With the introduction of the new asynchronous SSH transport plugin (core.ssh_async
), multiple communications with a remote machine can now happen concurrently.[1]
🚀 When core.ssh_async
outperforms core.ssh
#
core.ssh_async
offers significant performance gains in scenarios where the worker is blocked by heavy transfer tasks, such as uploading, downloading, or copying large files.
Example: Submitting two WorkGraphs/WorkChains with the following logic:
WorkGraph 1 – Heavy I/O operations
Uploads a 10 MB file
Remotely copies a 1 GB file
Retrieves a 1 GB file
WorkGraph 2 – Lightweight task
Executes a simple shell command:
touch file
Measured time until the second WorkGraph is processed (single worker):
core.ssh_async
: Only 4 seconds! 🚀🚀🚀🚀 A dramatic improvement!core.ssh
: 108 seconds (the second task waits for the first to finish)
⚖️ When core.ssh_async
and core.ssh
perform similarly#
For mixed workloads involving numerous uploads and downloads—a common real-world use case—the performance gains depend on the specific conditions.
Large file Transfers (~1 GB):#
core.ssh_async
typically outperforms due to concurrent upload and download streams.
In favorable network conditions, this can nearly double the effective bandwidth.
Example: On a network with a baseline of 11.8 MB/s, the asynchronous mode approached nearly twice that speed under light load (see graph in PR #6626).
Test case:
Two WorkGraphs: one uploads 1 GB, the other retrieves 1 GB using RemoteData
.
core.ssh_async
: 120 secondscore.ssh
: 204 seconds
Small file transfers (many small files):#
Here, the overhead of managing asynchronous operations can outweigh the benefits.
Test case:
25 WorkGraphs, each transferring several ~1 MB files.
core.ssh_async
: 105 secondscore.ssh
: 65 seconds
To conclude, the choice of which transport plugin is the best bet for your use case depends on your specific application:
use core.ssh_async
for workloads involving large file transfers or when you need to prevent I/O operations from blocking other tasks, but stick with core.ssh
for scenarios dominated by many small file transfers where the asynchronous overhead may reduce performance.
Extended dumping support for profiles and groups (#6723)#
In version v2.6.0
, AiiDA introduced the ability to dump processes from the database into a human-readable, structured folder format.
Building on this feature, support has now been extended to allow dumping of entire groups and profiles, enabling users to retrieve AiiDA data more easily.
This enhancement is part of our broader roadmap to improve AiiDA’s usability—especially for new users—who may find it challenging to construct the appropriate queries to extract data from the database manually.
The functionality is accessible via the verdi
CLI:
verdi profile dump --all # This dumps the whole current profile
verdi profile dump --groups <PK> # This dumps one selected group as part of the profile dumping operation
verdi group dump <PK> # This dumps only the selected group, disregarding other profile data
Since dumping an entire profile can be a resource- and I/O-intensive operation (for large profiles), significant effort has been made to provide flexible options for fine-tuning which nodes are included in the dump.[2] Below is a snippet from the command’s help output:
Usage: verdi profile dump [OPTIONS] [--]
Dump all data in an AiiDA profiles storage to disk.
Options:
-p, --path PATH Base path for dump operations that write to
disk.
-n, --dry-run Perform a dry run.
-o, --overwrite Overwrite file/directory when writing to
disk.
-a, --all Include all entries, disregarding all other
filter options and flags.
-X, --codes CODE... One or multiple codes identified by their
ID, UUID or label.
-Y, --computers COMPUTER... One or multiple computers identified by
their ID, UUID or label.
-G, --groups GROUP... One or multiple groups identified by their
ID, UUID or label.
-u, --user USER Email address of the user.
-p, --past-days PAST_DAYS Only include entries created in the last
PAST_DAYS number of days.
--start-date TEXT Start date for node mtime range selection
for node collection dumping.
--end-date TEXT End date for node mtime range selection for
node collection dumping.
--filter-by-last-dump-time / --no-filter-by-last-dump-time
Only select nodes whose mtime is after the
last dump time. [default: filter-by-last-
dump-time]
--only-top-level-calcs / --no-only-top-level-calcs
Dump calculations in their own dedicated
directories, not just as part of the dumped
workflow. [default: only-top-level-calcs]
--only-top-level-workflows / --no-only-top-level-workflows
If a top-level workflow calls sub-workflows,
create a designated directory only for the
top-level workflow. [default: only-top-
level-workflows]
--delete-missing / --no-delete-missing
If a previously dumped group or node is
deleted from the DB, delete the
corresponding dump directory. [default:
delete-missing]
--symlink-calcs / --no-symlink-calcs
Symlink workflow sub-calculations to their
own dedicated directories. [default: no-
symlink-calcs]
--organize-by-groups / --no-organize-by-groups
If the collection of nodes to be dumped is
organized in groups, reproduce its
hierarchy. [default: organize-by-groups]
--also-ungrouped / --no-also-ungrouped
Dump also data of nodes that are not part of
any group. [default: no-also-ungrouped]
--relabel-groups / --no-relabel-groups
Update directories and log entries for the
dumping if groups have been relabeled since
the last dump. [default: relabel-groups]
--include-inputs / --exclude-inputs
Include linked input nodes of
`CalculationNode`(s). [default: include-
inputs]
--include-outputs / --exclude-outputs
Include linked output nodes of
`CalculationNode`(s). [default: exclude-
outputs]
--include-attributes / --exclude-attributes
Include attributes in the
`aiida_node_metadata.yaml` written for every
`ProcessNode`. [default: include-
attributes]
--include-extras / --exclude-extras
Include extras in the
`aiida_node_metadata.yaml` written for every
`ProcessNode`. [default: exclude-extras]
-f, --flat Dump files in a flat directory for every
step of a workflow.
--dump-unsealed / --no-dump-unsealed
Also allow the dumping of unsealed process
nodes. [default: no-dump-unsealed]
-v, --verbosity [notset|debug|info|report|warning|error|critical]
Set the verbosity of the output.
-h, --help Show this message and exit.
Another key feature is the incremental nature of the command, which ensures that the dumping process synchronizes the output folder with the internal state of AiiDA’s DB by gradually adding or removing files on successive executions of the command. This allows for efficient updates without having to overwrite everything, and is in contrast to AiiDA archive creation, which is a one-shot process. The behavior can further be adjusted using:
--dry-run
(-n
): to simulate the dump without writing any files.--overwrite
(-o
): to fully overwrite the target directory if it already exists.
Finally, the command provides various options to customize the output folder structure, for instance, to reflect the group hierarchy of AiiDA’s internal DB state, symlink duplicate calculations (e.g., which are contained in multiple groups), create dedicated directories for sub-workflows and calculations of top-level workflows, and more.
These enhancements aim to make data export from AiiDA more robust, customizable, and user-friendly.
Stashing (#6746, #6772)#
With this feature, you can bundle your data to a (compressed) tar archive during stashing by specifying one of the stash_mode
options "tar"
, "tar.bz2"
, "tar.gz"
, or "tar.xz"
.
When specifying the stashing operation during the setup of your calculation, compression can be configured as follows:
from aiida.plugins import CalculationFactory
from aiida.engine import run
from aiida.common import StashMode
from aiida.orm import load_computer
inputs = {
...,
'metadata': {
'computer': load_computer(label="localhost"),
'options': {
'resources': {'num_machines': 1},
'stash': {
'stash_mode': StashMode.COMPRESS_TARGZ,
'target_base': '/scratch/',
'source_list': ['heavy_data.xyz'], # ['*'] to stash everything
},
},
},
}
# If you use a builder, use
# builder.metadata = {'options': {...}, ...}
run(MyCalculation, **inputs)
In addition, it was historically only possible to enable stashing when it was instructed before running a generic CalcJob
.
This means that the instruction had to be “attached” to the original CalcJob
before its execcution.
However, if a user would realize they need to stash something only after running the calculation, this would not be possible.
With v2.7.0
, we introduce the new StashCalculation
CalcJob
which is able to perform a stashing operation after a calculation has finished—provenance included!
The usage is very similar, and for consistency and user-friendliness, we keep the instructions as part of the metadata.
The only main input is the remote_folder
output node (an instance of RemoteData
) of the calculation source node to be stashed, for example:
from aiida.plugins import CalculationFactory
from aiida.engine import run
from aiida.common import StashMode
from aiida.orm import load_node
StashCalculation = CalculationFactory('core.stash')
calcjob_node = load_node(<CALCJOB_PK>)
inputs = {
'metadata': {
'computer': calcjob_node.computer,
'options': {
'resources': {'num_machines': 1},
'stash': {
'stash_mode': StashMode.COPY.value,
'target_base': '/scratch/',
'source_list': ['heavy_data.xyz'],
},
},
},
'source_node': calcjob_node.outputs.remote_folder,
}
result = run(StashCalculation, **inputs)
Forcefully killing processes (#6793)#
Prior to version v2.7.0
, the verdi process kill
command could hang if a connection to the remote computer could not be established.
A new --force
option has been introduced to terminate a process without waiting for a response from the remote machine.
Note: Using --force
may result in orphaned jobs on the remote system if the remote job cancellation fails.
verdi process kill --force <PROCESS_ID>
We also now cancel the old killing action if it is resend by the user.
This allows the user to adapt the parameters for the exponential backoff mechanism (EBM) applied by AiiDA in the verdi config
and then resend the kill command with the new parameters.
verdi process kill --timeout 5 <PROCESS_ID>
verdi config set transport.task_maximum_attempts 1
verdi config set transport.task_retry_initial_interval 5
verdi daemon restart
verdi process kill <PROCESS_ID>
Furthermore, the timeout
and wait
options were not behaving correctly, so they are now fixed and both merged into the single timeout
option.
By passing --timeout 0
it replicates the --no-wait
functionality, meaning the command does not block until the action has finished, and by passing --timeout inf
(default option, replicating --wait
without a timeout
), the command blocks until a response.
For more information see issue #6524.
Serialization of ORM nodes (#6723)#
AiiDA’s Python API provides an object relational mapper (ORM) that abstracts the various entities that can be stored inside the provenance graph (via the SQL database) and the relationships between them. In most use cases, users use this ORM directly in Python to construct new instances of entities and retrieve existing ones, in order to get access to their data and manipulate it. A shortcoming of the current ORM is that it is not possible to programmatically introspect the schema of each entity: that is to say, what data each entity stores. This makes it difficult for external applications to provide interfaces to create and or retrieve entity instances. It also makes it difficult to take the data outside of the Python environment since the data would have to be serialized. However, without a well-defined schema, doing this without an ad-hoc solution is practically impossible.
With the implementation of a pydantic
Model
for each Entity we now allow external applications to programmatically determine the schema of all AiiDA ORM entities and automatically (de)serialize entity instances to and from other data formats, e.g., JSON.
An example how this is done for an AiiDA integer node:
node = Int(5) # Can be any ORM node
serialized_node = node.serialize()
print(serialized_node)
# Out: {'pk': None, 'uuid': '485c2ec8-441d-484d-b7d9-374a3cdd98ae', 'node_type': 'data.core.int.Int.', 'process_type': None, 'repository_metadata': {}, 'ctime': datetime.datetime(2025, 5, 2, 10, 20, 41, 275443, tzinfo=datetime.timezone(datetime.timedelta(seconds=7200), 'CEST')), 'mtime': None, 'label': '', 'description': '', 'attributes': {'value': 5}, 'extras': {}, 'computer': None, 'user': 1, 'repository_content': {}, 'source': None, 'value': 5}
uuid: 77e9c19a-5ecb-40cf-8238-ea5c55fbb83f (unstored) value: 5
node_deserialized = Int.from_serialized(**serialized_node)
print(node_deserialized)
# Out: uuid: 77e9c19a-5ecb-40cf-8238-ea5c55fbb83f (unstored) value: 5
For an extensive overview of the implications see AEP 010.
Miscellaneous#
aiida-core
is now compatible with Python 3.13 #6600Improved Windows support #6715
RemoteData
extended by member functionget_size_on_disk
#6584SinglefileData
extended by constructorfrom_bytes
#6653Allow zero memory specification for SLURM #6605
Add filters to
verdi group delete
#6556verdi storage maintain
shows a progress bar #6562New transport endpoints
compress
&extract
#6743Implementation of missing SQLite endpoints (en route to full feature parity between PostgreSQL and SQLite):