Repository - API - Source
maint
create-sibling-*
commands reimplements the GitHub-platform support of create-sibling-github
and adds support to interface three new platforms in a unified fashion: GIN (create-sibling-gin
), GOGS (create-sibling-gogs
), and Gitea (create-sibling-gitea
). All commands rely on personal access tokens only for authentication, allow for specifying one of several stored credentials via a uniform --credential
parameter, and support a uniform --dry-run
mode for testing without network. #5949 (by @mih)create-sibling-github
now has supports direct specification of organization repositories via a [<org>/]repo
syntax #5949 (by @mih)create-sibling-gitlab
gained a --dry-run
parameter to match the corresponding parameters in create-sibling-{github,gin,gogs,gitea}
#6013 (by @adswa)--new-store-ok
parameter of create-sibling-ria
only creates new RIA stores when explicitly provided #6045 (by @adswa)status()
and diff()
commands is improved by up to 700% removing file-type evaluation as a default operation, and simplifying the type reporting rule #6097 (by @mih)drop()
and remove()
were reimplemented in full, conceptualized as the antagonist commands to get()
and clone()
. A new, harmonized set of parameters (--what ['filecontent', 'allkeys', 'datasets', 'all']
, --reckless ['modification', 'availability', 'undead', 'kill']
) simplifies their API. Both commands include additional safeguards. uninstall
is replaced with a thin shim command around drop()
#6111 (by @mih)add_archive_content()
was refactored into a dataset method and gained progress bars #6105 (by @adswa)datalad
and datalad-archives
special remotes have been reimplemented based on AnnexRemote
#6165 (by @mih)result_renderer()
semantics were decomplexified and harmonized. The previous default
result renderer was renamed to generic
. #6174 (by @mih)get_status_dict
learned to include exit codes in the case of CommandErrors #5642 (by @yarikoptic)datalad clone
can now pass options to git-clone
, adding support for cloning specific tags or branches, naming siblings other names than origin
, and exposing git clone
's optimization arguments #6218 (by @kyleam and @mih)export-archive-ora
learned to filter files exported to 7z archives #6234 (by @mih and @bpinsard)datalad run
learned to glob recursively #6262 (by @AKSoo)BatchedCommand
and AnnexRepo
#6244 (by @christian-monch)run
and rerun
now support parallel jobs via --jobs
#6279 (by @AKSoo)foreach-dataset
plumbing command allows to run commands on each (sub)dataset, similar to git submodule foreach
#5517 (by @yarikoptic)dataset
parameter is not restricted to only locally resolvable file-URLs anymore #6276 (by @christian-monch)git-credential
by specifying credential type git
in the respective provider configuration #5796 (by @bpoldrack)git-credential-datalad
allowing Git to query DataLad's credential system #5796 (by @bpoldrack and @mih)ConfigManager
now supports reading committed dataset configuration in bare repositories. Analog to reading .datalad/config
from a worktree, blob:HEAD:.datalad/config
is read (e.g., the config committed in the default branch). The support includes ``reload()` change detection using the gitsha of this file. The behavior for non-bare repositories is unchanged. #6332 (by @mih)Interface.on_failure
to be one of the supported modes (stop, continue, ignore). Previously, such a modification was only possible on a per-call basis. #6430 (by @mih)run
command changed its default "on-failure" behavior from continue
to stop
. This change prevents the execution of a command in case a declared input can not be obtained. Previously, only an error result was yielded (and run eventually yielded a non-zero exit code or an IncompleteResultsException
), but the execution proceeded and potentially saved a dataset modification despite incomplete inputs, in case the command succeeded. This previous default behavior can still be achieved by calling run with the equivalent of --on-failure continue
#6430 (by @mih)create-sibling --since=^
mode will now be as fast as push --since=^
to figure out for which subdatasets to create siblings #6436 (by @yarikoptic)save
(datalad.save.windows-compat-warning
) will either do nothing (none
), emit an incompatibility warning (warning
, default), or cause save
to error (error
) #6291 (by @adswa)datalad drop
in datasets with a large annex. #6580 (by @christian-monch)save
code might operate faster on heavy file trees #6581 (by @yarikoptic)datalad.support.extensions
offers the utility functions register_config()
and has_config()
that allow extension developers to announce additional configuration items to the central configuration management. #6601 (by @mih)export-to-figshare
now yields and impossible result instead of raising a RunTimeError #6543 (by @adswa)datalad.ssh.executable
. This key allows specifying an ssh-client executable that should be used by datalad to establish ssh-connections. The default value is ssh
unless on a Windows system where $WINDIR\System32\OpenSSH\ssh.exe
exists. In this case, the value defaults to $WINDIR\System32\OpenSSH\ssh.exe
. #6553 (by @christian-monch)--since
specification since would consider only submodules related to the changes since that point. #6528 (by @yarikoptic)datalad.ssh.try-use-annex-bundled-git=yes|no
can be used to influence the default remote git-annex bundle sensing for SSH connections. This was previously done unconditionally for any call to datalad sshrun
(which is also used for any SSH-related Git or git-annex functionality triggered by DataLad-internal processing) and could incur a substantial per-call runtime cost. The new default is to not perform this sensing, because for, e.g., use as GIT_SSH_COMMAND there is no expectation to have a remote git-annex installation, and even with an existing git-annex/Git bundle on the remote, it is not certain that the bundled Git version is to be preferred over any other Git installation in a user's PATH. #6533 (by @mih)run
now yields a result record immediately after executing a command. This allows callers to use the standard --on-failure switch
to control whether dataset modifications will be saved for a command that exited with an error. #6447 (by @mih)--pbs-runner
commandline option (deprecated in 0.15.0
) was removed #5981 (by @mih)create-sibling-github
's credential handling was trimmed down to only allow personal access tokens, because GitHub discontinued user/password based authentication #5949 (by @mih)create-sibling-gitlab
's --dryrun
parameter is deprecated in favor or --dry-run
#6013 (by @adswa)Gitrepo.*_submodule
methods were moved to datalad-deprecated
#6010 (by @mih)datalad/support/versions.py
is unused in DataLad core and removed #6115 (by @yarikoptic)datalad.api.result-renderer
config setting has been dropped #6174 (by @mih)result_renderer=None
is replaced with result_renderer='disabled'
#6174 (by @mih)remove
's --recursive
argument has been deprecated #6257 (by @mih)get_repo_instance()
is discontinued and deprecated #6268 (by @mih)datalad.interface.common_opts.eval_default
has been deprecated. All (command-specific) defaults for common interface parameters can be read from Interface
class attributes (#6391 (by @mih)datalad.interface.utils
helpers cls2cmdlinename
and path_is_under
#6392 (by @mih)main()
#6394 (by @mih)create-sibling
will require now "^"
instead of an empty string for since option #6436 (by @yarikoptic)run
no longer raises a CommandError
exception for failed commands, but yields an error
result that includes a superset of the information provided by the exception. This change impacts command line usage insofar as the exit code of the underlying command is no longer relayed as the exit code of the run
command call -- although run
continues to exit with a non-zero exit code in case of an error. For Python API users, the nature of the raised exception changes from CommandError
to IncompleteResultsError
, and the exception handling is now configurable using the standard on_failure
command argument. The original CommandError
exception remains available via the exception
property of the newly introduced result record for the command execution, and this result record is available via IncompleteResultsError.failed
, if such an exception is raised. #6447 (by @mih)bundled
parameter of get_connection_hash()
is now ignored and will be removed with a future release. #6532 (by @mih)BaseDownloader.fetch()
is logging download attempts on DEBUG (previously INFO) level to avoid polluting output of higher-level commands. #6564 (by @mih)create-sibling-gitlab
erroneously overwrote existing sibling configurations. A safeguard will now prevent overwriting and exit with an error result #6015 (by @adswa)create-sibling-gogs
now relays HTTP500 errors, such as "no space left on device" #6019 (by @mih)annotate_paths()
is removed from the last parts of code base that still contained it #6128 (by @mih)add_archive_content()
doesn't crash with --key
and --use-current-dir
anymore #6105 (by @adswa)run-procedure
now returns an error result when a non-existent procedure name is specified #6143 (by @mslw)download-url --archive
when extracting the archive #6172 (by @adswa)formatters
module is not found #6212 (by @adswa)create-sibling-gin
does not disable git-annex availability on Gin remotes anymore #6230 (by @mih)keyring.delete()
call was fixed to not call an uninitialized private attribute anymore #6253 (by @bpoldrack)format()
method instead of get_status_dict()
of create-sibling-ria
has been fixed #6256 (by @adswa)status
, run-procedure
, and metadata
are no longer swallowing result-related messages in renderers #6280 (by @mih)uninstall
now recommends the new --reckless
parameter instead of the deprecated --nocheck
parameter when reporting hints #6277 (by @adswa)download-url
learned to handle Pathobjects #6317 (by @adswa)ConfigManager
that could have caused a crash in rare cases when a config file is removed during the process runtime #6332 (by @mih)
`- ConfigManager.get_from_source()
now accesses the correct information when using the documented source='local'
, avoiding a crash #6332 (by @mih)run
no longer let's the internal call to save
render its results unconditionally, but the parameterization f run determines the effective rendering format. #6421 (by @mih)create-sibling-ria
no longer creates an annex/objects
directory in-store, when called with --no-storage-sibling
. #6495 (by @bpoldrack )clone
. #6500 (by @mih)keyring >= 20.0
to ensure that token-based authentication can be used. #6515 (by @adswa)require_dataset()
now uniformly raises NoDatasetFound
when no dataset was found. Implementations that catch the previously documented InsufficientArgumentsError
or the actually raised ValueError
will continue to work, because NoDatasetFound
is derived from both types. #6521 (by @mih)annex.skipunknown
config. #6550 (by @bpoldrack)save
now can commit the change where file becomes a directory with a staged for commit file. #6581 (by @yarikoptic)create-sibling
will no longer create siblings for not yet saved new subdatasets, and will now create sub-datasets nested in the subdatasets which did not yet have those siblings. #6603 (by @yarikoptic)disabled
result renderer mode is documented #6174 (by @mih)datalad
and datalad-archives
special remotes #6181 (by @mih)BatchedCommand
and BatchedAnnex
#6203 (by @christian-monch)create-sibling-gin
's examples have been improved to suggest push
as an additional step to ensure proper configuration #6289 (by @mslw)--since=^
mode of operation of create-sibling
is documented now #6436 (by @yarikoptic)status()
helper was equipped with docstrings and promotes "breadth-first" reporting with a new parameter reporting_order
#6006 (by @mih)AnnexRepo.get_file_annexinfo()
is introduced for more convinient queries for single files and replaces a now deprecated AnnexRepo.get_file_key()
to receive information with fewer calls to Git #6104 (by @mih)get_paths_by_ds()
helper exposes status
' path normalization and sorting #6110 (by @mih)status
is optimized with a cache for dataset roots #6137 (by @yarikoptic)get_func_args_doc()
helper with Python 2 is removed from DataLad core #6175 (by @yarikoptic)AddArchiveContent
is moved from datalad/interface
to datalad/local
(#6188 (by @mih)), Clean
is moved from datalad/interface
to datalad/local
(#6191 (by @mih)), Unlock
is moved from datalad/interface
to datalad/local
(#6192 (by @mih)), DownloadURL
is moved from datalad/interface
to datalad/local
(#6217 (by @mih)), Rerun
is moved from datalad/interface
to datalad/local
(#6220 (by @mih)), RunProcedure
is moved from datalad/interface
to datalad/local
(#6222 (by @mih)). The interface command list is restructured and resorted #6223 (by @mih)wrapt
is replaced with functools' wraps
#6190 (by @yariktopic)appdirs
library has been replaced with platformdirs
#6198 (by @adswa)datalad/__init__.py
has been cleaned up #6271 (by @mih)GitRepo.call_git_items
is implemented with a generator-based runner #6278 (by @christian-monch)*
#6176 (by @yarikoptic), #6304 (by @christian-monch)GitRepo.bare
does not require the ConfigManager anymore #6323 (by @mih)_get_dot_git()
was reimplemented to be more efficient and consistent, by testing for common scenarios first and introducing a consistently applied resolved
flag for result path reporting #6325 (by @mih)datalad
are now included when installing DataLad #6336 (by @jwodder)get_refds_path()
was deprecated #6387 (by @adswa)datalad.interface.base.Interface
is now an abstract class #6391 (by @mih)datalad.support.json_py
#6398 (by @mih)ArgumentParser.parse_known_args
instead of protected _parse_known_args
#6414 (by @yarikoptic)add-archive-content
does not rely on the deprecated tempfile.mktemp
anymore, but uses the more secure tempfile.mkdtemp
#6428 (by @adswa)annexstatus
is deprecated. In its place, a new test helper assists the few tests that rely on it #6413 (by @adswa)config
has been refactored from where[="dataset"]
to scope[="branch"]
#5969 (by @yarikoptic)serve_path_via_http
to the command line to deploy an ad-hoc instance of the HTTP server used for internal testing, with SSL and auth, if desired. #6169 (by @mih)Batchedcommand
gained a basic test #6203 (by @christian-monch)with_testrepo
is discontinued in all core tests #6224 (by @mih)git-annex.filter.annex.process
configuration is enabled by default on Windows to speed up the test suite #6245 (by @mih)GIT_CONFIG_GLOBAL
to configure a fake home directory instead of overwriting HOME
on OSX (#6251 (by @bpoldrack)) and HOME
and USERPROFILE
on Windows #6260 (by @adswa)download-url
's test under http_proxy
are skipped when a session can't be established #6361 (by @yarikoptic)datalad clean
was fixed to be invoked within a dataset #6359 (by @yarikoptic)test_source_candidate_subdataset
has been marked as @slow
#6429 (by @yarikoptic)CLI
benchmarks exist now #6381 (by @mih)readthedocs-theme
and Sphinx
versions were pinned to reenable rendering of bullet points in the documentation #6346 (by @adswa)__full_version__
and datalad.version
#6073 (@jwodder)maint
__doc__
and "Parameters" in build_doc
docstrings #6004 (@jwodder)maint
Command execution is now performed by a new Runner
implementation that is
no longer based on the asyncio
framework, which was found to exhibit
fragile performance in interaction with other asyncio
-using code, such as
Jupyter notebooks. The new implementation is based on threads. It also supports
the specification of "protocols" that were introduced with the switch to the
asyncio
implementation in 0.14.0. ([#5667][])
clone
now supports arbitrary URL transformations based on regular
expressions. One or more transformation steps can be defined via
datalad.clone.url-substitute.<label>
configuration settings. The feature can
be (and is now) used to support convenience mappings, such as
https://osf.io/q8xnk/
(displayed in a browser window) to osf://q8xnk
(clonable via the datalad-osf
extension. ([#5749][])
Homogenize SSH use and configurability between DataLad and git-annex, by
instructing git-annex to use DataLad's sshrun
for SSH calls (instead of SSH
directly). ([#5389][])
The ORA special remote has received several new features:
It now support a push-url
setting as an alternative to url
for write
access. An analog parameter was also added to create-sibling-ria
.
([#5420][], [#5428][])
Access of RIA stores now performs homogeneous availability checks, regardless of access protocol. Before, broken HTTP-based access due to misspecified URLs could have gone unnoticed. ([#5459][], [#5672][])
Error reporting was introduce to inform about undesirable conditions in remote RIA stores. ([#5683][])
create-sibling-ria
now supports --alias
for the specification of a
convenience dataset alias name in a RIA store. ([#5592][])
Analog to git commit
, save
now features an --amend
mode to support
incremental updates of a dataset state. ([#5430][])
run
now supports a dry-run mode that can be used to inspect the result of
parameter expansion on the effective command to ease the composition of more
complicated command lines. ([#5539][])
run
now supports a --assume-ready
switch to avoid the (possibly
expensive) preparation of inputs and outputs with large datasets that have
already been readied through other means. ([#5431][])
update
now features --how
and --how-subds
parameters to configure how
an update shall be performed. Supported modes are fetch
(unchanged
default), and merge
(previously also possible via --merge
), but also new
strategies like reset
or checkout
. ([#5534][])
update
has a new --follow=parentds-lazy
mode that only performs a fetch
operation in subdatasets when the desired commit is not yet present. During
recursive updates involving many subdatasets this can substantially speed up
performance. ([#5474][])
DataLad's command line API can now report the version for individual commands
via datalad <cmd> --version
. The output has been homogenized to
<providing package> <version>
. ([#5543][])
create-sibling
now logs information on an auto-generated sibling name, in
the case that no --name/-s
was provided. ([#5550][])
create-sibling-github
has been updated to emit result records like any
standard DataLad command. Previously it was implemented as a "plugin", which
did not support all standard API parameters. ([#5551][])
copy-file
now also works with content-less files in datasets on crippled
filesystems (adjusted mode), when a recent enough git-annex (8.20210428 or
later) is available. ([#5630][])
addurls
can now be instructed how to behave in the event of file name
collision via a new parameter --on-collision
. ([#5675][])
addurls
reporting now informs which particular subdatasets were created.
([#5689][])
Credentials can now be provided or overwritten via all means supported by
ConfigManager
. Importantly, datalad.credential.<name>.<field>
configuration settings and analog specification via environment variables are
now supported (rather than custom environment variables only). Previous
specification methods are still supported too. ([#5680][])
A new datalad.credentials.force-ask
configuration flag can now be used to
force re-entry of already known credentials. This simplifies credential
updates without having to use an approach native to individual credential
stores. ([#5777][])
Suppression of rendering repeated similar results is now configurable via the
configuration switches datalad.ui.suppress-similar-results
(bool), and
datalad.ui.suppress-similar-results-threshold
(int). ([#5681][])
The performance of status
and similar functionality when determining local
file availability has been improved. ([#5692][])
push
now renders a result summary on completion. ([#5696][])
A dedicated info log message indicates when dataset repositories are subjected to an annex version upgrade. ([#5698][])
Error reporting improvements:
The NoDatasetFound
exception now provides information for which purpose a
dataset is required. ([#5708][])
Wording of the MissingExternalDependeny
error was rephrased to account
for cases of non-functional installations. ([#5803][])
push
reports when a --to
parameter specification was (likely)
forgotten. ([#5726][])
Detailed information is now given when DataLad fails to obtain a lock for credential entry in a timely fashion. Previously only a generic debug log message was emitted. ([#5884][])
Clarified error message when create-sibling-gitlab
was called without
--project
. ([#5907][])
add-readme
now provides a README template with more information on the
nature and use of DataLad datasets. A README file is no longer annex'ed by
default, but can be using the new --annex
switch. ([#5723][], [#5725][])
clean
now supports a --dry-run
mode to inform about cleanable content.
([#5738][])
A new configuration setting datalad.locations.locks
can be used to control
the placement of lock files. ([#5740][])
wtf
now also reports branch names and states. ([#5804][])
AnnexRepo.whereis()
now supports batch mode. ([#5533][])
The minimum supported git-annex version is now 8.20200309. ([#5512][])
ORA special remote configuration items ssh-host
, and base-path
are
deprecated. They are completely replaced by ria+<protocol>://
URL
specifications. ([#5425][])
The deprecated no_annex
parameter of create()
was removed from the Python
API. ([#5441][])
The unused GitRepo.pull()
method has been removed. ([#5558][])
Residual support for "plugins" (a mechanism used before DataLad supported
extensions) was removed. This includes the configuration switches
datalad.locations.{system,user}-plugins
. ([#5554][], [#5564][])
Several features and comments have been moved to the datalad-deprecated
package. This package must now be installed to be able to use keep using this
functionality.
The publish
command. Use push
instead. ([#5837][])
The ls
command. ([#5569][])
The web UI that is deployable via datalad create-sibling --ui
. ([#5555][])
The "automagic IO" feature. ([#5577][])
AnnexRepo.copy_to()
has been deprecated. The push
command should be used
instead. ([#5560][])
AnnexRepo.sync()
has been deprecated. AnnexRepo.call_annex(['sync', ...])
should be used instead. ([#5461][])
All GitRepo.*_submodule()
methods have been deprecated and will be removed
in a future release. ([#5559][])
create-sibling-github
's --dryrun
switch was deprecated, use --dry-run
instead.
([#5551][])
The datalad --pbs-runner
option has been deprecated, use condor_run
(or similar) instead. ([#5956][])
Prevent invalid declaration of a publication dependencies for 'origin' on any auto-detected ORA special remotes, when cloing from a RIA store. An ORA remote is now checked whether it actually points to the RIA store the clone was made from. ([#5415][])
The ORA special remote implementation has received several fixes:
It can now handle HTTP redirects. ([#5792][])
Prevents failure when URL-type annex keys contain the '/' character. ([#5823][])
Properly support the specification of usernames, passwords and ports in
ria+<protocol>://
URLs. ([#5902][])
It is now possible to specifically select the default (or generic) result
renderer via datalad -f default
and with that override a tailored
result
renderer that may be preconfigured for a particular command. ([#5476][])
Starting with 0.14.0, original URLs given to clone
were recorded in a
subdataset record. This was initially done in a second commit, leading to
inflation of commits and slowdown in superdatasets with many subdatasets. Such
subdataset record annotation is now collapsed into a single commits.
([#5480][])
run
now longer removes leading empty directories as part of the output
preparation. This was surprising behavior for commands that do not ensure on
their own that output directories exist. ([#5492][])
A potentially existing message
property is no longer removed when using the
json
or json_pp
result renderer to avoid undesired withholding of
relevant information. ([#5536][])
subdatasets
now reports state=present
, rather than state=clean
, for
installed subdatasets to complement state=absent
reports for uninstalled
dataset. ([#5655][])
create-sibling-ria
now executes commands with a consistent environment
setup that matches all other command execution in other DataLad commands.
([#5682][])
save
no longer saves unspecified subdatasets when called with an explicit
path (list). The fix required a behavior change of
GitRepo.get_content_info()
in its interpretation of None
vs. []
path
argument values that now aligns the behavior of GitRepo.diff|status()
with
their respective documentation. ([#5693][])
get
now prefers the location of a subdatasets that is recorded in a
superdataset's .gitmodules
record. Previously, DataLad tried to obtain a
subdataset from an assumed checkout of the superdataset's origin. This new
default order is (re-)configurable via the
datalad.get.subdataset-source-candidate-<priority-label>
configuration
mechanism. ([#5760][])
create-sibling-gitlab
no longer skips the root dataset when .
is given as
a path. ([#5789][])
siblings
now rejects a value given to --as-common-datasrc
that clashes
with the respective Git remote. ([#5805][])
The usage synopsis reported by siblings
now lists all supported actions.
([#5913][])
siblings
now renders non-ok results to avoid silent failure. ([#5915][])
.gitattribute
file manipulations no longer leave the file without a
trailing newline. ([#5847][])
Prevent crash when trying to delete a non-existing keyring credential field. ([#5892][])
git-annex is no longer called with an unconditional annex.retry=3
configuration. Instead, this parameterization is now limited to annex get
and annex copy
calls. ([#5904][])
file://
URLs are no longer the predominant test case for AnnexRepo
functionality. A built-in HTTP server now used in most cases. ([#5332][])siblings
and improve usage synopsis #5913 (@mih)maint
--no-changelog
to auto shipit
if changelog already has entry #5952 (@jwodder)maint
Following an internal call to git-clone
, [clone][] assumed that
the remote name was "origin", but this may not be the case if
clone.defaultRemoteName
is configured (available as of Git 2.30).
([#5572][])
Several test fixes, including updates for changes in git-annex. ([#5612][]) ([#5632][]) ([#5639][])
For outputs that include a glob, [run][] didn't re-glob after
executing the command, which is necessary to catch changes if
--explicit
or --expand={outputs,both}
is specified. ([#5594][])
[run][] now gives an error result rather than a warning when an input glob doesn't match. ([#5594][])
The procedure for creating a RIA store checks for an existing ria-layout-version file and makes sure its version matches the desired version. This check wasn't done correctly for SSH hosts. ([#5607][])
A helper for transforming git-annex JSON records into DataLad results didn't account for the unusual case where the git-annex record doesn't have a "file" key. ([#5580][])
The test suite required updates for recent changes in PyGithub and git-annex. ([#5603][]) ([#5609][])
datalad shell-completion
. ([#5544][])[push][] now works bottom-up, pushing submodules first so that hooks on the remote can aggregate updated subdataset information. ([#5416][])
[run-procedure][] didn't ensure that the configuration of subdatasets was reloaded. ([#5552][])
The recent default branch changes on GitHub's side can lead to "git-annex" being selected over "master" as the default branch on GitHub when setting up a sibling with [create-sibling-github][]. To work around this, the current branch is now pushed first. ([#5010][])
The logic for reading in a JSON line from git-annex failed if the response exceeded the buffer size (256 KB on *nix systems).
Calling [unlock][] with a path of "." from within an untracked subdataset incorrectly aborted, complaining that the "dataset containing given paths is not underneath the reference dataset". ([#5458][])
[clone][] didn't account for the possibility of multiple accessible ORA remotes or the fact that none of them may be associated with the RIA store being cloned. ([#5488][])
[create-sibling-ria][] didn't call git update-server-info
after
setting up the remote repository and, as a result, the repository
couldn't be fetched until something else (e.g., a push) triggered a
call to git update-server-info
. ([#5531][])
The parser for git-config output didn't properly handle multi-line values and got thrown off by unexpected and unrelated lines. ([#5509][])
The 0.14 release introduced regressions in the handling of progress bars for git-annex actions, including collapsing progress bars for concurrent operations. ([#5421][]) ([#5438][])
[save][] failed if the user configured Git's diff.ignoreSubmodules
to a non-default value. ([#5453][])
A interprocess lock is now used to prevent a race between checking for an SSH socket's existence and creating it. ([#5466][])
If a Python procedure script is executable, [run-procedure][]
invokes it directly rather than passing it to sys.executable
. The
non-executable Python procedures that ship with DataLad now include
shebangs so that invoking them has a chance of working on file
systems that present all files as executable. ([#5436][])
DataLad's wrapper around argparse
failed if an underscore was used
in a positional argument. ([#5525][])
DATALAD_FOO_X__Y
to datalad.foo.x-y
) doesn't work
if the subsection name ("FOO") has an underscore. This limitation
can be sidestepped with the new DATALAD_CONFIG_OVERRIDES_JSON
environment variable, which can be set to a JSON record of
configuration values. ([#5505][])Git versions below v2.19.1 are no longer supported. ([#4650][])
The minimum git-annex version is still 7.20190503, but, if you're on Windows (or use adjusted branches in general), please upgrade to at least 8.20200330 but ideally 8.20210127 to get subdataset-related fixes. ([#4292][]) ([#5290][])
The minimum supported version of Python is now 3.6. ([#4879][])
[publish][] is now deprecated in favor of [push][]. It will be removed in the 0.15.0 release at the earliest.
A new command runner was added in v0.13. Functionality related to
the old runner has now been removed: Runner
, GitRunner
, and
run_gitcommand_on_file_list_chunks
from the datalad.cmd
module
along with the datalad.tests.protocolremote
,
datalad.cmd.protocol
, and datalad.cmd.protocol.prefix
configuration options. ([#5229][])
The --no-storage-sibling
switch of create-sibling-ria
is
deprecated in favor of --storage-sibling=off
and will be removed
in a later release. ([#5090][])
The get_git_dir
static method of GitRepo
is deprecated and will
be removed in a later release. Use the dot_git
attribute of an
instance instead. ([#4597][])
The ProcessAnnexProgressIndicators
helper from
datalad.support.annexrepo
has been removed. ([#5259][])
The save
argument of [install][], a noop since v0.6.0, has been
dropped. ([#5278][])
The get_URLS
method of AnnexCustomRemote
is deprecated and will
be removed in a later release. ([#4955][])
ConfigManager.get
now returns a single value rather than a tuple
when there are multiple values for the same key, as very few callers
correctly accounted for the possibility of a tuple return value.
Callers can restore the old behavior by passing get_all=True
.
([#4924][])
In 0.12.0, all of the assure_*
functions in datalad.utils
were
renamed as ensure_*
, keeping the old names around as compatibility
aliases. The assure_*
variants are now marked as deprecated and
will be removed in a later release. ([#4908][])
The datalad.interface.run
module, which was deprecated in 0.12.0
and kept as a compatibility shim for datalad.core.local.run
, has
been removed. ([#4583][])
The saver
argument of datalad.core.local.run.run_command
, marked
as obsolete in 0.12.0, has been removed. ([#4583][])
The dataset_only
argument of the ConfigManager
class was
deprecated in 0.12 and has now been removed. ([#4828][])
The linux_distribution_name
, linux_distribution_release
, and
on_debian_wheezy
attributes in datalad.utils
are no longer set
at import time and will be removed in a later release. Use
datalad.utils.get_linux_distribution
instead. ([#4696][])
datalad.distribution.clone
, which was marked as obsolete in v0.12
in favor of datalad.core.distributed.clone
, has been removed.
([#4904][])
datalad.support.annexrepo.N_AUTO_JOBS
, announced as deprecated in
v0.12.6, has been removed. ([#4904][])
The compat
parameter of GitRepo.get_submodules
, added in v0.12
as a temporary compatibility layer, has been removed. ([#4904][])
The long-deprecated (and non-functional) url
parameter of
GitRepo.__init__
has been removed. ([#5342][])
Cloning onto a system that enters adjusted branches by default (as Windows does) did not properly record the clone URL. ([#5128][])
The RIA-specific handling after calling [clone][] was correctly
triggered by ria+http
URLs but not ria+https
URLs. ([#4977][])
If the registered commit wasn't found when cloning a subdataset, the failed attempt was left around. ([#5391][])
The remote calls to cp
and chmod
in [create-sibling][] were not
portable and failed on macOS. ([#5108][])
A more reliable check is now done to decide if configuration files need to be reloaded. ([#5276][])
The internal command runner's handling of the event loop has been improved to play nicer with outside applications and scripts that use asyncio. ([#5350][]) ([#5367][])
The subdataset handling for adjusted branches, which is particularly
important on Windows where git-annex enters an adjusted branch by
default, has been improved. A core piece of the new approach is
registering the commit of the primary branch, not its checked out
adjusted branch, in the superdataset. Note: This means that git
status
will always consider a subdataset on an adjusted branch as
dirty while datalad status
will look more closely and see if the
tip of the primary branch matches the registered commit.
([#5241][])
The performance of the [subdatasets][] command has been improved, with substantial speedups for recursive processing of many subdatasets. ([#4868][]) ([#5076][])
Adding new subdatasets via [save][] has been sped up. ([#4793][])
[get][], [save][], and [addurls][] gained support for parallel
operations that can be enabled via the --jobs
command-line option
or the new datalad.runtime.max-jobs
configuration option. ([#5022][])
[addurls][]
--drop-after
switch that signals to drop a file's
content after downloading and adding it to the annex. ([#5081][])--key
option. ([#5184][])[create-sibling-github][] learned how to create private repositories (thanks to Nolan Nichols). ([#4769][])
[create-sibling-ria][] gained a --storage-sibling
option. When
--storage-sibling=only
is specified, the storage sibling is
created without an accompanying Git sibling. This enables using
hosts without Git installed for storage. ([#5090][])
The download machinery (and thus the datalad
special remote)
gained support for a new scheme, shub://
, which follows the same
format used by singularity run
and friends. In contrast to the
short-lived URLs obtained by querying Singularity Hub directly,
shub://
URLs are suitable for registering with git-annex. ([#4816][])
A provider is now included for https://registry-1.docker.io URLs. This is useful for storing an image's blobs in a dataset and registering the URLs with git-annex. ([#5129][])
The add-readme
command now links to the [DataLad
handbook][handbook] rather than http://docs.datalad.org. ([#4991][])
New option datalad.locations.extra-procedures
specifies an
additional location that should be searched for procedures. ([#5156][])
The class for handling configuration values, ConfigManager
, now
takes a lock before writes to allow for multiple processes to modify
the configuration of a dataset. ([#4829][])
[clone][] now records the original, unresolved URL for a subdataset
under submodule.<name>.datalad-url
in the parent's .gitmodules,
enabling later [get][] calls to use the original URL. This is
particularly useful for ria+
URLs. ([#5346][])
Installing a subdataset now uses custom handling rather than calling
git submodule update --init
. This avoids some locking issues when
running [get][] in parallel and enables more accurate source URLs to
be recorded. ([#4853][])
GitRepo.get_content_info
, a helper that gets triggered by many
commands, got faster by tweaking its git ls-files
call. ([#5067][])
[wtf][] now includes credentials-related information (e.g. active backends) in the its output. ([#4982][])
The call_git*
methods of GitRepo
now have a read_only
parameter. Callers can set this to True
to promise that the
provided command does not write to the repository, bypassing the
cost of some checks and locking. ([#5070][])
New call_annex*
methods in the AnnexRepo
class provide an
interface for running git-annex commands similar to that of the
GitRepo.call_git*
methods. ([#5163][])
It's now possible to register a custom metadata indexer that is discovered by [search][] and used to generate an index. ([#4963][])
The ConfigManager
methods get
, getbool
, getfloat
, and
getint
now return a single value (with same precedence as git
config --get
) when there are multiple values for the same key (in
the non-committed git configuration, if the key is present there, or
in the dataset configuration). For get
, the old behavior can be
restored by specifying get_all=True
. ([#4924][])
Command-line scripts are now defined via the entry_points
argument
of setuptools.setup
instead of the scripts
argument. ([#4695][])
Interactive use of --help
on the command-line now invokes a pager
on more systems and installation setups. ([#5344][])
The datalad
special remote now tries to eliminate some unnecessary
interactions with git-annex by being smarter about how it queries
for URLs associated with a key. ([#4955][])
The GitRepo
class now does a better job of handling bare
repositories, a step towards bare repositories support in DataLad.
([#4911][])
More internal work to move the code base over to the new command runner. ([#4699][]) ([#4855][]) ([#4900][]) ([#4996][]) ([#5002][]) ([#5141][]) ([#5142][]) ([#5229][])
Cloning from a RIA store on the local file system initialized annex
in the Git sibling of the RIA source, which is problematic because
all annex-related functionality should go through the storage
sibling. [clone][] now sets remote.origin.annex-ignore
to true
after cloning from RIA stores to prevent this. ([#5255][])
[create-sibling][] invoked cp
in a way that was not compatible
with macOS. ([#5269][])
Due to a bug in older Git versions (before 2.25), calling [status][]
with a file under .git/ (e.g., datalad status .git/config
)
incorrectly reported the file as untracked. A workaround has been
added. ([#5258][])
Update tests for compatibility with latest git-annex. ([#5254][])
An assortment of fixes for Windows compatibility. ([#5113][]) ([#5119][]) ([#5125][]) ([#5127][]) ([#5136][]) ([#5201][]) ([#5200][]) ([#5214][])
Adding a subdataset on a system that defaults to using an adjusted branch (i.e. doesn't support symlinks) didn't properly set up the submodule URL if the source dataset was not in an adjusted state. ([#5127][])
[push][] failed to push to a remote that did not have an
annex-uuid
value in the local .git/config
. ([#5148][])
The default renderer has been improved to avoid a spurious leading space, which led to the displayed path being incorrect in some cases. ([#5121][])
[siblings][] showed an uninformative error message when asked to configure an unknown remote. ([#5146][])
[drop][] confusingly relayed a suggestion from git annex drop
to
use --force
, an option that does not exist in datalad drop
.
([#5194][])
[create-sibling-github][] no longer offers user/password authentication because it is no longer supported by GitHub. ([#5218][])
The internal command runner's handling of the event loop has been tweaked to hopefully fix issues with runnning DataLad from IPython. ([#5106][])
SSH cleanup wasn't reliably triggered by the ORA special remote on failure, leading to a stall with a particular version of git-annex, 8.20201103. (This is also resolved on git-annex's end as of 8.20201127.) ([#5151][])
The credential helper no longer asks the user to repeat tokens or AWS keys. ([#5219][])
The new option datalad.locations.sockets
controls where Datalad
stores SSH sockets, allowing users to more easily work around file
system and path length restrictions. ([#5238][])
SSH connection handling has been reworked to fix cloning on Windows.
A new configuration option, datalad.ssh.multiplex-connections
,
defaults to false on Windows. ([#5042][])
The ORA special remote and post-clone RIA configuration now provide authentication via DataLad's credential mechanism and better handling of HTTP status codes. ([#5025][]) ([#5026][])
By default, if a git executable is present in the same location as
git-annex, DataLad modifies PATH
when running git and git-annex so
that the bundled git is used. This logic has been tightened to
avoid unnecessarily adjusting the path, reducing the cases where the
adjustment interferes with the local environment, such as special
remotes in a virtual environment being masked by the system-wide
variants. ([#5035][])
git-annex is now consistently invoked as "git annex" rather than "git-annex" to work around failures on Windows. ([#5001][])
[push][] called git annex sync ...
on plain git repositories.
([#5051][])
[save][] in genernal doesn't support registering multiple levels of
untracked subdatasets, but it can now properly register nested
subdatasets when all of the subdataset paths are passed explicitly
(e.g., datalad save -d. sub-a sub-a/sub-b
). ([#5049][])
When called with --sidecar
and --explicit
, [run][] didn't save
the sidecar. ([#5017][])
A couple of spots didn't properly quote format fields when combining substrings into a format string. ([#4957][])
The default credentials configured for indi-s3
prevented anonymous
access. ([#5045][])
Messages about suppressed similar results are now rate limited to improve performance when there are many similar results coming through quickly. ([#5060][])
[create-sibling-github][] can now be told to replace an existing
sibling by passing --existing=replace
. ([#5008][])
Progress bars now react to changes in the terminal's width (requires tqdm 2.1 or later). ([#5057][])
Ephemeral clones mishandled bare repositories. ([#4899][])
The post-clone logic for configuring RIA stores didn't consider
https://
URLs. ([#4977][])
DataLad custom remotes didn't escape newlines in messages sent to git-annex. ([#4926][])
The datalad-archives special remote incorrectly treated file names as percent-encoded. ([#4953][])
The result handler didn't properly escape "%" when constructing its message template. ([#4953][])
In v0.13.0, the tailored rendering for specific subtypes of external command failures (e.g., "out of space" or "remote not available") was unintentionally switched to the default rendering. ([#4966][])
Various fixes and updates for the NDA authenticator. ([#4824][])
The helper for getting a versioned S3 URL did not support anonymous access or buckets with "." in their name. ([#4985][])
Several issues with the handling of S3 credentials and token expiration have been addressed. ([#4927][]) ([#4931][]) ([#4952][])
A warning is now given if the detected Git is below v2.13.0 to let users that run into problems know that their Git version is likely the culprit. ([#4866][])
A fix to [push][] in v0.13.2 introduced a regression that surfaces
when push.default
is configured to "matching" and prevents the
git-annex branch from being pushed. Note that, as part of the fix,
the current branch is now always pushed even when it wouldn't be
based on the configured refspec or push.default
value. ([#4896][])
[publish][]
--since=
as ^
for consistency with [push][]. ([#4683][])--since=
with HEAD
rather than
the working tree to speed up the operation. ([#4448][])[rerun][]
The archives are handled with p7zip, if available, since DataLad v0.12.0. This implementation now supports .tgz and .tbz2 archives. ([#4877][])
Work around a Python bug that led to our asyncio-based command runner intermittently failing to capture the output of commands that exit very quickly. ([#4835][])
[push][] displayed an overestimate of the transfer size when multiple files pointed to the same key. ([#4821][])
When [download-url][] calls git annex addurl
, it catches and
reports any failures rather than crashing. A change in v0.12.0
broke this handling in a particular case. ([#4817][])
allow_quick
parameter of AnnexRepo.file_has_content
and
AnnexRepo.is_under_annex
is now ignored and will be removed in a
later release. This parameter was only relevant for git-annex
versions before 7.20190912. ([#4736][])Updates for compatibility with recent git and git-annex releases. ([#4746][]) ([#4760][]) ([#4684][])
[push][] didn't sync the git-annex branch when --data=nothing
was
specified. ([#4786][])
The datalad.clone.reckless
configuration wasn't stored in
non-annex datasets, preventing the values from being inherited by
annex subdatasets. ([#4749][])
Running the post-update hook installed by create-sibling --ui
could overwrite web log files from previous runs in the unlikely
event that the hook was executed multiple times in the same second.
([#4745][])
[clone][] inspected git's standard error in a way that could cause an attribute error. ([#4775][])
When cloning a repository whose HEAD
points to a branch without
commits, [clone][] tries to find a more useful branch to check out.
It unwisely considered adjusted branches. ([#4792][])
Since v0.12.0, SSHManager.close
hasn't closed connections when the
ctrl_path
argument was explicitly given. ([#4757][])
When working in a dataset in which git annex init
had not yet been
called, the file_has_content
and is_under_annex
methods of
AnnexRepo
incorrectly took the "allow quick" code path on file
systems that did not support it ([#4736][])
[create][] now assigns version 4 (random) UUIDs instead of version 1 UUIDs that encode the time and hardware address. ([#4790][])
The documentation for [create][] now does a better job of describing
the interaction between --dataset
and PATH
. ([#4763][])
The format_commit
and get_hexsha
methods of GitRepo
have been
sped up. ([#4807][]) ([#4806][])
A better error message is now shown when the ^
or ^.
shortcuts
for --dataset
do not resolve to a dataset. ([#4759][])
A more helpful error message is now shown if a caller tries to
download an ftp://
link but does not have request_ftp
installed.
([#4788][])
[clone][] now tries harder to get up-to-date availability
information after auto-enabling type=git
special remotes. ([#2897][])
Cloning a subdataset should inherit the parent's
datalad.clone.reckless
value, but that did not happen when cloning
via datalad get
rather than datalad install
or datalad clone
.
([#4657][])
The default result renderer crashed when the result did not have a
path
key. ([#4666][]) ([#4673][])
datalad push
didn't show information about git push
errors when
the output was not in the format that it expected. ([#4674][])
datalad push
silently accepted an empty string for --since
even
though it is an invalid value. ([#4682][])
Our JavaScript testing setup on Travis grew stale and has now been updated. (Thanks to Xiao Gui.) ([#4687][])
The new class for running Git commands (added in v0.13.0) ignored any changes to the process environment that occurred after instantiation. ([#4703][])
datalad push
now avoids unnecessary git push
dry runs and pushes
all refspecs with a single git push
call rather than invoking git
push
for each one. ([#4692][]) ([#4675][])
The readability of SSH error messages has been improved. ([#4729][])
datalad.support.annexrepo
avoids calling
datalad.utils.get_linux_distribution
at import time and caches the
result once it is called because, as of Python 3.8, the function
uses distro
underneath, adding noticeable overhead. ([#4696][])
Third-party code should be updated to use get_linux_distribution
directly in the unlikely event that the code relied on the
import-time call to get_linux_distribution
setting the
linux_distribution_name
, linux_distribution_release
, or
on_debian_wheezy
attributes in `datalad.utils.
A handful of new commands, including copy-file
, push
, and
create-sibling-ria
, along with various fixes and enhancements
The no_annex
parameter of [create][], which is exposed in the
Python API but not the command line, is deprecated and will be
removed in a later release. Use the new annex
argument instead,
flipping the value. Command-line callers that use --no-annex
are
unaffected. ([#4321][])
datalad add
, which was deprecated in 0.12.0, has been removed.
([#4158][]) ([#4319][])
The following GitRepo
and AnnexRepo
methods have been removed:
get_changed_files
, get_missing_files
, and get_deleted_files
.
([#4169][]) ([#4158][])
The get_branch_commits
method of GitRepo
and AnnexRepo
has
been renamed to get_branch_commits_
. ([#3834][])
The custom commit
method of AnnexRepo
has been removed, and
AnnexRepo.commit
now resolves to the parent method,
GitRepo.commit
. ([#4168][])
GitPython's git.repo.base.Repo
class is no longer available via
the .repo
attribute of GitRepo
and AnnexRepo
. ([#4172][])
AnnexRepo.get_corresponding_branch
now returns None
rather than
the current branch name when a managed branch is not checked out.
([#4274][])
The special UUID for git-annex web remotes is now available as
datalad.consts.WEB_SPECIAL_REMOTE_UUID
. It remains accessible as
AnnexRepo.WEB_UUID
for compatibility, but new code should use
consts.WEB_SPECIAL_REMOTE_UUID
([#4460][]).
Widespread improvements in functionality and test coverage on Windows and crippled file systems in general. ([#4057][]) ([#4245][]) ([#4268][]) ([#4276][]) ([#4291][]) ([#4296][]) ([#4301][]) ([#4303][]) ([#4304][]) ([#4305][]) ([#4306][])
AnnexRepo.get_size_from_key
incorrectly handled file chunks.
([#4081][])
[create-sibling][] would too readily clobber existing paths when
called with --existing=replace
. It now gets confirmation from the
user before doing so if running interactively and unconditionally
aborts when running non-interactively. ([#4147][])
[update][] ([#4159][])
When the caller included --bare
as a git init
option, [create][]
crashed creating the bare repository, which is currently
unsupported, rather than aborting with an informative error message.
([#4065][])
The logic for automatically propagating the 'origin' remote when cloning a local source could unintentionally trigger a fetch of a non-local remote. ([#4196][])
All remaining get_submodules()
call sites that relied on the
temporary compatibility layer added in v0.12.0 have been updated.
([#4348][])
The custom result summary renderer for [get][], which was visible
with --output-format=tailored
, displayed incorrect and confusing
information in some cases. The custom renderer has been removed
entirely. ([#4471][])
The documentation for the Python interface of a command listed an
incorrect default when the command overrode the value of command
parameters such as result_renderer
. ([#4480][])
The default result renderer learned to elide a chain of results after seeing ten consecutive results that it considers similar, which improves the display of actions that have many results (e.g., saving hundreds of files). ([#4337][])
The default result renderer, in addition to "tailored" result renderer, now triggers the custom summary renderer, if any. ([#4338][])
The new command [create-sibling-ria][] provides support for creating a sibling in a [RIA store][handbook-scalable-datastore]. ([#4124][])
DataLad ships with a new special remote, git-annex-remote-ora, for interacting with [RIA stores][handbook-scalable-datastore] and a new command [export-archive-ora][] for exporting an archive from a local annex object store. ([#4260][]) ([#4203][])
The new command [push][] provides an alternative interface to [publish][] for pushing a dataset hierarchy to a sibling. ([#4206][]) ([#4581][]) ([#4617][]) ([#4620][])
The new command [copy-file][] copies files and associated availability information from one dataset to another. ([#4430][])
The command examples have been expanded and improved. ([#4091][]) ([#4314][]) ([#4464][])
The tooling for linking to the [DataLad Handbook][handbook] from DataLad's documentation has been improved. ([#4046][])
The --reckless
parameter of [clone][] and [install][] learned two
new modes:
[clone][]
ria+<protocol>://<storelocation>#~<aliasname>
.
([#4459][])datalad.get.subdataset-source-candidate-NAME
to see
if NAME
starts with three digits, which is taken as a "cost".
Sources with lower costs will be tried first. ([#4619][])[update][] ([#4167][])
ff-only
is
given to the --merge
option.--follow
option that controls how --merge
behaves,
adding support for merging in the revision that is registered in
the parent dataset rather than merging in the configured branch
from the sibling.[create-sibling][] now supports local paths as targets in addition to SSH URLs. ([#4187][])
[siblings][] now
The rendering of command errors has been improved. ([#4157][])
[save][] now
--to-git
. ([#4290][])[diff][] and [save][] learned about scenarios where they could avoid unnecessary and expensive work. ([#4526][]) ([#4544][]) ([#4549][])
Calling [diff][] without --recursive
but with a path constraint
within a subdataset ("
New option datalad.annex.retry
controls how many times git-annex
will retry on a failed transfer. It defaults to 3 and can be set to
0 to restore the previous behavior. ([#4382][])
[wtf][] now warns when the specified dataset does not exist. ([#4331][])
The repr
and str
output of the dataset and repo classes got a
facelift. ([#4420][]) ([#4435][]) ([#4439][])
The DataLad Singularity container now comes with p7zip-full.
DataLad emits a log message when the current working directory is resolved to a different location due to a symlink. This is now logged at the DEBUG rather than WARNING level, as it typically does not indicate a problem. ([#4426][])
DataLad now lets the caller know that git annex init
is scanning
for unlocked files, as this operation can be slow in some
repositories. ([#4316][])
The log_progress
helper learned how to set the starting point to a
non-zero value and how to update the total of an existing progress
bar, two features needed for planned improvements to how some
commands display their progress. ([#4438][])
The ExternalVersions
object, which is used to check versions of
Python modules and external tools (e.g., git-annex), gained an add
method that enables DataLad extensions and other third-party code to
include other programs of interest. ([#4441][])
All of the remaining spots that use GitPython have been rewritten
without it. Most notably, this includes rewrites of the clone
,
fetch
, and push
methods of GitRepo
. ([#4080][]) ([#4087][])
([#4170][]) ([#4171][]) ([#4175][]) ([#4172][])
When GitRepo.commit
splits its operation across multiple calls to
avoid exceeding the maximum command line length, it now amends to
initial commit rather than creating multiple commits. ([#4156][])
GitRepo
gained a get_corresponding_branch
method (which always
returns None), allowing a caller to invoke the method without
needing to check if the underlying repo class is GitRepo
or
AnnexRepo
. ([#4274][])
A new helper function datalad.core.local.repo.repo_from_path
returns a repo class for a specified path. ([#4273][])
New AnnexRepo
method localsync
performs a git annex sync
that
disables external interaction and is particularly useful for
propagating changes on an adjusted branch back to the main branch.
([#4243][])
Requesting tailored output (--output=tailored
) from a command with
a custom result summary renderer produced repeated output. ([#4463][])
A longstanding regression in argcomplete-based command-line
completion for Bash has been fixed. You can enable completion by
configuring a Bash startup file to run eval
"$(register-python-argcomplete datalad)"
or source DataLad's
tools/cmdline-completion
. The latter should work for Zsh as well.
([#4477][])
[publish][] didn't prevent git-fetch
from recursing into
submodules, leading to a failure when the registered submodule was
not present locally and the submodule did not have a remote named
'origin'. ([#4560][])
[addurls][] botched path handling when the file name format started with "./" and the call was made from a subdirectory of the dataset. ([#4504][])
Double dash options in manpages were unintentionally escaped. ([#4332][])
The check for HTTP authentication failures crashed in situations where content came in as bytes rather than unicode. ([#4543][])
A check in AnnexRepo.whereis
could lead to a type error. ([#4552][])
When installing a dataset to obtain a subdataset, [get][] confusingly displayed a message that described the containing dataset as "underneath" the subdataset. ([#4456][])
A couple of Makefile rules didn't properly quote paths. ([#4481][])
With DueCredit support enabled (DUECREDIT_ENABLE=1
), the query for
metadata information could flood the output with warnings if
datasets didn't have aggregated metadata. The warnings are now
silenced, with the overall failure of a [metadata][] call logged at
the debug level. ([#4568][])
The resource identifier helper learned to recognize URLs with embedded Git transport information, such as gcrypt::https://example.com. ([#4529][])
When running non-interactively, a more informative error is now signaled when the UI backend, which cannot display a question, is asked to do so. ([#4553][])
datalad.support.annexrep.N_AUTO_JOBS
is no longer
considered. The variable will be removed in a later release.
([#4409][])Staring with v0.12.0, datalad save
recorded the current branch of
a parent dataset as the branch
value in the .gitmodules entry for
a subdataset. This behavior is problematic for a few reasons and
has been reverted. ([#4375][])
The default for the --jobs
option, "auto", instructed DataLad to
pass a value to git-annex's --jobs
equal to min(8, max(3, <number
of CPUs>))
, which could lead to issues due to the large number of
child processes spawned and file descriptors opened. To avoid this
behavior, --jobs=auto
now results in git-annex being called with
--jobs=1
by default. Configure the new option
datalad.runtime.max-annex-jobs
to control the maximum value that
will be considered when --jobs='auto'
. ([#4409][])
Various commands have been adjusted to better handle the case where a remote's HEAD ref points to an unborn branch. ([#4370][])
[search]
--show-keys short
. ([#4354][])The code for parsing Git configuration did not follow Git's behavior of accepting a key with no value as shorthand for key=true. ([#4421][])
AnnexRepo.info
needed a compatibility update for a change in how
git-annex reports file names. ([#4431][])
[create-sibling-github][] did not gracefully handle a token that did not have the necessary permissions. ([#4400][])
[search] learned to use the query as a regular expression that
restricts the keys that are shown for --show-keys short
. ([#4354][])
datalad <subcommand>
learned to point to the [datalad-container][]
extension when a subcommand from that extension is given but the
extension is not installed. ([#4400][]) ([#4174][])
๏ฟผ Fix some bugs and make the world an even better place.
Our log_progress
helper mishandled the initial display and step of
the progress bar. ([#4326][])
AnnexRepo.get_content_annexinfo
is designed to accept init=None
,
but passing that led to an error. ([#4330][])
Update a regular expression to handle an output change in Git v2.26.0. ([#4328][])
We now set LC_MESSAGES
to 'C' while running git to avoid failures
when parsing output that is marked for translation. ([#4342][])
The helper for decoding JSON streams loaded the last line of input without decoding it if the line didn't end with a new line, a regression introduced in the 0.12.0 release. ([#4361][])
The clone command failed to git-annex-init a fresh clone whenever it considered to add the origin of the origin as a remote. ([#4367][])
๏ฟผ The main purpose of this release is to have one on PyPi that has no associated wheel to enable a working installation on Windows ([#4315][]).
log.outputs
config switch did not keep up
with code changes and incorrectly stated that the output would be
logged at the DEBUG level; logging actually happens at a lower
level. ([#4317][])Updates for compatibility with the latest git-annex, along with a few miscellaneous fixes
NoDatasetArgumentFound
exception now raise
a NoDatasetFound
exception to better reflect the situation: it is
the dataset rather than the argument that is not found. For
compatibility, the latter inherits from the former, but new code
should prefer the latter. ([#4285][])Updates for compatibility with git-annex version 8.20200226. ([#4214][])
datalad export-to-figshare
failed to export if the generated title
was fewer than three characters. It now queries the caller for the
title and guards against titles that are too short. ([#4140][])
Authentication was requested multiple times when git-annex launched
parallel downloads from the datalad
special remote. ([#4308][])
At verbose logging levels, DataLad requests that git-annex display debugging information too. Work around a bug in git-annex that prevented that from happening. ([#4212][])
The internal command runner looked in the wrong place for some
configuration variables, including datalad.log.outputs
, resulting
in the default value always being used. ([#4194][])
[publish][] failed when trying to publish to a git-lfs special remote for the first time. ([#4200][])
AnnexRepo.set_remote_url
is supposed to establish shared SSH
connections but failed to do so. ([#4262][])
The message provided when a command cannot determine what dataset to operate on has been improved. ([#4285][])
The "aws-s3" authentication type now allows specifying the host through "aws-s3_host", which was needed to work around an authorization error due to a longstanding upstream bug. ([#4239][])
The xmp metadata extractor now recognizes ".wav" files.
Mostly a bugfix release with various robustifications, but also makes the first step towards versioned dataset installation requests.
The class for handling configuration values, ConfigManager
,
inappropriately considered the current working directory's dataset,
if any, for both reading and writing when instantiated with
dataset=None
. This misbehavior is fairly inaccessible through
typical use of DataLad. It affects datalad.cfg
, the top-level
configuration instance that should not consider repository-specific
values. It also affects Python users that call Dataset
with a
path that does not yet exist and persists until that dataset is
created. ([#4078][])
[update][] saved the dataset when called with --merge
, which is
unnecessary and risks committing unrelated changes. ([#3996][])
Confusing and irrelevant information about Python defaults have been dropped from the command-line help. ([#4002][])
The logic for automatically propagating the 'origin' remote when cloning a local source didn't properly account for relative paths. ([#4045][])
Various fixes to file name handling and quoting on Windows. ([#4049][]) ([#4050][])
When cloning failed, error lines were not bubbled up to the user in some scenarios. ([#4060][])
[clone][] (and thus [install][])
reckless
mode from the superdataset when
cloning a dataset into it. ([#4037][])ria+<protocol>://
URLs that point to
[RIA][handbook-scalable-datastore] stores. ([#4022][])ria+
URLs and install that
version of a dataset ([#4036][]) and to apply URL rewrites
configured through Git's url.*.insteadOf
mechanism ([#4064][]).datalad.get.subdataset-source-candidate-<name>
options configured within the superdataset into the subdataset.
This is particularly useful for RIA data stores. ([#4073][])Archives are now (optionally) handled with 7-Zip instead of
patool
. 7-Zip will be used by default, but patool
will be used
on non-Windows systems if the datalad.runtime.use-patool
option is
set or the 7z
executable is not found. ([#4041][])
Fix some fallout after major release.
Revert incorrect relative path adjustment to URLs in [clone][]. ([#3538][])
Various small fixes to internal helpers and test to run on Windows ([#2566][]) ([#2534][])
This release is the result of more than a year of development that includes fixes for a large number of issues, yielding more robust behavior across a wider range of use cases, and introduces major changes in API and behavior. It is the first release for which extensive user documentation is available in a dedicated [DataLad Handbook][handbook]. Python 3 (3.5 and later) is now the only supported Python flavor.
[save][] fully replaces [add][] (which is obsolete now, and will be removed in a future release).
A new Git-annex aware [status][] command enables detailed inspection of dataset hierarchies. The previously available [diff][] command has been adjusted to match [status][] in argument semantics and behavior.
The ability to configure dataset procedures prior and after the execution of particular commands has been replaced by a flexible "hook" mechanism that is able to run arbitrary DataLad commands whenever command results are detected that match a specification.
Support of the Windows platform has been improved substantially. While performance and feature coverage on Windows still falls behind Unix-like systems, typical data consumer use cases, and standard dataset operations, such as [create][] and [save][], are now working. Basic support for data provenance capture via [run][] is also functional.
Support for Git-annex direct mode repositories has been removed, following the end of support in Git-annex itself.
The semantics of relative paths in command line arguments have changed. Previously,
a call datalad save --dataset /tmp/myds some/relpath
would have been interpreted
as saving a file at /tmp/myds/some/relpath
into dataset /tmp/myds
. This has
changed to saving $PWD/some/relpath
into dataset /tmp/myds
. More generally,
relative paths are now always treated as relative to the current working directory,
except for path arguments of [Dataset][] class instance methods of the Python API.
The resulting partial duplication of path specifications between path and dataset
arguments is mitigated by the introduction of two special symbols that can be given
as dataset argument: ^
and ^.
, which identify the topmost superdataset and the
closest dataset that contains the working directory, respectively.
The concept of a "core API" has been introduced. Commands situated in the module
datalad.core
(such as [create][], [save][], [run][], [status][], [diff][])
receive additional scrutiny regarding API and implementation, and are
meant to provide longer-term stability. Application developers are encouraged to
preferentially build on these commands.
[clone][] has been incorporated into the growing core API. The public
--alternative-source
parameter has been removed, and a clone_dataset
function with multi-source capabilities is provided instead. The
--reckless
parameter can now take literal mode labels instead of just
beeing a binary flag, but backwards compatibility is maintained.
The get_file_content
method of GitRepo
was no longer used
internally or in any known DataLad extensions and has been removed.
([#3812][])
The function get_dataset_root
has been replaced by
rev_get_dataset_root
. rev_get_dataset_root
remains as a
compatibility alias and will be removed in a later release. ([#3815][])
The add_sibling
module, marked obsolete in v0.6.0, has been
removed. ([#3871][])
mock
is no longer declared as an external dependency because we
can rely on it being in the standard library now that our minimum
required Python version is 3.5. ([#3860][])
[download-url][] now requires that directories be indicated with a trailing slash rather than interpreting a path as directory when it doesn't exist. This avoids confusion that can result from typos and makes it possible to support directory targets that do not exist. ([#3854][])
The dataset_only
argument of the ConfigManager
class is
deprecated. Use source="dataset"
instead. ([#3907][])
The --proc-pre
and --proc-post
options have been removed, and
configuration values for datalad.COMMAND.proc-pre
and
datalad.COMMAND.proc-post
are no longer honored. The new result
hook mechanism provides an alternative for proc-post
procedures. ([#3963][])
[publish][] crashed when called with a detached HEAD. It now aborts with an informative message. ([#3804][])
Since 0.12.0rc6 the call to [update][] in [siblings][] resulted in a spurious warning. ([#3877][])
[siblings][] crashed if it encountered an annex repository that was marked as dead. ([#3892][])
The update of [rerun][] in v0.12.0rc3 for the rewritten [diff][]
command didn't account for a change in the output of diff
, leading
to rerun --report
unintentionally including unchanged files in its
diff values. ([#3873][])
In 0.12.0rc5 [download-url][] was updated to follow the new path handling logic, but its calls to AnnexRepo weren't properly adjusted, resulting in incorrect path handling when the called from a dataset subdirectory. ([#3850][])
[download-url][] called git annex addurl
in a way that failed to
register a URL when its header didn't report the content size.
([#3911][])
With Git v2.24.0, saving new subdatasets failed due to a bug in that Git release. ([#3904][])
With DataLad configured to stop on failure (e.g., specifying
--on-failure=stop
from the command line), a failing result record
was not rendered. ([#3863][])
Installing a subdataset yielded an "ok" status in cases where the repository was not yet in its final state, making it ineffective for a caller to operate on the repository in response to the result. ([#3906][])
The internal helper for converting git-annex's JSON output did not relay information from the "error-messages" field. ([#3931][])
[run-procedure][] reported relative paths that were confusingly not relative to the current directory in some cases. It now always reports absolute paths. ([#3959][])
[diff][] inappropriately reported files as deleted in some cases
when to
was a value other than None
. ([#3999][])
An assortment of fixes for Windows compatibility. ([#3971][]) ([#3974][]) ([#3975][]) ([#3976][]) ([#3979][])
Subdatasets installed from a source given by relative path will now have this relative path used as 'url' in their .gitmodules record, instead of an absolute path generated by Git. ([#3538][])
[clone][] will now correctly interpret '~/...' paths as absolute path specifications. ([#3958][])
[run-procedure][] mistakenly reported a directory as a procedure. ([#3793][])
The cleanup for batched git-annex processes has been improved. ([#3794][]) ([#3851][])
The function for adding a version ID to an AWS S3 URL doesn't
support URLs with an "s3://" scheme and raises a
NotImplementedError
exception when it encounters one. The
function learned to return a URL untouched if an "s3://" URL comes
in with a version ID. ([#3842][])
A few spots needed to be adjusted for compatibility with git-annex's
new --sameas
[feature][gx-sameas], which allows special remotes to
share a data store. ([#3856][])
The swallow_logs
utility failed to capture some log messages due
to an incompatibility with Python 3.7. ([#3935][])
[siblings][]
--inherit
was passed but the parent dataset did not
have a remote with a matching name. ([#3954][])By default, datasets cloned from local source paths will now get a configured remote for any recursively discoverable 'origin' sibling that is also available from a local path in order to maximize automatic file availability across local annexes. ([#3926][])
The new [result hooks mechanism][hooks] allows callers to specify,
via local Git configuration values, DataLad command calls that will
be triggered in response to matching result records (i.e., what you
see when you call a command with -f json_pp
). ([#3903][])
The command interface classes learned to use a new _examples_
attribute to render documentation examples for both the Python and
command-line API. ([#3821][])
Candidate URLs for cloning a submodule can now be generated based on configured templates that have access to various properties of the submodule, including its dataset ID. ([#3828][])
DataLad's check that the user's Git identity is configured has been sped up and now considers the appropriate environment variables as well. ([#3807][])
The tag
method of GitRepo
can now tag revisions other than
HEAD
and accepts a list of arbitrary git tag
options.
([#3787][])
When get
clones a subdataset and the subdataset's HEAD differs
from the commit that is registered in the parent, the active branch
of the subdataset is moved to the registered commit if the
registered commit is an ancestor of the subdataset's HEAD commit.
This handling has been moved to a more central location within
GitRepo
, and now applies to any update_submodule(..., init=True)
call. ([#3831][])
The output of datalad -h
has been reformatted to improve
readability. ([#3862][])
[unlock][] has been sped up. ([#3880][])
[run-procedure][] learned to provide and render more information about discovered procedures, including whether the procedure is overridden by another procedure with the same base name. ([#3960][])
[save][] now ([#3817][])
git annex sync
when saving a dataset on an adjusted branch
so that the changes are brought into the mainline branch.[subdatasets][] now aborts when its dataset
argument points to a
non-existent dataset. ([#3940][])
[wtf][] now
The ConfigManager
class
.datalad/config
as a source of
configuration values, restricting the sources to standard Git
configuration files, when called with source="local"
.
([#3907][])where
argument to allow
Python callers to more convenient override configuration.
([#3970][])Commands now accept a dataset
value of "^." as shorthand for "the
dataset to which the current directory belongs". ([#3242][])
bet we will fix some bugs and make a world even a better place.
DataLad no longer supports Python 2. The minimum supported version of Python is now 3.5. ([#3629][])
Much of the user-focused content at http://docs.datalad.org has been removed in favor of more up to date and complete material available in the [DataLad Handbook][handbook]. Going forward, the plan is to restrict http://docs.datalad.org to technical documentation geared at developers. ([#3678][])
[update][] used to allow the caller to specify which dataset(s) to
update as a PATH
argument or via the the --dataset
option; now
only the latter is supported. Path arguments only serve to restrict
which subdataset are updated when operating recursively.
([#3700][])
Result records from a [get][] call no longer have a "state" key. ([#3746][])
[update][] and [get][] no longer support operating on independent hierarchies of datasets. ([#3700][]) ([#3746][])
The [run][] update in 0.12.0rc4 for the new path resolution logic broke the handling of inputs and outputs for calls from a subdirectory. ([#3747][])
The is_submodule_modified
method of GitRepo
as well as two
helper functions in gitrepo.py, kwargs_to_options
and
split_remote_branch
, were no longer used internally or in any
known DataLad extensions and have been removed. ([#3702][])
([#3704][])
The only_remote
option of GitRepo.is_with_annex
was not used
internally or in any known extensions and has been dropped.
([#3768][])
The get_tags
method of GitRepo
used to sort tags by committer
date. It now sorts them by the tagger date for annotated tags and
the committer date for lightweight tags. ([#3715][])
The rev_resolve_path
substituted resolve_path
helper. ([#3797][])
Correctly handle relative paths in [publish][]. ([#3799][]) ([#3102][])
Do not errorneously discover directory as a procedure. ([#3793][])
Correctly extract version from manpage to trigger use of manpages for
--help
. ([#3798][])
The cfg_yoda
procedure saved all modifications in the repository
rather than saving only the files it modified. ([#3680][])
Some spots in the documentation that were supposed appear as two hyphens were incorrectly rendered in the HTML output en-dashs. ([#3692][])
[create][], [install][], and [clone][] treated paths as relative to the dataset even when the string form was given, violating the new path handling rules. ([#3749][]) ([#3777][]) ([#3780][])
Providing the "^" shortcut to --dataset
didn't work properly when
called from a subdirectory of a subdataset. ([#3772][])
We failed to propagate some errors from git-annex when working with its JSON output. ([#3751][])
With the Python API, callers are allowed to pass a string or list of
strings as the cfg_proc
argument to [create][], but the string
form was mishandled. ([#3761][])
Incorrect command quoting for SSH calls on Windows that rendered basic SSH-related functionality (e.g., [sshrun][]) on Windows unusable. ([#3688][])
Annex JSON result handling assumed platform-specific paths on Windows instead of the POSIX-style that is happening across all platforms. ([#3719][])
path_is_under()
was incapable of comparing Windows paths with different
drive letters. ([#3728][])
Provide a collection of "public" call_git*
helpers within GitRepo
and replace use of "private" and less specific _git_custom_command
calls. ([#3791][])
[status][] gained a --report-filetype
. Setting it to "raw" can
give a performance boost for the price of no longer distinguishing
symlinks that point to annexed content from other symlinks.
([#3701][])
[save][] disables file type reporting by [status][] to improve performance. ([#3712][])
[subdatasets][] ([#3743][])
contains
field that lists
which contains
arguments matched a given subdataset.contains
argument
wasn't matched to any of the reported subdatasets.[install][] now shows more readable output when cloning fails. ([#3775][])
SSHConnection
now displays a more informative error message when
it cannot start the ControlMaster
process. ([#3776][])
If the new configuration option datalad.log.result-level
is set to
a single level, all result records will be logged at that level. If
you've been bothered by DataLad's double reporting of failures,
consider setting this to "debug". ([#3754][])
Configuration values from datalad -c OPTION=VALUE ...
are now
validated to provide better errors. ([#3695][])
[rerun][] learned how to handle history with merges. As was already
the case when cherry picking non-run commits, re-creating merges may
results in conflicts, and rerun
does not yet provide an interface
to let the user handle these. ([#2754][])
The fsck
method of AnnexRepo
has been enhanced to expose more
features of the underlying git fsck
command. ([#3693][])
GitRepo
now has a for_each_ref_
method that wraps git
for-each-ref
, which is used in various spots that used to rely on
GitPython functionality. ([#3705][])
Do not pretend to be able to work in optimized (python -O
) mode,
crash early with an informative message. ([#3803][])
Various fixes and enhancements that bring the 0.12.0 release closer.
The two modules below have a new home. The old locations still exist as compatibility shims and will be removed in a future release.
datalad.distribution.subdatasets
has been moved to
datalad.local.subdatasets
([#3429][])datalad.interface.run
has been moved to datalad.core.local.run
([#3444][])The lock
method of AnnexRepo
and the options
parameter of
AnnexRepo.unlock
were unused internally and have been removed.
([#3459][])
The get_submodules
method of GitRepo
has been rewritten without
GitPython. When the new compat
flag is true (the current
default), the method returns a value that is compatible with the old
return value. This backwards-compatible return value and the
compat
flag will be removed in a future release. ([#3508][])
The logic for resolving relative paths given to a command has changed ([#3435][]). The new rule is that relative paths are taken as relative to the dataset only if a dataset instance is passed by the caller. In all other scenarios they're considered relative to the current directory.
The main user-visible difference from the command line is that using
the --dataset
argument does not result in relative paths being
taken as relative to the specified dataset. (The undocumented
distinction between "rel/path" and "./rel/path" no longer exists.)
All commands under datalad.core
and datalad.local
, as well as
unlock
and addurls
, follow the new logic. The goal is for all
commands to eventually do so.
The function for loading JSON streams wasn't clever enough to handle content that included a Unicode line separator like U2028. ([#3524][])
When [unlock][] was called without an explicit target (i.e., a directory or no paths at all), the call failed if any of the files did not have content present. ([#3459][])
AnnexRepo.get_content_info
failed in the rare case of a key
without size information. ([#3534][])
[save][] ignored --on-failure
in its underlying call to
[status][]. ([#3470][])
Calling [remove][] with a subdirectory displayed spurious warnings about the subdirectory files not existing. ([#3586][])
Our processing of git-annex --json
output mishandled info messages
from special remotes. ([#3546][])
[create][]
--force
as of 0.12.0rc3 ([#3552][])--cfg-proc
was used with --dataset
([#3591][])The base downloader had some error handling that wasn't compatible with Python 3. ([#3622][])
Fixed a number of Unicode py2-compatibility issues. ([#3602][])
AnnexRepo.get_content_annexinfo
did not properly chunk file
arguments to avoid exceeding the command-line character limit.
([#3587][])
New command create-sibling-gitlab
provides an interface for
creating a publication target on a GitLab instance. ([#3447][])
[subdatasets][] ([#3429][])
save
and status
--contains=PATH
option that can be used to restrict the
output to datasets that include a specific path.[status][] learned to accept a plain --annex
(no value) as
shorthand for --annex basic
. ([#3534][])
The .dirty
property of GitRepo
and AnnexRepo
has been sped up.
([#3460][])
The get_content_info
method of GitRepo
, used by status
and
commands that depend on status
, now restricts its git calls to a
subset of files, if possible, for a performance gain in repositories
with many files. ([#3508][])
Extensions that do not provide a command, such as those that provide only metadata extractors, are now supported. ([#3531][])
When calling git-annex with --json
, we log standard error at the
debug level rather than the warning level if a non-zero exit is
expected behavior. ([#3518][])
[create][] no longer refuses to create a new dataset in the odd scenario of an empty .git/ directory upstairs. ([#3475][])
As of v2.22.0 Git treats a sub-repository on an unborn branch as a repository rather than as a directory. Our documentation and tests have been updated appropriately. ([#3476][])
[addurls][] learned to accept a --cfg-proc
value and pass it to
its create
calls. ([#3562][])
With the replacement of the save
command implementation with rev-save
the revolution effort is now over, and the set of key commands for
local dataset operations (create
, run
, save
, status
, diff
) is
now complete. This new core API is available from datalad.core.local
(and also via datalad.api
, as any other command).
๏ฟผ
add
command is now deprecated. It will be removed in a future
release.Remove hard-coded dependencies on POSIX path conventions in SSH support code ([#3400][])
Emit an add
result when adding a new subdataset during [save][] ([#3398][])
SSH file transfer now actually opens a shared connection, if none exists yet ([#3403][])
SSHConnection
now offers methods for file upload and dowload (get()
,
put()
. The previous copy()
method only supported upload and was
discontinued ([#3401][])๏ฟผ
Continues API consolidation and replaces the create
and diff
command
with more performant implementations.
The previous diff
command has been replaced by the diff variant
from the [datalad-revolution][] extension. ([#3366][])
rev-create
has been renamed to create
, and the previous create
has been removed. ([#3383][])
The procedure setup_yoda_dataset
has been renamed to cfg_yoda
([#3353][]).
The --nosave
of addurls
now affects only added content, not
newly created subdatasets ([#3259][]).
Dataset.get_subdatasets
(deprecated since v0.9.0) has been
removed. ([#3336][])
The .is_dirty
method of GitRepo
and AnnexRepo
has been
replaced by .status
or, for a subset of cases, the .dirty
property. ([#3330][])
AnnexRepo.get_status
has been replaced by AnnexRepo.status
.
([#3330][])
[status][]
--annex basic
was specified ([#3378][])An informative error wasn't given when a download provider was invalid. ([#3258][])
Calling rev-save PATH
saved unspecified untracked subdatasets.
([#3288][])
The available choices for command-line options that take values are now displayed more consistently in the help output. ([#3326][])
The new pathlib-based code had various encoding issues on Python 2. ([#3332][])
[wtf][] now includes information about the Python version. ([#3255][])
When operating in an annex repository, checking whether git-annex is available is now delayed until a call to git-annex is actually needed, allowing systems without git-annex to operate on annex repositories in a restricted fashion. ([#3274][])
The load_stream
on helper now supports auto-detection of
compressed files. ([#3289][])
create
(formerly rev-create
)
status
([#3294][])--cfg-proc
(or -c
) convenience option for running
configuration procedures (or more accurately any procedure that
begins with "cfg_") in the newly created dataset ([#3353][])AnnexRepo.set_metadata
now returns a list while
AnnexRepo.set_metadata_
returns a generator, a behavior which is
consistent with the add
and add_
method pair. ([#3298][])
AnnexRepo.get_metadata
now supports batch querying of known annex
files. Note, however, that callers should carefully validate the
input paths because the batch call will silently hang if given
non-annex files. ([#3364][])
[status][]
eval_subdataset_state
that controls how the
subdataset state is evaluated. Depending on the information you
need, you can select a less expensive mode to make status
faster. ([#3324][])Querying repository content is faster due to batching of git
cat-file
calls. ([#3301][])
The dataset ID of a subdataset is now recorded in the superdataset. ([#3304][])
GitRepo.diffstatus
GitRepo.get_content_info
now supports disabling the file type
evaluation, which gives a performance boost in cases where this
information isn't needed. ([#3362][])
The XMP metadata extractor now filters based on file name to improve its performance. ([#3329][])
GitRepo.dirty
does not report on nested empty directories ([#3196][]).
GitRepo.save()
reports results on deleted files.
Absorb a new set of core commands from the datalad-revolution extension:
rev-status
: like git status
, but simpler and working with dataset
hierarchiesrev-save
: a 2-in-1 replacement for save and addrev-create
: a ~30% faster createJSON support tools can now read and write compressed files.
Dataset and Repo object instances are now hashable, and can be created based on pathlib Path object instances
Imported various additional methods for the Repo classes to query information and save changes.
Prepared for upstream changes in git-annex, including support for the latest git-annex
The cfg_text2git
procedure, as well the --text-no-annex
option
of [create][], now configure .gitattributes so that empty files are
stored in git rather than annex. ([#3667][])
Primarily bugfixes with some optimizations and refactorings.
[addurls][]
[run-procedure][]
sys.executable
is now used.
([#3624][])shlex.quote
, but note that on Windows
values are left unquoted. ([#3626][])[siblings][] now displays an informative error message if a local
path is given to --url
but --name
isn't specified. ([#3555][])
[sshrun][], the command DataLad uses for GIT_SSH_COMMAND
, didn't
support all the parameters that Git expects it to. ([#3616][])
Fixed a number of Unicode py2-compatibility issues. ([#3597][])
[download-url][] now will create leading directories of the output path if they do not exist ([#3646][])
The [annotate-paths][] helper now caches subdatasets it has seen to avoid unnecessary calls. ([#3570][])
A repeated configuration query has been dropped from the handling of
--proc-pre
and --proc-post
. ([#3576][])
Calls to git annex find
now use --in=.
instead of the alias
--in=here
to take advantage of an optimization that git-annex (as
of the current release, 7.20190730) applies only to the
former. ([#3574][])
[addurls][] now suggests close matches when the URL or file format contains an unknown field. ([#3594][])
Shared logic used in the setup.py files of Datalad and its extensions has been moved to modules in the _datalad_build_support/ directory. ([#3600][])
Get ready for upcoming git-annex dropping support for direct mode ([#3631][])
Primarily bug fixes to achieve more robust performance
Our tests needed various adjustments to keep up with upstream changes in Travis and Git. ([#3479][]) ([#3492][]) ([#3493][])
AnnexRepo.is_special_annex_remote
was too selective in what it
considered to be a special remote. ([#3499][])
We now provide information about unexpected output when git-annex is
called with --json
. ([#3516][])
Exception logging in the __del__
method of GitRepo
and
AnnexRepo
no longer fails if the names it needs are no longer
bound. ([#3527][])
[addurls][] botched the construction of subdataset paths that were more than two levels deep and failed to create datasets in a reliable, breadth-first order. ([#3561][])
Cloning a type=git
special remote showed a spurious warning about
the remote not being enabled. ([#3547][])
For calls to git and git-annex, we disable automatic garbage collection due to past issues with GitPython's state becoming stale, but doing so results in a larger .git/objects/ directory that isn't cleaned up until garbage collection is triggered outside of DataLad. Tests with the latest GitPython didn't reveal any state issues, so we've re-enabled automatic garbage collection. ([#3458][])
[rerun][] learned an --explicit
flag, which it relays to its calls
to [run][[]]. This makes it possible to call rerun
in a dirty
working tree ([#3498][]).
The [metadata][] command aborts earlier if a metadata extractor is unavailable. ([#3525][])
Should be faster and less buggy, with a few enhancements.
--ui
.git submodule update --init
is no longer called from the
post-update hook.--inherit
is given for a dataset without a superdataset, a
warning is now given instead of raising an error.env
argument had unicode values. ([#3332][])annex.largefiles
threshold. The logic of this workaround was
faulty, leading to files being displayed as typechanged in the index
following the commit. ([#3365][])-R
is now available for the --recursion-limit
flag,
a flag shared by several subcommands. ([#3340][])datalad.ui.progressbar
can be used to
configure the default backend for progress reporting ("none", for
example, results in no progress bars being shown). ([#3396][])datalad.ui.color
configuration option when deciding to
color output. The default value, "auto", retains the current
behavior of coloring output if attached to a TTY ([#3407][]).git clone
.
([#3425][])dist
package if platform.dist
,
which has been removed in the yet-to-be-release Python 3.8, does
not exist. ([#3439][])--section
option for limiting the output to specific
sections and a --decor
option, which currently knows how to
format the output as GitHub's <details>
section. ([#3440][])Largely a bug fix release with a few enhancements
Extraction of .gz files is broken without p7zip installed. We now abort with an informative error in this situation. ([#3176][])
Committing failed in some cases because we didn't ensure that the
path passed to git read-tree --index-output=...
resided on the
same filesystem as the repository. ([#3181][])
Some pointless warnings during metadata aggregation have been eliminated. ([#3186][])
With Python 3 the LORIS token authenticator did not properly decode a response ([#3205][]).
With Python 3 downloaders unnecessarily decoded the response when getting the status, leading to an encoding error. ([#3210][])
In some cases, our internal command Runner did not adjust the
environment's PWD
to match the current working directory specified
with the cwd
parameter. ([#3215][])
The specification of the pyliblzma dependency was broken. ([#3220][])
[search] displayed an uninformative blank log message in some cases. ([#3222][])
The logic for finding the location of the aggregate metadata DB anchored the search path incorrectly, leading to a spurious warning. ([#3241][])
Some progress bars were still displayed when stdout and stderr were not attached to a tty. ([#3281][])
Check for stdin/out/err to not be closed before checking for .isatty
.
([#3268][])
Creating a new repository now aborts if any of the files in the directory are tracked by a repository in a parent directory. ([#3211][])
[run] learned to replace the {tmpdir}
placeholder in commands with
a temporary directory. ([#3223][])
[duecredit][] support has been added for citing DataLad itself as well as datasets that an analysis uses. ([#3184][])
The eval_results
interface helper unintentionally modified one of
its arguments. ([#3249][])
A few DataLad constants have been added, changed, or renamed ([#3250][]):
HANDLE_META_DIR
is now DATALAD_DOTDIR
. The old name should be
considered deprecated.METADATA_DIR
now refers to DATALAD_DOTDIR/metadata
rather than
DATALAD_DOTDIR/meta
(which is still available as
OLDMETADATA_DIR
).DATASET_METADATA_FILE
refers to METADATA_DIR/dataset.json
.DATASET_CONFIG_FILE
refers to DATALAD_DOTDIR/config
.METADATA_FILENAME
has been renamed to OLDMETADATA_FILENAME
.Just a few of important fixes and minor enhancements.
The logic for setting the maximum command line length now works
around Python 3.4 returning an unreasonably high value for
SC_ARG_MAX
on Debian systems. ([#3165][])
DataLad commands that are conceptually "read-only", such as
datalad ls -L
, can fail when the caller lacks write permissions
because git-annex tries merging remote git-annex branches to update
information about availability. DataLad now disables
annex.merge-annex-branches
in some common "read-only" scenarios to
avoid these failures. ([#3164][])
Accessing an "unbound" dataset method now automatically imports the
necessary module rather than requiring an explicit import from the
Python caller. For example, calling Dataset.add
no longer needs to
be preceded by from datalad.distribution.add import Add
or an
import of datalad.api
. ([#3156][])
Configuring the new variable datalad.ssh.identityfile
instructs
DataLad to pass a value to the -i
option of ssh
. ([#3149][])
([#3168][])
A variety of bugfixes and enhancements
datalad.cmd.get_runner
has been removed. ([#3104][])SC_ARG_MAX
didn't check that the
reported value was a sensible, positive number. ([#3025][])git
and git-annex
with file
arguments learned to split up the command calls when it is likely
that the command would fail due to exceeding the maximum supported
length. ([#3138][])setup_yoda_dataset
procedure created a malformed
.gitattributes line. ([#3057][])--no-save
was given. ([#3029][])--onto
didn't exist. ([#3019][])run
didn't preserve the current directory prefix ("./") on
inputs and outputs, which is problematic if the caller relies on
this representation when formatting the command. ([#3037][])save
instead of add
even though run
uses
add
underneath. ([#3080][])git worktree
checkout of the
source repository. ([#3129][])GIT_SSH_VARIANT=ssh
to git processes to be able to specify
alternative ports in SSH urls--group
option so that the caller can specify the file
system group for the repository. ([#3098][])man git-fetch
). ([#3146][])--input
and --output
can now be shortened to -i
and -o
.
([#3066][])interface.run.run_command
gained an extra_inputs
argument so
that wrappers like [datalad-container][] can specify additional inputs
that aren't considered when formatting the command string. ([#3038][])run
and those for
the command in ambiguous cases. ([#3119][])create_tree
and ok_file_has_content
now support
".gz" files. ([#3049][])GitRepo.set_gitattributes
now accepts a mode
argument that
controls whether the .gitattributes file is appended to (default) or
overwritten. ([#3115][])datalad --help
now avoids using man
so that the list of
subcommands is shown. ([#3124][])Rushed out bugfix release to stay fully compatible with recent [git-annex][] which introduced v7 to replace v6.
-r
invocation aggregating all subdatasets of the specified dataset
as wellannex
commands are now chunked assuming 50% "safety margin" on the
maximal command line length. Should resolve crashes while operating
ot too many files at ones ([#3001][])run
sidecar config processing ([#2991][])ds.repo.set_gitattributes
([#2974][]) ([#2954][])os.getcwd()
if inconsistency with env var
$PWD
is detected ([#2914][])tools/bisect-git-annex
provides a helper for running
git bisect
on git-annex using that Singularity container ([#2995][]).zenodo.json
for better integration with Zenodo for citationannex
metadata extractor now extracts annex.key
metadata record.
Should allow now to identify uses of specific files etc ([#2952][])CommandError
(e.g. in case of "out of space"
error) ([#2958][])[git-annex][] 6.20180913 (or later) is now required - provides a number of fixes for v6 mode operations etc.
datalad.consts.LOCAL_CENTRAL_PATH
constant was deprecated in favor
of datalad.locations.default-dataset
[configuration][config] variable
([#2835][])"notneeded"
messages are no longer reported by default results
rendererexplicit
is true and no outputs are specified ([#2922][])get_git_dir
moved into GitRepo ([#2886][])_gitpy_custom_call
removed from GitRepo ([#2894][])GitRepo.get_merge_base
argument is now called commitishes
instead
of treeishes
([#2903][])jobs
set to be "auto"
(not None
) to take
advantage of possible parallel get if in -g
mode ([#2861][])git-annex
is not installed etc ([#2865][]),
([#2865][]), ([#2918][]), ([#2917][])__del__
should not access .repo
but ._repo
to avoid attempts
for reinstantiation etc ([#2901][]).git
right in GitRepo.add_submodule
to avoid
added submodules being non git-annex friendly ([#2909][]), ([#2904][]).py
or .sh
suffixes.gitattributes
handling while setting annex backend
([#2912][])GlobbedPaths.expand(..., full=True)
incorrectly returned relative
paths when called more than once ([#2921][])sth_like_file_has_content
was removed ([#2860][])git annex init
operation is now logged ([#2881][])GitRepo.cherry_pick
([#2900][])GitRepo.format_commit
([#2902][])Emergency bugfix to address forgotten boost of version in
datalad/version.py
.
This is largely a bugfix release which addressed many (but not yet all)
issues of working with git-annex direct and version 6 modes, and operation
on Windows in general. Among enhancements you will see the
support of public S3 buckets (even with periods in their names),
ability to configure new providers interactively, and improved egrep
search backend.
Although we do not require with this release, it is recommended to make
sure that you are using a recent git-annex
since it also had a variety
of fixes and enhancements in the past months.
datalad save
instructions shown by datalad run
for a command
with a non-zero exit were incorrectly formatted. ([#2692][])datalad
add-archive-content
) failed on Python 3. ([#2702][])BadName
issue. ([#2712][]), ([#2794][])datalad add-readme
halted when no aggregated metadata was found
rather than displaying a warning. ([#2731][])datalad rerun
failed if --onto
was specified and the history
contained no run commits. ([#2761][])install
). ([#2788][])datalad install
removed the directory after a failed clone. ([#2788][])datalad run
incorrectly handled inputs and outputs for paths with
spaces and other characters that require shell escaping. ([#2798][])datalad run
didn't work correctly
if a subdataset wasn't installed. ([#2796][])pa*:findme
) is a hit, when any
matching field matches the query.datalad run
has
been improved. ([#2703][])datalad --version
now simply shows the version without the
license. ([#2733][])datalad export-archive
learned to export under an existing
directory via its --filename
option. ([#2723][])datalad export-to-figshare
now generates the zip archive in the
root of the dataset unless --filename
is specified. ([#2723][])datalad.api
, help(datalad.api)
(or
datalad.api?
in IPython) now shows a summary of the available
DataLad commands. ([#2728][])datalad
from IPython has been improved. ([#2722][])datalad wtf
now returns structured data and reports the version of
each extension. ([#2741][])datalad create
--force
no longer duplicates existing attributes. ([#2744][])add_url_to_file
method (called by commands like datalad
download-url
and datalad add-archive-content
) learned how to
display a progress bar. ([#2738][])Primarily a bugfix release to accommodate recent git-annex release forbidding file:// and http://localhost/ URLs which might lead to revealing private files if annex is publicly shared.
yoda
procedure will instantiate README.md
--discover
option added to [run-procedure][] to list available
proceduresThe is a minor bugfix release.
run-procedure
.rerun
error when trying to unlock non-available files.This release is a major leap forward in metadata support.
.datalad/meta
is no
longer used or supported. Metadata must be reaggregated using 0.10
versiondatalad.metadata.nativetype
config
(could contain multiple values)export_tarball
plugin has been generalized to export_archive
and
can now also generate ZIP archives.A number of fixes did not make it into the 0.9.x series:
-c
option were not in effect.save
is now more robust with respect to invocation in subdirectories
of a dataset.unlock
now reports correct paths when running in a dataset subdirectory.get
is more robust to path that contain symbolic links.add
now correctly saves staged subdataset additions.datalad save
in a dataset no longer adds untracked content to the
dataset. In order to add content a path has to be given, e.g. datalad save .
wtf
now works reliably with a DataLad that wasn't installed from Git (but,
e.g., via pip)simple_with_archives
crawler pipeline.search
can now discover individual files.datalad.metadata.create-aggregate-annex-limit
).metadata --get-aggregates
datalad.metadata.maxfieldsize
to exclude too large
metadata fields from aggregation.datalad.metadata.nativetype
was introduced to enable
one or more particular metadata extractors for a dataset.datalad.metadata.store-aggregate-content
to enable
the storage of aggregated metadata for dataset content (i.e. file-based metadata)
in contrast to just metadata describing a dataset as a whole.search
was completely reimplemented. It offers three different modes now:--senstive=some
or --senstive=all
.-d <parent> --nosave
now registers subdatasets, when possible.--fake-dates
configures dataset to use fake-datesdatalad rerun
now has a --script
option that can be used to extract
previous commands into a file.datalad --report-status
has a new value 'all' that can be used to
temporarily re-enable reporting that was disable by configuration settings.Some important bug fixes which should improve usability
datalad-archives
special remote now will lock on acquiring or
extracting an archive - this allows for it to be used with -J flag
for parallel operationdatalad ls
should now list "authored date" and work also for datasets
in detached HEAD modedatalad save
will now save original file as well, if file was
"git mv"ed, so you can now datalad run git mv old new
and have
changes recorded--jobs
argument now could take auto
value which would decide ongit-annex
> 6.20180314 is recommended to avoid regression with -J.RI
meta-constructor -- should speed up operation a
bitDATALAD_SEED
environment variable could be used to seed Python RNG
and provide reproducible UUIDs etc (useful for testing and demos)Largely a bugfix release with a few enhancements.
remove
if annex drop
faileddatalad rerun
command capable of rerunning entire
sequences of previously run
commands.
Reproducibility through VCS. Use run
even if not interested in rerun
git
is not yet configured but git operations
are requestedgit-hub
tool to
"attach" commits to an issue making it into a PRswallow_logs
in the code was refactored away -- less
mysteries now, just increase logging levelwtf
plugin will report more information about environment, externals
and the systemMinor bugfix release
files
argument of [save][] has been renamed to path
to be uniform with
any other command--transfer-data
switch that allows for a
disambiguous specification of whether to publish data -- independent of
the selection which datasets to publish (which is done via their paths).
Moreover, [publish][] now transfers data before repository content is pushed.--since=
in considering only the
differences the last "pushed" statelargefiles
if
specified in .gitattributes
datalad-recursiveinstall
submodule configuration property)annex.largefiles
setting if any was set within .gitattribues
(e.g. by datalad create --text-no-annex
)tools/cast*
tools and sample cast scripts under
doc/casts
which are published at datalad.org/features.htmlBugfixes
A variety of fixes and enhancements
git-annex
branch even if no other changes
were donegit-annex
special remotes copy_to
got progress bar report now and support of --jobs
New features, refactorings, and bug fixes.
here
wanted
(as previously supported in other commands), and now
also required
This release includes a huge refactoring to make code base and functionality more robust and flexible
--output-format
, --report-status
, --report-type
, and --report-type
options for [datalad][] command.add-sibling
and rewrite-urls
were refactored in favor of new [siblings][]
command which should be used for siblings manipulationspost-update
hook script now should be more robust
(tolerate directory names with spaces, etc.)--modified
to summarize changes between different points in
the historybenchmarks/
collection of Airspeed velocity
benchmarks initiated. See reports at http://datalad.github.io/datalad/move
) were removed from
the interfaceA bugfix release
.gitattributes
. Now that decision is left to annex by defaulttools/testing/run_doc_examples
used to run
doc examples as tests, fixed up to provide status per each example
and not fail at oncedoc/examples
doc/examples
This release includes an avalanche of bug fixes, enhancements, and additions which at large should stay consistent with previous behavior but provide better functioning. Lots of code was refactored to provide more consistent code-base, and some API breakage has happened. Further work is ongoing to standardize output and results reporting ([#1350][])
-a
is deprecated in favor of -u
or --all-updates
so only changes known components get saved, and no new files
automagically added-S
does no longer store the originating dataset in its commit
message-m
-s
(--name
)
option, not a positional argument--publish-depends
to setup publishing data and code to multiple
repositories (e.g. github + webserve) should now be functional
see this comment--publish-by-default
to specify what refs should be published
by default--annex-wanted
, --annex-groupwanted
and --annex-group
settings which would be used to instruct annex about preferred
content. [publish][] then will publish data using those settings if
wanted
is set.--inherit
option to automagically figure out url/wanted and
other git/annex settings for new remote sub-dataset to be constructed--skip-failing
refactored into --missing
option
which could use new feature of [create-sibling][] --inherit
--what
to specify explicitly what cleaning steps to perform
and now could be invoked with -r
datalad
and git-annex-remote*
scripts now do not use setuptools
entry points mechanism and rely on simple import to shorten start up time_prep
for arguments validation
and pre-processing to avoid recursive invocationsRequires now GitPython >= 2.1.0
git config
to avoid leakage of possibly
sensitive settings to the logssibling
of a dataset on
githubgit annex enableremote datalad
to make them available)Primarily it is a bugfix release but because of significant refactoring of the [install][] and [get][] implementation, it gets a new minor release.
--recursion-limit=existing
to not recurse into not-installed
subdatasets-n
to possibly install sub-datasets without getting any data--jobs|-J
to specify number of parallel jobs for annex
[get][] call could use (ATM would not work when data comes from archives)Primarily bugfixes but also a number of enhancements and core refactorings
-r
or
-g
).datalad/config
and local within .git/config
variables we have used were renamed to match configuration namestarball
plugin to export
datasets.api
functions with rendering of results in command line
got a _-suffixed sibling, which would render results as well in Python
as well (e.g., using search_
instead of search
would also render
results, not only output them back as Python objects)--jobs
option (passed to annex get
) for parallel downloads--reckless
mode option-d^
or -d///
to point to top-most or centrally
installed meta-datasets-s
option to specify which fields (only) to searchtqdm
library (progressbar
is no longer
used/supported)Lots of everything, including but not limited to
New features and bugfix release
New feature and bugfix release
Major RFing to switch from relying on rdf to git native submodules etc
Release primarily focusing on interface functionality including initial publishing