Changelogs¶
Python¶
[0.X.X] - 2020-XX-XX¶
Features
Expose
TreeSequence.coiterate()method to allow iteration over 2 sequences simultaneously, aiding comparison of trees from two sequences. (@jeromekelleher, @hyanwong, #1021, #1022)tskit is now supported on, and has wheels for, python3.9. (@benjeffery, #982, #907)
Tree.newick()now has extra optioninclude_branch_lengthsto allow branch lengths to be omitted (@hyanwong, #931).Added
Tree.generate_starstatic method to create star-topologies (@hyanwong, #934).Added
Tree.generate_combandTree.generate_balancedmethods to create example trees. (@jeromekelleher, #1026).Added
equalsmethod to TreeSequence, TableCollection and each of the tables which provides more flexible equality comparisons, for example, allowing users to ignore metadata or provenance in the comparison. (@mufernando, @jeromekelleher, #896, #897, #913, #917).Added
__eq__to TreeSequence. (@benjeffery, #1011, #1020)ts.dumpandtskit.loadnow support reading and writing file objects such as FIFOs and sockets. (@benjeffery, #657, #909)Added
tskit.write_msfor writing to MS format. (@saurabhbelsare, #727, #854)Added
TableCollection.indexesfor access to the edge insertion/removal order indexes. (@benjeffery, #4, #916)The dictionary representation of a TableCollection now contains its index. (@benjeffery, #870, #921)
Added
TreeSequence._repr_html_for use in jupyter notebooks. (@benjeffery, #872, #923)Added
TreeSequence.__repr__to display a summary for terminal usage. (@benjeffery, #938, #985)Added
TableCollection.dumpandTableCollection.load. This allows table collections that are not valid tree sequences to be manipulated. (@benjeffery, #14, #986)Added
nbytesmethod to tables,TableCollectionandTreeSequencewhich reports the size in bytes of those objects. (@jeromekelleher, @benjeffery, #54, #871)Added
TableCollection.clearto clear data table rows and optionally provenances, table schemas and tree-sequence level metadata and schema. (@benjeffery, #929, #1001)
Bugfixes
LightWeightTableCollection.asdictandTableCollection.asdictnow return copies of arrays. (@benjeffery, #1025, #1029)
Breaking changes
The argument to
ts.dumpandtskit.loadhas been renamed file from path.All arguments to
Tree.newick()except precision are now keyword-only.Renamed
ts.trait_regressiontots.trait_linear_model.
[0.3.2] - 2020-09-29¶
Breaking changes
The argument order of
Tree.unrankandcombinatorics.num_labellingsnow positions the number of leaves before the tree rank (@daniel-goldstein, #950, #978)Change several methods (
simplify(),trees(),Tree()) so most parameters are keyword only, not positional. This allows reordering of parameters, so that deprecated parameters can be moved, and the parameter order in similar functions, e.g.TableCollection.simplifyandTreeSequence.simplify()can be made consistent (@hyanwong, #374, #846, #851)
Features
Add
split_polytomiesmethod to the Tree class (@hyanwong, @jeromekelleher, #809, #815)Tree accessor functions (e.g.
ts.first(),ts.at()pass extra parameters such assample_indexesto the underlyingTreeconstructor; alsoroot_thresholdcan be specified when callingts.trees()(@hyanwong, #847, #848)Genomic intervals returned by python functions are now namedtuples, allowing
.left.rightand.spanusage (@hyanwong, #784, #786, #811)Added
include_terminalparameter to edge diffs iterator, to output the last edges at the end of a tree sequence (@hyanwong, #783, #787)#832 - Add
metadata_bytesmethod to allow access to raw TableCollection metadata (@benjeffery, #842)tskit.is_unknown_timecan now check arrays. (@benjeffery, #857).
[0.3.1] - 2020-09-04¶
Bugfixes
#823 - Fix mutation time error when using
simplify(keep_input_roots=True)(@petrelharp, #823).#821 - Fix mutation rows with unknown time never being equal (@petrelharp, #822).
[0.3.0] - 2020-08-27¶
Major feature release for metadata schemas, set-like operations, mutation times, SVG drawing improvements and many others.
Breaking changes
The default display order for tree visualisations has been changed to
minlex(see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available withorder="tree".File system operations such as dump/load now raise an appropriate OSError instead of
tskit.FileFormatError. Loading from an empty file now raises andEOFError.Bad tree topologies are detected earlier, so that it is no longer possible to create a
TreeSequenceobject which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (@jeromekelleher, #709).The
TableCollection objectno longer implements the iterator protocol. Previouslylist(tables)returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proofTableCollection.name_mapandTreeSequence.tables_dictattributes, which perform the same function (@jeromekelleher, #500, #694).The arguments to
TreeSequence.genotype_matrix,TreeSequence.haplotypesandTreeSequence.variantsmust now be keyword arguments, not positional. This is to support the change fromimpute_missing_datatoisolated_as_missingin the arguments to these methods. (@benjeffery, #716, #794)
New features
New methods to perform set operations on TableCollections and TreeSequences.
TableCollection.subsetsubsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).TableCollection.unionforms the node-wise union of two table collections (@mufernando, @petrelharp, #381 #623).Mutations now have an optional double-precision floating-point
timecolumn. If not specified, this defaults to a particularNaNvalue (tskit.UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see Mutation requirements. Also added functionTableCollection.compute_mutation_times. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence (@benjeffery, #672).Add support for trees with internal samples for the Kendall-Colijn tree distance metric. (@daniel-goldstein, #610)
Add background shading to SVG tree sequences to reflect tree position along the sequence (@hyanwong, #563).
Tables with a metadata column now have a
metadata_schemathat is used to validate and encode metadata that is passed toadd_rowand decode metadata on calls totable[j]and e.g.tree_sequence.node(j)See Metadata (@benjeffery, #491, #542, #543, #601).The tree-sequence now has top-level metadata with a schema (@benjeffery, #666, #644, #642).
Add classes to SVG drawings to allow easy adjustment and styling, and document the new
tskit.Tree.draw_svg()andtskit.TreeSequence.draw_svg()methods. This also fixes #467 for duplicate SVG entityids in Jupyter notebooks (@hyanwong, #555).Add a
to_nexusfunction that outputs a tree sequence in Nexus format (@saunack, #550).Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
TreeSequence.kc_distance(@daniel-goldstein, #548).Add an optional node traversal order in
tskit.Treethat uses the minimum lexicographic order of leaf nodes visited. This ordering ("minlex_postorder") adds more determinism because it constraints the order in which children of a node are visited (@brianzhang01, #411).Add an
orderargument to the tree visualisation functions which supports two node orderings:"tree"(the previous default) and"minlex"which stabilises the node ordering (making it easier to compare trees). The default node ordering is changed to"minlex"(@brianzhang01, @jeromekelleher, #389, #566).Add
_repr_html_to tables, so that jupyter notebooks render them as html tables (@benjeffery, #514).Remove support for
kc_distanceon trees with unary nodes (@daniel-goldstein, #508).Improve Kendall-Colijn tree distance algorithm to operate in O(n^2) time instead of O(n^2 * log(n)) where n is the number of samples (@daniel-goldstein, #490).
Add a metadata column to the migrations table. Works similarly to existing metadata columns on other tables (@benjeffery, #505).
Add a metadata column to the edges table. Works similarly to existing metadata columns on other tables (@benjeffery, #496).
Allow sites with missing data to be output by the
haplotypesmethod, by default replacing with-. Errors are no longer raised for missing data withisolated_as_missing=True; the error types returned for bad alleles (e.g. multiletter or non-ascii) have also changed from_tskit.LibraryErrorto TypeError, or ValueError if the missing data character clashes (@hyanwong, #426).Access the number of children of a node in a tree directly using
tree.num_children(u)(@hyanwong, #436).User specified allele mapping for genotypes in
variantsandgenotype_matrix(@jeromekelleher, #430).New
root_thresholdoption for the Tree class, which allows us to efficiently iterate over ‘real’ roots when we have missing data (@jeromekelleher, #462).Add
tree.as_dict_of_dicts()function to enable use with networkx. See Traversals with networkx (@winni2k, #457).Add
tree_sequence.to_macs()function to convert tree sequence to MACS format (@winni2k, #727)Add a
keep_input_rootsoption to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).
Bugfixes
#453 - Fix LibraryError when
tree.newick()is called with large node time values (@jeromekelleher, #637).#777 - Mutations over isolated samples were incorrectly decoded as missing data. (@jeromekelleher, #778)
#776 - Fix a segfault when a partial list of samples was provided to the
variantsiterator. (@jeromekelleher, #778)
Deprecated
The
sample_countsfeature has been deprecated and is now ignored. Sample counts are now always computed.For
TreeSequence.genotype_matrix,TreeSequence.haplotypesandTreeSequence.variantstheimpute_missing_dataargument is deprecated and replaced withisolated_as_missing. Note that to get the same behaviourimpute_missing_data=Trueshould be replaced withisolated_as_missing=False. (@benjeffery, #716, #794)
[0.2.3] - 2019-11-22¶
Minor feature release, providing a tree distance metric and various method to manipulate tree sequence data.
New features
Kendall-Colijn tree distance metric computed by
Tree.kc_distance(@awohns, #172).New “timeasc” and “timedesc” orders for tree traversals (@benjeffery, #246, #399).
Up to 2X performance improvements to tree traversals (@benjeffery, #400).
Add
trim,delete_sites,keep_intervalsanddelete_intervalsmethods to edit tree sequence data. (@hyanwong, #364, #372, #377, #390).Various documentation improvements (@hyanwong, @jeromekelleher, @petrelharp).
Rename the
map_ancestorsfunction tolink_ancestors(@hyanwong, @gtsambos; #406, #262). The original function is retained as an deprecated alias.
Bugfixes
Fix height scaling issues with SVG tree drawing (@jeromekelleher, #407, #383, #378).
Do not reuse buffers in
LdCalculator(@jeromekelleher). See #397 and #396.
[0.2.2] - 2019-09-01¶
Minor bugfix release.
Relaxes overly-strict input requirements on individual location data that caused some SLiM tree sequences to fail loading in version 0.2.1 (see #351).
New features
Add log_time height scaling option for drawing SVG trees (@marianne-aspbury). See #324 and #303.
Bugfixes
Allow 4G metadata columns (@jeromekelleher). See #342 and #341.
[0.2.1] - 2019-08-23¶
Major feature release, adding support for population genetic statistics, improved VCF output and many other features.
Note: Version 0.2.0 was skipped because of an error uploading to PyPI which could not be undone.
Breaking changes
Genotype arrays returned by
TreeSequence.variantsandTreeSequence.genotype_matrixhave changed from unsigned 8 bit values to signed 8 bit values to accomodate missing data (see #144 for discussion). Specifically, the dtype of the genotypes arrays have changed from numpy “u8” to “i8”. This should not affect client code in any way unless it specifically depends on the type of the returned numpy array.The VCF written by the
write_vcfis no longer compatible with previous versions, which had significant shortcomings. Position values are now rounded to the nearest integer by default, REF and ALT values are derived from the actual allelic states (rather than always being A and T). Sample names are now of the formtsk_jfor sample ID j. Most of the legacy behaviour can be recovered with new options, however.The positional parameter
reference_setsingenealogical_nearest_neighboursandmean_descendantsTreeSequence methods has been renamed tosample_sets.
New features
Support for general windowed statistics. Implementations of diversity, divergence, segregating sites, Tajima’s D, Fst, Patterson’s F statistics, Y statistics, trait correlations and covariance, and k-dimensional allele frequency specra (@petrelharp, @jeromekelleher, @molpopgen).
Add the
keep_unaryoption to simplify (@gtsambos). See #1 and #143.Add the
map_ancestorsmethod to TableCollection (user:gtsambos). See #175.Add the
squashmethod to EdgeTable (@gtsambos). See #59 and #285.Add support for individuals to VCF output, and fix major issues with output format (@jeromekelleher). Position values are transformed in a much more straightforward manner and output has been generalised substantially. Adds
individual_namesandposition_transformarguments. See #286, and issues #2, #30 and #73.Control height scale in SVG trees using ‘tree_height_scale’ and ‘max_tree_height’ (@hyanwong, @jeromekelleher). See #167, #168. Various other improvements to tree drawing (#235, #241, #242, #252, #259).
Add
Tree.max_root_timeproperty (@hyanwong, @jeromekelleher). See #170.Improved input checking on various methods taking numpy arrays as parameters (@hyanwong). See #8 and #185.
Define the branch length over roots in trees to be zero (previously raise an error; @jeromekelleher). See #188 and #191.
Implementation of the genealogical nearest neighbours statistic (@hyanwong, @jeromekelleher).
New
delete_intervalsandkeep_intervalsmethod for the TableCollection to allow slicing out of topology from specific intervals (@hyanwong, @andrewkern, @petrelharp, @jeromekelleher). See #225 and #261.Support for missing data via a topological definition (@jeromekelleher). See #270 and #272.
Add ability to set columns directly in the Tables API (@jeromekelleher). See #12 and #307.
Various documentation improvements from @brianzhang01, @hyanwong, @petrelharp and @jeromekelleher.
Deprecated
Deprecate
Tree.lengthin favour ofTree.span(@hyanwong). See #169.Deprecate
TreeSequence.pairwise_diversityin favour of the newdiversitymethod. See #215, #312.
Bugfixes
[0.1.5] - 2019-03-27¶
This release removes support for Python 2, adds more flexible tree access and a
new tskit command line interface.
New features
More flexible tree API (#121). Adds
TreeSequence.atandTreeSequence.at_indexmethods to find specific trees, and efficient support for backwards traversal usingreversed(ts.trees()).Add initial
tskitCLI (#80)Add
tskit infoCLI command (#66)Enable drawing SVG trees with coloured edges (@hyanwong; #149).
Add
Tree.is_descendantmethod (#120)Add
Tree.copymethod (#122)
Bugfixes
[0.1.4] - 2019-02-01¶
Minor feature update. Using the C API 0.99.1.
New features
Add interface for setting TableCollection.sequence_length: https://github.com/tskit-dev/tskit/issues/107
Add support for building and dropping TableCollection indexes: https://github.com/tskit-dev/tskit/issues/108
[0.1.3] - 2019-01-14¶
Bugfix release.
Bugfixes
Fix missing provenance schema: https://github.com/tskit-dev/tskit/issues/81
[0.1.2] - 2019-01-14¶
Bugfix release.
Bugfixes
Fix memory leak in table collection. https://github.com/tskit-dev/tskit/issues/76
[0.1.1] - 2019-01-11¶
Fixes broken distribution tarball for 0.1.0.
[0.1.0] - 2019-01-11¶
Initial release after separation from msprime 0.6.2. Code that reads tree sequence files and processes them should be able to work without changes.
Breaking changes
Removal of the previously deprecated
sort_tables,simplify_tablesandload_tablesfunctions. All code should change to using corresponding TableCollection methods.Rename
SparseTreeclass toTree.
[1.1.0a1] - 2019-01-10¶
Initial alpha version posted to PyPI for bootstrapping.
[0.0.0] - 2019-01-10¶
Initial extraction of tskit code from msprime. Relicense to MIT.
Code copied at hash 29921408661d5fe0b1a82b1ca302a8b87510fd23
C API¶
[0.99.8] - 2020-XX-XX¶
Breaking changes
Added an
optionsargument totsk_table_collection_equalsand table equality methods to allow for more flexible equality criteria (e.g., ignore top-level metadata and schema or provenance tables). Existing code should add an extra final parameter0to retain the current behaviour. (@mufernando, @jeromekelleher, #896, #897, #913, #917).Changed default behaviour of
tsk_table_collection_clearto not clear provenances and addedoptionsargument to optionally clear provenances and schemas. (@benjeffery, #929, #1001)Exposed
tsk_table_collection_set_indexesto the API. (@benjeffery, #870, #921)Renamed
ts.trait_regressiontots.trait_linear_model.
[0.99.7] - 2020-09-29¶
Added
TSK_INCLUDE_TERMINALoption totsk_diff_iter_initto output the last edges at the end of a tree sequence (@hyanwong, #783, #787)Added
tsk_bug_assertfor assertions that should be compiled into release binaries (@benjeffery, #860)
[0.99.6] - 2020-09-04¶
Bugfixes
#823 - Fix mutation time error when using
tsk_table_collection_simplifywithTSK_KEEP_INPUT_ROOTS(@petrelharp, #823).
[0.99.5] - 2020-08-27¶
Breaking changes
The macro
TSK_IMPUTE_MISSING_DATAis renamed toTSK_ISOLATED_NOT_MISSING(@benjeffery, #716, #794)
New features
Add a
TSK_KEEP_INPUT_ROOTSoption to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).
Bugfixes
#777 - Mutations over isolated samples were incorrectly decoded as missing data. (@jeromekelleher, #778)
#776 - Fix a segfault when a partial list of samples was provided to the
variantsiterator. (@jeromekelleher, #778)
[0.99.4] - 2020-08-12¶
Note
The
TSK_VERSION_PATCHmacro was incorrectly set to4for 0.99.3, so both 0.99.4 and 0.99.3 have the same value.
Changes
Mutation times can be a mixture of known and unknown as long as for each individual site they are either all known or all unknown (@benjeffery, #761).
Bugfixes
Fix for including core.h under C++ (@petrelharp, #755).
[0.99.3] - 2020-07-27¶
Breaking changes
tsk_mutation_table_add_rowhas an extratimeargument. If the time is unknownTSK_UNKNOWN_TIMEshould be passed. (@benjeffery, #672)Change genotypes from unsigned to signed to accommodate missing data (see #144 for discussion). This only affects users of the
tsk_vargen_tclass. Genotypes are now stored as int8_t and int16_t types rather than the former unsigned types. The field names in the genotypes union of thetsk_variant_tstruct returned bytsk_vargen_nexthave been renamed toi8andi16accordingly; care should be taken when updating client code to ensure that types are correct. The number of distinct alleles supported by 8 bit genotypes has therefore dropped from 255 to 127, with a similar reduction for 16 bit genotypes.Change the
tsk_vargen_initmethod to take an extra parameteralleles. To keep the current behaviour, set this parameter to NULL.Edges can now have metadata. Hence edge methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. Edge metadata can be disabled for a table collection with the TSK_NO_EDGE_METADATA flag. (@benjeffery, #496, #712)
Migrations can now have metadata. Hence migration methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. (@benjeffery, #505)
The text dump of tables with metadata now includes the metadata schema as a header. (@benjeffery, #493)
Bad tree topologies are detected earlier, so that it is no longer possible to create a tsk_treeseq_t object which contains a parent with contradictory children on an interval. Previously an error occured when some operation building the trees was attempted (@jeromekelleher, #709).
New features
New methods to perform set operations on table collections.
tsk_table_collection_subsetsubsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).tsk_table_collection_unionforms the node-wise union of two table collections (@mufernando, @petrelharp, #381, #623).Mutations now have an optional double-precision floating-point
timecolumn. If not specified, this defaults to a particular NaN value (TSK_UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see Mutation requirements. Addtsk_table_collection_compute_mutation_timesand new flag totsk_table_collection_check_integrity:TSK_CHECK_MUTATION_TIME. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence. (@benjeffery, #672)Add
metadataandmetadata_schemafields to table collection, with accessors on tree sequence. These store arbitrary bytes and are optional in the file format. (:user: benjeffery, #641)Add the
TSK_KEEP_UNARYoption to simplify (@gtsambos). See #1 and #143.Add a
set_root_thresholdoption to tsk_tree_t which allows us to set the number of samples a node must be an ancestor of to be considered a root (#462).Change the semantics of tsk_tree_t so that sample counts are always computed, and add a new
TSK_NO_SAMPLE_COUNTSoption to turn this off (#462).Tables with metadata now have an optional metadata_schema field that can contain arbitrary bytes. (@benjeffery, #493)
Tables loaded from a file can now be edited in the same way as any other table collection (@jeromekelleher, #536, #530.
Support for reading/writing to arbitrary file streams with the loadf/dumpf variants for tree sequence and table collection load/dump (@jeromekelleher, @grahamgower, #565, #599).
Add low-level sorting API and
TSK_NO_CHECK_INTEGRITYflag (@jeromekelleher, #627, #626).Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
tsk_treeseq_kc_distance(@daniel-goldstein, #548)
Deprecated
The
TSK_SAMPLE_COUNTSoptions is now ignored and will print out a warning if used (#462).
[0.99.2] - 2019-03-27¶
Bugfix release. Changes:
Fix incorrect errors on tbl_collection_dump (#132)
Catch table overflows (#157)
[0.99.1] - 2019-01-24¶
Refinements to the C API as we move towards 1.0.0. Changes:
Change the
_tbl_abbreviation to_table_to improve readability. Hence, we now have, e.g.,tsk_node_table_tetc.Change
tsk_tbl_size_ttotsk_size_t.Standardise public API to use
tsk_size_tandtsk_id_tas appropriate.Add
tsk_flags_ttypedef and consistently use this as the type used to encode bitwise flags. To avoid confusion, functions now have anoptionsparameter.Rename
tsk_table_collection_position_ttotsk_bookmark_t.Rename
tsk_table_collection_reset_positiontotsk_table_collection_truncateandtsk_table_collection_record_positiontotsk_table_collection_record_num_rows.Generalise
tsk_table_collection_sortto take a bookmark as start argument.Relax restriction that nodes in the
samplesargument to simplify must currently be marked as samples. (https://github.com/tskit-dev/tskit/issues/72)Allow
tsk_table_collection_simplifyto take a NULL samples argument to specify “all samples in the current tables”.Add support for building as a meson subproject.
[0.99.0] - 2019-01-14¶
Initial alpha version of the tskit C API tagged. Version 0.99.x represents the series of releases leading to version 1.0.0 which will be the first stable release. After 1.0.0, semver rules regarding API/ABI breakage will apply; however, in the 0.99.x series arbitrary changes may happen.
[0.0.0] - 2019-01-10¶
Initial extraction of tskit code from msprime. Relicense to MIT. Code copied at hash 29921408661d5fe0b1a82b1ca302a8b87510fd23