C API¶
This is the documentation for the tskit
C API, a low-level library
for manipulating and processing tree sequence data.
The library is written using the C99 standard and is fully thread safe.
Tskit uses kastore to define a
simple storage format for the tree sequence data.
To see the API in action, please see Examples section.
Overview¶
Do I need the C API?¶
The tskit
C API is generally useful in the following situations:
You want to use the
tskit
API in a larger C/C++ application (e.g., in order to output data in the.trees
format);You need to perform lots of tree traversals/loops etc to analyse some data that is in tree sequence form.
For high level operations that are not performance sensitive, the Python API
is generally more useful. Python is much more convenient that C,
and since the tskit
Python module is essentially a wrapper for this
C library, there’s often no real performance penalty for using it.
API structure¶
Tskit uses a set of conventions to provide a pseudo object oriented API. Each ‘object’ is represented by a C struct and has a set of ‘methods’. This is most easily explained by an example:
#include <stdio.h>
#include <stdlib.h>
#include <tskit/tables.h>
#define check_tsk_error(val) \
if (val < 0) { \
fprintf(stderr, "line %d: %s", __LINE__, tsk_strerror(val)); \
exit(EXIT_FAILURE); \
}
int
main(int argc, char **argv)
{
int j, ret;
tsk_edge_table_t edges;
ret = tsk_edge_table_init(&edges, 0);
check_tsk_error(ret);
for (j = 0; j < 5; j++) {
ret = tsk_edge_table_add_row(&edges, 0, 1, j + 1, j, NULL, 0);
check_tsk_error(ret);
}
tsk_edge_table_print_state(&edges, stdout);
tsk_edge_table_free(&edges);
return EXIT_SUCCESS;
}
In this program we create a tsk_edge_table_t
instance, add five rows
using tsk_edge_table_add_row()
, print out its contents using the
tsk_edge_table_print_state()
debugging method, and finally free
the memory used by the edge table object. We define this edge table
‘class’ by using some simple naming conventions which are adhered
to throughout tskit
. This is simply a naming convention that helps to
keep code written in plain C logically structured; there are no extra C++ style features.
We use object oriented terminology freely throughout this documentation
with this understanding.
In this convention, a class is defined by a struct class_name_t
(e.g.
edge_table_t
) and its methods all have the form class_name_method_name
whose first argument is always a pointer to an instance of the class (e.g.,
edge_table_add_row
above).
Each class has an initialise and free method, called class_name_init
and class_name_free
, respectively. The init method must
be called to ensure that the object is correctly initialised (except
for functions such as for tsk_table_collection_load()
and tsk_table_collection_copy()
which automatically initialise
the object by default for convenience). The free
method must always be called to avoid leaking memory, even in the
case of an error occuring during initialisation. If class_name_init
has
been called succesfully, we say the object has been “initialised”; if not,
it is “uninitialised”. After class_name_free
has been called,
the object is again uninitialised.
It is important to note that the init methods only allocate internal memory; the memory for the instance itself must be allocated either on the heap or the stack:
// Instance allocated on the stack
tsk_node_table_t nodes;
tsk_node_table_init(&nodes, 0);
tsk_node_table_free(&nodes);
// Instance allocated on the heap
tsk_edge_table_t *edges = malloc(sizeof(tsk_edge_table_t));
tsk_edge_table_init(edges, 0);
tsk_edge_table_free(edges);
free(edges);
Error handling¶
C does not have a mechanism for propagating exceptions, and great care
must be taken to ensure that errors are correctly and safely handled.
The convention adopted in tskit
is that
every function (except for trivial accessor methods) returns
an integer. If this return value is negative an error has occured which
must be handled. A description of the error that occured can be obtained
using the tsk_strerror()
function. The following example illustrates
the key conventions around error handling in tskit
:
#include <stdio.h>
#include <stdlib.h>
#include <tskit.h>
int
main(int argc, char **argv)
{
int ret;
tsk_treeseq_t ts;
if (argc != 2) {
fprintf(stderr, "usage: <tree sequence file>");
exit(EXIT_FAILURE);
}
ret = tsk_treeseq_load(&ts, argv[1], 0);
if (ret < 0) {
/* Error condition. Free and exit */
tsk_treeseq_free(&ts);
fprintf(stderr, "%s", tsk_strerror(ret));
exit(EXIT_FAILURE);
}
printf("Loaded tree sequence with %d nodes and %d edges from %s\n",
tsk_treeseq_get_num_nodes(&ts),
tsk_treeseq_get_num_edges(&ts),
argv[1]);
tsk_treeseq_free(&ts);
return EXIT_SUCCESS;
}
In this example we load a tree sequence from file and print out a summary
of the number of nodes and edges it contains. After calling
tsk_treeseq_load()
we check the return value ret
to see
if an error occured. If an error has occured we exit with an error
message produced by tsk_strerror()
. Note that in this example we call
tsk_treeseq_free()
whether or not an error occurs: in general,
once a function that initialises an object (e.g., X_init
, X_copy
or X_load
) is called, then X_free
must
be called to ensure that memory is not leaked.
Most functions in tskit
return an error status; we recommend that every
return value is checked.
Using tskit in your project¶
Tskit is built as a standard C library and so there are many different ways
in which it can be included in downstream projects. It is possible to
install tskit
onto a system (i.e., installing a shared library and
header files to a standard locations on Unix) and linking against it,
but there are many different ways in which this can go wrong. In the
interest of simplicity and improving the end-user experience we recommend
embedding tskit
directly into your applications.
There are many different build systems and approaches to compiling
code, and so it’s not possible to give definitive documentation on
how tskit
should be included in downstream projects. Please
see the build examples
repo for some examples of how to incorporate tskit
into
different project structures and build systems.
Tskit uses the meson build system internally, and supports being used a meson subproject. We show an example in which this is combined with git submodules to neatly abstract many details of cross platform C development.
Some users may choose to check the source for tskit
(and kastore
)
directly into their source control repositories. If you wish to do this,
the code is in the c
subdirectory of the
tskit and
kastore repos.
The following header files should be placed in the search path:
kastore.h
, tskit.h
, and tskit/*.h
.
The C files kastore.c
and tskit*.c
should be compiled.
For those who wish to minimise the size of their compiled binaries,
tskit
is quite modular, and C files can be omitted if not needed.
For example, if you are just using the Tables API then
only the files tskit/core.[c,h]
and tskit/tables.[c,h]
are
needed.
However you include tskit
in your project, however, please
ensure that it is a released version. Released versions are
tagged on GitHub using the convention C_{VERSION}
. The code
can either be downloaded from GitHub on the releases page or checked out
using git. For example, to check out the C_0.99.1
release:
$ git clone https://github.com/tskit-dev/tskit.git
$ cd tskit
$ git checkout C_0.99.1
Git submodules may also be considered—see the example for how to set these up and to check out at a specific release.
Basic Types¶
-
typedef int32_t
tsk_id_t
¶ Tskit Object IDs.
All objects in tskit are referred to by integer IDs corresponding to the row they occupy in the relevant table. The
tsk_id_t
type should be used when manipulating these ID values. The reserved valueTSK_NULL
(-1) defines missing data.
-
typedef uint32_t
tsk_size_t
¶ Tskit sizes.
Sizes in tskit are defined by the
tsk_size_t
type.
-
typedef uint32_t
tsk_flags_t
¶ Container for bitwise flags.
Bitwise flags are used in tskit as a column type and also as a way to specify options to API functions.
Common options¶
-
TSK_DEBUG
(1u << 31)¶ Turn on debugging output. Not supported by all functions.
-
TSK_NO_INIT
(1u << 30)¶ Do not initialise the parameter object.
-
TSK_NO_CHECK_INTEGRITY
(1u << 29)¶ Do not run integrity checks before performing an operation.
Tables API¶
The tables API section of tskit
is defined in the tskit/tables.h
header.
Table collections¶
-
struct
tsk_table_collection_t
¶ A collection of tables defining the data for a tree sequence.
Public Members
-
double
sequence_length
¶ The sequence length defining the tree sequence’s coordinate space.
-
char *
metadata
¶ The tree-sequence metadata.
-
char *
metadata_schema
¶ The metadata schema.
-
tsk_individual_table_t
individuals
¶ The individual table.
-
tsk_node_table_t
nodes
¶ The node table.
-
tsk_edge_table_t
edges
¶ The edge table.
-
tsk_migration_table_t
migrations
¶ The migration table.
-
tsk_site_table_t
sites
¶ The site table.
-
tsk_mutation_table_t
mutations
¶ The mutation table.
-
tsk_population_table_t
populations
¶ The population table.
-
tsk_provenance_table_t
provenances
¶ The provenance table.
-
double
-
struct
tsk_bookmark_t
¶ A bookmark recording the position of all the tables in a table collection.
Public Members
-
tsk_size_t
individuals
¶ The position in the individual table.
-
tsk_size_t
nodes
¶ The position in the node table.
-
tsk_size_t
edges
¶ The position in the edge table.
-
tsk_size_t
migrations
¶ The position in the migration table.
-
tsk_size_t
sites
¶ The position in the site table.
-
tsk_size_t
mutations
¶ The position in the mutation table.
-
tsk_size_t
populations
¶ The position in the population table.
-
tsk_size_t
provenances
¶ The position in the provenance table.
-
tsk_size_t
-
int
tsk_table_collection_init
(tsk_table_collection_t *self, tsk_flags_t options)¶ Initialises the table collection by allocating the internal memory and initialising all the constituent tables.
This must be called before any operations are performed on the table collection. See the API structure for details on how objects are initialised and freed.
Options
Options can be specified by providing one or more of the following bitwise flags:
- TSK_NO_EDGE_METADATA
Do not allocate space to store metadata in the edge table. Operations attempting to add non-empty metadata to the edge table will fail with error TSK_ERR_METADATA_DISABLED.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_table_collection_t object.options
: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_table_collection_free
(tsk_table_collection_t *self)¶ Free the internal memory for the specified table collection.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_table_collection_t object.
-
int
tsk_table_collection_clear
(tsk_table_collection_t *self, tsk_flags_t options)¶ Clears data tables (and optionally provenances and metadata) in this table collection.
By default this operation clears all tables except the provenance table, retaining table metadata schemas and the tree-sequnce level metadata and schema.
Options
Options can be specified by providing one or more of the following bitwise flags:
- TSK_CLEAR_PROVENANCE
Additionally clear the provenance table
- TSK_CLEAR_METADATA_SCHEMAS
Additionally clear the table metadata schemas
- TSK_CLEAR_TS_METADATA_AND_SCHEMA
Additionally clear the tree-sequence metadata and schema
No memory is freed as a result of this operation; please use
tsk_table_collection_free()
to free internal resources.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_table_collection_t object.options
: Bitwise clearing options
-
bool
tsk_table_collection_equals
(const tsk_table_collection_t *self, const tsk_table_collection_t *other, tsk_flags_t options)¶ Returns true if the data in the specified table collection is equal to the data in this table collection.
Returns true if the two table collections are equal. The indexes are not considered as these are derived from the tables. We also do not consider the
file_uuid
, since it is a property of the file that set of tables is stored in.Options
Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) two table collections are considered equal if all of the tables are byte-wise identical, and the sequence lengths, metadata and metadata schemas of the two table collections are identical.
- TSK_CMP_IGNORE_PROVENANCE
Do not include the provenance table in comparison.
- TSK_CMP_IGNORE_METADATA
Do not include metadata when comparing the table collections. This includes both the top-level tree sequence metadata as well as the metadata for each of the tables (i.e, TSK_CMP_IGNORE_TS_METADATA is implied). All metadata schemas are also ignored.
- TSK_CMP_IGNORE_TS_METADATA
Do not include the top-level tree sequence metadata and metadata schemas in the comparison.
- TSK_CMP_IGNORE_TIMESTAMPS
Do not include the timestamp information when comparing the provenance tables. This has no effect if TSK_CMP_IGNORE_PROVENANCE is specified.
- Return
Return true if the specified table collection is equal to this table.
- Parameters
self
: A pointer to a tsk_table_collection_t object.other
: A pointer to a tsk_table_collection_t object.options
: Bitwise comparison options.
-
int
tsk_table_collection_copy
(const tsk_table_collection_t *self, tsk_table_collection_t *dest, tsk_flags_t options)¶ Copies the state of this table collection into the specified destination.
By default the method initialises the specified destination table. If the destination is already initialised, the
TSK_NO_INIT
option should be supplied to avoid leaking memory.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_table_collection_t object.dest
: A pointer to a tsk_table_collection_t object. If the TSK_NO_INIT option is specified, this must be an initialised provenance table. If not, it must be an uninitialised provenance table.options
: Bitwise option flags.
-
void
tsk_table_collection_print_state
(const tsk_table_collection_t *self, FILE *out)¶ Print out the state of this table collection to the specified stream.
This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.
- Parameters
self
: A pointer to a tsk_table_collection_t object.out
: The stream to write the summary to.
-
int
tsk_table_collection_load
(tsk_table_collection_t *self, const char *filename, tsk_flags_t options)¶ Load a table collection from a file path.
Loads the data from the specified file into this table collection. By default, the table collection is also initialised. The resources allocated must be freed using
tsk_table_collection_free()
even in error conditions.If the
TSK_NO_INIT
option is set, the table collection is not initialised, allowing an already initialised table collection to be overwritten with the data from a file.If the file contains multiple table collections, this function will load the first. Please see the
tsk_table_collection_loadf()
for details on how to sequentially load table collections from a stream.Options
Options can be specified by providing one or more of the following bitwise flags:
- TSK_NO_INIT
Do not initialise this
tsk_table_collection_t
before loading.
Examples
int ret; tsk_table_collection_t tables; ret = tsk_table_collection_load(&tables, "data.trees", 0); if (ret != 0) { fprintf(stderr, "Load error:%s\n", tsk_strerror(ret)); exit(EXIT_FAILURE); }
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_table_collection_t object if the TSK_NO_INIT option is not set (default), or an initialised tsk_table_collection_t otherwise.filename
: A NULL terminated string containing the filename.options
: Bitwise options. See above for details.
-
int
tsk_table_collection_loadf
(tsk_table_collection_t *self, FILE *file, tsk_flags_t options)¶ Load a table collection from a stream.
Loads a tables definition from the specified file stream to this table collection. By default, the table collection is also initialised. The resources allocated must be freed using
tsk_table_collection_free()
even in error conditions.If the
TSK_NO_INIT
option is set, the table collection is not initialised, allowing an already initialised table collection to be overwritten with the data from a file.If the stream contains multiple table collection definitions, this function will load the next table collection from the stream. If the stream contains no more table collection definitions the error value
TSK_ERR_EOF
will be returned. Note that EOF is only returned in the case where zero bytes are read from the stream — malformed files or other errors will result in different error conditions. Please see the File streaming section for an example of how to sequentially load tree sequences from a stream.Options
Options can be specified by providing one or more of the following bitwise flags:
- TSK_NO_INIT
Do not initialise this
tsk_table_collection_t
before loading.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_table_collection_t object if the TSK_NO_INIT option is not set (default), or an initialised tsk_table_collection_t otherwise.file
: A FILE stream opened in an appropriate mode for reading (e.g. “r”, “r+” or “w+”) positioned at the beginning of a table collection definition.options
: Bitwise options. See above for details.
-
int
tsk_table_collection_dump
(tsk_table_collection_t *self, const char *filename, tsk_flags_t options)¶ Write a table collection to file.
Writes the data from this table collection to the specified file. Usually we expect that data written to a file will be in a form that can be read directly and used to create a tree sequence; that is, we assume that by default the tables are sorted and indexed. Following these assumptions, if the tables are not already indexed, we index the tables before writing to file to save the cost of building these indexes at load time. This behaviour requires that the tables are sorted. If this automatic indexing is not desired, it can be disabled using the TSK_NO_BUILD_INDEXES option.
If an error occurs the file path is deleted, ensuring that only complete and well formed files will be written.
Options
Options can be specified by providing one or more of the following bitwise flags:
- TSK_NO_BUILD_INDEXES
Do not build indexes for this table before writing to file. This is useful if you wish to write unsorted tables to file, as building the indexes will raise an error if the table is unsorted.
Examples
int ret; tsk_table_collection_t tables; ret = tsk_table_collection_init(&tables, 0); error_check(ret); tables.sequence_length = 1.0; // Write out the empty tree sequence ret = tsk_table_collection_dump(&tables, "empty.trees", 0); error_check(ret);
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an initialised tsk_table_collection_t object.filename
: A NULL terminated string containing the filename.options
: Bitwise options. See above for details.
-
int
tsk_table_collection_dumpf
(tsk_table_collection_t *self, FILE *file, tsk_flags_t options)¶ Write a table collection to a stream.
Writes the data from this table collection to the specified FILE stream. Semantics are identical to
tsk_table_collection_dump()
.Please see the File streaming section for an example of how to sequentially dump and load tree sequences from a stream.
Options
Options can be specified by providing one or more of the following bitwise flags:
- TSK_NO_BUILD_INDEXES
Do not build indexes for this table before writing to file. This is useful if you wish to write unsorted tables to file, as building the indexes will raise an error if the table is unsorted.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an initialised tsk_table_collection_t object.file
: A FILE stream opened in an appropriate mode for writing (e.g. “w”, “a”, “r+” or “w+”).options
: Bitwise options. See above for details.
-
int
tsk_table_collection_record_num_rows
(const tsk_table_collection_t *self, tsk_bookmark_t *bookmark)¶ Record the number of rows in each table in the specified tsk_bookmark_t object.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an initialised tsk_table_collection_t object.bookmark
: A pointer to a tsk_bookmark_t which is updated to contain the number of rows in all tables.
-
int
tsk_table_collection_truncate
(tsk_table_collection_t *self, tsk_bookmark_t *bookmark)¶ Truncates the tables in this table collection according to the specified bookmark.
Truncate the tables in this collection so that each one has the number of rows specified in the parameter
tsk_bookmark_t
. Use thetsk_table_collection_record_num_rows()
function to record the number rows for each table in a table collection at a particular time.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_individual_table_t object.bookmark
: The number of rows to retain in each table.
-
int
tsk_table_collection_sort
(tsk_table_collection_t *self, const tsk_bookmark_t *start, tsk_flags_t options)¶ Sorts the tables in this collection.
Some of the tables in a table collection must satisfy specific sortedness requirements in order to define a valid tree sequence. This method sorts the
edge
,site
andmutation
tables such that these requirements are guaranteed to be fulfilled. Theindividual
,node
,population
andprovenance
tables do not have any sortedness requirements, and are therefore ignored by this method.Note
The current implementation may sort in such a way that exceeds these requirements, but this behaviour should not be relied upon and later versions may weaken the level of sortedness. However, the method does guarantee that the resulting tables describes a valid tree sequence.
Warning
Sorting migrations is currently not supported and an error will be raised if a table collection containing a non-empty migration table is specified.
The specified
tsk_bookmark_t
allows us to specify a start position for sorting in each of the tables; rows before this value are assumed to already be in sorted order and this information is used to make sorting more efficient. Positions in tables that are not sorted (individual
,node
,population
andprovenance
) are ignored and can be set to arbitrary values.Warning
The current implementation only supports specifying a start position for the
edge
table and in a limited form for thesite
andmutation
tables. Specifying a non-zeromigration
, start position results in an error. The start positions for thesite
andmutation
tables can either be 0 or the length of the respective tables, allowing these tables to either be fully sorted, or not sorted at all.The table collection will always be unindexed after sort successfully completes.
See the table sorting section for more details. For more control over the sorting process, see the Low-level sorting section.
Options
Options can be specified by providing one or more of the following bitwise flags:
- TSK_NO_CHECK_INTEGRITY
Do not run integrity checks using
tsk_table_collection_check_integrity()
before sorting, potentially leading to a small reduction in execution time. This performance optimisation should not be used unless the calling code can guarantee reference integrity within the table collection. References to rows not in the table or bad offsets will result in undefined behaviour.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_individual_table_t object.start
: The position to begin sorting in each table; all rows less than this position must fulfill the tree sequence sortedness requirements. If this is NULL, sort all rows.options
: Sort options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_table_collection_simplify
(tsk_table_collection_t *self, const tsk_id_t *samples, tsk_size_t num_samples, tsk_flags_t options, tsk_id_t *node_map)¶ Simplify the tables to remove redundant information.
Simplification transforms the tables to remove redundancy and canonicalise tree sequence data. See the simplification section for more details.
A mapping from the node IDs in the table before simplification to their equivalent values after simplification can be obtained via the
node_map
argument. If this is non NULL,node_map[u]
will contain the new ID for nodeu
after simplification, orTSK_NULL
if the node has been removed. Thus,node_map
must be an array of at leastself->nodes.num_rows
tsk_id_t
values.Options:
Options can be specified by providing one or more of the following bitwise flags:
- TSK_FILTER_SITES
Remove sites from the output if there are no mutations that reference them.
- TSK_FILTER_POPULATIONS
Remove populations from the output if there are no nodes or migrations that reference them.
- TSK_FILTER_INDIVIDUALS
Remove individuals from the output if there are no nodes that reference them.
- TSK_REDUCE_TO_SITE_TOPOLOGY
Reduce the topological information in the tables to the minimum necessary to represent the trees that contain sites. If there are zero sites this will result in an zero output edges. When the number of sites is greater than zero, every tree in the output tree sequence will contain at least one site. For a given site, the topology of the tree containing that site will be identical (up to node ID remapping) to the topology of the corresponding tree in the input.
- TSK_KEEP_UNARY
By default simplify removes unary nodes (i.e., nodes with exactly one child) along the path from samples to root. If this option is specified such unary nodes will be preserved in the output.
- TSK_KEEP_INPUT_ROOTS
By default simplify removes all topology ancestral the MRCAs of the samples. This option inserts edges from these MRCAs back to the roots of the input trees.
Note
Migrations are currently not supported by simplify, and an error will be raised if we attempt call simplify on a table collection with greater than zero migrations. See https://github.com/tskit-dev/tskit/issues/20
The table collection will always be unindexed after simplify successfully completes.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_individual_table_t object.samples
: Either NULL or an array of num_samples distinct and valid node IDs. If non-null the nodes in this array will be marked as samples in the output. If NULL, the num_samples parameter is ignored and the samples in the output will be the same as the samples in the input. This is equivalent to populating the samples array with all of the sample nodes in the input in increasing order of ID.num_samples
: The number of node IDs in the input samples array. Ignored if the samples array is NULL.options
: Simplify options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.node_map
: If not NULL, this array will be filled to define the mapping between nodes IDs in the table collection before and after simplification.
-
int
tsk_table_collection_subset
(tsk_table_collection_t *self, const tsk_id_t *nodes, tsk_size_t num_nodes)¶ Subsets and reorders a table collection according to an array of nodes.
Reduces the table collection to contain only the entries referring to the provided list of nodes, with nodes reordered according to the order they appear in the
nodes
argument. Specifically, this subsets and reorders each of the tables as follows:Nodes: if in the list of nodes, and in the order provided.
Individuals and Populations: if referred to by a retained node, and in the order first seen when traversing the list of retained nodes.
Edges: if both parent and child are retained nodes.
Mutations: if the mutation’s node is a retained node.
Sites: if any mutations remain at the site after removing mutations.
Retained edges, mutations, and sites appear in the same order as in the original tables.
If
nodes
is the entire list of nodes in the tables, then the resulting tables will be identical to the original tables, but with nodes (and individuals and populations) reordered.Note
Migrations are currently not supported by susbset, and an error will be raised if we attempt call subset on a table collection with greater than zero migrations.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_table_collection_t object.nodes
: An array of num_nodes valid node IDs.num_nodes
: The number of node IDs in the input nodes array.
-
int
tsk_table_collection_union
(tsk_table_collection_t *self, const tsk_table_collection_t *other, const tsk_id_t *other_node_mapping, tsk_flags_t options)¶ Forms the node-wise union of two table collections.
Expands this table collection by adding the non-shared portions of another table collection to itself. The
other_node_mapping
encodes which nodes inother
are equivalent to a node inself
. The positions in theother_node_mapping
array correspond to node ids inother
, and the elements encode the equivalent node id inself
or TSK_NULL if the node is exclusive toother
. Nodes that are exclusiveother
are added toself
, along with:Individuals which are new to
self
.Edges whose parent or child are new to
self
.Sites which were not present in
self
.Mutations whose nodes are new to
self
.
By default, populations of newly added nodes are assumed to be new populations, and added to the population table as well.
This operation will also sort the resulting tables, so the tables may change even if nothing new is added, if the original tables were not sorted.
Options:
Options can be specified by providing one or more of the following bitwise flags:
- TSK_UNION_NO_CHECK_SHARED
By default, union checks that the portion of shared history between
self
andother
, as implied byother_node_mapping
, are indeed equivalent. It does so by subsetting bothself
andother
on the equivalent nodes specified inother_node_mapping
, and then checking for equality of the subsets.- TSK_UNION_NO_ADD_POP
By default, all nodes new to
self
are assigned new populations. If this option is specified, nodes that are added toself
will retain the population IDs they have inother
.
Note
Migrations are currently not supported by union, and an error will be raised if we attempt call union on a table collection with migrations.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_table_collection_t object.other
: A pointer to a tsk_table_collection_t object.other_node_mapping
: An array of node IDs that relate nodes in other to nodes in self: the k-th element of other_node_mapping should be the index of the equivalent node in self, or TSK_NULL if the node is not present in self (in which case it will be added to self).options
: Union options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.
-
int
tsk_table_collection_set_metadata
(tsk_table_collection_t *self, const char *metadata, tsk_size_t metadata_length)¶ Set the metadata.
Copies the metadata string to this table collection, replacing any existing.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_table_collection_t object.metadata
: A pointer to a char arraymetadata_length
: The size of the metadata in bytes.
-
int
tsk_table_collection_set_metadata_schema
(tsk_table_collection_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)¶ Set the metadata schema.
Copies the metadata schema string to this table collection, replacing any existing.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_table_collection_t object.metadata_schema
: A pointer to a char arraymetadata_schema_length
: The size of the metadata schema in bytes.
-
bool
tsk_table_collection_has_index
(const tsk_table_collection_t *self, tsk_flags_t options)¶ Returns true if this table collection is indexed.
This method returns true if the table collection has an index for the edge table. It guarantees that the index exists, and that it is for the same number of edges that are in the edge table. It does not guarantee that the index is valid (i.e., if the rows in the edge have been permuted in some way since the index was built).
See the Table indexes section for details on the index life-cycle.
- Return
Return true if there is an index present for this table collection.
- Parameters
self
: A pointer to a tsk_table_collection_t object.options
: Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_table_collection_drop_index
(tsk_table_collection_t *self, tsk_flags_t options)¶ Deletes the indexes for this table collection.
Unconditionally drop the indexes that may be present for this table collection. It is not an error to call this method on an unindexed table collection. See the Table indexes section for details on the index life-cycle.
- Return
Always returns 0.
- Parameters
self
: A pointer to a tsk_table_collection_t object.options
: Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_table_collection_build_index
(tsk_table_collection_t *self, tsk_flags_t options)¶ Builds indexes for this table collection.
Builds the tree traversal indexes for this table collection. Any existing index is first dropped using
tsk_table_collection_drop_index()
. See the Table indexes section for details on the index life-cycle.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_table_collection_t object.options
: Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_table_collection_set_indexes
(tsk_table_collection_t *self, tsk_id_t *edge_insertion_order, tsk_id_t *edge_removal_order)¶ Sets the edge insertion/removal index for this table collection.
This method sets the edge insertion/removal index for this table collection The index arrays should have the same number of edges that are in the edge table. The index is not checked for validity.
See the Table indexes section for details on the index life-cycle.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_table_collection_t object.edge_insertion_order
: Array of tsk_id_t edge ids.edge_removal_order
: Array of tsk_id_t edge ids.
-
tsk_id_t
tsk_table_collection_check_integrity
(const tsk_table_collection_t *self, tsk_flags_t options)¶ Runs integrity checks on this table collection.
Checks the integrity of this table collection. The default checks (i.e., with options = 0) guarantee the integrity of memory and entity references within the table collection. All spatial values (along the genome) are checked to see if they are finite values and within the required bounds. Time values are checked to see if they are finite or marked as unknown.
To check if a set of tables fulfills the requirements needed for a valid tree sequence, use the TSK_CHECK_TREES option. When this method is called with TSK_CHECK_TREES, the number of trees in the tree sequence is returned. Thus, to check for errors client code should verify that the return value is less than zero. All other options will return zero on success and a negative value on failure.
More fine-grained checks can be achieved using bitwise combinations of the other options.
Options:
Options can be specified by providing one or more of the following bitwise flags:
- TSK_CHECK_EDGE_ORDERING
Check edge ordering constraints for a tree sequence.
- TSK_CHECK_SITE_ORDERING
Check that sites are in nondecreasing position order.
- TSK_CHECK_SITE_DUPLICATES
Check for any duplicate site positions.
- TSK_CHECK_MUTATION_ORDERING
Check contraints on the ordering of mutations. Any non-null mutation parents and known times are checked for ordering constraints.
- TSK_CHECK_INDEXES
Check that the table indexes exist, and contain valid edge references.
- TSK_CHECK_TREES
All checks needed to define a valid tree sequence. Note that this implies all of the above checks.
It is sometimes useful to disregard some parts of the data model when performing checks:
- TSK_NO_CHECK_POPULATION_REFS
Do not check integrity of references to populations. This can be safely combined with the other checks.
- Return
Return a negative error value on if any problems are detected in the tree sequence. If the TSK_CHECK_TREES option is provided, the number of trees in the tree sequence will be returned, on success.
- Parameters
self
: A pointer to a tsk_table_collection_t object.options
: Bitwise options.
Individuals¶
-
struct
tsk_individual_t
¶ A single individual defined by a row in the individual table.
See the data model section for the definition of an individual and its properties.
Public Members
-
tsk_flags_t
flags
¶ Bitwise flags.
-
const double *
location
¶ Spatial location. The number of dimensions is defined by
location_length
.
-
tsk_size_t
location_length
¶ Number of spatial dimensions.
-
const char *
metadata
¶ Metadata.
-
tsk_size_t
metadata_length
¶ Size of the metadata in bytes.
-
tsk_flags_t
-
struct
tsk_individual_table_t
¶ The individual table.
See the individual table definition for details of the columns in this table.
Public Members
-
tsk_size_t
num_rows
¶ The number of rows in this table.
-
tsk_size_t
location_length
¶ The total length of the location column.
-
tsk_size_t
metadata_length
¶ The total length of the metadata column.
-
tsk_flags_t *
flags
¶ The flags column.
-
double *
location
¶ The location column.
-
tsk_size_t *
location_offset
¶ The location_offset column.
-
char *
metadata
¶ The metadata column.
-
tsk_size_t *
metadata_offset
¶ The metadata_offset column.
-
char *
metadata_schema
¶ The metadata schema.
-
tsk_size_t
-
int
tsk_individual_table_init
(tsk_individual_table_t *self, tsk_flags_t options)¶ Initialises the table by allocating the internal memory.
This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_individual_table_t object.options
: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_individual_table_free
(tsk_individual_table_t *self)¶ Free the internal memory for the specified table.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_individual_table_t object.
-
tsk_id_t
tsk_individual_table_add_row
(tsk_individual_table_t *self, tsk_flags_t flags, const double *location, tsk_size_t location_length, const char *metadata, tsk_size_t metadata_length)¶ Adds a row to this individual table.
Add a new individual with the specified
flags
,location
andmetadata
to the table. Copies of thelocation
andmetadata
parameters are taken immediately. See the table definition for details of the columns in this table.- Return
Return the ID of the newly added individual on success, or a negative value on failure.
- Parameters
self
: A pointer to a tsk_individual_table_t object.flags
: The bitwise flags for the new individual.location
: A pointer to a double array representing the spatial location of the new individual. Can beNULL
iflocation_length
is 0.location_length
: The number of dimensions in the locations position. Note this the number of elements in the corresponding double array not the number of bytes.metadata
: The metadata to be associated with the new individual. This is a pointer to arbitrary memory. Can beNULL
ifmetadata_length
is 0.metadata_length
: The size of the metadata array in bytes.
-
int
tsk_individual_table_clear
(tsk_individual_table_t *self)¶ Clears this table, setting the number of rows to zero.
No memory is freed as a result of this operation; please use
tsk_individual_table_free()
to free the table’s internal resources. Note that the metadata schema is not cleared.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_individual_table_t object.
-
int
tsk_individual_table_truncate
(tsk_individual_table_t *self, tsk_size_t num_rows)¶ Truncates this table so that only the first num_rows are retained.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_individual_table_t object.num_rows
: The number of rows to retain in the table.
-
bool
tsk_individual_table_equals
(const tsk_individual_table_t *self, const tsk_individual_table_t *other, tsk_flags_t options)¶ Returns true if the data in the specified table is identical to the data in this table.
Options
Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.
- TSK_CMP_IGNORE_METADATA
Do not include metadata or metadata schemas in the comparison.
- Return
Return true if the specified table is equal to this table.
- Parameters
self
: A pointer to a tsk_individual_table_t object.other
: A pointer to a tsk_individual_table_t object.options
: Bitwise comparison options.
-
int
tsk_individual_table_copy
(const tsk_individual_table_t *self, tsk_individual_table_t *dest, tsk_flags_t options)¶ Copies the state of this table into the specified destination.
By default the method initialises the specified destination table. If the destination is already initialised, the
TSK_NO_INIT
option should be supplied to avoid leaking memory.Indexes that are present are also copied to the destination table.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_individual_table_t object.dest
: A pointer to a tsk_individual_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised individual table. If not, it must be an uninitialised individual table.options
: Bitwise option flags.
-
int
tsk_individual_table_get_row
(const tsk_individual_table_t *self, tsk_id_t index, tsk_individual_t *row)¶ Get the row at the specified index.
Updates the specified individual struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_individual_table_t object.index
: The requested table row.row
: A pointer to a tsk_individual_t struct that is updated to reflect the values in the specified row.
-
int
tsk_individual_table_set_metadata_schema
(tsk_individual_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)¶ Set the metadata schema.
Copies the metadata schema string to this table, replacing any existing.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_individual_table_t object.metadata_schema
: A pointer to a char arraymetadata_schema_length
: The size of the metadata schema in bytes.
-
void
tsk_individual_table_print_state
(const tsk_individual_table_t *self, FILE *out)¶ Print out the state of this table to the specified stream.
This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.
- Parameters
self
: A pointer to a tsk_individual_table_t object.out
: The stream to write the summary to.
Nodes¶
-
struct
tsk_node_t
¶ A single node defined by a row in the node table.
See the data model section for the definition of a node and its properties.
-
struct
tsk_node_table_t
¶ The node table.
See the node table definition for details of the columns in this table.
Public Members
-
tsk_size_t
num_rows
¶ The number of rows in this table.
-
tsk_size_t
metadata_length
¶ The total length of the metadata column.
-
tsk_flags_t *
flags
¶ The flags column.
-
double *
time
¶ The time column.
-
char *
metadata
¶ The metadata column.
-
tsk_size_t *
metadata_offset
¶ The metadata_offset column.
-
char *
metadata_schema
¶ The metadata schema.
-
tsk_size_t
-
int
tsk_node_table_init
(tsk_node_table_t *self, tsk_flags_t options)¶ Initialises the table by allocating the internal memory.
This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_node_table_t object.options
: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_node_table_free
(tsk_node_table_t *self)¶ Free the internal memory for the specified table.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_node_table_t object.
-
tsk_id_t
tsk_node_table_add_row
(tsk_node_table_t *self, tsk_flags_t flags, double time, tsk_id_t population, tsk_id_t individual, const char *metadata, tsk_size_t metadata_length)¶ Adds a row to this node table.
Add a new node with the specified
flags
,time
,population
,individual
andmetadata
to the table. A copy of themetadata
parameter is taken immediately. See the table definition for details of the columns in this table.- Return
Return the ID of the newly added node on success, or a negative value on failure.
- Parameters
self
: A pointer to a tsk_node_table_t object.flags
: The bitwise flags for the new node.time
: The time for the new node.population
: The population for the new node. Set to TSK_NULL if not known.individual
: The individual for the new node. Set to TSK_NULL if not known.metadata
: The metadata to be associated with the new node. This is a pointer to arbitrary memory. Can beNULL
ifmetadata_length
is 0.metadata_length
: The size of the metadata array in bytes.
-
int
tsk_node_table_clear
(tsk_node_table_t *self)¶ Clears this table, setting the number of rows to zero.
No memory is freed as a result of this operation; please use
tsk_node_table_free()
to free the table’s internal resources. Note that the metadata schema is not cleared.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_node_table_t object.
-
int
tsk_node_table_truncate
(tsk_node_table_t *self, tsk_size_t num_rows)¶ Truncates this table so that only the first num_rows are retained.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_node_table_t object.num_rows
: The number of rows to retain in the table.
-
bool
tsk_node_table_equals
(const tsk_node_table_t *self, const tsk_node_table_t *other, tsk_flags_t options)¶ Returns true if the data in the specified table is identical to the data in this table.
Options
Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.
- TSK_CMP_IGNORE_METADATA
Do not include metadata or metadata schemas in the comparison.
- Return
Return true if the specified table is equal to this table.
- Parameters
self
: A pointer to a tsk_node_table_t object.other
: A pointer to a tsk_node_table_t object.options
: Bitwise comparison options.
-
int
tsk_node_table_copy
(const tsk_node_table_t *self, tsk_node_table_t *dest, tsk_flags_t options)¶ Copies the state of this table into the specified destination.
By default the method initialises the specified destination table. If the destination is already initialised, the
TSK_NO_INIT
option should be supplied to avoid leaking memory.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_node_table_t object.dest
: A pointer to a tsk_node_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised node table. If not, it must be an uninitialised node table.options
: Bitwise option flags.
-
int
tsk_node_table_get_row
(const tsk_node_table_t *self, tsk_id_t index, tsk_node_t *row)¶ Get the row at the specified index.
Updates the specified node struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_node_table_t object.index
: The requested table row.row
: A pointer to a tsk_node_t struct that is updated to reflect the values in the specified row.
-
int
tsk_node_table_set_metadata_schema
(tsk_node_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)¶ Set the metadata schema.
Copies the metadata schema string to this table, replacing any existing.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_node_table_t object.metadata_schema
: A pointer to a char arraymetadata_schema_length
: The size of the metadata schema in bytes.
-
void
tsk_node_table_print_state
(const tsk_node_table_t *self, FILE *out)¶ Print out the state of this table to the specified stream.
This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.
- Parameters
self
: A pointer to a tsk_node_table_t object.out
: The stream to write the summary to.
Edges¶
-
struct
tsk_edge_t
¶ A single edge defined by a row in the edge table.
See the data model section for the definition of an edge and its properties.
-
struct
tsk_edge_table_t
¶ The edge table.
See the edge table definition for details of the columns in this table.
Public Members
-
tsk_size_t
num_rows
¶ The number of rows in this table.
-
tsk_size_t
metadata_length
¶ The total length of the metadata column.
-
double *
left
¶ The left column.
-
double *
right
¶ The right column.
-
char *
metadata
¶ The metadata column.
-
tsk_size_t *
metadata_offset
¶ The metadata_offset column.
-
char *
metadata_schema
¶ The metadata schema.
-
tsk_flags_t
options
¶ Flags for this table.
-
tsk_size_t
-
int
tsk_edge_table_init
(tsk_edge_table_t *self, tsk_flags_t options)¶ Initialises the table by allocating the internal memory.
This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.
Options
Options can be specified by providing one or more of the following bitwise flags:
- TSK_NO_METADATA
Do not allocate space to store metadata in this table. Operations attempting to add non-empty metadata to the table will fail with error TSK_ERR_METADATA_DISABLED.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_edge_table_t object.options
: Allocation time options.
-
int
tsk_edge_table_free
(tsk_edge_table_t *self)¶ Free the internal memory for the specified table.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_edge_table_t object.
-
tsk_id_t
tsk_edge_table_add_row
(tsk_edge_table_t *self, double left, double right, tsk_id_t parent, tsk_id_t child, const char *metadata, tsk_size_t metadata_length)¶ Adds a row to this edge table.
Add a new edge with the specified
left
,right
,parent
,child
andmetadata
to the table. See the table definition for details of the columns in this table.- Return
Return the ID of the newly added edge on success, or a negative value on failure.
- Parameters
self
: A pointer to a tsk_edge_table_t object.left
: The left coordinate for the new edge.right
: The right coordinate for the new edge.parent
: The parent node for the new edge.child
: The child node for the new edge.metadata
: The metadata to be associated with the new edge. This is a pointer to arbitrary memory. Can beNULL
ifmetadata_length
is 0.metadata_length
: The size of the metadata array in bytes.
-
int
tsk_edge_table_clear
(tsk_edge_table_t *self)¶ Clears this table, setting the number of rows to zero.
No memory is freed as a result of this operation; please use
tsk_edge_table_free()
to free the table’s internal resources. Note that the metadata schema is not cleared.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_edge_table_t object.
-
int
tsk_edge_table_truncate
(tsk_edge_table_t *self, tsk_size_t num_rows)¶ Truncates this table so that only the first num_rows are retained.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_edge_table_t object.num_rows
: The number of rows to retain in the table.
-
bool
tsk_edge_table_equals
(const tsk_edge_table_t *self, const tsk_edge_table_t *other, tsk_flags_t options)¶ Returns true if the data in the specified table is identical to the data in this table.
Options
Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.
- TSK_CMP_IGNORE_METADATA
Do not include metadata or metadata schemas in the comparison.
- Return
Return true if the specified table is equal to this table.
- Parameters
self
: A pointer to a tsk_edge_table_t object.other
: A pointer to a tsk_edge_table_t object.options
: Bitwise comparison options.
-
int
tsk_edge_table_copy
(const tsk_edge_table_t *self, tsk_edge_table_t *dest, tsk_flags_t options)¶ Copies the state of this table into the specified destination.
By default the method initialises the specified destination table. If the destination is already initialised, the
TSK_NO_INIT
option should be supplied to avoid leaking memory.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_edge_table_t object.dest
: A pointer to a tsk_edge_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised edge table. If not, it must be an uninitialised edge table.options
: Bitwise option flags.
-
int
tsk_edge_table_get_row
(const tsk_edge_table_t *self, tsk_id_t index, tsk_edge_t *row)¶ Get the row at the specified index.
Updates the specified edge struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_edge_table_t object.index
: The requested table row.row
: A pointer to a tsk_edge_t struct that is updated to reflect the values in the specified row.
-
int
tsk_edge_table_set_metadata_schema
(tsk_edge_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)¶ Set the metadata schema.
Copies the metadata schema string to this table, replacing any existing.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_edge_table_t object.metadata_schema
: A pointer to a char arraymetadata_schema_length
: The size of the metadata schema in bytes.
-
void
tsk_edge_table_print_state
(const tsk_edge_table_t *self, FILE *out)¶ Print out the state of this table to the specified stream.
This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.
- Parameters
self
: A pointer to a tsk_edge_table_t object.out
: The stream to write the summary to.
Migrations¶
-
struct
tsk_migration_t
¶ A single migration defined by a row in the migration table.
See the data model section for the definition of a migration and its properties.
-
struct
tsk_migration_table_t
¶ The migration table.
See the migration table definition for details of the columns in this table.
Public Members
-
tsk_size_t
num_rows
¶ The number of rows in this table.
-
tsk_size_t
metadata_length
¶ The total length of the metadata column.
-
double *
left
¶ The left column.
-
double *
right
¶ The right column.
-
double *
time
¶ The time column.
-
char *
metadata
¶ The metadata column.
-
tsk_size_t *
metadata_offset
¶ The metadata_offset column.
-
char *
metadata_schema
¶ The metadata schema.
-
tsk_size_t
-
int
tsk_migration_table_init
(tsk_migration_table_t *self, tsk_flags_t options)¶ Initialises the table by allocating the internal memory.
This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_migration_table_t object.options
: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_migration_table_free
(tsk_migration_table_t *self)¶ Free the internal memory for the specified table.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_migration_table_t object.
-
tsk_id_t
tsk_migration_table_add_row
(tsk_migration_table_t *self, double left, double right, tsk_id_t node, tsk_id_t source, tsk_id_t dest, double time, const char *metadata, tsk_size_t metadata_length)¶ Adds a row to this migration table.
Add a new migration with the specified
left
,right
,node
,source
,dest
,time
andmetadata
to the table. See the table definition for details of the columns in this table.- Return
Return the ID of the newly added migration on success, or a negative value on failure.
- Parameters
self
: A pointer to a tsk_migration_table_t object.left
: The left coordinate for the new migration.right
: The right coordinate for the new migration.node
: The node ID for the new migration.source
: The source population ID for the new migration.dest
: The destination population ID for the new migration.time
: The time for the new migration.metadata
: The metadata to be associated with the new migration. This is a pointer to arbitrary memory. Can beNULL
ifmetadata_length
is 0.metadata_length
: The size of the metadata array in bytes.
-
int
tsk_migration_table_clear
(tsk_migration_table_t *self)¶ Clears this table, setting the number of rows to zero.
No memory is freed as a result of this operation; please use
tsk_migration_table_free()
to free the table’s internal resources. Note that the metadata schema is not cleared.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_migration_table_t object.
-
int
tsk_migration_table_truncate
(tsk_migration_table_t *self, tsk_size_t num_rows)¶ Truncates this table so that only the first num_rows are retained.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_migration_table_t object.num_rows
: The number of rows to retain in the table.
-
bool
tsk_migration_table_equals
(const tsk_migration_table_t *self, const tsk_migration_table_t *other, tsk_flags_t options)¶ Returns true if the data in the specified table is identical to the data in this table.
Options
Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.
- TSK_CMP_IGNORE_METADATA
Do not include metadata or metadata schemas in the comparison.
- Return
Return true if the specified table is equal to this table.
- Parameters
self
: A pointer to a tsk_migration_table_t object.other
: A pointer to a tsk_migration_table_t object.options
: Bitwise comparison options.
-
int
tsk_migration_table_copy
(const tsk_migration_table_t *self, tsk_migration_table_t *dest, tsk_flags_t options)¶ Copies the state of this table into the specified destination.
By default the method initialises the specified destination table. If the destination is already initialised, the
TSK_NO_INIT
option should be supplied to avoid leaking memory.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_migration_table_t object.dest
: A pointer to a tsk_migration_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised migration table. If not, it must be an uninitialised migration table.options
: Bitwise option flags.
-
int
tsk_migration_table_get_row
(const tsk_migration_table_t *self, tsk_id_t index, tsk_migration_t *row)¶ Get the row at the specified index.
Updates the specified migration struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_migration_table_t object.index
: The requested table row.row
: A pointer to a tsk_migration_t struct that is updated to reflect the values in the specified row.
-
int
tsk_migration_table_set_metadata_schema
(tsk_migration_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)¶ Set the metadata schema.
Copies the metadata schema string to this table, replacing any existing.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_migration_table_t object.metadata_schema
: A pointer to a char arraymetadata_schema_length
: The size of the metadata schema in bytes.
-
void
tsk_migration_table_print_state
(const tsk_migration_table_t *self, FILE *out)¶ Print out the state of this table to the specified stream.
This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.
- Parameters
self
: A pointer to a tsk_migration_table_t object.out
: The stream to write the summary to.
Sites¶
-
struct
tsk_site_t
¶ A single site defined by a row in the site table.
See the data model section for the definition of a site and its properties.
Public Members
-
double
position
¶ Position coordinate.
-
const char *
ancestral_state
¶ Ancestral state.
-
tsk_size_t
ancestral_state_length
¶ Ancestral state length in bytes.
-
const char *
metadata
¶ Metadata.
-
tsk_size_t
metadata_length
¶ Metadata length in bytes.
-
double
-
struct
tsk_site_table_t
¶ The site table.
See the site table definition for details of the columns in this table.
Public Members
-
tsk_size_t
num_rows
¶ The number of rows in this table.
-
tsk_size_t
metadata_length
¶ The total length of the metadata column.
-
double *
position
¶ The position column.
-
char *
ancestral_state
¶ The ancestral_state column.
-
tsk_size_t *
ancestral_state_offset
¶ The ancestral_state_offset column.
-
char *
metadata
¶ The metadata column.
-
tsk_size_t *
metadata_offset
¶ The metadata_offset column.
-
char *
metadata_schema
¶ The metadata schema.
-
tsk_size_t
-
int
tsk_site_table_init
(tsk_site_table_t *self, tsk_flags_t options)¶ Initialises the table by allocating the internal memory.
This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_site_table_t object.options
: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_site_table_free
(tsk_site_table_t *self)¶ Free the internal memory for the specified table.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_site_table_t object.
-
tsk_id_t
tsk_site_table_add_row
(tsk_site_table_t *self, double position, const char *ancestral_state, tsk_size_t ancestral_state_length, const char *metadata, tsk_size_t metadata_length)¶ Adds a row to this site table.
Add a new site with the specified
position
,ancestral_state
andmetadata
to the table. Copies ofancestral_state
andmetadata
are immediately taken. See the table definition for details of the columns in this table.- Return
Return the ID of the newly added site on success, or a negative value on failure.
- Parameters
self
: A pointer to a tsk_site_table_t object.position
: The position coordinate for the new site.ancestral_state
: The ancestral_state for the new site.ancestral_state_length
: The length of the ancestral_state in bytes.metadata
: The metadata to be associated with the new site. This is a pointer to arbitrary memory. Can beNULL
ifmetadata_length
is 0.metadata_length
: The size of the metadata array in bytes.
-
int
tsk_site_table_clear
(tsk_site_table_t *self)¶ Clears this table, setting the number of rows to zero.
No memory is freed as a result of this operation; please use
tsk_site_table_free()
to free the table’s internal resources. Note that the metadata schema is not cleared.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_site_table_t object.
-
int
tsk_site_table_truncate
(tsk_site_table_t *self, tsk_size_t num_rows)¶ Truncates this table so that only the first num_rows are retained.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_site_table_t object.num_rows
: The number of rows to retain in the table.
-
bool
tsk_site_table_equals
(const tsk_site_table_t *self, const tsk_site_table_t *other, tsk_flags_t options)¶ Returns true if the data in the specified table is identical to the data in this table.
Options
Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.
- TSK_CMP_IGNORE_METADATA
Do not include metadata or metadata schemas in the comparison.
- Return
Return true if the specified table is equal to this table.
- Parameters
self
: A pointer to a tsk_site_table_t object.other
: A pointer to a tsk_site_table_t object.options
: Bitwise comparison options.
-
int
tsk_site_table_copy
(const tsk_site_table_t *self, tsk_site_table_t *dest, tsk_flags_t options)¶ Copies the state of this table into the specified destination.
By default the method initialises the specified destination table. If the destination is already initialised, the
TSK_NO_INIT
option should be supplied to avoid leaking memory.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_site_table_t object.dest
: A pointer to a tsk_site_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised site table. If not, it must be an uninitialised site table.options
: Bitwise option flags.
-
int
tsk_site_table_get_row
(const tsk_site_table_t *self, tsk_id_t index, tsk_site_t *row)¶ Get the row at the specified index.
Updates the specified site struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_site_table_t object.index
: The requested table row.row
: A pointer to a tsk_site_t struct that is updated to reflect the values in the specified row.
-
int
tsk_site_table_set_metadata_schema
(tsk_site_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)¶ Set the metadata schema.
Copies the metadata schema string to this table, replacing any existing.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_site_table_t object.metadata_schema
: A pointer to a char arraymetadata_schema_length
: The size of the metadata schema in bytes.
-
void
tsk_site_table_print_state
(const tsk_site_table_t *self, FILE *out)¶ Print out the state of this table to the specified stream.
This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.
- Parameters
self
: A pointer to a tsk_site_table_t object.out
: The stream to write the summary to.
Mutations¶
-
struct
tsk_mutation_t
¶ A single mutation defined by a row in the mutation table.
See the data model section for the definition of a mutation and its properties.
Public Members
-
double
time
¶ Mutation time.
-
const char *
derived_state
¶ Derived state.
-
tsk_size_t
derived_state_length
¶ Size of the derived state in bytes.
-
const char *
metadata
¶ Metadata.
-
tsk_size_t
metadata_length
¶ Size of the metadata in bytes.
-
double
-
struct
tsk_mutation_table_t
¶ The mutation table.
See the mutation table definition for details of the columns in this table.
Public Members
-
tsk_size_t
num_rows
¶ The number of rows in this table.
-
tsk_size_t
metadata_length
¶ The total length of the metadata column.
-
double *
time
¶ The time column.
-
char *
derived_state
¶ The derived_state column.
-
tsk_size_t *
derived_state_offset
¶ The derived_state_offset column.
-
char *
metadata
¶ The metadata column.
-
tsk_size_t *
metadata_offset
¶ The metadata_offset column.
-
char *
metadata_schema
¶ The metadata schema.
-
tsk_size_t
-
int
tsk_mutation_table_init
(tsk_mutation_table_t *self, tsk_flags_t options)¶ Initialises the table by allocating the internal memory.
This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_mutation_table_t object.options
: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_mutation_table_free
(tsk_mutation_table_t *self)¶ Free the internal memory for the specified table.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_mutation_table_t object.
-
tsk_id_t
tsk_mutation_table_add_row
(tsk_mutation_table_t *self, tsk_id_t site, tsk_id_t node, tsk_id_t parent, double time, const char *derived_state, tsk_size_t derived_state_length, const char *metadata, tsk_size_t metadata_length)¶ Adds a row to this mutation table.
Add a new mutation with the specified
site
,parent
,derived_state
andmetadata
to the table. Copies ofderived_state
andmetadata
are immediately taken. See the table definition for details of the columns in this table.- Return
Return the ID of the newly added mutation on success, or a negative value on failure.
- Parameters
self
: A pointer to a tsk_mutation_table_t object.site
: The site ID for the new mutation.node
: The ID of the node this mutation occurs over.parent
: The ID of the parent mutation.time
: The time of the mutation.derived_state
: The derived_state for the new mutation.derived_state_length
: The length of the derived_state in bytes.metadata
: The metadata to be associated with the new mutation. This is a pointer to arbitrary memory. Can beNULL
ifmetadata_length
is 0.metadata_length
: The size of the metadata array in bytes.
-
int
tsk_mutation_table_clear
(tsk_mutation_table_t *self)¶ Clears this table, setting the number of rows to zero.
No memory is freed as a result of this operation; please use
tsk_mutation_table_free()
to free the table’s internal resources. Note that the metadata schema is not cleared.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_mutation_table_t object.
-
int
tsk_mutation_table_truncate
(tsk_mutation_table_t *self, tsk_size_t num_rows)¶ Truncates this table so that only the first num_rows are retained.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_mutation_table_t object.num_rows
: The number of rows to retain in the table.
-
bool
tsk_mutation_table_equals
(const tsk_mutation_table_t *self, const tsk_mutation_table_t *other, tsk_flags_t options)¶ Returns true if the data in the specified table is identical to the data in this table.
Options
Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.
- TSK_CMP_IGNORE_METADATA
Do not include metadata or metadata schemas in the comparison.
- Return
Return true if the specified table is equal to this table.
- Parameters
self
: A pointer to a tsk_mutation_table_t object.other
: A pointer to a tsk_mutation_table_t object.options
: Bitwise comparison options.
-
int
tsk_mutation_table_copy
(const tsk_mutation_table_t *self, tsk_mutation_table_t *dest, tsk_flags_t options)¶ Copies the state of this table into the specified destination.
By default the method initialises the specified destination table. If the destination is already initialised, the
TSK_NO_INIT
option should be supplied to avoid leaking memory.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_mutation_table_t object.dest
: A pointer to a tsk_mutation_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised mutation table. If not, it must be an uninitialised mutation table.options
: Bitwise option flags.
-
int
tsk_mutation_table_get_row
(const tsk_mutation_table_t *self, tsk_id_t index, tsk_mutation_t *row)¶ Get the row at the specified index.
Updates the specified mutation struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_mutation_table_t object.index
: The requested table row.row
: A pointer to a tsk_mutation_t struct that is updated to reflect the values in the specified row.
-
int
tsk_mutation_table_set_metadata_schema
(tsk_mutation_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)¶ Set the metadata schema.
Copies the metadata schema string to this table, replacing any existing.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_mutation_table_t object.metadata_schema
: A pointer to a char arraymetadata_schema_length
: The size of the metadata schema in bytes.
-
void
tsk_mutation_table_print_state
(const tsk_mutation_table_t *self, FILE *out)¶ Print out the state of this table to the specified stream.
This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.
- Parameters
self
: A pointer to a tsk_mutation_table_t object.out
: The stream to write the summary to.
Populations¶
-
struct
tsk_population_t
¶ A single population defined by a row in the population table.
See the data model section for the definition of a population and its properties.
Public Members
-
const char *
metadata
¶ Metadata.
-
tsk_size_t
metadata_length
¶ Metadata length in bytes.
-
const char *
-
struct
tsk_population_table_t
¶ The population table.
See the population table definition for details of the columns in this table.
Public Members
-
tsk_size_t
num_rows
¶ The number of rows in this table.
-
tsk_size_t
metadata_length
¶ The total length of the metadata column.
-
char *
metadata
¶ The metadata column.
-
tsk_size_t *
metadata_offset
¶ The metadata_offset column.
-
char *
metadata_schema
¶ The metadata schema.
-
tsk_size_t
-
int
tsk_population_table_init
(tsk_population_table_t *self, tsk_flags_t options)¶ Initialises the table by allocating the internal memory.
This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_population_table_t object.options
: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_population_table_free
(tsk_population_table_t *self)¶ Free the internal memory for the specified table.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_population_table_t object.
-
tsk_id_t
tsk_population_table_add_row
(tsk_population_table_t *self, const char *metadata, tsk_size_t metadata_length)¶ Adds a row to this population table.
Add a new population with the specified
metadata
to the table. A copy of themetadata
is immediately taken. See the table definition for details of the columns in this table.- Return
Return the ID of the newly added population on success, or a negative value on failure.
- Parameters
self
: A pointer to a tsk_population_table_t object.metadata
: The metadata to be associated with the new population. This is a pointer to arbitrary memory. Can beNULL
ifmetadata_length
is 0.metadata_length
: The size of the metadata array in bytes.
-
int
tsk_population_table_clear
(tsk_population_table_t *self)¶ Clears this table, setting the number of rows to zero.
No memory is freed as a result of this operation; please use
tsk_population_table_free()
to free the table’s internal resources. Note that the metadata schema is not cleared.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_population_table_t object.
-
int
tsk_population_table_truncate
(tsk_population_table_t *self, tsk_size_t num_rows)¶ Truncates this table so that only the first num_rows are retained.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_population_table_t object.num_rows
: The number of rows to retain in the table.
-
bool
tsk_population_table_equals
(const tsk_population_table_t *self, const tsk_population_table_t *other, tsk_flags_t options)¶ Returns true if the data in the specified table is identical to the data in this table.
Options
Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.
- TSK_CMP_IGNORE_METADATA
Do not include metadata in the comparison. Note that as metadata is the only column in the population table, two population tables are considered equal if they have the same number of rows if this flag is specified.
- Return
Return true if the specified table is equal to this table.
- Parameters
self
: A pointer to a tsk_population_table_t object.other
: A pointer to a tsk_population_table_t object.options
: Bitwise comparison options.
-
int
tsk_population_table_copy
(const tsk_population_table_t *self, tsk_population_table_t *dest, tsk_flags_t options)¶ Copies the state of this table into the specified destination.
By default the method initialises the specified destination table. If the destination is already initialised, the
TSK_NO_INIT
option should be supplied to avoid leaking memory.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_population_table_t object.dest
: A pointer to a tsk_population_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised population table. If not, it must be an uninitialised population table.options
: Bitwise option flags.
-
int
tsk_population_table_get_row
(const tsk_population_table_t *self, tsk_id_t index, tsk_population_t *row)¶ Get the row at the specified index.
Updates the specified population struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_population_table_t object.index
: The requested table row.row
: A pointer to a tsk_population_t struct that is updated to reflect the values in the specified row.
-
int
tsk_population_table_set_metadata_schema
(tsk_population_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)¶ Set the metadata schema.
Copies the metadata schema string to this table, replacing any existing.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_population_table_t object.metadata_schema
: A pointer to a char arraymetadata_schema_length
: The size of the metadata schema in bytes.
-
void
tsk_population_table_print_state
(const tsk_population_table_t *self, FILE *out)¶ Print out the state of this table to the specified stream.
This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.
- Parameters
self
: A pointer to a tsk_population_table_t object.out
: The stream to write the summary to.
Provenances¶
-
struct
tsk_provenance_t
¶ A single provenance defined by a row in the provenance table.
See the data model section for the definition of a provenance object and its properties. See the Provenance section for more information on how provenance records should be structured.
Public Members
-
const char *
timestamp
¶ The timestamp.
-
tsk_size_t
timestamp_length
¶ The timestamp length in bytes.
-
const char *
record
¶ The record.
-
tsk_size_t
record_length
¶ The record length in bytes.
-
const char *
-
struct
tsk_provenance_table_t
¶ The provenance table.
See the provenance table definition for details of the columns in this table.
Public Members
-
tsk_size_t
num_rows
¶ The number of rows in this table.
-
tsk_size_t
timestamp_length
¶ The total length of the timestamp column.
-
tsk_size_t
record_length
¶ The total length of the record column.
-
char *
timestamp
¶ The timestamp column.
-
tsk_size_t *
timestamp_offset
¶ The timestamp_offset column.
-
char *
record
¶ The record column.
-
tsk_size_t *
record_offset
¶ The record_offset column.
-
tsk_size_t
-
int
tsk_provenance_table_init
(tsk_provenance_table_t *self, tsk_flags_t options)¶ Initialises the table by allocating the internal memory.
This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_provenance_table_t object.options
: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.
-
int
tsk_provenance_table_free
(tsk_provenance_table_t *self)¶ Free the internal memory for the specified table.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_provenance_table_t object.
-
tsk_id_t
tsk_provenance_table_add_row
(tsk_provenance_table_t *self, const char *timestamp, tsk_size_t timestamp_length, const char *record, tsk_size_t record_length)¶ Adds a row to this provenance table.
Add a new provenance with the specified
timestamp
andrecord
to the table. Copies of thetimestamp
andrecord
are immediately taken. See the table definition for details of the columns in this table.- Return
Return the ID of the newly added provenance on success, or a negative value on failure.
- Parameters
self
: A pointer to a tsk_provenance_table_t object.timestamp
: The timestamp to be associated with the new provenance. This is a pointer to arbitrary memory. Can beNULL
iftimestamp_length
is 0.timestamp_length
: The size of the timestamp array in bytes.record
: The record to be associated with the new provenance. This is a pointer to arbitrary memory. Can beNULL
ifrecord_length
is 0.record_length
: The size of the record array in bytes.
-
int
tsk_provenance_table_clear
(tsk_provenance_table_t *self)¶ Clears this table, setting the number of rows to zero.
No memory is freed as a result of this operation; please use
tsk_provenance_table_free()
to free the table’s internal resources.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_provenance_table_t object.
-
int
tsk_provenance_table_truncate
(tsk_provenance_table_t *self, tsk_size_t num_rows)¶ Truncates this table so that only the first num_rows are retained.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_provenance_table_t object.num_rows
: The number of rows to retain in the table.
-
bool
tsk_provenance_table_equals
(const tsk_provenance_table_t *self, const tsk_provenance_table_t *other, tsk_flags_t options)¶ Returns true if the data in the specified table is identical to the data in this table.
Options
Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns.
- TSK_CMP_IGNORE_TIMESTAMPS
Do not include the timestamp column when comparing provenance tables.
- Return
Return true if the specified table is equal to this table.
- Parameters
self
: A pointer to a tsk_provenance_table_t object.other
: A pointer to a tsk_provenance_table_t object.options
: Bitwise comparison options.
-
int
tsk_provenance_table_copy
(const tsk_provenance_table_t *self, tsk_provenance_table_t *dest, tsk_flags_t options)¶ Copies the state of this table into the specified destination.
By default the method initialises the specified destination table. If the destination is already initialised, the
TSK_NO_INIT
option should be supplied to avoid leaking memory.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_provenance_table_t object.dest
: A pointer to a tsk_provenance_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised provenance table. If not, it must be an uninitialised provenance table.options
: Bitwise option flags.
-
int
tsk_provenance_table_get_row
(const tsk_provenance_table_t *self, tsk_id_t index, tsk_provenance_t *row)¶ Get the row at the specified index.
Updates the specified provenance struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.
- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_provenance_table_t object.index
: The requested table row.row
: A pointer to a tsk_provenance_t struct that is updated to reflect the values in the specified row.
-
void
tsk_provenance_table_print_state
(const tsk_provenance_table_t *self, FILE *out)¶ Print out the state of this table to the specified stream.
This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.
- Parameters
self
: A pointer to a tsk_provenance_table_t object.out
: The stream to write the summary to.
Table indexes¶
Along with the tree sequence ordering requirements, the Table indexes allow us to take a table collection and efficiently operate on the trees defined within it. This section defines the rules for safely operating on table indexes and their life-cycle.
The edge index used for tree generation consists of two arrays,
each holding N
edge IDs (where N
is the size of the edge
table). When the index is computed using
tsk_table_collection_build_index()
, we store the current size
of the edge table along with the two arrays of edge IDs. The
function tsk_table_collection_has_index()
then returns true
iff (a) both of these arrays are not NULL and (b) the stored
number of edges is the same as the current size of the edge table.
Updating the edge table does not automatically invalidate the indexes.
Thus, if we call tsk_edge_table_clear()
on an edge table
which has an index, this index will still exist. However, it will
not be considered a valid index by
tsk_table_collection_has_index()
because of the size mismatch.
Similarly for functions that increase the size of the table.
Note that it is possible then to have
tsk_table_collection_has_index()
return true, but the index
is not actually valid, if, for example, the user has manipulated the
node and edge tables to describe a different topology, which happens
to have the same number of edges. The behaviour of methods that
use the indexes will be undefined in this case.
Thus, if you are manipulating an existing table collection that may
be indexed, it is always recommended to call
tsk_table_collection_drop_index()
first.
Tree sequences¶
Warning
This part of the API is more preliminary and may be subject to change.
-
struct
tsk_treeseq_t
¶ The tree sequence object.
-
int
tsk_treeseq_init
(tsk_treeseq_t *self, const tsk_table_collection_t *tables, tsk_flags_t options)¶
-
int
tsk_treeseq_load
(tsk_treeseq_t *self, const char *filename, tsk_flags_t options)¶
-
int
tsk_treeseq_loadf
(tsk_treeseq_t *self, FILE *file, tsk_flags_t options)¶
-
int
tsk_treeseq_dump
(tsk_treeseq_t *self, const char *filename, tsk_flags_t options)¶
-
int
tsk_treeseq_dumpf
(tsk_treeseq_t *self, FILE *file, tsk_flags_t options)¶
-
int
tsk_treeseq_copy_tables
(const tsk_treeseq_t *self, tsk_table_collection_t *tables, tsk_flags_t options)¶
-
int
tsk_treeseq_free
(tsk_treeseq_t *self)¶
-
void
tsk_treeseq_print_state
(const tsk_treeseq_t *self, FILE *out)¶
Trees¶
Warning
This part of the API is more preliminary and may be subject to change.
-
struct
tsk_tree_t
¶ A single tree in a tree sequence.
A
tsk_tree_t
object has two basic functions:Represent the state of a single tree in a tree sequence;
Provide methods to transform this state into different trees in the sequence.
The state of a single tree in the tree sequence is represented using the quintuply linked encoding: please see the data model section for details on how this works. The left-to-right ordering of nodes in this encoding is arbitrary, and may change depending on the order in which trees are accessed within the sequence. Please see the Tree traversals examples for recommended usage.
On initialisation, a tree is in a “null” state: each sample is a root and there are no edges. We must call one of the ‘seeking’ methods to make the state of the tree object correspond to a particular tree in the sequence. Please see the Tree iteration examples for recommended usage.
Public Members
-
const tsk_treeseq_t *
tree_sequence
¶ The parent tree sequence.
-
tsk_id_t
left_root
¶ The leftmost root in the tree. Roots are siblings, and other roots can be found using right_sib.
-
tsk_id_t *
parent
¶ The parent of node u is parent[u]. Equal to TSK_NULL if node u is a root or is not a node in the current tree.
-
tsk_id_t *
left_child
¶ The leftmost child of node u is left_child[u]. Equal to TSK_NULL if node u is a leaf or is not a node in the current tree.
-
tsk_id_t *
right_child
¶ The rightmost child of node u is right_child[u]. Equal to TSK_NULL if node u is a leaf or is not a node in the current tree.
-
int
tsk_tree_init
(tsk_tree_t *self, const tsk_treeseq_t *tree_sequence, tsk_flags_t options)¶
-
int
tsk_tree_free
(tsk_tree_t *self)¶
-
tsk_id_t
tsk_tree_get_index
(const tsk_tree_t *self)¶
-
tsk_size_t
tsk_tree_get_num_roots
(const tsk_tree_t *self)¶
-
int
tsk_tree_first
(tsk_tree_t *self)¶
-
int
tsk_tree_last
(tsk_tree_t *self)¶
-
int
tsk_tree_next
(tsk_tree_t *self)¶
-
int
tsk_tree_prev
(tsk_tree_t *self)¶
-
int
tsk_tree_clear
(tsk_tree_t *self)¶
-
void
tsk_tree_print_state
(const tsk_tree_t *self, FILE *out)¶
Low-level sorting¶
In some highly performance sensitive cases it can be useful to have more control over the process of sorting tables. This low-level API allows a user to provide their own edge sorting function. This can be useful, for example, to use parallel sorting algorithms, or to take advantage of the more efficient sorting procedures available in C++. It is the user’s responsibility to ensure that the edge sorting requirements are fulfilled by this function.
Todo
Create an idiomatic C++11 example where we load a table collection file from argv[1], and sort the edges using std::sort, based on the example in tests/test_minimal_cpp.cpp. We can include this in the examples below, and link to it here.
-
struct
_tsk_table_sorter_t
¶ Low-level table sorting method.
Public Members
-
tsk_table_collection_t *
tables
¶ The input tables that are being sorted.
-
int (*
sort_edges
)(struct _tsk_table_sorter_t *self, tsk_size_t start)¶ The edge sorting function. If set to NULL, edges are not sorted.
-
void *
user_data
¶ An opaque pointer for use by client code.
-
tsk_table_collection_t *
-
int
tsk_table_sorter_init
(struct _tsk_table_sorter_t *self, tsk_table_collection_t *tables, tsk_flags_t options)¶ Initialises the memory for the sorter object.
This must be called before any operations are performed on the table sorter and initialises all fields. The
edge_sort
function is set to the default method using qsort. Theuser_data
field is set to NULL. This method supports the same options astsk_table_collection_sort()
.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to an uninitialised tsk_table_sorter_t object.tables
: The table collection to sort.options
: Sorting options.
-
int
tsk_table_sorter_run
(struct _tsk_table_sorter_t *self, const tsk_bookmark_t *start)¶ Runs the sort using the configured functions.
Runs the sorting process:
Drop the table indexes.
If the
sort_edges
function pointer is not NULL, run it. The first parameter to the called function will be a pointer to this table_sorter_t object. The second parameter will be the valuestart.edges
. This specifies the offset at which sorting should start in the edge table. This offset is guaranteed to be within the bounds of the edge table.Sort the site table, building the mapping between site IDs in the current and sorted tables.
Sort the mutation table.
If an error occurs during the execution of a user-supplied sorting function a non-zero value must be returned. This value will then be returned by
tsk_table_sorter_run
. The error return value should be chosen to avoid conflicts with tskit error codes.See
tsk_table_collection_sort()
for details on thestart
parameter.- Return
Return 0 on success or a negative value on failure.
- Parameters
self
: A pointer to a tsk_table_sorter_t object.start
: The position in the tables at which sorting starts.
-
int
tsk_table_sorter_free
(struct _tsk_table_sorter_t *self)¶ Free the internal memory for the specified table sorter.
- Return
Always returns 0.
- Parameters
self
: A pointer to an initialised tsk_table_sorter_t object.
Miscellaneous functions¶
-
const char *
tsk_strerror
(int err)¶ Return a description of the specified error.
The memory for the returned string is handled by the library and should not be freed by client code.
- Return
A description of the error.
- Parameters
err
: A tskit error code.
Constants¶
API Version¶
-
TSK_VERSION_MAJOR
0¶ The library major version. Incremented when breaking changes to the API or ABI are introduced. This includes any changes to the signatures of functions and the sizes and types of externally visible structs.
-
TSK_VERSION_MINOR
99¶ The library major version. Incremented when non-breaking backward-compatible changes to the API or ABI are introduced, i.e., the addition of a new function.
-
TSK_VERSION_PATCH
8¶ The library patch version. Incremented when any changes not relevant to the to the API or ABI are introduced, i.e., internal refactors of bugfixes.
Generic Errors¶
-
TSK_ERR_GENERIC
-1¶ Generic error thrown when no other message can be generated.
-
TSK_ERR_NO_MEMORY
-2¶ Memory could not be allocated.
-
TSK_ERR_IO
-3¶ An IO error occured.
-
TSK_ERR_BAD_PARAM_VALUE
-4¶
-
TSK_ERR_BUFFER_OVERFLOW
-5¶
-
TSK_ERR_UNSUPPORTED_OPERATION
-6¶
-
TSK_ERR_GENERATE_UUID
-7¶
-
TSK_ERR_EOF
-8¶ The file stream ended after reading zero bytes.
File format errors¶
-
TSK_ERR_FILE_FORMAT
-100¶ A file could not be read because it is in the wrong format
-
TSK_ERR_FILE_VERSION_TOO_OLD
-101¶ The file is in tskit format, but the version is too old for the library to read. The file should be upgraded to the latest version using the
tskit upgrade
command line utility.
-
TSK_ERR_FILE_VERSION_TOO_NEW
-102¶ The file is in tskit format, but the version is too new for the library to read. To read the file you must upgrade the version of tskit.
Todo
Add in groups for rest of the error types and document.
Examples¶
Basic forwards simulator¶
This is an example of using the tables API to define a simple haploid Wright-Fisher simulator. Because this simple example repeatedly sorts the edge data, it is quite inefficient and should not be used as the basis of a large-scale simulator.
Note
This example uses the C function rand
and constant
RAND_MAX
for random number generation. These methods
are used for example purposes only and a high-quality
random number library should be preferred for code
used for research. Examples include, but are not
limited to:
The GNU Scientific Library, which is licensed under the GNU General Public License, version 3 (GPL3+.
For C++ projects using C++11 or later, the built-in random number library.
The numpy C API may be useful for those writing Python extension modules in C/C++.
Todo
Give a pointer to an example that caches and flushes edge data efficiently. Probably using the C++ API?
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <err.h>
#include <tskit/tables.h>
#define check_tsk_error(val) \
if (val < 0) { \
errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val)); \
}
void
simulate(
tsk_table_collection_t *tables, int N, int T, int simplify_interval)
{
tsk_id_t *buffer, *parents, *children, child, left_parent, right_parent;
double breakpoint;
int ret, j, t, b;
assert(simplify_interval != 0); // leads to division by zero
buffer = malloc(2 * N * sizeof(tsk_id_t));
if (buffer == NULL) {
errx(EXIT_FAILURE, "Out of memory");
}
tables->sequence_length = 1.0;
parents = buffer;
for (j = 0; j < N; j++) {
parents[j]
= tsk_node_table_add_row(&tables->nodes, 0, T, TSK_NULL, TSK_NULL, NULL, 0);
check_tsk_error(parents[j]);
}
b = 0;
for (t = T - 1; t >= 0; t--) {
/* Alternate between using the first and last N values in the buffer */
parents = buffer + (b * N);
b = (b + 1) % 2;
children = buffer + (b * N);
for (j = 0; j < N; j++) {
child = tsk_node_table_add_row(
&tables->nodes, 0, t, TSK_NULL, TSK_NULL, NULL, 0);
check_tsk_error(child);
/* NOTE: the use of rand() is discouraged for
* research code and proper random number generator
* libraries should be preferred.
*/
left_parent = parents[(size_t)((rand()/(1.+RAND_MAX))*N)];
right_parent = parents[(size_t)((rand()/(1.+RAND_MAX))*N)];
do {
breakpoint = rand()/(1.+RAND_MAX);
} while (breakpoint == 0); /* tiny proba of breakpoint being 0 */
ret = tsk_edge_table_add_row(
&tables->edges, 0, breakpoint, left_parent, child, NULL, 0);
check_tsk_error(ret);
ret = tsk_edge_table_add_row(
&tables->edges, breakpoint, 1, right_parent, child, NULL, 0);
check_tsk_error(ret);
children[j] = child;
}
if (t % simplify_interval == 0) {
printf("Simplify at generation %d: (%d nodes %d edges)",
t,
tables->nodes.num_rows,
tables->edges.num_rows);
/* Note: Edges must be sorted for simplify to work, and we use a brute force
* approach of sorting each time here for simplicity. This is inefficient. */
ret = tsk_table_collection_sort(tables, NULL, 0);
check_tsk_error(ret);
ret = tsk_table_collection_simplify(tables, children, N, 0, NULL);
check_tsk_error(ret);
printf(" -> (%d nodes %d edges)\n",
tables->nodes.num_rows,
tables->edges.num_rows);
for (j = 0; j < N; j++) {
children[j] = j;
}
}
}
free(buffer);
}
int
main(int argc, char **argv)
{
int ret;
tsk_table_collection_t tables;
if (argc != 6) {
errx(EXIT_FAILURE, "usage: N T simplify-interval output-file seed");
}
ret = tsk_table_collection_init(&tables, 0);
check_tsk_error(ret);
srand((unsigned)atoi(argv[5]));
simulate(&tables, atoi(argv[1]), atoi(argv[2]), atoi(argv[3]));
ret = tsk_table_collection_dump(&tables, argv[4], 0);
check_tsk_error(ret);
tsk_table_collection_free(&tables);
return 0;
}
Tree iteration¶
#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <tskit.h>
#define check_tsk_error(val) \
if (val < 0) { \
errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val)); \
}
int
main(int argc, char **argv)
{
int ret, iter;
tsk_treeseq_t ts;
tsk_tree_t tree;
if (argc != 2) {
errx(EXIT_FAILURE, "usage: <tree sequence file>");
}
ret = tsk_treeseq_load(&ts, argv[1], 0);
check_tsk_error(ret);
ret = tsk_tree_init(&tree, &ts, 0);
check_tsk_error(ret);
printf("Iterate forwards\n");
for (iter = tsk_tree_first(&tree); iter == 1; iter = tsk_tree_next(&tree)) {
printf("\ttree %d has %d roots\n",
tsk_tree_get_index(&tree),
tsk_tree_get_num_roots(&tree));
}
check_tsk_error(iter);
printf("Iterate backwards\n");
for (iter = tsk_tree_last(&tree); iter == 1; iter = tsk_tree_prev(&tree)) {
printf("\ttree %d has %d roots\n",
tsk_tree_get_index(&tree),
tsk_tree_get_num_roots(&tree));
}
check_tsk_error(iter);
tsk_tree_free(&tree);
tsk_treeseq_free(&ts);
return 0;
}
Tree traversals¶
In this example we load a tree sequence file, and then traverse the first tree in three different ways:
We first traverse the tree in preorder using recursion. This is a very common way of navigating around trees and can be very convenient for some applications. For example, here we compute the depth of each node (i.e., it’s distance from the root) and use this when printing out the nodes as we visit them.
Then we traverse the tree in preorder using an iterative approach. This is a little more efficient than using recursion, and is sometimes more convenient than structuring the calculation recursively. Note that we allocate a stack here with space to hold the total number of nodes in the tree sequence. This is safe, but it likely to be a massive over estimate. However, this makes very little difference in practise even for tree sequences with millions of nodes since it’s likely only the first page (usually 4K) will be written to and the rest of the stack will never therefore be mapped to physical memory.
In the third example we iterate upwards from the samples rather than downwards from the root.
#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <tskit.h>
#define check_tsk_error(val) \
if (val < 0) { \
errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val)); \
}
static void
_traverse(tsk_tree_t *tree, tsk_id_t u, int depth)
{
tsk_id_t v;
int j;
for (j = 0; j < depth; j++) {
printf(" ");
}
printf("Visit recursive %d\n", u);
for (v = tree->left_child[u]; v != TSK_NULL; v = tree->right_sib[v]) {
_traverse(tree, v, depth + 1);
}
}
static void
traverse_recursive(tsk_tree_t *tree)
{
tsk_id_t root;
for (root = tree->left_root; root != TSK_NULL; root = tree->right_sib[root]) {
_traverse(tree, root, 0);
}
}
static void
traverse_stack(tsk_tree_t *tree)
{
int stack_top;
tsk_id_t u, v, root;
tsk_id_t *stack
= malloc(tsk_treeseq_get_num_nodes(tree->tree_sequence) * sizeof(*stack));
if (stack == NULL) {
errx(EXIT_FAILURE, "Out of memory");
}
for (root = tree->left_root; root != TSK_NULL; root = tree->right_sib[root]) {
stack_top = 0;
stack[stack_top] = root;
while (stack_top >= 0) {
u = stack[stack_top];
stack_top--;
printf("Visit stack %d\n", u);
/* Put nodes on the stack right-to-left, so we visit in left-to-right */
for (v = tree->right_child[u]; v != TSK_NULL; v = tree->left_sib[v]) {
stack_top++;
stack[stack_top] = v;
}
}
}
free(stack);
}
static void
traverse_upwards(tsk_tree_t *tree)
{
const tsk_id_t *samples = tsk_treeseq_get_samples(tree->tree_sequence);
tsk_size_t num_samples = tsk_treeseq_get_num_samples(tree->tree_sequence);
tsk_size_t j;
tsk_id_t u;
for (j = 0; j < num_samples; j++) {
u = samples[j];
while (u != TSK_NULL) {
printf("Visit upwards: %d\n", u);
u = tree->parent[u];
}
}
}
int
main(int argc, char **argv)
{
int ret;
tsk_treeseq_t ts;
tsk_tree_t tree;
if (argc != 2) {
errx(EXIT_FAILURE, "usage: <tree sequence file>");
}
ret = tsk_treeseq_load(&ts, argv[1], 0);
check_tsk_error(ret);
ret = tsk_tree_init(&tree, &ts, 0);
check_tsk_error(ret);
ret = tsk_tree_first(&tree);
check_tsk_error(ret);
traverse_recursive(&tree);
traverse_stack(&tree);
traverse_upwards(&tree);
tsk_tree_free(&tree);
tsk_treeseq_free(&ts);
return 0;
}
File streaming¶
It is often useful to read tree sequence files from a stream rather than
from a fixed filename. This example shows how to do this using the
tsk_table_collection_loadf()
and
tsk_table_collection_dumpf()
functions. Here, we sequentially
load table collections from the stdin
stream and write them
back out to stdout
with their mutations removed.
#include <stdio.h>
#include <stdlib.h>
#include <tskit/tables.h>
#define check_tsk_error(val) \
if (val < 0) { \
fprintf(stderr, "Error: line %d: %s\n", __LINE__, tsk_strerror(val)); \
exit(EXIT_FAILURE); \
}
int
main(int argc, char **argv)
{
int ret;
int j = 0;
tsk_table_collection_t tables;
ret = tsk_table_collection_init(&tables, 0);
check_tsk_error(ret);
while (true) {
ret = tsk_table_collection_loadf(&tables, stdin, TSK_NO_INIT);
if (ret == TSK_ERR_EOF) {
break;
}
check_tsk_error(ret);
fprintf(stderr, "Tree sequence %d had %d mutations\n", j,
(int) tables.mutations.num_rows);
ret = tsk_mutation_table_truncate(&tables.mutations, 0);
check_tsk_error(ret);
ret = tsk_table_collection_dumpf(&tables, stdout, 0);
check_tsk_error(ret);
j++;
}
tsk_table_collection_free(&tables);
return EXIT_SUCCESS;
}
Note that we use the value TSK_ERR_EOF
to detect when the stream
ends, as we don’t know how many tree sequences to expect on the input.
In this case, TSK_ERR_EOF
is not considered an error and we exit
normally.
Running this program on some tree sequence files we might get:
$ cat tmp1.trees tmp2.trees | ./build/streaming > no_mutations.trees
Tree sequence 0 had 38 mutations
Tree sequence 1 had 132 mutations
Then, running this program again on the output of the previous command,
we see that we now have two tree sequences with their mutations removed
stored in the file no_mutations.trees
:
$ ./build/streaming < no_mutations.trees > /dev/null
Tree sequence 0 had 0 mutations
Tree sequence 1 had 0 mutations