C API

This is the documentation for the tskit C API, a low-level library for manipulating and processing tree sequence data. The library is written using the C99 standard and is fully thread safe. Tskit uses kastore to define a simple storage format for the tree sequence data.

To see the API in action, please see Examples section.

Overview

Do I need the C API?

The tskit C API is generally useful in the following situations:

  • You want to use the tskit API in a larger C/C++ application (e.g., in order to output data in the .trees format);

  • You need to perform lots of tree traversals/loops etc to analyse some data that is in tree sequence form.

For high level operations that are not performance sensitive, the Python API is generally more useful. Python is much more convenient that C, and since the tskit Python module is essentially a wrapper for this C library, there’s often no real performance penalty for using it.

API structure

Tskit uses a set of conventions to provide a pseudo object oriented API. Each ‘object’ is represented by a C struct and has a set of ‘methods’. This is most easily explained by an example:

#include <stdio.h>
#include <stdlib.h>
#include <tskit/tables.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        fprintf(stderr, "line %d: %s", __LINE__, tsk_strerror(val));                    \
        exit(EXIT_FAILURE);                                                             \
    }

int
main(int argc, char **argv)
{
    int j, ret;
    tsk_edge_table_t edges;

    ret = tsk_edge_table_init(&edges, 0);
    check_tsk_error(ret);
    for (j = 0; j < 5; j++) {
        ret = tsk_edge_table_add_row(&edges, 0, 1, j + 1, j, NULL, 0);
        check_tsk_error(ret);
    }
    tsk_edge_table_print_state(&edges, stdout);
    tsk_edge_table_free(&edges);

    return EXIT_SUCCESS;
}

In this program we create a tsk_edge_table_t instance, add five rows using tsk_edge_table_add_row(), print out its contents using the tsk_edge_table_print_state() debugging method, and finally free the memory used by the edge table object. We define this edge table ‘class’ by using some simple naming conventions which are adhered to throughout tskit. This is simply a naming convention that helps to keep code written in plain C logically structured; there are no extra C++ style features. We use object oriented terminology freely throughout this documentation with this understanding.

In this convention, a class is defined by a struct class_name_t (e.g. edge_table_t) and its methods all have the form class_name_method_name whose first argument is always a pointer to an instance of the class (e.g., edge_table_add_row above). Each class has an initialise and free method, called class_name_init and class_name_free, respectively. The init method must be called to ensure that the object is correctly initialised (except for functions such as for tsk_table_collection_load() and tsk_table_collection_copy() which automatically initialise the object by default for convenience). The free method must always be called to avoid leaking memory, even in the case of an error occuring during initialisation. If class_name_init has been called succesfully, we say the object has been “initialised”; if not, it is “uninitialised”. After class_name_free has been called, the object is again uninitialised.

It is important to note that the init methods only allocate internal memory; the memory for the instance itself must be allocated either on the heap or the stack:

// Instance allocated on the stack
tsk_node_table_t nodes;
tsk_node_table_init(&nodes, 0);
tsk_node_table_free(&nodes);

// Instance allocated on the heap
tsk_edge_table_t *edges = malloc(sizeof(tsk_edge_table_t));
tsk_edge_table_init(edges, 0);
tsk_edge_table_free(edges);
free(edges);

Error handling

C does not have a mechanism for propagating exceptions, and great care must be taken to ensure that errors are correctly and safely handled. The convention adopted in tskit is that every function (except for trivial accessor methods) returns an integer. If this return value is negative an error has occured which must be handled. A description of the error that occured can be obtained using the tsk_strerror() function. The following example illustrates the key conventions around error handling in tskit:

#include <stdio.h>
#include <stdlib.h>
#include <tskit.h>

int
main(int argc, char **argv)
{
    int ret;
    tsk_treeseq_t ts;

    if (argc != 2) {
        fprintf(stderr, "usage: <tree sequence file>");
        exit(EXIT_FAILURE);
    }
    ret = tsk_treeseq_load(&ts, argv[1], 0);
    if (ret < 0) {
        /* Error condition. Free and exit */
        tsk_treeseq_free(&ts);
        fprintf(stderr, "%s", tsk_strerror(ret));
        exit(EXIT_FAILURE);
    }
    printf("Loaded tree sequence with %d nodes and %d edges from %s\n",
        tsk_treeseq_get_num_nodes(&ts),
        tsk_treeseq_get_num_edges(&ts),
        argv[1]);
    tsk_treeseq_free(&ts);

    return EXIT_SUCCESS;
}

In this example we load a tree sequence from file and print out a summary of the number of nodes and edges it contains. After calling tsk_treeseq_load() we check the return value ret to see if an error occured. If an error has occured we exit with an error message produced by tsk_strerror(). Note that in this example we call tsk_treeseq_free() whether or not an error occurs: in general, once a function that initialises an object (e.g., X_init, X_copy or X_load) is called, then X_free must be called to ensure that memory is not leaked.

Most functions in tskit return an error status; we recommend that every return value is checked.

Using tskit in your project

Tskit is built as a standard C library and so there are many different ways in which it can be included in downstream projects. It is possible to install tskit onto a system (i.e., installing a shared library and header files to a standard locations on Unix) and linking against it, but there are many different ways in which this can go wrong. In the interest of simplicity and improving the end-user experience we recommend embedding tskit directly into your applications.

There are many different build systems and approaches to compiling code, and so it’s not possible to give definitive documentation on how tskit should be included in downstream projects. Please see the build examples repo for some examples of how to incorporate tskit into different project structures and build systems.

Tskit uses the meson build system internally, and supports being used a meson subproject. We show an example in which this is combined with git submodules to neatly abstract many details of cross platform C development.

Some users may choose to check the source for tskit (and kastore) directly into their source control repositories. If you wish to do this, the code is in the c subdirectory of the tskit and kastore repos. The following header files should be placed in the search path: kastore.h, tskit.h, and tskit/*.h. The C files kastore.c and tskit*.c should be compiled. For those who wish to minimise the size of their compiled binaries, tskit is quite modular, and C files can be omitted if not needed. For example, if you are just using the Tables API then only the files tskit/core.[c,h] and tskit/tables.[c,h] are needed.

However you include tskit in your project, however, please ensure that it is a released version. Released versions are tagged on GitHub using the convention C_{VERSION}. The code can either be downloaded from GitHub on the releases page or checked out using git. For example, to check out the C_0.99.1 release:

$ git clone https://github.com/tskit-dev/tskit.git
$ cd tskit
$ git checkout C_0.99.1

Git submodules may also be considered—see the example for how to set these up and to check out at a specific release.

Basic Types

typedef int32_t tsk_id_t

Tskit Object IDs.

All objects in tskit are referred to by integer IDs corresponding to the row they occupy in the relevant table. The tsk_id_t type should be used when manipulating these ID values. The reserved value TSK_NULL (-1) defines missing data.

typedef uint32_t tsk_size_t

Tskit sizes.

Sizes in tskit are defined by the tsk_size_t type.

typedef uint32_t tsk_flags_t

Container for bitwise flags.

Bitwise flags are used in tskit as a column type and also as a way to specify options to API functions.

Common options

TSK_DEBUG (1u << 31)

Turn on debugging output. Not supported by all functions.

TSK_NO_INIT (1u << 30)

Do not initialise the parameter object.

TSK_NO_CHECK_INTEGRITY (1u << 29)

Do not run integrity checks before performing an operation.

Tables API

The tables API section of tskit is defined in the tskit/tables.h header.

Table collections

struct tsk_table_collection_t

A collection of tables defining the data for a tree sequence.

Public Members

double sequence_length

The sequence length defining the tree sequence’s coordinate space.

char *metadata

The tree-sequence metadata.

char *metadata_schema

The metadata schema.

tsk_individual_table_t individuals

The individual table.

tsk_node_table_t nodes

The node table.

tsk_edge_table_t edges

The edge table.

tsk_migration_table_t migrations

The migration table.

tsk_site_table_t sites

The site table.

tsk_mutation_table_t mutations

The mutation table.

tsk_population_table_t populations

The population table.

tsk_provenance_table_t provenances

The provenance table.

struct tsk_bookmark_t

A bookmark recording the position of all the tables in a table collection.

Public Members

tsk_size_t individuals

The position in the individual table.

tsk_size_t nodes

The position in the node table.

tsk_size_t edges

The position in the edge table.

tsk_size_t migrations

The position in the migration table.

tsk_size_t sites

The position in the site table.

tsk_size_t mutations

The position in the mutation table.

tsk_size_t populations

The position in the population table.

tsk_size_t provenances

The position in the provenance table.

int tsk_table_collection_init(tsk_table_collection_t *self, tsk_flags_t options)

Initialises the table collection by allocating the internal memory and initialising all the constituent tables.

This must be called before any operations are performed on the table collection. See the API structure for details on how objects are initialised and freed.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_NO_EDGE_METADATA

Do not allocate space to store metadata in the edge table. Operations attempting to add non-empty metadata to the edge table will fail with error TSK_ERR_METADATA_DISABLED.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_table_collection_t object.

  • options: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_table_collection_free(tsk_table_collection_t *self)

Free the internal memory for the specified table collection.

Return

Always returns 0.

Parameters

int tsk_table_collection_clear(tsk_table_collection_t *self, tsk_flags_t options)

Clears data tables (and optionally provenances and metadata) in this table collection.

By default this operation clears all tables except the provenance table, retaining table metadata schemas and the tree-sequnce level metadata and schema.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_CLEAR_PROVENANCE

Additionally clear the provenance table

TSK_CLEAR_METADATA_SCHEMAS

Additionally clear the table metadata schemas

TSK_CLEAR_TS_METADATA_AND_SCHEMA

Additionally clear the tree-sequence metadata and schema

No memory is freed as a result of this operation; please use tsk_table_collection_free() to free internal resources.

Return

Return 0 on success or a negative value on failure.

Parameters

bool tsk_table_collection_equals(const tsk_table_collection_t *self, const tsk_table_collection_t *other, tsk_flags_t options)

Returns true if the data in the specified table collection is equal to the data in this table collection.

Returns true if the two table collections are equal. The indexes are not considered as these are derived from the tables. We also do not consider the file_uuid, since it is a property of the file that set of tables is stored in.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) two table collections are considered equal if all of the tables are byte-wise identical, and the sequence lengths, metadata and metadata schemas of the two table collections are identical.

TSK_CMP_IGNORE_PROVENANCE

Do not include the provenance table in comparison.

TSK_CMP_IGNORE_METADATA

Do not include metadata when comparing the table collections. This includes both the top-level tree sequence metadata as well as the metadata for each of the tables (i.e, TSK_CMP_IGNORE_TS_METADATA is implied). All metadata schemas are also ignored.

TSK_CMP_IGNORE_TS_METADATA

Do not include the top-level tree sequence metadata and metadata schemas in the comparison.

TSK_CMP_IGNORE_TIMESTAMPS

Do not include the timestamp information when comparing the provenance tables. This has no effect if TSK_CMP_IGNORE_PROVENANCE is specified.

Return

Return true if the specified table collection is equal to this table.

Parameters

int tsk_table_collection_copy(const tsk_table_collection_t *self, tsk_table_collection_t *dest, tsk_flags_t options)

Copies the state of this table collection into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_table_collection_t object.

  • dest: A pointer to a tsk_table_collection_t object. If the TSK_NO_INIT option is specified, this must be an initialised provenance table. If not, it must be an uninitialised provenance table.

  • options: Bitwise option flags.

void tsk_table_collection_print_state(const tsk_table_collection_t *self, FILE *out)

Print out the state of this table collection to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters

int tsk_table_collection_load(tsk_table_collection_t *self, const char *filename, tsk_flags_t options)

Load a table collection from a file path.

Loads the data from the specified file into this table collection. By default, the table collection is also initialised. The resources allocated must be freed using tsk_table_collection_free() even in error conditions.

If the TSK_NO_INIT option is set, the table collection is not initialised, allowing an already initialised table collection to be overwritten with the data from a file.

If the file contains multiple table collections, this function will load the first. Please see the tsk_table_collection_loadf() for details on how to sequentially load table collections from a stream.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_NO_INIT

Do not initialise this tsk_table_collection_t before loading.

Examples

int ret;
tsk_table_collection_t tables;
ret = tsk_table_collection_load(&tables, "data.trees", 0);
if (ret != 0) {
    fprintf(stderr, "Load error:%s\n", tsk_strerror(ret));
    exit(EXIT_FAILURE);
}

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_table_collection_t object if the TSK_NO_INIT option is not set (default), or an initialised tsk_table_collection_t otherwise.

  • filename: A NULL terminated string containing the filename.

  • options: Bitwise options. See above for details.

int tsk_table_collection_loadf(tsk_table_collection_t *self, FILE *file, tsk_flags_t options)

Load a table collection from a stream.

Loads a tables definition from the specified file stream to this table collection. By default, the table collection is also initialised. The resources allocated must be freed using tsk_table_collection_free() even in error conditions.

If the TSK_NO_INIT option is set, the table collection is not initialised, allowing an already initialised table collection to be overwritten with the data from a file.

If the stream contains multiple table collection definitions, this function will load the next table collection from the stream. If the stream contains no more table collection definitions the error value TSK_ERR_EOF will be returned. Note that EOF is only returned in the case where zero bytes are read from the stream — malformed files or other errors will result in different error conditions. Please see the File streaming section for an example of how to sequentially load tree sequences from a stream.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_NO_INIT

Do not initialise this tsk_table_collection_t before loading.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_table_collection_t object if the TSK_NO_INIT option is not set (default), or an initialised tsk_table_collection_t otherwise.

  • file: A FILE stream opened in an appropriate mode for reading (e.g. “r”, “r+” or “w+”) positioned at the beginning of a table collection definition.

  • options: Bitwise options. See above for details.

int tsk_table_collection_dump(tsk_table_collection_t *self, const char *filename, tsk_flags_t options)

Write a table collection to file.

Writes the data from this table collection to the specified file. Usually we expect that data written to a file will be in a form that can be read directly and used to create a tree sequence; that is, we assume that by default the tables are sorted and indexed. Following these assumptions, if the tables are not already indexed, we index the tables before writing to file to save the cost of building these indexes at load time. This behaviour requires that the tables are sorted. If this automatic indexing is not desired, it can be disabled using the TSK_NO_BUILD_INDEXES option.

If an error occurs the file path is deleted, ensuring that only complete and well formed files will be written.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_NO_BUILD_INDEXES

Do not build indexes for this table before writing to file. This is useful if you wish to write unsorted tables to file, as building the indexes will raise an error if the table is unsorted.

Examples

int ret;
tsk_table_collection_t tables;

ret = tsk_table_collection_init(&tables, 0);
error_check(ret);
tables.sequence_length = 1.0;
// Write out the empty tree sequence
ret = tsk_table_collection_dump(&tables, "empty.trees", 0);
error_check(ret);

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an initialised tsk_table_collection_t object.

  • filename: A NULL terminated string containing the filename.

  • options: Bitwise options. See above for details.

int tsk_table_collection_dumpf(tsk_table_collection_t *self, FILE *file, tsk_flags_t options)

Write a table collection to a stream.

Writes the data from this table collection to the specified FILE stream. Semantics are identical to tsk_table_collection_dump().

Please see the File streaming section for an example of how to sequentially dump and load tree sequences from a stream.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_NO_BUILD_INDEXES

Do not build indexes for this table before writing to file. This is useful if you wish to write unsorted tables to file, as building the indexes will raise an error if the table is unsorted.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an initialised tsk_table_collection_t object.

  • file: A FILE stream opened in an appropriate mode for writing (e.g. “w”, “a”, “r+” or “w+”).

  • options: Bitwise options. See above for details.

int tsk_table_collection_record_num_rows(const tsk_table_collection_t *self, tsk_bookmark_t *bookmark)

Record the number of rows in each table in the specified tsk_bookmark_t object.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_table_collection_truncate(tsk_table_collection_t *self, tsk_bookmark_t *bookmark)

Truncates the tables in this table collection according to the specified bookmark.

Truncate the tables in this collection so that each one has the number of rows specified in the parameter tsk_bookmark_t. Use the tsk_table_collection_record_num_rows() function to record the number rows for each table in a table collection at a particular time.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_table_collection_sort(tsk_table_collection_t *self, const tsk_bookmark_t *start, tsk_flags_t options)

Sorts the tables in this collection.

Some of the tables in a table collection must satisfy specific sortedness requirements in order to define a valid tree sequence. This method sorts the edge, site and mutation tables such that these requirements are guaranteed to be fulfilled. The individual, node, population and provenance tables do not have any sortedness requirements, and are therefore ignored by this method.

Note

The current implementation may sort in such a way that exceeds these requirements, but this behaviour should not be relied upon and later versions may weaken the level of sortedness. However, the method does guarantee that the resulting tables describes a valid tree sequence.

Warning

Sorting migrations is currently not supported and an error will be raised if a table collection containing a non-empty migration table is specified.

The specified tsk_bookmark_t allows us to specify a start position for sorting in each of the tables; rows before this value are assumed to already be in sorted order and this information is used to make sorting more efficient. Positions in tables that are not sorted (individual, node, population and provenance) are ignored and can be set to arbitrary values.

Warning

The current implementation only supports specifying a start position for the edge table and in a limited form for the site and mutation tables. Specifying a non-zero migration, start position results in an error. The start positions for the site and mutation tables can either be 0 or the length of the respective tables, allowing these tables to either be fully sorted, or not sorted at all.

The table collection will always be unindexed after sort successfully completes.

See the table sorting section for more details. For more control over the sorting process, see the Low-level sorting section.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_NO_CHECK_INTEGRITY

Do not run integrity checks using tsk_table_collection_check_integrity() before sorting, potentially leading to a small reduction in execution time. This performance optimisation should not be used unless the calling code can guarantee reference integrity within the table collection. References to rows not in the table or bad offsets will result in undefined behaviour.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_individual_table_t object.

  • start: The position to begin sorting in each table; all rows less than this position must fulfill the tree sequence sortedness requirements. If this is NULL, sort all rows.

  • options: Sort options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_table_collection_simplify(tsk_table_collection_t *self, const tsk_id_t *samples, tsk_size_t num_samples, tsk_flags_t options, tsk_id_t *node_map)

Simplify the tables to remove redundant information.

Simplification transforms the tables to remove redundancy and canonicalise tree sequence data. See the simplification section for more details.

A mapping from the node IDs in the table before simplification to their equivalent values after simplification can be obtained via the node_map argument. If this is non NULL, node_map[u] will contain the new ID for node u after simplification, or TSK_NULL if the node has been removed. Thus, node_map must be an array of at least self->nodes.num_rows tsk_id_t values.

Options:

Options can be specified by providing one or more of the following bitwise flags:

TSK_FILTER_SITES

Remove sites from the output if there are no mutations that reference them.

TSK_FILTER_POPULATIONS

Remove populations from the output if there are no nodes or migrations that reference them.

TSK_FILTER_INDIVIDUALS

Remove individuals from the output if there are no nodes that reference them.

TSK_REDUCE_TO_SITE_TOPOLOGY

Reduce the topological information in the tables to the minimum necessary to represent the trees that contain sites. If there are zero sites this will result in an zero output edges. When the number of sites is greater than zero, every tree in the output tree sequence will contain at least one site. For a given site, the topology of the tree containing that site will be identical (up to node ID remapping) to the topology of the corresponding tree in the input.

TSK_KEEP_UNARY

By default simplify removes unary nodes (i.e., nodes with exactly one child) along the path from samples to root. If this option is specified such unary nodes will be preserved in the output.

TSK_KEEP_INPUT_ROOTS

By default simplify removes all topology ancestral the MRCAs of the samples. This option inserts edges from these MRCAs back to the roots of the input trees.

Note

Migrations are currently not supported by simplify, and an error will be raised if we attempt call simplify on a table collection with greater than zero migrations. See https://github.com/tskit-dev/tskit/issues/20

The table collection will always be unindexed after simplify successfully completes.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_individual_table_t object.

  • samples: Either NULL or an array of num_samples distinct and valid node IDs. If non-null the nodes in this array will be marked as samples in the output. If NULL, the num_samples parameter is ignored and the samples in the output will be the same as the samples in the input. This is equivalent to populating the samples array with all of the sample nodes in the input in increasing order of ID.

  • num_samples: The number of node IDs in the input samples array. Ignored if the samples array is NULL.

  • options: Simplify options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.

  • node_map: If not NULL, this array will be filled to define the mapping between nodes IDs in the table collection before and after simplification.

int tsk_table_collection_subset(tsk_table_collection_t *self, const tsk_id_t *nodes, tsk_size_t num_nodes)

Subsets and reorders a table collection according to an array of nodes.

Reduces the table collection to contain only the entries referring to the provided list of nodes, with nodes reordered according to the order they appear in the nodes argument. Specifically, this subsets and reorders each of the tables as follows:

  1. Nodes: if in the list of nodes, and in the order provided.

  2. Individuals and Populations: if referred to by a retained node, and in the order first seen when traversing the list of retained nodes.

  3. Edges: if both parent and child are retained nodes.

  4. Mutations: if the mutation’s node is a retained node.

  5. Sites: if any mutations remain at the site after removing mutations.

Retained edges, mutations, and sites appear in the same order as in the original tables.

If nodes is the entire list of nodes in the tables, then the resulting tables will be identical to the original tables, but with nodes (and individuals and populations) reordered.

Note

Migrations are currently not supported by susbset, and an error will be raised if we attempt call subset on a table collection with greater than zero migrations.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_table_collection_t object.

  • nodes: An array of num_nodes valid node IDs.

  • num_nodes: The number of node IDs in the input nodes array.

int tsk_table_collection_union(tsk_table_collection_t *self, const tsk_table_collection_t *other, const tsk_id_t *other_node_mapping, tsk_flags_t options)

Forms the node-wise union of two table collections.

Expands this table collection by adding the non-shared portions of another table collection to itself. The other_node_mapping encodes which nodes in other are equivalent to a node in self. The positions in the other_node_mapping array correspond to node ids in other, and the elements encode the equivalent node id in self or TSK_NULL if the node is exclusive to other. Nodes that are exclusive other are added to self, along with:

  1. Individuals which are new to self.

  2. Edges whose parent or child are new to self.

  3. Sites which were not present in self.

  4. Mutations whose nodes are new to self.

By default, populations of newly added nodes are assumed to be new populations, and added to the population table as well.

This operation will also sort the resulting tables, so the tables may change even if nothing new is added, if the original tables were not sorted.

Options:

Options can be specified by providing one or more of the following bitwise flags:

TSK_UNION_NO_CHECK_SHARED

By default, union checks that the portion of shared history between self and other, as implied by other_node_mapping, are indeed equivalent. It does so by subsetting both self and other on the equivalent nodes specified in other_node_mapping, and then checking for equality of the subsets.

TSK_UNION_NO_ADD_POP

By default, all nodes new to self are assigned new populations. If this option is specified, nodes that are added to self will retain the population IDs they have in other.

Note

Migrations are currently not supported by union, and an error will be raised if we attempt call union on a table collection with migrations.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_table_collection_t object.

  • other: A pointer to a tsk_table_collection_t object.

  • other_node_mapping: An array of node IDs that relate nodes in other to nodes in self: the k-th element of other_node_mapping should be the index of the equivalent node in self, or TSK_NULL if the node is not present in self (in which case it will be added to self).

  • options: Union options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.

int tsk_table_collection_set_metadata(tsk_table_collection_t *self, const char *metadata, tsk_size_t metadata_length)

Set the metadata.

Copies the metadata string to this table collection, replacing any existing.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_table_collection_t object.

  • metadata: A pointer to a char array

  • metadata_length: The size of the metadata in bytes.

int tsk_table_collection_set_metadata_schema(tsk_table_collection_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table collection, replacing any existing.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_table_collection_t object.

  • metadata_schema: A pointer to a char array

  • metadata_schema_length: The size of the metadata schema in bytes.

bool tsk_table_collection_has_index(const tsk_table_collection_t *self, tsk_flags_t options)

Returns true if this table collection is indexed.

This method returns true if the table collection has an index for the edge table. It guarantees that the index exists, and that it is for the same number of edges that are in the edge table. It does not guarantee that the index is valid (i.e., if the rows in the edge have been permuted in some way since the index was built).

See the Table indexes section for details on the index life-cycle.

Return

Return true if there is an index present for this table collection.

Parameters
  • self: A pointer to a tsk_table_collection_t object.

  • options: Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_table_collection_drop_index(tsk_table_collection_t *self, tsk_flags_t options)

Deletes the indexes for this table collection.

Unconditionally drop the indexes that may be present for this table collection. It is not an error to call this method on an unindexed table collection. See the Table indexes section for details on the index life-cycle.

Return

Always returns 0.

Parameters
  • self: A pointer to a tsk_table_collection_t object.

  • options: Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_table_collection_build_index(tsk_table_collection_t *self, tsk_flags_t options)

Builds indexes for this table collection.

Builds the tree traversal indexes for this table collection. Any existing index is first dropped using tsk_table_collection_drop_index(). See the Table indexes section for details on the index life-cycle.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_table_collection_t object.

  • options: Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_table_collection_set_indexes(tsk_table_collection_t *self, tsk_id_t *edge_insertion_order, tsk_id_t *edge_removal_order)

Sets the edge insertion/removal index for this table collection.

This method sets the edge insertion/removal index for this table collection The index arrays should have the same number of edges that are in the edge table. The index is not checked for validity.

See the Table indexes section for details on the index life-cycle.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_table_collection_t object.

  • edge_insertion_order: Array of tsk_id_t edge ids.

  • edge_removal_order: Array of tsk_id_t edge ids.

tsk_id_t tsk_table_collection_check_integrity(const tsk_table_collection_t *self, tsk_flags_t options)

Runs integrity checks on this table collection.

Checks the integrity of this table collection. The default checks (i.e., with options = 0) guarantee the integrity of memory and entity references within the table collection. All spatial values (along the genome) are checked to see if they are finite values and within the required bounds. Time values are checked to see if they are finite or marked as unknown.

To check if a set of tables fulfills the requirements needed for a valid tree sequence, use the TSK_CHECK_TREES option. When this method is called with TSK_CHECK_TREES, the number of trees in the tree sequence is returned. Thus, to check for errors client code should verify that the return value is less than zero. All other options will return zero on success and a negative value on failure.

More fine-grained checks can be achieved using bitwise combinations of the other options.

Options:

Options can be specified by providing one or more of the following bitwise flags:

TSK_CHECK_EDGE_ORDERING

Check edge ordering constraints for a tree sequence.

TSK_CHECK_SITE_ORDERING

Check that sites are in nondecreasing position order.

TSK_CHECK_SITE_DUPLICATES

Check for any duplicate site positions.

TSK_CHECK_MUTATION_ORDERING

Check contraints on the ordering of mutations. Any non-null mutation parents and known times are checked for ordering constraints.

TSK_CHECK_INDEXES

Check that the table indexes exist, and contain valid edge references.

TSK_CHECK_TREES

All checks needed to define a valid tree sequence. Note that this implies all of the above checks.

It is sometimes useful to disregard some parts of the data model when performing checks:

TSK_NO_CHECK_POPULATION_REFS

Do not check integrity of references to populations. This can be safely combined with the other checks.

Return

Return a negative error value on if any problems are detected in the tree sequence. If the TSK_CHECK_TREES option is provided, the number of trees in the tree sequence will be returned, on success.

Parameters

Individuals

struct tsk_individual_t

A single individual defined by a row in the individual table.

See the data model section for the definition of an individual and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_flags_t flags

Bitwise flags.

const double *location

Spatial location. The number of dimensions is defined by location_length.

tsk_size_t location_length

Number of spatial dimensions.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

struct tsk_individual_table_t

The individual table.

See the individual table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t location_length

The total length of the location column.

tsk_size_t metadata_length

The total length of the metadata column.

tsk_flags_t *flags

The flags column.

double *location

The location column.

tsk_size_t *location_offset

The location_offset column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_individual_table_init(tsk_individual_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_individual_table_t object.

  • options: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_individual_table_free(tsk_individual_table_t *self)

Free the internal memory for the specified table.

Return

Always returns 0.

Parameters

tsk_id_t tsk_individual_table_add_row(tsk_individual_table_t *self, tsk_flags_t flags, const double *location, tsk_size_t location_length, const char *metadata, tsk_size_t metadata_length)

Adds a row to this individual table.

Add a new individual with the specified flags, location and metadata to the table. Copies of the location and metadata parameters are taken immediately. See the table definition for details of the columns in this table.

Return

Return the ID of the newly added individual on success, or a negative value on failure.

Parameters
  • self: A pointer to a tsk_individual_table_t object.

  • flags: The bitwise flags for the new individual.

  • location: A pointer to a double array representing the spatial location of the new individual. Can be NULL if location_length is 0.

  • location_length: The number of dimensions in the locations position. Note this the number of elements in the corresponding double array not the number of bytes.

  • metadata: The metadata to be associated with the new individual. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length: The size of the metadata array in bytes.

int tsk_individual_table_clear(tsk_individual_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_individual_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_individual_table_truncate(tsk_individual_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Return

Return 0 on success or a negative value on failure.

Parameters

bool tsk_individual_table_equals(const tsk_individual_table_t *self, const tsk_individual_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

TSK_CMP_IGNORE_METADATA

Do not include metadata or metadata schemas in the comparison.

Return

Return true if the specified table is equal to this table.

Parameters

int tsk_individual_table_copy(const tsk_individual_table_t *self, tsk_individual_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Indexes that are present are also copied to the destination table.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_individual_table_t object.

  • dest: A pointer to a tsk_individual_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised individual table. If not, it must be an uninitialised individual table.

  • options: Bitwise option flags.

int tsk_individual_table_get_row(const tsk_individual_table_t *self, tsk_id_t index, tsk_individual_t *row)

Get the row at the specified index.

Updates the specified individual struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_individual_table_set_metadata_schema(tsk_individual_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_individual_table_t object.

  • metadata_schema: A pointer to a char array

  • metadata_schema_length: The size of the metadata schema in bytes.

void tsk_individual_table_print_state(const tsk_individual_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters

Nodes

struct tsk_node_t

A single node defined by a row in the node table.

See the data model section for the definition of a node and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_flags_t flags

Bitwise flags.

double time

Time.

tsk_id_t population

Population ID.

tsk_id_t individual

Individual ID.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

struct tsk_node_table_t

The node table.

See the node table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

tsk_flags_t *flags

The flags column.

double *time

The time column.

tsk_id_t *population

The population column.

tsk_id_t *individual

The individual column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_node_table_init(tsk_node_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_node_table_t object.

  • options: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_node_table_free(tsk_node_table_t *self)

Free the internal memory for the specified table.

Return

Always returns 0.

Parameters

tsk_id_t tsk_node_table_add_row(tsk_node_table_t *self, tsk_flags_t flags, double time, tsk_id_t population, tsk_id_t individual, const char *metadata, tsk_size_t metadata_length)

Adds a row to this node table.

Add a new node with the specified flags, time, population, individual and metadata to the table. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Return

Return the ID of the newly added node on success, or a negative value on failure.

Parameters
  • self: A pointer to a tsk_node_table_t object.

  • flags: The bitwise flags for the new node.

  • time: The time for the new node.

  • population: The population for the new node. Set to TSK_NULL if not known.

  • individual: The individual for the new node. Set to TSK_NULL if not known.

  • metadata: The metadata to be associated with the new node. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length: The size of the metadata array in bytes.

int tsk_node_table_clear(tsk_node_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_node_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_node_table_truncate(tsk_node_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_node_table_t object.

  • num_rows: The number of rows to retain in the table.

bool tsk_node_table_equals(const tsk_node_table_t *self, const tsk_node_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

TSK_CMP_IGNORE_METADATA

Do not include metadata or metadata schemas in the comparison.

Return

Return true if the specified table is equal to this table.

Parameters

int tsk_node_table_copy(const tsk_node_table_t *self, tsk_node_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_node_table_t object.

  • dest: A pointer to a tsk_node_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised node table. If not, it must be an uninitialised node table.

  • options: Bitwise option flags.

int tsk_node_table_get_row(const tsk_node_table_t *self, tsk_id_t index, tsk_node_t *row)

Get the row at the specified index.

Updates the specified node struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_node_table_t object.

  • index: The requested table row.

  • row: A pointer to a tsk_node_t struct that is updated to reflect the values in the specified row.

int tsk_node_table_set_metadata_schema(tsk_node_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_node_table_t object.

  • metadata_schema: A pointer to a char array

  • metadata_schema_length: The size of the metadata schema in bytes.

void tsk_node_table_print_state(const tsk_node_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
  • self: A pointer to a tsk_node_table_t object.

  • out: The stream to write the summary to.

Edges

struct tsk_edge_t

A single edge defined by a row in the edge table.

See the data model section for the definition of an edge and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_id_t parent

Parent node ID.

tsk_id_t child

Child node ID.

double left

Left coordinate.

double right

Right coordinate.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

struct tsk_edge_table_t

The edge table.

See the edge table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

double *left

The left column.

double *right

The right column.

tsk_id_t *parent

The parent column.

tsk_id_t *child

The child column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

tsk_flags_t options

Flags for this table.

int tsk_edge_table_init(tsk_edge_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_NO_METADATA

Do not allocate space to store metadata in this table. Operations attempting to add non-empty metadata to the table will fail with error TSK_ERR_METADATA_DISABLED.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_edge_table_t object.

  • options: Allocation time options.

int tsk_edge_table_free(tsk_edge_table_t *self)

Free the internal memory for the specified table.

Return

Always returns 0.

Parameters

tsk_id_t tsk_edge_table_add_row(tsk_edge_table_t *self, double left, double right, tsk_id_t parent, tsk_id_t child, const char *metadata, tsk_size_t metadata_length)

Adds a row to this edge table.

Add a new edge with the specified left, right, parent, child and metadata to the table. See the table definition for details of the columns in this table.

Return

Return the ID of the newly added edge on success, or a negative value on failure.

Parameters
  • self: A pointer to a tsk_edge_table_t object.

  • left: The left coordinate for the new edge.

  • right: The right coordinate for the new edge.

  • parent: The parent node for the new edge.

  • child: The child node for the new edge.

  • metadata: The metadata to be associated with the new edge. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length: The size of the metadata array in bytes.

int tsk_edge_table_clear(tsk_edge_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_edge_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_edge_table_truncate(tsk_edge_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_edge_table_t object.

  • num_rows: The number of rows to retain in the table.

bool tsk_edge_table_equals(const tsk_edge_table_t *self, const tsk_edge_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

TSK_CMP_IGNORE_METADATA

Do not include metadata or metadata schemas in the comparison.

Return

Return true if the specified table is equal to this table.

Parameters

int tsk_edge_table_copy(const tsk_edge_table_t *self, tsk_edge_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_edge_table_t object.

  • dest: A pointer to a tsk_edge_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised edge table. If not, it must be an uninitialised edge table.

  • options: Bitwise option flags.

int tsk_edge_table_get_row(const tsk_edge_table_t *self, tsk_id_t index, tsk_edge_t *row)

Get the row at the specified index.

Updates the specified edge struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_edge_table_t object.

  • index: The requested table row.

  • row: A pointer to a tsk_edge_t struct that is updated to reflect the values in the specified row.

int tsk_edge_table_set_metadata_schema(tsk_edge_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_edge_table_t object.

  • metadata_schema: A pointer to a char array

  • metadata_schema_length: The size of the metadata schema in bytes.

void tsk_edge_table_print_state(const tsk_edge_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
  • self: A pointer to a tsk_edge_table_t object.

  • out: The stream to write the summary to.

Migrations

struct tsk_migration_t

A single migration defined by a row in the migration table.

See the data model section for the definition of a migration and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_id_t source

Source population ID.

tsk_id_t dest

Destination population ID.

tsk_id_t node

Node ID.

double left

Left coordinate.

double right

Right coordinate.

double time

Time.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

struct tsk_migration_table_t

The migration table.

See the migration table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

tsk_id_t *source

The source column.

tsk_id_t *dest

The dest column.

tsk_id_t *node

The node column.

double *left

The left column.

double *right

The right column.

double *time

The time column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_migration_table_init(tsk_migration_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_migration_table_t object.

  • options: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_migration_table_free(tsk_migration_table_t *self)

Free the internal memory for the specified table.

Return

Always returns 0.

Parameters

tsk_id_t tsk_migration_table_add_row(tsk_migration_table_t *self, double left, double right, tsk_id_t node, tsk_id_t source, tsk_id_t dest, double time, const char *metadata, tsk_size_t metadata_length)

Adds a row to this migration table.

Add a new migration with the specified left, right, node, source, dest, time and metadata to the table. See the table definition for details of the columns in this table.

Return

Return the ID of the newly added migration on success, or a negative value on failure.

Parameters
  • self: A pointer to a tsk_migration_table_t object.

  • left: The left coordinate for the new migration.

  • right: The right coordinate for the new migration.

  • node: The node ID for the new migration.

  • source: The source population ID for the new migration.

  • dest: The destination population ID for the new migration.

  • time: The time for the new migration.

  • metadata: The metadata to be associated with the new migration. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length: The size of the metadata array in bytes.

int tsk_migration_table_clear(tsk_migration_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_migration_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_migration_table_truncate(tsk_migration_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_migration_table_t object.

  • num_rows: The number of rows to retain in the table.

bool tsk_migration_table_equals(const tsk_migration_table_t *self, const tsk_migration_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

TSK_CMP_IGNORE_METADATA

Do not include metadata or metadata schemas in the comparison.

Return

Return true if the specified table is equal to this table.

Parameters

int tsk_migration_table_copy(const tsk_migration_table_t *self, tsk_migration_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_migration_table_t object.

  • dest: A pointer to a tsk_migration_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised migration table. If not, it must be an uninitialised migration table.

  • options: Bitwise option flags.

int tsk_migration_table_get_row(const tsk_migration_table_t *self, tsk_id_t index, tsk_migration_t *row)

Get the row at the specified index.

Updates the specified migration struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_migration_table_t object.

  • index: The requested table row.

  • row: A pointer to a tsk_migration_t struct that is updated to reflect the values in the specified row.

int tsk_migration_table_set_metadata_schema(tsk_migration_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_migration_table_t object.

  • metadata_schema: A pointer to a char array

  • metadata_schema_length: The size of the metadata schema in bytes.

void tsk_migration_table_print_state(const tsk_migration_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters

Sites

struct tsk_site_t

A single site defined by a row in the site table.

See the data model section for the definition of a site and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

double position

Position coordinate.

const char *ancestral_state

Ancestral state.

tsk_size_t ancestral_state_length

Ancestral state length in bytes.

const char *metadata

Metadata.

tsk_size_t metadata_length

Metadata length in bytes.

struct tsk_site_table_t

The site table.

See the site table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

double *position

The position column.

char *ancestral_state

The ancestral_state column.

tsk_size_t *ancestral_state_offset

The ancestral_state_offset column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_site_table_init(tsk_site_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_site_table_t object.

  • options: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_site_table_free(tsk_site_table_t *self)

Free the internal memory for the specified table.

Return

Always returns 0.

Parameters

tsk_id_t tsk_site_table_add_row(tsk_site_table_t *self, double position, const char *ancestral_state, tsk_size_t ancestral_state_length, const char *metadata, tsk_size_t metadata_length)

Adds a row to this site table.

Add a new site with the specified position, ancestral_state and metadata to the table. Copies of ancestral_state and metadata are immediately taken. See the table definition for details of the columns in this table.

Return

Return the ID of the newly added site on success, or a negative value on failure.

Parameters
  • self: A pointer to a tsk_site_table_t object.

  • position: The position coordinate for the new site.

  • ancestral_state: The ancestral_state for the new site.

  • ancestral_state_length: The length of the ancestral_state in bytes.

  • metadata: The metadata to be associated with the new site. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length: The size of the metadata array in bytes.

int tsk_site_table_clear(tsk_site_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_site_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_site_table_truncate(tsk_site_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_site_table_t object.

  • num_rows: The number of rows to retain in the table.

bool tsk_site_table_equals(const tsk_site_table_t *self, const tsk_site_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

TSK_CMP_IGNORE_METADATA

Do not include metadata or metadata schemas in the comparison.

Return

Return true if the specified table is equal to this table.

Parameters

int tsk_site_table_copy(const tsk_site_table_t *self, tsk_site_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_site_table_t object.

  • dest: A pointer to a tsk_site_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised site table. If not, it must be an uninitialised site table.

  • options: Bitwise option flags.

int tsk_site_table_get_row(const tsk_site_table_t *self, tsk_id_t index, tsk_site_t *row)

Get the row at the specified index.

Updates the specified site struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_site_table_t object.

  • index: The requested table row.

  • row: A pointer to a tsk_site_t struct that is updated to reflect the values in the specified row.

int tsk_site_table_set_metadata_schema(tsk_site_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_site_table_t object.

  • metadata_schema: A pointer to a char array

  • metadata_schema_length: The size of the metadata schema in bytes.

void tsk_site_table_print_state(const tsk_site_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
  • self: A pointer to a tsk_site_table_t object.

  • out: The stream to write the summary to.

Mutations

struct tsk_mutation_t

A single mutation defined by a row in the mutation table.

See the data model section for the definition of a mutation and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_id_t site

Site ID.

tsk_id_t node

Node ID.

tsk_id_t parent

Parent mutation ID.

double time

Mutation time.

const char *derived_state

Derived state.

tsk_size_t derived_state_length

Size of the derived state in bytes.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

struct tsk_mutation_table_t

The mutation table.

See the mutation table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

tsk_id_t *node

The node column.

tsk_id_t *site

The site column.

tsk_id_t *parent

The parent column.

double *time

The time column.

char *derived_state

The derived_state column.

tsk_size_t *derived_state_offset

The derived_state_offset column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_mutation_table_init(tsk_mutation_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_mutation_table_t object.

  • options: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_mutation_table_free(tsk_mutation_table_t *self)

Free the internal memory for the specified table.

Return

Always returns 0.

Parameters

tsk_id_t tsk_mutation_table_add_row(tsk_mutation_table_t *self, tsk_id_t site, tsk_id_t node, tsk_id_t parent, double time, const char *derived_state, tsk_size_t derived_state_length, const char *metadata, tsk_size_t metadata_length)

Adds a row to this mutation table.

Add a new mutation with the specified site, parent, derived_state and metadata to the table. Copies of derived_state and metadata are immediately taken. See the table definition for details of the columns in this table.

Return

Return the ID of the newly added mutation on success, or a negative value on failure.

Parameters
  • self: A pointer to a tsk_mutation_table_t object.

  • site: The site ID for the new mutation.

  • node: The ID of the node this mutation occurs over.

  • parent: The ID of the parent mutation.

  • time: The time of the mutation.

  • derived_state: The derived_state for the new mutation.

  • derived_state_length: The length of the derived_state in bytes.

  • metadata: The metadata to be associated with the new mutation. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length: The size of the metadata array in bytes.

int tsk_mutation_table_clear(tsk_mutation_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_mutation_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_mutation_table_truncate(tsk_mutation_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_mutation_table_t object.

  • num_rows: The number of rows to retain in the table.

bool tsk_mutation_table_equals(const tsk_mutation_table_t *self, const tsk_mutation_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

TSK_CMP_IGNORE_METADATA

Do not include metadata or metadata schemas in the comparison.

Return

Return true if the specified table is equal to this table.

Parameters

int tsk_mutation_table_copy(const tsk_mutation_table_t *self, tsk_mutation_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_mutation_table_t object.

  • dest: A pointer to a tsk_mutation_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised mutation table. If not, it must be an uninitialised mutation table.

  • options: Bitwise option flags.

int tsk_mutation_table_get_row(const tsk_mutation_table_t *self, tsk_id_t index, tsk_mutation_t *row)

Get the row at the specified index.

Updates the specified mutation struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_mutation_table_t object.

  • index: The requested table row.

  • row: A pointer to a tsk_mutation_t struct that is updated to reflect the values in the specified row.

int tsk_mutation_table_set_metadata_schema(tsk_mutation_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_mutation_table_t object.

  • metadata_schema: A pointer to a char array

  • metadata_schema_length: The size of the metadata schema in bytes.

void tsk_mutation_table_print_state(const tsk_mutation_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters

Populations

struct tsk_population_t

A single population defined by a row in the population table.

See the data model section for the definition of a population and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

const char *metadata

Metadata.

tsk_size_t metadata_length

Metadata length in bytes.

struct tsk_population_table_t

The population table.

See the population table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_population_table_init(tsk_population_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_population_table_t object.

  • options: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_population_table_free(tsk_population_table_t *self)

Free the internal memory for the specified table.

Return

Always returns 0.

Parameters

tsk_id_t tsk_population_table_add_row(tsk_population_table_t *self, const char *metadata, tsk_size_t metadata_length)

Adds a row to this population table.

Add a new population with the specified metadata to the table. A copy of the metadata is immediately taken. See the table definition for details of the columns in this table.

Return

Return the ID of the newly added population on success, or a negative value on failure.

Parameters
  • self: A pointer to a tsk_population_table_t object.

  • metadata: The metadata to be associated with the new population. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length: The size of the metadata array in bytes.

int tsk_population_table_clear(tsk_population_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_population_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_population_table_truncate(tsk_population_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Return

Return 0 on success or a negative value on failure.

Parameters

bool tsk_population_table_equals(const tsk_population_table_t *self, const tsk_population_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

TSK_CMP_IGNORE_METADATA

Do not include metadata in the comparison. Note that as metadata is the only column in the population table, two population tables are considered equal if they have the same number of rows if this flag is specified.

Return

Return true if the specified table is equal to this table.

Parameters

int tsk_population_table_copy(const tsk_population_table_t *self, tsk_population_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_population_table_t object.

  • dest: A pointer to a tsk_population_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised population table. If not, it must be an uninitialised population table.

  • options: Bitwise option flags.

int tsk_population_table_get_row(const tsk_population_table_t *self, tsk_id_t index, tsk_population_t *row)

Get the row at the specified index.

Updates the specified population struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_population_table_set_metadata_schema(tsk_population_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_population_table_t object.

  • metadata_schema: A pointer to a char array

  • metadata_schema_length: The size of the metadata schema in bytes.

void tsk_population_table_print_state(const tsk_population_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters

Provenances

struct tsk_provenance_t

A single provenance defined by a row in the provenance table.

See the data model section for the definition of a provenance object and its properties. See the Provenance section for more information on how provenance records should be structured.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

const char *timestamp

The timestamp.

tsk_size_t timestamp_length

The timestamp length in bytes.

const char *record

The record.

tsk_size_t record_length

The record length in bytes.

struct tsk_provenance_table_t

The provenance table.

See the provenance table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t timestamp_length

The total length of the timestamp column.

tsk_size_t record_length

The total length of the record column.

char *timestamp

The timestamp column.

tsk_size_t *timestamp_offset

The timestamp_offset column.

char *record

The record column.

tsk_size_t *record_offset

The record_offset column.

int tsk_provenance_table_init(tsk_provenance_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_provenance_table_t object.

  • options: Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

int tsk_provenance_table_free(tsk_provenance_table_t *self)

Free the internal memory for the specified table.

Return

Always returns 0.

Parameters

tsk_id_t tsk_provenance_table_add_row(tsk_provenance_table_t *self, const char *timestamp, tsk_size_t timestamp_length, const char *record, tsk_size_t record_length)

Adds a row to this provenance table.

Add a new provenance with the specified timestamp and record to the table. Copies of the timestamp and record are immediately taken. See the table definition for details of the columns in this table.

Return

Return the ID of the newly added provenance on success, or a negative value on failure.

Parameters
  • self: A pointer to a tsk_provenance_table_t object.

  • timestamp: The timestamp to be associated with the new provenance. This is a pointer to arbitrary memory. Can be NULL if timestamp_length is 0.

  • timestamp_length: The size of the timestamp array in bytes.

  • record: The record to be associated with the new provenance. This is a pointer to arbitrary memory. Can be NULL if record_length is 0.

  • record_length: The size of the record array in bytes.

int tsk_provenance_table_clear(tsk_provenance_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_provenance_table_free() to free the table’s internal resources.

Return

Return 0 on success or a negative value on failure.

Parameters

int tsk_provenance_table_truncate(tsk_provenance_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Return

Return 0 on success or a negative value on failure.

Parameters

bool tsk_provenance_table_equals(const tsk_provenance_table_t *self, const tsk_provenance_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns.

TSK_CMP_IGNORE_TIMESTAMPS

Do not include the timestamp column when comparing provenance tables.

Return

Return true if the specified table is equal to this table.

Parameters

int tsk_provenance_table_copy(const tsk_provenance_table_t *self, tsk_provenance_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_provenance_table_t object.

  • dest: A pointer to a tsk_provenance_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised provenance table. If not, it must be an uninitialised provenance table.

  • options: Bitwise option flags.

int tsk_provenance_table_get_row(const tsk_provenance_table_t *self, tsk_id_t index, tsk_provenance_t *row)

Get the row at the specified index.

Updates the specified provenance struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Return

Return 0 on success or a negative value on failure.

Parameters

void tsk_provenance_table_print_state(const tsk_provenance_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters

Table indexes

Along with the tree sequence ordering requirements, the Table indexes allow us to take a table collection and efficiently operate on the trees defined within it. This section defines the rules for safely operating on table indexes and their life-cycle.

The edge index used for tree generation consists of two arrays, each holding N edge IDs (where N is the size of the edge table). When the index is computed using tsk_table_collection_build_index(), we store the current size of the edge table along with the two arrays of edge IDs. The function tsk_table_collection_has_index() then returns true iff (a) both of these arrays are not NULL and (b) the stored number of edges is the same as the current size of the edge table.

Updating the edge table does not automatically invalidate the indexes. Thus, if we call tsk_edge_table_clear() on an edge table which has an index, this index will still exist. However, it will not be considered a valid index by tsk_table_collection_has_index() because of the size mismatch. Similarly for functions that increase the size of the table. Note that it is possible then to have tsk_table_collection_has_index() return true, but the index is not actually valid, if, for example, the user has manipulated the node and edge tables to describe a different topology, which happens to have the same number of edges. The behaviour of methods that use the indexes will be undefined in this case.

Thus, if you are manipulating an existing table collection that may be indexed, it is always recommended to call tsk_table_collection_drop_index() first.

Tree sequences

Warning

This part of the API is more preliminary and may be subject to change.

struct tsk_treeseq_t

The tree sequence object.

int tsk_treeseq_init(tsk_treeseq_t *self, const tsk_table_collection_t *tables, tsk_flags_t options)
int tsk_treeseq_load(tsk_treeseq_t *self, const char *filename, tsk_flags_t options)
int tsk_treeseq_loadf(tsk_treeseq_t *self, FILE *file, tsk_flags_t options)
int tsk_treeseq_dump(tsk_treeseq_t *self, const char *filename, tsk_flags_t options)
int tsk_treeseq_dumpf(tsk_treeseq_t *self, FILE *file, tsk_flags_t options)
int tsk_treeseq_copy_tables(const tsk_treeseq_t *self, tsk_table_collection_t *tables, tsk_flags_t options)
int tsk_treeseq_free(tsk_treeseq_t *self)
void tsk_treeseq_print_state(const tsk_treeseq_t *self, FILE *out)

Trees

Warning

This part of the API is more preliminary and may be subject to change.

struct tsk_tree_t

A single tree in a tree sequence.

A tsk_tree_t object has two basic functions:

  1. Represent the state of a single tree in a tree sequence;

  2. Provide methods to transform this state into different trees in the sequence.

The state of a single tree in the tree sequence is represented using the quintuply linked encoding: please see the data model section for details on how this works. The left-to-right ordering of nodes in this encoding is arbitrary, and may change depending on the order in which trees are accessed within the sequence. Please see the Tree traversals examples for recommended usage.

On initialisation, a tree is in a “null” state: each sample is a root and there are no edges. We must call one of the ‘seeking’ methods to make the state of the tree object correspond to a particular tree in the sequence. Please see the Tree iteration examples for recommended usage.

Public Members

const tsk_treeseq_t *tree_sequence

The parent tree sequence.

tsk_id_t left_root

The leftmost root in the tree. Roots are siblings, and other roots can be found using right_sib.

tsk_id_t *parent

The parent of node u is parent[u]. Equal to TSK_NULL if node u is a root or is not a node in the current tree.

tsk_id_t *left_child

The leftmost child of node u is left_child[u]. Equal to TSK_NULL if node u is a leaf or is not a node in the current tree.

tsk_id_t *right_child

The rightmost child of node u is right_child[u]. Equal to TSK_NULL if node u is a leaf or is not a node in the current tree.

tsk_id_t *left_sib

The sibling to the left of node u is left_sib[u]. Equal to TSK_NULL if node u has no siblings to its left.

tsk_id_t *right_sib

The sibling to the right of node u is right_sib[u]. Equal to TSK_NULL if node u has no siblings to its right.

int tsk_tree_init(tsk_tree_t *self, const tsk_treeseq_t *tree_sequence, tsk_flags_t options)
int tsk_tree_free(tsk_tree_t *self)
tsk_id_t tsk_tree_get_index(const tsk_tree_t *self)
tsk_size_t tsk_tree_get_num_roots(const tsk_tree_t *self)
int tsk_tree_first(tsk_tree_t *self)
int tsk_tree_last(tsk_tree_t *self)
int tsk_tree_next(tsk_tree_t *self)
int tsk_tree_prev(tsk_tree_t *self)
int tsk_tree_clear(tsk_tree_t *self)
void tsk_tree_print_state(const tsk_tree_t *self, FILE *out)

Low-level sorting

In some highly performance sensitive cases it can be useful to have more control over the process of sorting tables. This low-level API allows a user to provide their own edge sorting function. This can be useful, for example, to use parallel sorting algorithms, or to take advantage of the more efficient sorting procedures available in C++. It is the user’s responsibility to ensure that the edge sorting requirements are fulfilled by this function.

Todo

Create an idiomatic C++11 example where we load a table collection file from argv[1], and sort the edges using std::sort, based on the example in tests/test_minimal_cpp.cpp. We can include this in the examples below, and link to it here.

struct _tsk_table_sorter_t

Low-level table sorting method.

Public Members

tsk_table_collection_t *tables

The input tables that are being sorted.

int (*sort_edges)(struct _tsk_table_sorter_t *self, tsk_size_t start)

The edge sorting function. If set to NULL, edges are not sorted.

void *user_data

An opaque pointer for use by client code.

tsk_id_t *site_id_map

Mapping from input site IDs to output site IDs.

int tsk_table_sorter_init(struct _tsk_table_sorter_t *self, tsk_table_collection_t *tables, tsk_flags_t options)

Initialises the memory for the sorter object.

This must be called before any operations are performed on the table sorter and initialises all fields. The edge_sort function is set to the default method using qsort. The user_data field is set to NULL. This method supports the same options as tsk_table_collection_sort().

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to an uninitialised tsk_table_sorter_t object.

  • tables: The table collection to sort.

  • options: Sorting options.

int tsk_table_sorter_run(struct _tsk_table_sorter_t *self, const tsk_bookmark_t *start)

Runs the sort using the configured functions.

Runs the sorting process:

  1. Drop the table indexes.

  2. If the sort_edges function pointer is not NULL, run it. The first parameter to the called function will be a pointer to this table_sorter_t object. The second parameter will be the value start.edges. This specifies the offset at which sorting should start in the edge table. This offset is guaranteed to be within the bounds of the edge table.

  3. Sort the site table, building the mapping between site IDs in the current and sorted tables.

  4. Sort the mutation table.

If an error occurs during the execution of a user-supplied sorting function a non-zero value must be returned. This value will then be returned by tsk_table_sorter_run. The error return value should be chosen to avoid conflicts with tskit error codes.

See tsk_table_collection_sort() for details on the start parameter.

Return

Return 0 on success or a negative value on failure.

Parameters
  • self: A pointer to a tsk_table_sorter_t object.

  • start: The position in the tables at which sorting starts.

int tsk_table_sorter_free(struct _tsk_table_sorter_t *self)

Free the internal memory for the specified table sorter.

Return

Always returns 0.

Parameters
  • self: A pointer to an initialised tsk_table_sorter_t object.

Miscellaneous functions

const char *tsk_strerror(int err)

Return a description of the specified error.

The memory for the returned string is handled by the library and should not be freed by client code.

Return

A description of the error.

Parameters
  • err: A tskit error code.

Constants

API Version

TSK_VERSION_MAJOR 0

The library major version. Incremented when breaking changes to the API or ABI are introduced. This includes any changes to the signatures of functions and the sizes and types of externally visible structs.

TSK_VERSION_MINOR 99

The library major version. Incremented when non-breaking backward-compatible changes to the API or ABI are introduced, i.e., the addition of a new function.

TSK_VERSION_PATCH 8

The library patch version. Incremented when any changes not relevant to the to the API or ABI are introduced, i.e., internal refactors of bugfixes.

Generic Errors

TSK_ERR_GENERIC -1

Generic error thrown when no other message can be generated.

TSK_ERR_NO_MEMORY -2

Memory could not be allocated.

TSK_ERR_IO -3

An IO error occured.

TSK_ERR_BAD_PARAM_VALUE -4
TSK_ERR_BUFFER_OVERFLOW -5
TSK_ERR_UNSUPPORTED_OPERATION -6
TSK_ERR_GENERATE_UUID -7
TSK_ERR_EOF -8

The file stream ended after reading zero bytes.

File format errors

TSK_ERR_FILE_FORMAT -100

A file could not be read because it is in the wrong format

TSK_ERR_FILE_VERSION_TOO_OLD -101

The file is in tskit format, but the version is too old for the library to read. The file should be upgraded to the latest version using the tskit upgrade command line utility.

TSK_ERR_FILE_VERSION_TOO_NEW -102

The file is in tskit format, but the version is too new for the library to read. To read the file you must upgrade the version of tskit.

Todo

Add in groups for rest of the error types and document.

Examples

Basic forwards simulator

This is an example of using the tables API to define a simple haploid Wright-Fisher simulator. Because this simple example repeatedly sorts the edge data, it is quite inefficient and should not be used as the basis of a large-scale simulator.

Note

This example uses the C function rand and constant RAND_MAX for random number generation. These methods are used for example purposes only and a high-quality random number library should be preferred for code used for research. Examples include, but are not limited to:

  1. The GNU Scientific Library, which is licensed under the GNU General Public License, version 3 (GPL3+.

  2. For C++ projects using C++11 or later, the built-in random number library.

  3. The numpy C API may be useful for those writing Python extension modules in C/C++.

Todo

Give a pointer to an example that caches and flushes edge data efficiently. Probably using the C++ API?

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <err.h>

#include <tskit/tables.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val));                 \
    }

void
simulate(
    tsk_table_collection_t *tables, int N, int T, int simplify_interval)
{
    tsk_id_t *buffer, *parents, *children, child, left_parent, right_parent;
    double breakpoint;
    int ret, j, t, b;

    assert(simplify_interval != 0); // leads to division by zero
    buffer = malloc(2 * N * sizeof(tsk_id_t));
    if (buffer == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    tables->sequence_length = 1.0;
    parents = buffer;
    for (j = 0; j < N; j++) {
        parents[j]
            = tsk_node_table_add_row(&tables->nodes, 0, T, TSK_NULL, TSK_NULL, NULL, 0);
        check_tsk_error(parents[j]);
    }
    b = 0;
    for (t = T - 1; t >= 0; t--) {
        /* Alternate between using the first and last N values in the buffer */
        parents = buffer + (b * N);
        b = (b + 1) % 2;
        children = buffer + (b * N);
        for (j = 0; j < N; j++) {
            child = tsk_node_table_add_row(
                &tables->nodes, 0, t, TSK_NULL, TSK_NULL, NULL, 0);
            check_tsk_error(child);
            /* NOTE: the use of rand() is discouraged for
             * research code and proper random number generator
             * libraries should be preferred.
             */
            left_parent = parents[(size_t)((rand()/(1.+RAND_MAX))*N)];
            right_parent = parents[(size_t)((rand()/(1.+RAND_MAX))*N)];
            do {
                breakpoint = rand()/(1.+RAND_MAX);
            } while (breakpoint == 0); /* tiny proba of breakpoint being 0 */
            ret = tsk_edge_table_add_row(
                &tables->edges, 0, breakpoint, left_parent, child, NULL, 0);
            check_tsk_error(ret);
            ret = tsk_edge_table_add_row(
                &tables->edges, breakpoint, 1, right_parent, child, NULL, 0);
            check_tsk_error(ret);
            children[j] = child;
        }
        if (t % simplify_interval == 0) {
            printf("Simplify at generation %d: (%d nodes %d edges)",
                t,
                tables->nodes.num_rows,
                tables->edges.num_rows);
            /* Note: Edges must be sorted for simplify to work, and we use a brute force
             * approach of sorting each time here for simplicity. This is inefficient. */
            ret = tsk_table_collection_sort(tables, NULL, 0);
            check_tsk_error(ret);
            ret = tsk_table_collection_simplify(tables, children, N, 0, NULL);
            check_tsk_error(ret);
            printf(" -> (%d nodes %d edges)\n",
                tables->nodes.num_rows,
                tables->edges.num_rows);
            for (j = 0; j < N; j++) {
                children[j] = j;
            }
        }
    }
    free(buffer);
}

int
main(int argc, char **argv)
{
    int ret;
    tsk_table_collection_t tables;

    if (argc != 6) {
        errx(EXIT_FAILURE, "usage: N T simplify-interval output-file seed");
    }
    ret = tsk_table_collection_init(&tables, 0);
    check_tsk_error(ret);
    srand((unsigned)atoi(argv[5]));
    simulate(&tables, atoi(argv[1]), atoi(argv[2]), atoi(argv[3]));
    ret = tsk_table_collection_dump(&tables, argv[4], 0);
    check_tsk_error(ret);

    tsk_table_collection_free(&tables);
    return 0;
}

Tree iteration

#include <stdio.h>
#include <stdlib.h>
#include <err.h>

#include <tskit.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val));                 \
    }

int
main(int argc, char **argv)
{
    int ret, iter;
    tsk_treeseq_t ts;
    tsk_tree_t tree;

    if (argc != 2) {
        errx(EXIT_FAILURE, "usage: <tree sequence file>");
    }
    ret = tsk_treeseq_load(&ts, argv[1], 0);
    check_tsk_error(ret);
    ret = tsk_tree_init(&tree, &ts, 0);
    check_tsk_error(ret);

    printf("Iterate forwards\n");
    for (iter = tsk_tree_first(&tree); iter == 1; iter = tsk_tree_next(&tree)) {
        printf("\ttree %d has %d roots\n",
            tsk_tree_get_index(&tree),
            tsk_tree_get_num_roots(&tree));
    }
    check_tsk_error(iter);

    printf("Iterate backwards\n");
    for (iter = tsk_tree_last(&tree); iter == 1; iter = tsk_tree_prev(&tree)) {
        printf("\ttree %d has %d roots\n",
            tsk_tree_get_index(&tree),
            tsk_tree_get_num_roots(&tree));
    }
    check_tsk_error(iter);

    tsk_tree_free(&tree);
    tsk_treeseq_free(&ts);
    return 0;
}

Tree traversals

In this example we load a tree sequence file, and then traverse the first tree in three different ways:

  1. We first traverse the tree in preorder using recursion. This is a very common way of navigating around trees and can be very convenient for some applications. For example, here we compute the depth of each node (i.e., it’s distance from the root) and use this when printing out the nodes as we visit them.

  2. Then we traverse the tree in preorder using an iterative approach. This is a little more efficient than using recursion, and is sometimes more convenient than structuring the calculation recursively. Note that we allocate a stack here with space to hold the total number of nodes in the tree sequence. This is safe, but it likely to be a massive over estimate. However, this makes very little difference in practise even for tree sequences with millions of nodes since it’s likely only the first page (usually 4K) will be written to and the rest of the stack will never therefore be mapped to physical memory.

  3. In the third example we iterate upwards from the samples rather than downwards from the root.

#include <stdio.h>
#include <stdlib.h>
#include <err.h>

#include <tskit.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val));                 \
    }

static void
_traverse(tsk_tree_t *tree, tsk_id_t u, int depth)
{
    tsk_id_t v;
    int j;

    for (j = 0; j < depth; j++) {
        printf("    ");
    }
    printf("Visit recursive %d\n", u);
    for (v = tree->left_child[u]; v != TSK_NULL; v = tree->right_sib[v]) {
        _traverse(tree, v, depth + 1);
    }
}

static void
traverse_recursive(tsk_tree_t *tree)
{
    tsk_id_t root;

    for (root = tree->left_root; root != TSK_NULL; root = tree->right_sib[root]) {
        _traverse(tree, root, 0);
    }
}

static void
traverse_stack(tsk_tree_t *tree)
{
    int stack_top;
    tsk_id_t u, v, root;
    tsk_id_t *stack
        = malloc(tsk_treeseq_get_num_nodes(tree->tree_sequence) * sizeof(*stack));

    if (stack == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    for (root = tree->left_root; root != TSK_NULL; root = tree->right_sib[root]) {
        stack_top = 0;
        stack[stack_top] = root;
        while (stack_top >= 0) {
            u = stack[stack_top];
            stack_top--;
            printf("Visit stack %d\n", u);
            /* Put nodes on the stack right-to-left, so we visit in left-to-right */
            for (v = tree->right_child[u]; v != TSK_NULL; v = tree->left_sib[v]) {
                stack_top++;
                stack[stack_top] = v;
            }
        }
    }
    free(stack);
}

static void
traverse_upwards(tsk_tree_t *tree)
{
    const tsk_id_t *samples = tsk_treeseq_get_samples(tree->tree_sequence);
    tsk_size_t num_samples = tsk_treeseq_get_num_samples(tree->tree_sequence);
    tsk_size_t j;
    tsk_id_t u;

    for (j = 0; j < num_samples; j++) {
        u = samples[j];
        while (u != TSK_NULL) {
            printf("Visit upwards: %d\n", u);
            u = tree->parent[u];
        }
    }
}

int
main(int argc, char **argv)
{
    int ret;
    tsk_treeseq_t ts;
    tsk_tree_t tree;

    if (argc != 2) {
        errx(EXIT_FAILURE, "usage: <tree sequence file>");
    }
    ret = tsk_treeseq_load(&ts, argv[1], 0);
    check_tsk_error(ret);
    ret = tsk_tree_init(&tree, &ts, 0);
    check_tsk_error(ret);
    ret = tsk_tree_first(&tree);
    check_tsk_error(ret);

    traverse_recursive(&tree);

    traverse_stack(&tree);

    traverse_upwards(&tree);

    tsk_tree_free(&tree);
    tsk_treeseq_free(&ts);
    return 0;
}

File streaming

It is often useful to read tree sequence files from a stream rather than from a fixed filename. This example shows how to do this using the tsk_table_collection_loadf() and tsk_table_collection_dumpf() functions. Here, we sequentially load table collections from the stdin stream and write them back out to stdout with their mutations removed.

#include <stdio.h>
#include <stdlib.h>
#include <tskit/tables.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        fprintf(stderr, "Error: line %d: %s\n", __LINE__, tsk_strerror(val));           \
        exit(EXIT_FAILURE);                                                             \
    }

int
main(int argc, char **argv)
{
    int ret;
    int j = 0;
    tsk_table_collection_t tables;

    ret = tsk_table_collection_init(&tables, 0);
    check_tsk_error(ret);

    while (true) {
        ret = tsk_table_collection_loadf(&tables, stdin, TSK_NO_INIT);
        if (ret == TSK_ERR_EOF) {
            break;
        }
        check_tsk_error(ret);
        fprintf(stderr, "Tree sequence %d had %d mutations\n", j,
            (int) tables.mutations.num_rows);
        ret = tsk_mutation_table_truncate(&tables.mutations, 0);
        check_tsk_error(ret);
        ret = tsk_table_collection_dumpf(&tables, stdout, 0);
        check_tsk_error(ret);
        j++;
    }
    tsk_table_collection_free(&tables);
    return EXIT_SUCCESS;
}

Note that we use the value TSK_ERR_EOF to detect when the stream ends, as we don’t know how many tree sequences to expect on the input. In this case, TSK_ERR_EOF is not considered an error and we exit normally.

Running this program on some tree sequence files we might get:

$ cat tmp1.trees tmp2.trees | ./build/streaming > no_mutations.trees
Tree sequence 0 had 38 mutations
Tree sequence 1 had 132 mutations

Then, running this program again on the output of the previous command, we see that we now have two tree sequences with their mutations removed stored in the file no_mutations.trees:

$ ./build/streaming < no_mutations.trees > /dev/null
Tree sequence 0 had 0 mutations
Tree sequence 1 had 0 mutations