Visualization#

Yan Wong

It is often helpful to visualize a single tree — or multiple trees along a tree sequence — together with sites and mutations. Tskit provides methods to do this, outputting either plain ascii or unicode text, or the more flexible Scalable Vector Graphics (SVG) format. The first two sections of this tutorial give details of these methods, using a few 50kb tree sequences generated by msprime as examples: one in which selection has occurred, and others which involve subdivision into 3 populations (labelled A, B, and C).

Hide code cell source
import tskit
# To see how these tree sequences were made, inspect the removed cell at the top of this
# notebook, visible in e.g. the `.md` (markdown) version downloadable from the menubar
ts_tiny = tskit.load("data/viz_ts_tiny.trees")
ts_small = tskit.load("data/viz_ts_small.trees")
ts_full = tskit.load("data/viz_ts_full.trees")

If you just want a quick look at visualization possibilities, you might want to skip the explanations and just browse some Examples, which contain fully reproduceable code.

Note

This tutorial is primarily focussed on showing a tree sequence as a set of marginal trees along a genome. The section titled Other visualizations provides examples of other representations of tree sequences, or the processes that can create them.

Text format#

The TreeSequence.draw_text() and Tree.draw_text() methods provide a quick way to print out a tree sequence, or an individual tree within it. They are primarily useful for looking at topologies in small datasets (e.g. fewer than 20 sampled genomes), and do not display mutations.

# Print a tree sequence
print(ts_tiny.draw_text())

print("The first tree in the tree sequence above, but replacing some node ids with names:")
print(ts_tiny.first().draw_text(
    # An example of how to change or omit node labels: unspecified nodes are omitted
    # The same convention applies to SVG graphics
    node_labels={0: "Alice", 1: "Bob", 2:"Chris", 3: "Dora", 6: "MRCA"}
))
11428.48┊         ┊         ┊         ┊         ┊   13    ┊         ┊         ┊         ┊  
        ┊         ┊         ┊         ┊         ┊  ┏━┻━┓  ┊         ┊         ┊         ┊  
11144.74┊         ┊         ┊         ┊         ┊  ┃   ┃  ┊   12    ┊         ┊         ┊  
        ┊         ┊         ┊         ┊         ┊  ┃   ┃  ┊  ┏━┻━┓  ┊         ┊         ┊  
9753.69 ┊         ┊         ┊         ┊         ┊  ┃   ┃  ┊  ┃   ┃  ┊   11    ┊         ┊  
        ┊         ┊         ┊         ┊         ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┏━┻━┓  ┊         ┊  
6204.42 ┊         ┊         ┊         ┊   10    ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊         ┊  
        ┊         ┊         ┊         ┊  ┏━┻━┓  ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊         ┊  
3893.05 ┊         ┊         ┊         ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊    9    ┊  
        ┊         ┊         ┊         ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┏━┻━┓  ┊  
3378.92 ┊         ┊    8    ┊    8    ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊  
        ┊         ┊  ┏━┻━┓  ┊  ┏━┻━┓  ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊  ┃   ┃  ┊  
2076.43 ┊         ┊  ┃   ┃  ┊  7   ┃  ┊  7   ┃  ┊  7   ┃  ┊  7   ┃  ┊  7   ┃  ┊  7   ┃  ┊  
        ┊         ┊  ┃   ┃  ┊ ┏┻┓  ┃  ┊ ┏┻┓  ┃  ┊ ┏┻┓  ┃  ┊ ┏┻┓  ┃  ┊ ┏┻┓  ┃  ┊ ┏┻┓  ┃  ┊  
1941.72 ┊   6     ┊  6   ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊  
        ┊ ┏━┻━┓   ┊ ┏┻┓  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊  
1439.14 ┊ ┃   5   ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊  
        ┊ ┃ ┏━┻┓  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊ ┃ ┃  ┃  ┊  
33.99   ┊ ┃ ┃  4  ┊ ┃ ┃  4  ┊ ┃ ┃  4  ┊ ┃ ┃  4  ┊ ┃ ┃  4  ┊ ┃ ┃  4  ┊ ┃ ┃  4  ┊ ┃ ┃  4  ┊  
        ┊ ┃ ┃ ┏┻┓ ┊ ┃ ┃ ┏┻┓ ┊ ┃ ┃ ┏┻┓ ┊ ┃ ┃ ┏┻┓ ┊ ┃ ┃ ┏┻┓ ┊ ┃ ┃ ┏┻┓ ┊ ┃ ┃ ┏┻┓ ┊ ┃ ┃ ┏┻┓ ┊  
0.00    ┊ 0 1 2 3 ┊ 0 1 2 3 ┊ 0 1 2 3 ┊ 0 1 2 3 ┊ 0 1 2 3 ┊ 0 1 2 3 ┊ 0 1 2 3 ┊ 0 1 2 3 ┊  
        0       9855      13901     21285     23424     25213     30463     46237     50000

The first tree in the tree sequence above, but replacing some node ids with names:
    MRCA            
  ┏━━━┻━━━━┓        
  ┃        ┃        
  ┃    ┏━━━┻━━━┓    
  ┃    ┃       ┃    
  ┃    ┃    ┏━━┻━━┓ 
Alice Bob Chris Dora

SVG format#

Most users will want to use the SVG drawing functions TreeSequence.draw_svg() and Tree.draw_svg() for visualization. Being a vectorised format, SVG files are suitable for presentations, publication, and editing or converting to other graphic formats; some basic forms of animation are also possible. Both functions produce an SVG string which is automatically drawn if the string is the result of the last call in a Jupyter notebook:

from IPython.display import display
svg_size = (800, 250) # Height and width for the SVG: optional but useful for this notebook
svg_string = ts_tiny.draw_svg(
    size=svg_size,
    y_axis=True, y_label=" ",  # optional: show a time scale on the left
    time_scale="rank", x_scale="treewise",  # Force same axis settings as the text view
)
display(svg_string)  # If the last line in a cell, wrapping this in display() is not needed
_images/1b055881e84ea5657973e0fee71168b00c1656e634184c7247a818b8a602b103.svg

By default, sample nodes are drawn as black squares, and non-sample nodes are drawn as black circles (but see below for ways to e.g. hide or colour these node symbols - NB. apologies to US readers: the British spelling of “colour” will be used in the rest of this tutorial).

Axes and scales#

For ease of drawing, the text representation and the SVG image above use unconventional non-linear X and Y coordinate systems. By default, the SVG output uses a more conventional linear scale for both time (Y) and genome position (X), and indicates the position of each tree along the genome by an alternating shaded background. Although more intuitive, linear scales can obscure some features of the trees, for example causing labels to overlap:

ts_tiny.draw_svg(size=svg_size, y_axis=True)
_images/8b6cc0827026a6603664b0db7fa12a33b9d53e05fd7df33b0fe7645c794238e1.svg

One way to avoid overlapping labels on the Y axis is to use the y_ticks parameter, which will be used in most subsequent examples in this tutorial.

Larger tree sequences#

So far, we have plotted only very small tree sequences. To visualize larger tree sequences it is sometimes advisable to focus on a small region of the genome, possibly even a single tree. The x_lim parameter allows you to plot the part of a tree sequence that spans a particular genomic region: here’s a slighly larger tree sequence with 8 samples, but where we’ve restricted the amount of the tree sequence we plot:

x_limits = [5000, 15000]
# Create evenly-spaced y tick positions to avoid overlap
y_tick_pos = [0, 1000, 2000, 3000, 4000]

print("The tree sequence between positions {} and {} ...".format(*x_limits))
display(ts_small.draw_svg(y_axis=True, y_ticks=y_tick_pos, x_lim=x_limits))

third_tree = ts_small.at_index(2)
print("... or just look at (say) the third tree")
display(third_tree.draw_svg())
The tree sequence between positions 5000 and 15000 ...
_images/6c497ca94a7f22adcf4c09e3286551ccea483da7a183cb0d46b476e05fef2c01.svg
... or just look at (say) the third tree
_images/cef53842fe98e50b6393039663a98d6a8d74470ac16e6ef13e060914342a3ff4.svg

As the number of sample nodes increases, internal nodes often bunch up at recent time points, obscuring relationships. Setting time_scale="rank", as in the first SVG plot, is one way to solve this. Another is to use a log-scale on the time axis, which can be done by specifying time_scale="log_time", as below. To compare node times across the plot, this example also uses the y_gridlines option, which puts a very faint grid line at each y tick (if you are finding the lines difficult to see, note that the line intensity, along with many other plot features, can be modified through styling, which we also use in this example to avoid overlapping text by shrinking the node labels and rotating those associated with leaves; styling is detailed later in this tutorial).

print("A larger tree, on a log timescale")
wide_fmt = (1200, 250)
# Create a stylesheet that shrinks labels and rotates leaf labels, to avoid overlap
node_label_style = (
    ".node > .lab {font-size: 80%}"
    ".leaf > .lab {text-anchor: start; transform: rotate(90deg) translate(6px)}"
)
ts_full.first().draw_svg(
    size=wide_fmt,
    time_scale="log_time",
    y_gridlines=True,
    y_axis=True,
    y_ticks=[1, 10, 100, 1000],
    style=node_label_style,
)
A larger tree, on a log timescale
_images/2d3313982e829ec0c677d060b67584f4f9e6deba13f1ce71675f67feba5c123c.svg

For even larger numbers of samples, you can plot the trees for a subset of samples by applying the TreeSequence.simplify() method (make sure to specify filter_nodes=False to retain the same node IDs). For a fancier solution, see Simplifying larger plots below.

Plotting mutations#

By default the SVG visualization also plots sites and mutations on the tree or tree sequence (this can be disabled using the omit_sites parameter). For example, adding mutations to the 8-sample tree sequence above gives the following plot: each mutation is marked by a red cross on the branch where it occurs. Symbols are either placed at the mutation’s known time, or (if the mutation time is unknown or the time_scale parameter has been set to "rank") spaced evenly along the branch. By default, each mutation is also labelled with its mutation ID.

If the X axis is shown (which it is by default when drawing a tree sequence, but not when drawing an individual tree) then the sites are plotted using a black tickmark above the axis line. Each plotted mutation at the site is then overlaid on top of this as a red downwards-pointing chevron.

ts_mutated = msprime.sim_mutations(ts_small, rate=1e-7, random_seed=342)
ts_mutated.draw_svg(y_axis=True, y_ticks=y_tick_pos, x_lim=x_limits)
_images/ae61d9b198eb2fe3b8642c319cbfef46a3aaf4b0d04c1418d49a2b3fd4be3f35.svg

Note that, unusually, the rightmost site on the axis has more than one stacked chevron, indicating that multiple mutations in the tree occur at the same site. These could be mutations to different allelic states, or recurrent/back mutations. In this case the mutations, 14 and 15 (above nodes 1 and 6) are recurrent mutations from T to G.

Hide code cell source
ts_mutated = tskit.load("data/viz_ts_small_mutated.trees")
site_descr = str(next(ts_mutated.at_index(2).sites()))
print(site_descr.replace("[", "[\n  ").replace("),", "),\n ").replace("],", "],\n"))
Site(id=14, position=13963.0, ancestral_state='T', mutations=[
  Mutation(id=14, site=14, node=1, derived_state='G', parent=-1, metadata=b'', time=619.6499091610359, edge=14),
  Mutation(id=15, site=14, node=6, derived_state='G', parent=-1, metadata=b'', time=370.24979591084764, edge=4)],
 metadata=b'')

Which mutations are shown?#

When using the x_lim parameter, only the mutations in the plotted region are shown. For the third tree in the tree sequence visualization above, we thus haven’t plotted mutations above position 15000. We can see all the mutations in the tree by changing the plot region, or simply plotting the tree itself:

third_tree = ts_mutated.at_index(2)
print(f"The third tree in the mutated tree sequence, which covers {third_tree.interval}")
third_tree.draw_svg(size=(200, 300))
The third tree in the mutated tree sequence, which covers Interval(left=13901.0, right=21285.0)
_images/9f74ca04afafbf9a2d6fa17e1359582055355093cf6c5b0d310650037ce2007e.svg

However, when plotting a single tree it may not be evident that identical branches may exist in several adjacent trees, indicating an edge that persists across adjacent trees. For instance the rightmost branch in the tree above, from node 10 down to 7, exists in the previous two trees too. Indeed, this edge has a mutation on it at position 6295, in the first tree. This mutation is not plotted in the tree above, but if you want all the mutations on each edge to be plotted, you can set the all_edge_mutations parameter to True. This adds any extra mutations that are associated with an edge in the tree but which fall outside the interval of that tree; by default these mutations are drawn in a slightly different shade (e.g. mutation 64 below).

third_tree.draw_svg(size=(200, 300), all_edge_mutations=True)
_images/d8033e8a3b138ac60ae2453c11b58f30d45bafbc3c73db317905c67d7a076f29.svg

Labelling#

Although the default node and mutation labels show unique identifiers, they are’t terribly intuituive. The node_labels and mutation_labels parameters can be used to set more meaningful labels (for example from the tree sequence Metadata). See Dynamic effects if you want to dynamically hide and show such labels.

nd_labels = {}  # An array of labels for the nodes
for n in ts_mutated.nodes():
    # Set sample node labels from metadata. Here we use the population name, but you might want
    # to use the *individual* name instead, if the individuals in your tree sequence have names
    if n.is_sample():
        nd_labels[n.id] = ts_mutated.population(n.population).metadata["name"]

mut_labels = {}  # An array of labels for the mutations
for mut in ts_mutated.mutations():  # Make pretty labels showing the change in state
    site = ts_mutated.site(mut.site)
    older_mut = mut.parent >= 0  # is there an older mutation at the same position?
    prev = ts_mutated.mutation(mut.parent).derived_state if older_mut else site.ancestral_state
    mut_labels[mut.id] = f"{prev}{mut.derived_state}"

ts_mutated.draw_svg(
    y_axis=True, y_ticks=y_tick_pos, x_lim=x_limits,
    node_labels=nd_labels,
    mutation_labels=mut_labels,
)
_images/31a8a8b4fa98aeb40f0e9d9370abe3b86699dc3e0c71bbef39c773c3e8b26d5f.svg

Styling#

The SVG output produced by tskit contains a large number of classes which can be used to target different elements of the drawing, allowing them to be hidden, styled, or otherwise manipulated. This is done by passing a cascading style sheet (CSS) string to draw_svg. A common use of styles is to colour nodes by their population:

styles = []
# Create a style for each population, programmatically (or just type the string by hand)
for colour, p in zip(['red', 'green', 'blue'], ts_full.populations()):
    # target the symbols only (class "sym")
    s = f".node.p{p.id} > .sym " + "{" + f"fill: {colour}" + "}"
    styles.append(s)
    print(f'"{s}" applies to nodes from population {p.metadata["name"]} (id {p.id})')
css_string = " ".join(styles)
print(f'CSS string applied:\n    "{css_string}"')

ts_full.first().draw_svg(
    size=wide_fmt,
    node_labels={},    # Remove all node labels for a clearer viz
    style=css_string,  # Apply the stylesheet
)
".node.p0 > .sym {fill: red}" applies to nodes from population A (id 0)
".node.p1 > .sym {fill: green}" applies to nodes from population B (id 1)
".node.p2 > .sym {fill: blue}" applies to nodes from population C (id 2)
CSS string applied:
    ".node.p0 > .sym {fill: red} .node.p1 > .sym {fill: green} .node.p2 > .sym {fill: blue}"
_images/fe7f931e03f742453adfbb1722ea2880020b45988e7ce50eb2875ec7e99be936.svg

Colouring nodes by population makes it immediately clear that, while the tree structure does not exactly reflect the population divisions, there’s still considerable population substructure present in this larger tree.

Todo

The (older) Tree.draw() function also has a node_colour argument that can be used to colour tree nodes, which is used in some of the other tskit tutorials. Under the hood, this function simply sets appropriate SVG styles on nodes. We intend to make it easier to set colours in a similar way: see https://github.com/tskit-dev/tskit/issues/579.

The CSS string used to style the tree above takes advantage of the general classes defined in a tskit SVG file: a node symbol always has a class named sym, which is contained within a grouping element of class node. Moreover, elements such as node have additional classes, such as p1, indicating that the node in this case belongs to the population with ID 1.

Available css classes#

Here are the css classes in a tskit SVG which can be used to style specific elements.

Within the plotting area#
  • tree: a grouping element containing each tree

  • node: a grouping element within a tree, containing a node and its descendant elements such as a node symbol, an edge, mutations, and other nodes.

  • mut: a grouping element containing a mutation symbol and label

  • extra: an extra class for mutations outside the tree

  • lab: a label element (for a node, mutation, axis, tick number, etc.)

  • sym: a symbol element (e.g. a node, mutation, or site symbol)

  • edge: an edge element (i.e. a branch in a tree)

  • root, leaf and sample: additional classes applied to a node group if the node is a root node, a leaf node or a sample node

  • unknown_time: a class added to mut groups if the time of the mutation is tskit.UNKNOWN_TIME.

  • rgt and lft: additional classes applied to labels for left- or right-justification

Outside the plotting area#
  • axes: a grouping element containing the X and Y axes, if either are present

  • x-axis, y-axis: more specific grouping elements contained within axes

  • tick: a single tick on an axis, containing a tickmark line and a label

  • site: a grouping element representing a site (plotted on the X axis), containing a site symbol (a tick line) and zero or more mut groups, each containing a chevron-shaped mutation symbol

  • background: the shaded background of a tree sequence plot

  • grid: a gridline

ID-based classes#

Elements have additional classes based on the IDs of trees, nodes, parent (ancestor) nodes, individuals, populations, mutations, and sites. These class names start with a single letter (respectively t, n, a, i, p, m, and s) followed by a numerical ID. For example, here’s a typical node in an tskit SVG plot:

<g class="a10 i3 leaf m16 m17 node n7 p2 s15 s16 sample">...</g>

This corresponds to node 7, the rightmost leaf in the third tree in the mutated tree sequence (plotted in the previous section but one). The classes indicate that it has an immediate ancestor (parent) node with ID 10 (a10), and that the node belongs to an individual with ID 3 (i3). The classes n7 and p2 tell us that the node ID is 7 and is is from the population with ID 2 (p2). Other ID classes on the node tell us about the mutations above that node, of which there are two in this case, with IDs 16 and 17 (m16, m17); those mutations are associated with site IDs 15 and 16 (s16, s17).

Other grouping elements apart from nodes can also contain ID-based classes. For example the tree group contains the ID of the tree (e.g. t0), the site group on the X axis contains the site ID (e.g. s15) the mut class contains the mutation ID (e.g. m16), and so on.

Styling graphical elements#

The classes above make it easy to target specific nodes or edges in one or multiple trees. For example, we can colour branches that are shared between trees (identified below as ones that have the same parent and child):

css_string = ".a15.n9 > .edge {stroke: cyan; stroke-width: 2px}"  # branches from 15->9
ts_small.draw_svg(time_scale="rank", size=wide_fmt, style=css_string)
_images/bcc4111c905db0df2f142193a70e6056e876151ed9a16830d574bb788a81ca1a.svg

By generating the css string programatically, you can target all the edges present in a particular tree, and see how they gradually disappear from adjacent trees. Below, for example the branches in the central tree have been coloured red, as have the identical branches in adjacent trees. The central tree represents a location in the genome that has seen a selective sweep, and therefore has short branch lengths: adjacent trees are not under direct selection and thus the black branches tend to be longer. These (red) shared branches extending far on either side represent shared haplotypes, and this shows how long, shared haplotypes can extend much further away from a sweep than the region of reduced diversity (which is the region spanned by the short tree in the middle). For visual clarity, node symbols and labels have been turned off.

Hide code cell source
ts_selection = tskit.load("data/viz_ts_selection.trees")
css_edge_targets = []  # collect the css targets of all the edges in the selected tree
sweep_location = ts_selection.sequence_length / 2  # NB: sweep is in the middle of the ts
focal_tree = ts_selection.at(sweep_location)
for node_id in focal_tree.nodes():
    parent_id = focal_tree.parent(node_id)
    if parent_id != tskit.NULL:
        css_edge_targets.append(f".a{parent_id}.n{node_id}>.edge")
css_string = ",".join(css_edge_targets) + "{stroke: red} .sym {display: none}"
css_string += (  # Rotate the position labels etc
    ".x-axis .ticks .lab {text-anchor: start; transform: translate(6px) rotate(90deg)}"
    ".x-axis .title .lab {text-anchor: start}"
)
wide_tall_fmt = (1200, 400)
ts_selection.draw_svg(
    style=css_string,
    size=wide_tall_fmt,
    canvas_size=(wide_tall_fmt[0], wide_tall_fmt[1] + 30),
    x_lim=[1e4, 4e4],
    node_labels={},
)
_images/b54ab87cca207130f4801b9ca43dc525d59a431e4c75a38d6be2e401c313b375.svg

Note

Branches in multiple trees that have the same parent and child do not always correspond to a single edge in a tree sequence: for example, edges have the additional constraint that they must belong to adjacent trees.

Dynamic effects#

In the previous example, the large size of the plotted trees meant that, for clarity, the node and mutation labels were turned off (in that case by passing empty mappings to the node_labels and mutation_labels parameters). Nevertheless, it can be useful to identify nodes and mutations, and this can be done dynamically (on “mouseover”) by setting the CSS display property to none vs initial, and combining it with the CSS :hover pseudoclass. Here’s an example using a region from within the previous example:

from IPython.display import HTML
# add some mutations
ts = msprime.sim_mutations(ts_selection, rate=2e-8, random_seed=1)

node_label_css = (
    # hide node labels by default
    "#hover_example .node > .sym ~ .lab {display: none}"
    # Unless the adjacent node or the label is hovered over
    "#hover_example .node > .sym:hover ~ .lab {display: inherit}"
    "#hover_example .node > .sym ~ .lab:hover {display: inherit}"
)

mut_label_css = (
    # hide mutation labels by default
    "#hover_example .mut .sym ~ .lab {display: none}"
    # Unless the adjacent node or the label is hovered over
    "#hover_example .mut .sym:hover ~ .lab {display: inherit}"
    "#hover_example .mut .sym ~ .lab:hover {display: inherit}"
)

optional_css = (
    # These are optional, but setting e.g. the node label text to bold with grey stroke
    # and a black fill, serves to make black text readable against a black tree 
    "svg#hover_example {background-color: white}"
    "#hover_example .tree .plotbox .lab {stroke: #CCC; fill: black; font-weight: bold}"
    "#hover_example .tree .mut .lab {stroke: #FCC; fill: red; font-weight: bold}"
)

HTML(ts.draw_svg(
    style=optional_css + node_label_css + mut_label_css,
    y_axis=True,
    y_ticks={0: "0", 500: "", 1000: "1000"},
    x_lim=[2.3e4, 2.7e4],    
    root_svg_attributes={"id": "hover_example"},
        # Label node by name in metadata, if it exists, else node ID
    node_labels={u.id: u.metadata.get("name", f"NodeID={u.id}") for u in ts.nodes()},
    mutation_labels={
        # Label mutation by site position, prev state, and new state
        m.id: (
            f"pos {s.position:g}: " +
            (s.ancestral_state if m.parent<0 else ts.mutation(m.parent).derived_state) +
            f"→{m.derived_state}"
        )
        for s in ts.sites()
        for m in s.mutations
    },
))
Genome position2416624621Time (generations)01000Sample FSample GSample HSample JSample KNodeID=19NodeID=24Sample CSample ENodeID=15Sample DSample BSample INodeID=12Sample ASample LNodeID=18NodeID=20NodeID=21NodeID=25NodeID=26NodeID=27NodeID=28Sample FSample GSample HSample JSample KNodeID=16NodeID=24Sample CSample ENodeID=15pos 24257: G→ASample DSample BSample INodeID=12Sample ASample LNodeID=18NodeID=20NodeID=21NodeID=25NodeID=26NodeID=27NodeID=28Sample FSample GSample Hpos 26806: T→GSample Jpos 26623: G→Apos 25540: C→GSample KNodeID=22NodeID=24Sample CSample ENodeID=15Sample DSample BSample INodeID=12Sample ASample LNodeID=18NodeID=20NodeID=21NodeID=25NodeID=26NodeID=27NodeID=28

Note

Above we have wrapped the svg in an IPython HTML class, and given the SVG a unique id as described below in More about styling. This forces the SVG plot to be rendered inline (rather than inside an <img> tag), allowing the hover functionality to work in all supported Jupyter notebook implementations. However, depending on your Jupyter setup, the HTML() wrapper may not be necessary.

Using the transformations discussed in the next section, it is also possible to animate SVG images, as shown in the Animation code within the SVG examples section near the end of this tutorial.

Transforming and masking elements#

We can also use styles to transform elements of the drawing, shifting them into different locations or changing their orientation. For example, earlier in this tutorial we used the following CSS string to rotate leaf labels:

.leaf > .lab {text-anchor: start; transform: rotate(90deg) translate(6px)}

Transformations not only allow us to shift elements about, but also resize and skew them. When applied to both symbols and labels this can create rather different formatting styles:

css_string = (
    # Draw large yellow circles for nodes ...
    ".node > .sym {transform: scale(2.2); fill: yellow; stroke: black; stroke-width: 0.5px}"

    # ...but for leaf nodes, override the yellow circle using a more specific CSS target
    ".node.leaf > .sym {transform: scale(1); fill:black}"

    # Override default node text position to be based at (0, 0) relative to the node pos
    # Note that the .tree specifier is needed to make this more specific than the default
    # positioning which is targetted at ".lab.lft" and ".lab.rgt"
    ".tree .node > .lab {transform: translate(0, 0); text-anchor: middle; font-size: 7pt}"

    # For leaf nodes, override the above positioning using a subsequent CSS style
    ".node.leaf > .lab {transform: translate(0, 12px); font-size: 10pt}"
)
ts_small.first().draw_svg(style=css_string)
_images/b5251ed9a4c58e0217a9bb27227863a1bcb71d0fd643f2d448e6200660c88125.svg

Note that when transforming elements, parts of the drawing may be plotted outsize of of the standard canvas. You can use the canvas_size parameter to increase the size of the canvas (adding more space to the right and bottom of the plot), without rescaling the graphics. This is particularly useful when performing more radical CSS transformations, for example to create 3D effects:

skew = 0.8  # How skewed the trees are, in radians

# CSS transforms used to skew the trees
style = f".tree .plotbox {{transform: skewY({skew}rad)}}"
# Shift the x axis to make room for the skewed trees
shift_axis = (10, 50)
style += f".x-axis {{transform: translate({shift_axis[0]}px, {shift_axis[1]}px)}}"

# Must define a bigger canvas size so we don't crop the axis off
size = (800, 200) # width, height of svg
canvas_size = (size[0] + shift_axis[0], size[1] + shift_axis[1])

ts_tiny.draw_svg(size=size, x_scale="treewise", style=style, canvas_size=canvas_size)
_images/a01caa900793c2858be20f3334ce59c0b9063ab21639a3796845d082ccb5fea5.svg

Note

Using transform in styles is an SVG2 feature, and has not yet been implemented in the software programs Inkscape or librsvg. Therefore if you are converting or editing the examples above, the transformed elements may be positioned incorrectly. For changing symbols, the symbol_size option can be used to simply change the size of all symbols in the plot, but otherwise you may need to use the chromium workaround documented here.

Although it is hard to change the style of a node symbol, the visible area of the symbol can be modified using the clip-path CSS property. This can be useful to show, for instance, a triangle to summarise the descendants of a MRCA.

# Check that MRCA of 2 & 3 is node 4 in all trees, assumed later
assert all([4 == tree.mrca(2, 3) for tree in ts_tiny.trees()])

styles = [
    # Set all node labels to be rotated and small
    ".node > .lab {text-anchor: start; transform: rotate(90deg) translate(6px); font-size: 8px}",

    # Hide all nodes descending from node 4. We then treat node 4 as a summary node
    ".n4 > .node {display: none}",
    
    # Use clipping & scaling to change the symbol for node 4 into a summary triangle
    ".n4 > .sym {clip-path: polygon(50% 50%, 75% 75%, 25% 75%); transform: scale(8.0, 8.0)}",
    
    # Make the font bigger for this summary node label
    ".n4 > .lab {transform: rotate(90deg) translate(14px); font-size: 16px}"
]

node_labels = {0: "Nd. 0", 1: "Nd. 1", 4: "Two samples"}

ts_tiny.draw_svg(
    size=(800, 300),
    x_scale="treewise",
    time_scale="log_time",
    style="".join(styles),
    node_labels=node_labels,
)
_images/d32433d477f69b5566e0948201b12c31e155099b6c966cadad1d449fc673743b.svg

In the example above we simply hid the descendant topology for each “summary MRCA”, meaning more horizontal space was taken up than expected. For a more sophisticated example, see Simplifying larger plots in which some descendant samples are actually removed from the tree sequence entirely, and their MRCA is changed into a sample node instead.

Styling and SVG structure#

To take full advantage of the SVG styling capabilities in tskit, it is worth knowing how the SVG file is structured. In particular tskit SVGs use a hierarchical grouping structure that reflects the tree topology. This allows easy styling and manipulation of both individual elements and entire subtrees. Currently, the hierarchical structure of a simple 2-tip SVG tree produced by tskit looks something like this:

<g class="tree t0">
  <g class="plotbox">
    <g class="node n2 root">
      <g class="node n1 a2 i1 p1 m0 s0 sample leaf">
        <path class="edge" ... />
        <g class="mut m0 s0" ...>
          <line .../>
          <path class="sym" .../>
          <text class="lab">Mutation 0</text>
        </g>
        <rect class="sym" ... />
        <text class="lab" ...>Node 1</text>
      </g>
      <g class="node n0 a2 i2 p1 sample leaf">
        <path class="edge" ... />
        <rect class="sym" .../>
        <text class="lab" ...>Node 0</text>
      </g>
      <path class="edge" ... />
      <circle class="sym" ... />
      <text class="lab">Root (Node 2)</text>
    </g>
  </g>
</g>

And in a tree sequence plot, the SVG simply consists of a set of such trees, together with groups containing the background and axes, if required.

<g class="tree-sequence">
  <g class="background"></g>
  <g class="axes"></g>
  <g class="trees">
    <g class="tree t0">...</g>
    <g class="tree t1">...</g>
    <g class="tree t2">...</g>
    ...
    </g>
  </g>
</g>
    

Styling subtrees#

The nested grouping structure makes it easy to target a node and all its descendants. For instance, here’s how to draw all the edges of node 13 and its descendants using a thicker blue line:

edge_style = ".n13 .edge {stroke: blue; stroke-width: 2px}"
nd_labs = {n: n for n in [0, 1, 2, 3, 4, 5, 6, 7, 13, 17]}
ts_mutated.draw_svg(x_lim=x_limits, node_labels=nd_labs, style=edge_style)
_images/f52fc2c39c0a8b50455f72c43a6a9da07ff4d721f5becafd3d306ac775ad7942.svg

This might not be quite what you expected: the branch leading from node 13 to its parent (node 17) has also been coloured. That’s because the SVG node group deliberately contains the branch that leads to the parent (this can be helpful, for example, for hiding the entire subtree leading to node 13, using e.g. .n13 {display: none}). To colour the branches descending from node 13, you therefore need to target the nodes nested at least one level deep within the n13 group. One way to do that is to add an extra .node class to the style, e.g.

edge_style = ".n13 .node .edge {stroke: blue; stroke-width: 2px}"
# NB to target the edges in only (say) the 1st tree you could use ".t0 .n13 .node .edge ..."
ts_mutated.draw_svg(x_lim=x_limits, node_labels=nd_labs, style=edge_style)
_images/e7365cf8e3fc97e8bc4114ffd3f16fb5eec0269e355b8bd777028fa43cce0b4b.svg

If you want to colour the branches descending from a particular mutation (say mutation 7) then you need to colour not only the edges, but also part of an edge (i.e. the line that connects a mutation downwards to its associated node). The tskit SVG format provides a special <line> element to enable this, which is normally made invisible using fill: none stroke: none in the default stylesheet. Here’s an example of activating this normally-hidden line:

default_muts = ".mut .lab {fill: gray} .mut .sym {stroke: gray}"  # all other muts in gray
m8_mut = (
    ".m8 .node .edge, "  # the descendant edges
    ".mut.m8 line, "  # activate the hidden line between the mutation and the node
    ".mut.m8 .sym "  # the mutation symbols on the tree and the axis
    "{stroke: red; stroke-width: 2px}"
    ".mut.m8 .lab {fill: red}"  # colour the label "8" in red too
)
css_string = default_muts + m8_mut
ts_mutated.draw_svg(x_lim=x_limits, node_labels=nd_labs, style=css_string)
_images/29a9c8bac7ea1326eedbe177fcd42d061483608103801f0106795973bb672041.svg

Restricting styling#

Sometimes the hierarchical nesting leads to styles being applied too widely. For example, since style selectors include all the descendants of a target, to target just the node itself (and not its descendants) a slightly different specification is required, involving, the “>” symbol, or child combinator (we have, in fact, used it in several previous examples). The following plot shows the difference when all decendant symbols are targetted, versus just the immediate child symbol:

node_style1 = ".n13 .sym {fill: yellow}"  # All symbols under node 13 
node_style2 = ".n15 > .sym {fill: cyan}"  # Only symbols that are an immediate child of node 15
css_string = node_style1 + node_style2
ts_small.draw_svg(y_axis=True, y_ticks=y_tick_pos, x_lim=x_limits, style=css_string)
_images/7895272486bcbbd8aebb37cf7eda5e8bb81fb0f8c9afb7efb17ab5cfb9880acf.svg

Another example of modifying the style target is negation. This is needed, for example, to target nodes that are not leaves (i.e. internal nodes). One way to do this is to target all the node symbols first, then replace the style with a more specific targetting of the leaf symbols only:

hide_internal_symlabs = ".node > .sym, .node > .lab {display: none}"
show_leaf_symlabs = ".node.leaf > .sym, .node.leaf > .lab {display: initial}"
css_string = hide_internal_symlabs + show_leaf_symlabs
ts_small.draw_svg(y_axis=True, y_ticks=y_tick_pos, x_lim=x_limits, style=css_string)
_images/d1d0f508d0f1a6ef86e6338d6c35db2da272b351ca79f2d7aae09778141b0b74.svg

Alternatively, the :not selector can be used to target nodes that are not leaves, so the following style specification should produce the same effect in SVG viewers that support it (note, however, as of v1.2 Inkscape does not appear to support this selector).

style_string = ".node:not(.leaf) > .sym, .node:not(.leaf) > .lab {display: none}"

More about styling#

NOTE: if your SVG is embedded directly into an HTML page (a common way for jupyter notebooks to render SVGs), then according to the HTML specifications, any styles applied to one SVG will apply to all SVGs in the document. To avoid this confusing state of affairs, we recommend that you tag the SVG with a unique ID using the root_svg_attributes parameter, then prepend this ID to the style string:

ts_small.draw_svg(
    x_lim=x_limits,
    root_svg_attributes={'id': "myUID"},
    style="#myUID .background * {fill: #00FF00}",  # apply any old style to this specific SVG
)
_images/809c0b4a9eb62e84f1b4aff59d5fe5a471490b92d85b66c1f43a5fae930f7342.svg

SVG styles allow a huge amount of flexibility in formatting your plot, even extending to animations. Feel free to browse the examples for inspiration.

Converting and editing SVG#

Converting#

Inkscape is an open source SVG editor that can also be scripted to output bitmap files.

Imagemagick is a common piece of software used to convert between image formats. It can be configured to delegate to one of several different SVG libraries when converting SVGs to bitmap formats. Currently, both the librsvg library or the Inkscape library produce reasonable output, although librsvg currently misaligns some labels due to ignoring certain SVG properties.

Note

A few stylesheet specifications, such as the transform property, are SVG2 features, and have not yet been implemented in Inkscape or librsvg. Therefore if you use these in your own custom SVG stylesheet (such as the example above where we rotated leaf labels), they will not be applied properly when converted with those tools. For custom stylesheets like this, a workaround is to convert the SVG to PDF first, using e.g. the programmable chromium engine:

chromium --headless --print-to-pdf=out.pdf in.svg

The resulting PDF file can then be converted by Inkscape, retaining the correct transformations.

Editing the SVG#

  • Editing can be done in Inkscape (subject to the note above)

Todo

Tips on how to cope with the hierarchical grouping when editing (e.g. in Inkscape using Extensions menu > Arrange > Deep Ungroup, but note that this will mess with the styles!)

Examples#

Text examples#

Tree orientation#

In the text format, trees (but not tree sequences) can be displayed in different orientations

Hide code cell source
from IPython.display import HTML
import msprime

ts = msprime.sim_ancestry(4)

orient = "top", "left", "bottom", "right"
html = []
for o in orient:
    tree_string = ts.first().draw_text(orientation=o)
    html.append(f"<pre style='display: inline-block'>{tree_string}</pre>")
HTML(("&nbsp;"*10).join(html))
         14    
      ┏━━━┻━━┓ 
     13      ┃ 
    ┏━┻━━┓   ┃ 
   12    ┃   ┃ 
  ┏━┻━┓  ┃   ┃ 
 11   ┃  ┃   ┃ 
┏━┻┓  ┃  ┃   ┃ 
┃ 10  ┃  ┃   ┃ 
┃ ┏┻┓ ┃  ┃   ┃ 
┃ ┃ ┃ ┃  ┃   9 
┃ ┃ ┃ ┃  ┃  ┏┻┓
┃ ┃ ┃ ┃  8  ┃ ┃
┃ ┃ ┃ ┃ ┏┻┓ ┃ ┃
0 1 4 3 6 7 2 5
          
           ┏━━━━━━━0
           ┃        
        ┏11┫  ┏━━━━1
        ┃  ┗10┫     
     ┏12┫     ┗━━━━4
     ┃  ┃           
  ┏13┫  ┗━━━━━━━━━━3
  ┃  ┃              
  ┃  ┃            ┏6
  ┃  ┗━━━━━━━━━━━8┫ 
14┫               ┗7
  ┃                 
  ┃             ┏━━2
  ┗━━━━━━━━━━━━9┫   
                ┗━━5
          
0 1 4 3 6 7 2 5
┃ ┃ ┃ ┃ ┗┳┛ ┃ ┃
┃ ┃ ┃ ┃  8  ┃ ┃
┃ ┃ ┃ ┃  ┃  ┗┳┛
┃ ┃ ┃ ┃  ┃   9 
┃ ┗┳┛ ┃  ┃   ┃ 
┃ 10  ┃  ┃   ┃ 
┗━┳┛  ┃  ┃   ┃ 
 11   ┃  ┃   ┃ 
  ┗━┳━┛  ┃   ┃ 
   12    ┃   ┃ 
    ┗━┳━━┛   ┃ 
     13      ┃ 
      ┗━━━┳━━┛ 
         14    
          
0━━━━━━━┓           
        ┃           
1━━━━┓  ┣11┓        
     ┣10┛  ┃        
4━━━━┛     ┣12┓     
           ┃  ┃     
3━━━━━━━━━━┛  ┣13┓  
              ┃  ┃  
6┓            ┃  ┃  
 ┣8━━━━━━━━━━━┛  ┃  
7┛               ┣14
                 ┃  
2━━┓             ┃  
   ┣9━━━━━━━━━━━━┛  
5━━┛                

SVG examples#

A standard ts plot#

Note that this tree sequence also illustrates a few features which are not normally produced e.g. by msprime simulations, in particular a “empty” site (with no associated mutations) at position 50, and some mutations that occur above root nodes in the trees. Graphically, root mutations necessitate a line above the root node on which to place them, so each tree in this SVG has a nominal “root branch” at the top. Normally, root branches are not drawn, unless the force_root_branch parameter is specified.

Hide code cell source
# Make a tree sequence with multiple mutations including some above the root
import io
import tskit

def make_unusual_ts():
    nodes = io.StringIO(
        """\
    id      is_sample   population      individual      time    metadata
    0       1       0       -1      0
    1       1       0       -1      0
    2       1       0       -1      0
    3       1       0       -1      0
    4       0       0       -1      0.1145014598813
    5       0       0       -1      1.11067965364865
    6       0       0       -1      1.75005250750382
    7       0       0       -1      5.31067154311640
    8       0       0       -1      6.57331354884652
    9       0       0       -1      9.08308317451295
    """
    )
    edges = io.StringIO(
        """\
    id      left   right   parent  child
    0       0      100     4       0
    1       0      100     4       1
    2       0      100     5       2
    3       0      100     5       3
    4       80     85      6       4
    5       80     85      6       5
    6       6      80      7       4
    7       85     91      7       4
    8       6      80      7       5
    9       85     91      7       5
    10      91     100     8       4
    11      91     100     8       5
    12      0      6       9       4
    13      0      6       9       5
    """
    )
    sites = io.StringIO(
        """\
    position    ancestral_state
    4           A
    6           0
    30          Empty
    50          XXX
    91          T
    """
    )
    muts = io.StringIO(
        """\
    site   node    derived_state    parent    time
    0      9       T                -1        15
    0      9       G                0         9.1
    0      5       1                1         9
    1      4       C                -1        1.6
    1      4       G                3         1.5
    2      7       G                -1        10
    2      3       C                5         1
    4      3       G                -1        1
    """
    )
    return tskit.load_text(nodes, edges, sites=sites, mutations=muts, strict=False)


ts = make_unusual_ts()
ts.draw_svg()
_images/735a3ab251c0addfd7114dd955617209cf1c96c45602b0d3e94e170faf6c0d03.svg

Highlighted mutations#

Specific mutations can be given a different colour. Moreover, the descendant lineages of specific mutations can be coloured and the branch colours overlay each other as expected. Note that in this example, internal node labels and symbols have been hidden for clarity.

Hide code cell source
ts = make_unusual_ts()  # Defined in the first SVG example
css_string = (
    ".edge {stroke: grey}"
    ".mut .sym{stroke:pink} .mut text{fill:pink}"
    ".mut.m2 .sym, .m2>line, .m2>.node .edge{stroke:blue} .mut.m2 .lab{fill:blue}"
    ".mut.m3 .sym, .m3>line, .m3>.node .edge{stroke:cyan} .mut.m3 .lab{fill:cyan}"
    ".mut.m4 .sym, .m4>line, .m4>.node .edge{stroke:red} .mut.m4 .lab{fill:red}"
    # Hide internal node labels & symbols
    ".node:not(.leaf) > .sym, .node:not(.leaf) > .lab {display: none}"
)
ts.draw_svg(style=css_string, time_scale="rank", x_lim=[0, 30])
_images/82ddc79963bee0a22403f8180b0eededbb4bfb3dfa7939e2d5d69b050e17625b.svg

Leaf, sample & isolated nodes#

By default, sample nodes are square and non-sample nodes circular (at the moment this can’t easily be changed). However, neither need to be at specific times: sample nodes can be at times other than 0, and nonsample nodes can be at time 0. Moreover, leaves need not be samples, and samples need not be leaves. Here we change the previous tree sequence to make some leaves non-samples and some samples internal nodes. To highlight the change, we have plotted sample nodes in green, and leaf nodes (if not samples) in blue.

Hide code cell source
def swap_samples_ts():
    tables = make_unusual_ts().dump_tables()  # Defined in the first SVG example
    tables.mutations.clear()
    tables.sites.clear()
    flags = tables.nodes.flags
    flags[:] = [0, 0, 1, 1, 0, 0, 0, 1, 1, 0]
    tables.nodes.flags = flags
    return tables.tree_sequence()


ts = swap_samples_ts()
css_string=".leaf .sym {fill: blue} .sample > .sym {fill: green}"
ts.draw_svg(
    style=css_string,
    x_scale="treewise",
    time_scale="rank",
    y_axis=True,
    y_gridlines=True,
    x_lim=[0, 10],
)
_images/46fb6b52e3c9991185c9fdd7511351ac7b5750876d2ef8a4735d30d76d1a61e3.svg

Note

By definition, if a node is a sample, it must be present in every tree. This means that there can be sample nodes which are “isolated” in a tree. These are drawn unconnected to the main topology in one or more trees (e.g. nodes 7 and 8 above).

A fancy formatted plot#

Here we have activated the Y axis, and changed the node style. In particular, we have coloured nodes by time, and increased the internal node symbol size while moving the internal node labels into the symbol; node labels have also been plotted in a sans-serif font. Axis tick labels have been changed to avoid potential overlapping (some Y tick labels have been removed, and the X tick labels rotated).

Hide code cell source
import msprime
import numpy as np

def make_standard_ts():
    seed=370009
    return msprime.sim_ancestry(
        7, ploidy=1, sequence_length=1000, random_seed=seed, recombination_rate=0.001)


ts = make_standard_ts()

# Thin the tick values so we don't get labels within 0.01 of each other
y_ticks = ts.tables.nodes.time
y_ticks = np.delete(y_ticks, np.argwhere(np.ediff1d(y_ticks) <= 0.01))

css_string = (
    ".tree .lab {font-family: sans-serif}"
    # Normal X axis tick labels have dominant baseline: hanging, but it needs centring when rotated
    + ".x-axis .tick .lab {text-anchor: start; dominant-baseline: central; transform: rotate(90deg)}"
    + ".y-axis .grid {stroke: #DDDDDD}"
    + ".tree :not(.leaf).node > .lab  {transform: translate(0,0); text-anchor:middle; fill: white}"
    + ".tree :not(.leaf).node > .sym {transform: scale(3.5)}"
    + "".join(
        f".tree .n{n.id} > .sym {{fill: hsl({int((1-n.time/ts.max_root_time)*260)}, 50%, 50%)}}"
        for n in ts.nodes()
      )
)
ts.draw_svg(size=(1000, 350), y_axis=True, y_gridlines=True, y_ticks=y_ticks, style=css_string)
_images/a88b0b499d6ca8bdd33c2211b5f9df4052f4abd096666bbfef8b4b5679604f26.svg

Ticks, labels, and gridlines#

Y tick labels can be specified explicitly, which allows time scales to be plotted e.g. in years even if the tree sequence ticks in generations. The title class allows axis titles to be moved out of the way of tick labels. Finally, grid lines associated with each y tick can also be changed or even hidden individually using the CSS nth-child pseudo-selector, where tickmarks are indexed from the bottom. Below is an example of all 3 techniques, drawing on an example from the Introgression tutorial:

Hide code cell source
import msprime

time_units = 1000 / 25  # Conversion factor for kya to generations

def make_introgression_ts(sequence_length, random_seed=None):
    """
    Function from the introgression tutorial - see there for justification
    """
    demography = msprime.Demography()
    # The same size for all populations; highly unrealistic!
    Ne = 10**4
    demography.add_population(name="Africa", initial_size=Ne)
    demography.add_population(name="Eurasia", initial_size=Ne)
    demography.add_population(name="Neanderthal", initial_size=Ne)

    # 2% introgression 50 kya
    demography.add_mass_migration(
        time=50 * time_units, source='Eurasia', dest='Neanderthal', proportion=0.02)
    # Eurasian 'merges' backwards in time into Africa population, 70 kya
    demography.add_mass_migration(
        time=70 * time_units, source='Eurasia', dest='Africa', proportion=1)
    # Neanderthal 'merges' backwards in time into African population, 300 kya
    demography.add_mass_migration(
        time=300 * time_units, source='Neanderthal', dest='Africa', proportion=1)

    return msprime.sim_ancestry(
        recombination_rate=1e-8,
        sequence_length=sequence_length,  
        samples=[
            msprime.SampleSet(1, ploidy=1, population='Africa'),
            msprime.SampleSet(1, ploidy=1, population='Eurasia'),
            # Neanderthal sample taken 30 kya
            msprime.SampleSet(1, ploidy=1, time=30 * time_units, population='Neanderthal'),
        ],
        demography = demography,
        record_migrations=True,  # Needed for tracking segments.
        random_seed=random_seed,
    )


ts = make_introgression_ts(20 * 10**6, random_seed=1)

base_size = (1200, 500)
x_shift = 60
css = f".tree-sequence {{transform: translateX({x_shift}px)}}"  # Move rightwards
css += ".sample .lab {text-anchor: start; transform: rotate(90deg) translate(6px); font-size: 80%}"
css += ".x-axis .tick .lab {font-size: 85%}"
css += ".y-axis .title {transform: translateY(250px)}"  # Move out of the way of y_tick labels
css += ".y-axis .tick .grid {stroke: lightgrey}"  # Default gridline type
css += ".y-axis .ticks .tick:nth-child(3) .grid {stroke-dasharray: 4}"  # 3rd line from bottom
css += ".y-axis .ticks .tick:nth-child(3) .grid {stroke: magenta}"  # also 3rd line from bottom
css += ".y-axis .ticks .tick:nth-child(4) .grid {stroke: blue}"  # 4th line from bottom
css += ".y-axis .ticks .tick:nth-child(5) .grid {stroke: darkgrey}"  # 5th line from bottom
y_ticks = {
    0: "0",
    30: "30",
    50: "Introgression event",
    70: "European origin",
    300: "Neanderthal origin",
    1000: "1000",
}
ts.draw_svg(
    size=base_size,
    x_lim=(0, 25_000),
    time_scale="log_time",
    node_labels = {0: "Africa", 1: "Europe", 2: "Neanderthal"},
    y_axis=True,
    y_ticks={y * time_units: lab for y, lab in y_ticks.items()},
    y_gridlines=True,
    style=css,
    canvas_size=(base_size[0] + x_shift, base_size[1]),
)
_images/4c6e025d442470a609c9a2839abc4786b769ea6692d1a7caeaded857861d711c.svg

Simplifying larger plots#

It is common to want to visualise a tree sequence with many samples and trees. If there are many trees, the max_num_trees parameter can be used to just show those at the start and end of the genome. To reduce the size of each tree, multiple samples can be clustered into a single representative clade. If that clade has the same set of descendant samples throughout the tree sequence, Simplification can be used to to turn the MRCA of these samples into a sample node itself, while removing the original descendants. By using the scaling and masking method described in Transforming and masking elements this summary MRCA can be shown as a large triangle, of size proportional to the number of samples underneath it. The example below shows how a tree sequence of 40 sample nodes can be visualised relatively compactly using these techniques:

Hide code cell source
import msprime
import numpy as np

def clonal_mrcas(ts, most_recent=True):
    """
    Identify the nodes in a tree sequence which define clonal subtrees (i.e. in which the
    samples descending from that node are identical and show identical relationships
    to each other over the entire tree sequence). This includes, at its limit, nodes
    with only a single descendant sample.
    
    :param bool most_recent: If True, and the clonal node is a unary node, return IDs of
        the most recent node that defines the clonal subtree. In this case, the returned
        IDs represent cases where the node is either a tip or a coalescent point.
    :return: a list of nodes defining constant subtrees over the entire tree sequence
    :rtype: list
    """
    for interval, edges_out, edges_in in ts.edge_diffs():
        if interval.left==0:
            is_full_length_clonal = np.ones(ts.num_nodes, dtype=bool)  # nodes start clonal
        else:
            for e in edges_in:
                is_full_length_clonal[e.parent] = False
        for e in edges_out:
            is_full_length_clonal[e.parent] = False
    clonal_nodes = np.where(is_full_length_clonal)[0]

    tables = ts.dump_tables()
    edges = tables.edges
    # only keep edges where both the child and the parent are full-length clonal nodes
    keep_edge = np.logical_and(
        np.isin(edges.child, clonal_nodes),
        np.isin(edges.parent, clonal_nodes),
    )
    tables.edges.set_columns(
            left = tables.edges.left[keep_edge],
            right=tables.edges.right[keep_edge],
            parent=tables.edges.parent[keep_edge],
            child=tables.edges.child[keep_edge],
        )
    clonal_ts = tables.tree_sequence()
    assert clonal_ts.num_trees == 1

    # Also remove all the edges ascending from removed edges
    tree = clonal_ts.first()
    non_clonal_ancestors = set()
    deleted_edges = np.logical_not(keep_edge)
    for u in np.unique(ts.edges_parent[deleted_edges]):
        while u != tskit.NULL and u not in non_clonal_ancestors:
            non_clonal_ancestors.add(u)
            u = tree.parent(u)
    non_clonal_ancestors = np.array(list(non_clonal_ancestors))
    tables = clonal_ts.dump_tables()
    remove_edge = np.isin(tables.edges.parent, non_clonal_ancestors)
    tables.edges.replace_with(tables.edges[np.logical_not(remove_edge)])
    clonal_ts = tables.tree_sequence()   
    
    tree = ts.first(sample_lists=True)
    clonal_tree = clonal_ts.first(sample_lists=True)
    clonal_nodes = []
    for root in clonal_tree.roots:
        # Clonal trees should subtend the same set of samples as in the original tree
        assert set(tree.samples(root)) == set(clonal_tree.samples(root))
        u = root
        if most_recent:
            # decend to the first coalescent node (i.e. MRCA)
            while clonal_tree.num_children(u) == 1:
                u = clonal_tree.children(u)[0]
        clonal_nodes.append(u)
    return clonal_nodes

ts = msprime.sim_ancestry(
        20, population_size=1e2, sequence_length=1e4, recombination_rate=1e-6, random_seed=83)
clones = clonal_mrcas(ts)
# Simplify but keep the same node IDs using filter_nodes=False
ts_simp = ts.simplify(clones, filter_nodes=False)

styles = [
    ".node > .lab {font-size: 9px}",
    ".leaf > .lab {text-anchor: start; transform: rotate(90deg) translate(5px); font-size: 12px}",

]

node_labels = {u: u for u in range(ts.num_nodes)}
for u in ts.samples():
    node_labels[u] = f"S{u}"

tree = ts.first()
for u in clones:
    num_samples = tree.num_samples(u)
    if num_samples > 1:
        node_labels[u] = f"{num_samples} samples"
        styles.append(
            f".n{u} > .sym {{clip-path: polygon(50% 50%, 100% 100%, 0% 100%);"+
            f"transform: scale({(num_samples-1)/5 + 1}, 4.0)}}" +
            f".n{u} > .lab {{transform: rotate(90deg) translate(15px); font-size: 13px}}"
        )
ts_simp.draw_svg(
    size=(1000, 400),
    style="".join(styles),
    node_labels=node_labels,
    time_scale="log_time",
    y_axis=True,
    y_ticks=[0, 1, 10, 100],
    max_num_trees=4)
_images/ea9556beea74c9de943aa64e0be335c66f2905be7d3b5f305f458f0ad2d70f13.svg

3D effects#

We can use various CSS transforms, as discussed previously, to skew the trees and stagger them. With a bit of trigonometry, this can create flexible and tolerably good 3D effects for presentations, etc.

Hide code cell source
import math
import msprime

def make_7_tree_4_tip_ts():
    ts = msprime.sim_ancestry(
        4, ploidy=1, random_seed=889, sequence_length=1000, recombination_rate=0.001)
    ts = msprime.sim_mutations(ts, rate=2e-3, random_seed=123)

    # Check we have picked a random seed that gives a nice plot of 7 trees
    tip_orders = {
        tuple(u for u in t.nodes(order="minlex_postorder") if t.is_sample(u))
        for t in ts.trees()
    }
    topologies = {tree.rank() for tree in ts.trees()}
    assert tip_orders == {(0, 1, 2, 3)} and len(topologies) > 1 and ts.num_trees == 7

    return ts


ts = make_7_tree_4_tip_ts()

# Set some parameters: these can be adjusted to your liking
tree_width = 100
height = 200 # Normal height for tree + x-axis
y_step = 40  # Stagger between trees (i.e. 0 for all trees in a horizontal line)
skew = 0.6  # How skewed the trees are, in radians

width = tree_width * ts.num_trees + 20 + 20  # L & R margins in draw_svg = 20px
angle = math.atan(y_step/tree_width)
ax_mv = y_step, (ts.num_trees - 1) * y_step + math.tan(skew) * (tree_width * .9)

# CSS transforms used to skew the axis and stagger + skew the trees
style = f".x-axis {{transform: translate({ax_mv[0]}px, {ax_mv[1]}px) skewY(-{angle}rad)}}"
for i in range(ts.num_trees):
    # Stagger each tree vertically by y_step, transforming the "plotbox" tree container
    style += (
        f".tree.t{i} > .plotbox " + "{transform:" +
        f"translateY({(ts.num_trees - i - 1) * y_step}px) skewY({skew}rad)" + "}"
    )

# Define a bigger canvas size so we don't crop the moved trees from the drawing
size = (width, height)
canvas_size = (width + y_step, height + ts.num_trees*y_step + math.tan(skew)*tree_width)

ts.draw_svg(size=size, x_scale="treewise", style=style, canvas_size=canvas_size)
_images/ef7c7af0bf4f4518b2f17b8c5b1f212c4d9a413469f9de287b84df0b98ef2f97.svg

Animation#

The classes attached to the SVG also allow elements to be animated. Here’s a d3.js-based animation of sucessive subtree-prune-and-regraft (SPR) operations, using the ARG representation of a tree sequence to allow identification of pruned edges.

Hide code cell source
from IPython.display import HTML
import msprime

def make_full_arg_for_spr_animation():
    # created with record_full_arg needed to track recombination nodes (branch positions)
    # random_seed chosen to produce a ts whose leaves are plotted in the same order
    return msprime.sim_ancestry(
        5, ploidy=1,
        sequence_length=10000,
        recombination_rate=0.00005,
        random_seed=6787, model="smc_prime", record_full_arg=True)                
   

css_string = (
    "#anim_svg {background-color: white} "
    "#anim_svg .node:not(.sample) > .lab, #anim_svg .node:not(.sample) > .sym {display: none}"
)
html_string = r"""
%s
<script type="text/javascript" src="https://d3js.org/d3.v4.min.js"></script>
<script type="text/javascript">
function diff(A) {return A.slice(1).map((n, i) => { return n - A[i]; });};
function mean(A) {return A.reduce((sum, a) => { return 0 + sum + a },0)/(A.length||1);};
function getRelativeXY(canvas, element, x, y) {
  var p = canvas._groups[0][0].createSVGPoint();
  var ctm = element.getCTM();
  p.x = x || 0;
  p.y = y || 0;
  return p.matrixTransform(ctm);
};

function animate_SPR(canvas, num_trees) {
  d3.selectAll("#anim_svg .tree").attr("opacity", 0);
  for(var i=0; i<num_trees - 1; i++) 
  {
    var source_tree = "#anim_svg .tree.t" + i;
    var target_tree = "#anim_svg .tree.t" + (i+1);
    var dur = 2000;
    var delay = i * dur;
    d3.select(source_tree)
      .datum(function() { return d3.select(this).attr("transform")}) // store the original value
      .transition()
      .on("start", function() {d3.select(this).attr("opacity", "1");}) 
      .delay(delay)
      .duration(dur)
      .attr(
        "transform", d3.select(target_tree).attr("transform"))
      .on("end", function() {
        d3.select(this).attr("opacity", "0");
        d3.select(this).attr("transform", d3.select(this).datum()); // reset
      });
    transform_tree(canvas, source_tree, target_tree, dur, delay);
  }
};

// NB - this is buggy and doesn't correctly reset the transformations on the elements
// because it is hard to put the subtree back into the correct place in the hierarchy

function transform_tree(canvas, src_tree, target_tree, dur, delay) {
  canvas.selectAll(src_tree + " .node").each(function() {
    var n_ids = d3.select(this).attr("class").split(/\s+/g).filter(x=>x.match(/^n\d/));
    if (n_ids.length != 1) {alert("Bad node classes in SVG tree")};
    var node_id = n_ids[0].replace(/^n/, "")
    var src = src_tree + " .node.n" + node_id;
    var target = target_tree + " .node.n" + node_id;

    if (d3.select(src).nodes()[0] && d3.select(target).nodes()[0]) {
      // The same source and target edges exist, so we can simply move them
      d3.select(src).transition()
        .delay(delay)
        .duration(dur)
        .attr("transform", d3.select(target).attr("transform"))
      var selection = d3.select(src + " > .edge");
      if (!selection.empty()) { // the root may not have an edge
        selection.transition()
          .delay(delay)
          .duration(dur)
          .attr("d", d3.select(target +" > .edge").attr("d"))
      }
    } else {
      // No matching node: this could be a recombination
      // Hack: the equivalent recombination node is the next one labelled
      var target = target_tree + " .node.n" + (1+parseInt(node_id));
      // Extract the edge
      src_tree_pos = getRelativeXY(canvas, d3.select(src_tree).node());
      target_tree_pos = getRelativeXY(canvas, d3.select(target_tree).node());
      src_xy = getRelativeXY(canvas, d3.select(src).node());
      target_xy = getRelativeXY(canvas, d3.select(target).node());
      // Move the subtree out of the hierarchy and into the local tree space,
      // so that movements of the containing hierarchy do not affect position
      d3.select(src_tree).append(
        () => d3.select(src)
          .attr(
            "transform",
            "translate(" + (src_xy.x - src_tree_pos.x) + " " + (src_xy.y - src_tree_pos.y) + ")"
          )
          .remove()
          .node()
      );
      d3.select(src).transition()
        .delay(delay)
        .duration(dur)
        .attr(
          "transform",
          "translate(" + (target_xy.x-target_tree_pos.x) + " " + (target_xy.y-target_tree_pos.y) + ")")
      selection = d3.select(src + " > .edge");
      if (!selection.empty()) {
        selection.transition()
          .delay(delay)
          .duration(dur)
          .attr("d", d3.select(target +" > .edge").attr("d"))
      }
    }
  })
};

var svg_text = document.getElementById("anim_svg").innerHTML;

</script>

<button onclick='animate_SPR(d3.select("#anim_svg"), %s);'>Animate</button>
<button onclick='document.getElementById("anim_svg").innerHTML = svg_text;'>Reset</button>
"""

ts = make_full_arg_for_spr_animation()
HTML(html_string % (
    ts.draw_svg(root_svg_attributes={"id": "anim_svg"}, style=css_string),
    ts.num_trees,
))
Genome position0741138132751000001418911263451315161701412789113451315161701278912143451315161701434512781013151617

Other visualizations#

As well as visualizing a tree sequence as, well, a sequence of local trees, or by plotting statistical summaries, other visualizations are possible, some of which are outlined below.

Graph representations#

A tree sequence can be treated as a specific form of (directed) graph consisting of nodes connected by edges. Standard graph visualization software, such as graphviz can therefore be used to depict tree sequence topologies. Alternatively, the tskit_arg_visualizer project will draw a interactive tskit graph directly. Below is an example, which uses the variable_edge_width option to highlight the spans of genome inherited through different routes. Nodes can be dragged horizontally and embedded trees highlighted:

Hide code cell source
%%javascript
require.config({paths: {d3: 'https://d3js.org/d3.v7.min'}});
require(["d3"], function(d3) {window.d3 = d3;});
import msprime
import tskit_arg_visualizer
ts = msprime.sim_ancestry(4, sequence_length=1000, recombination_rate=0.001, random_seed=3)
d3arg = tskit_arg_visualizer.D3ARG.from_ts(ts=ts)
tip_order = [0, 6, 3, 1, 2, 7, 4, 5]  # Found by trial and error for this seed
d3arg.draw(width=500, height=500, variable_edge_width=True, sample_order=tip_order)

For an msprime “full ARG” tree sequence, edge_type="ortho" can be used to draw a traditional “Ancestral Recombination Graph” style plot (variable edge widths turned off for clarity):

import msprime
import tskit_arg_visualizer
tip_order = [3, 0, 1, 2, 6, 7, 4, 5]
full_arg_ts = msprime.sim_ancestry(
    4, sequence_length=1000, recombination_rate=0.001, random_seed=3, record_full_arg=True)
d3arg = tskit_arg_visualizer.D3ARG.from_ts(ts=full_arg_ts)
d3arg.draw(width=500, height=500, edge_type="ortho", sample_order=tip_order)

For more general graph plots, it can be helpful convert the tree sequence to a networkx graph first, as described in the Other graph analysis section of the ARGs as tree sequences tutorial. This provides interfaces to graph plotting software such as graphviz, which provides the dot layout engine for directed graphs:

Hide code cell source
## Networkx conversion code taken from the ARG tutorial

import networkx as nx
import pandas as pd
import tskit

def to_networkx_graph(ts, interval_lists=False):
    """
    Make an nx graph from a tree sequence. If `intervals_lists` is True, then
    each graph edge will have an ``intervals`` attribute containing a *list*
    of tskit.Intervals per parent/child combination. Otherwise each graph edge
    will correspond to a tskit edge, with a ``left`` and ``right`` attribute.
    """
    D = dict(source=ts.edges_parent, target=ts.edges_child, left=ts.edges_left, right=ts.edges_right)
    G = nx.from_pandas_edgelist(pd.DataFrame(D), edge_attr=True, create_using=nx.MultiDiGraph)
    if interval_lists:
        GG = nx.DiGraph()  # Mave a new graph with one edge that can contai
        for parent, children in G.adjacency():
            for child, edict in children.items():
                ilist = [tskit.Interval(v['left'], v['right']) for v in edict.values()]
                GG.add_edge(parent, child, intervals=ilist)
        G = GG
    nx.set_node_attributes(G, {n.id: {'flags':n.flags, 'time': n.time} for n in ts.nodes()})
    return G
import networkx as nx
from IPython.display import SVG

def graphviz_svg(networkx_graph):
    AG = nx.drawing.nx_agraph.to_agraph(networkx_graph)  # Convert to graphviz "agraph"
    nodes_at_time_0 = [k for k, v in networkx_graph.nodes(data=True) if v['time'] == 0]
    AG.add_subgraph(nodes_at_time_0, rank='same')  # put time=0 at same rank
    return AG.draw(prog="dot", format="svg")

G = to_networkx_graph(ts, interval_lists=True)  # Function from the ARG tutorial
print("Converted `ts` to a networkx graph named `G`")
print("Plotting using graphviz...")
SVG(graphviz_svg(G))
Converted `ts` to a networkx graph named `G`
Plotting using graphviz...
_images/acb153e64e86280ae05d28010e7335c69e195a4e63be9344b2bd363c5bcdaace.svg

Alternatively, you can read the graphviz positions back into networkx and use the networkx drawing functionality, which relies upon the matplotlib library. This allows modification of node colours and symbols, labels, rotations, annotations, etc., as shown below:

Hide code cell source
from matplotlib import pyplot as plt
import string

def get_graphviz_positions(networkx_graph):
    AG = nx.drawing.nx_agraph.to_agraph(networkx_graph)  # Convert to graphviz "agraph"
    nodes_at_time_0 = [k for k, v in networkx_graph.nodes(data=True) if v['time'] == 0]
    AG.add_subgraph(nodes_at_time_0, rank='same')  # put time=0 at same rank
    AG.layout(prog="dot")  # create the layout, storing positions in the "pos" attribute
    return {n: [float(x) for x in AG.get_node(n).attr["pos"].split(",")] for n in G.nodes()}

pos=get_graphviz_positions(G)
edge_labels = {
    edge[0:2]: "\n".join([f"[{int(i.left)},{int(i.right)})" for i in edge[2]["intervals"]])
    for edge in G.edges(data=True)
}

samples = set(ts.samples())
nonsamples = set(range(ts.num_nodes)) - samples

plt.figure(figsize=(10, 4))
# Sample nodes as dark green squares with white text
nx.draw(G, pos, node_color="#007700", font_size=9, node_size=250, node_shape="s", nodelist=samples, edgelist=[])
nx.draw_networkx_labels(G, pos, font_size=9, labels={u: u for u in samples}, font_color="white")
# Others as blue circles with alphabetic labels
nx.draw(G, pos, node_color="#22CCFF", font_size=9, edgecolors="black", nodelist=nonsamples, edgelist=[])
nx.draw_networkx_labels(G, pos, font_size=9, labels={u: string.ascii_lowercase[u] for u in nonsamples})

nx.draw_networkx_edges(G, pos, edge_color="lightgrey", arrows=False, width=2);
nx.draw_networkx_edge_labels(G, pos, edge_labels, font_size=7, rotate=False);
_images/bb7473460c28f085ac3a65be533d84626d3da1fe6aeca1bd8b549e2070626776.png

Note, however, that finding node and edge layout positions that avoid too much overlap can be tricky, even for the graphviz layout engine, and there is no easy functionality to place nodes at specific vertical (time) positions.

Demographic processes#

If you are generating a tree sequence via a Demes model, then you can visualize a schematic of the demography itself (rather than the resulting tree sequence) using the DemesDraw software. For example, here’s the plotting code to generate the demography plot from the “What is a tree sequence?” tutorial:

Hide code cell source
import matplotlib_inline
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')

import demes
import demesdraw

def size_max(graph):
    return max(
        max(epoch.start_size, epoch.end_size)
        for deme in graph.demes
        for epoch in deme.epochs
    )

# See https://popsim-consortium.github.io/demes-docs/ for the yml spec for the file below
graph = demes.load("data/whatis_example.yml")
w = 1.5 * size_max(graph)
positions = dict(Ancestral_population=0, A=-w, B=w)
ax = demesdraw.tubes(graph, positions=positions, seed=1)
plt.show(ax.figure)
_images/e7556c494e4f78949b36c0a3f38bd67573e7b2df85b17a7843c79730cbd10e71.svg

Geography#

Todo

How to get lat/long information out of a tree sequence and plot ancestors (or a tree) on a geographical landscape.