Frontend - Visualizing Progression Tree to Root Out Cancer

Scatter-Plot script

Representing and dealing with scatter plots

There was a typo, 'centroids' was mistakenly typed 'centeroid' at some parts, and correcting it might not be a safe option

For the documentation of the function 'each' refer to

Visual Controls Options
      Contains the values of the options that are used to control the visual behaviors of the node-link diagram
       Defined clusters, nodes, links, and grah
       Are meant to be just a sample of how to use and initialize the graph properly (refer to them to see how they are related to each other)

Reduced Elements
      Contains the information that will be used to represent each of the reduced node-link diagrams after the dimenionality reduction of the original one
      It is nessacry to have a copy for each one even if they share the same attributes, because of the changes that happen to them in the visualization stage (such as branch selection, dragging,...etc)

Reduction Functions
      Are just functions that take care of calling the reduction functions with their proper settings. To reduce the call and avoid mistakes and inconsistency, it is better to keep them this way so that changes the settings occurs only at one place

get_centroids() Function
      A function that gets the data (centers and average grade values) from the HTML file and creates the original non-reduced clusters and graph structure
jsonStr() Function
      Not nesscary but added in cases of arrays just in case a browser fails to convert an array into a JSON object directly
init_reduce() Function
      Handles the dimensionality reduction of the original graph structure and creates a new graph for each reduction type

For the details of how to properly initialize the graph to work with the node-link diagram refer to the example in the top of the script file.

reconstruct() Function
      Parameters:
       centroids: An array of the centroids to which the graph will be drawn based on
       links: An array of links that links the centroids
       clusters: An array of the cluster set, is not used in the functionality of the function but is needed for the creation of the node-link diagram
       graph: A reference to the graph that is to be reconstructed. Must be an existing graph
       SVG ID: For the nodelink() function to be given to the created SVG

Given the above parameters the function resets and recalculates the nodes and the links of edges of the graph, then creates a new node-link diagram with the specified ID.

changeDR() Function
      Parameters:
       algorithm: The name of the algorithm to which the view should be switched to
       reconstruct: A boolean variable that determines whether the graph should be reconstructed due to changes in the parameters or not

Given the above parameters the function hides every SVG element in the view, the based on the specified algorithm it will show the SVG element that is associated with it. If the value of reconstruct was true, the function will call reconstruct() with the corresponding arguments.

Normalized Array
      Shows how to properly set the array that can be passed to a scatter plot
      Also two examples on how to initialize a scatter plot are given but commented

window.resize() Event Handler
      Currently does not function, is meant to handle the changes in the sizes of the node-link diagram and scatter plots

Features Array
      Hold the data of the features that will be read from the HTML

Selected Features
      Hold the names of the selected features, both indicator and grade

The rest of the AUTOCOMPLETE section is just some precedure to handle the autocomplete and selection prodecures

Coloring Section

Custom Colors Object
An object that has the attributes named after the corresponding color scaling types. Each attribute is an array of palletes, where each pallete is an array of colors.


Format { 'scale type 1' : [ ['color 1', 'color 2', ..], [..], .., [..] ], 'scale type 2' : [....] }

`colorscale()` Constructor

Parameters:
       domain: The minimum and the maximum values that will be scaled to match a color set
       color set: An array of colors that will be used as the range of the scale

Functions:
       maps(): Maps a value that lies within the domain to and index to a color in the specified color set
       clrScale variable: variable that will server as the global color scale

       init_colorScales(): Creates HTML elements (divs, radio buttons, and SVG) that will represent the color scales and pallets. It also creates the events of the radio buttons and handles hiding all the scaling types except for the first one (in that case it is the sequential scaling)

Events Section

kmeansUpdate.submit() Event Handler

       init_colorScales(): Sends an AJAX request to the server to recalculate the K-means algorithm with a new number of clusters. Upon success, the it changes the attributes in the HTML element that holds the data of the graph, calls get_centroids() -documented previously- and init_reduce() -documented previously- to recalculate the whole thing
       switch_opts_algos(), switch_opts_red(), switch_clr_scales() Functions: Are used to show and hide certain elements based on their specified targets (called through the proper event handlers in the HTML file)
       window.load() Event Handler: Handles the opertains that should be executed once the page has been loaded
       btn_plot.click() Event Handler: Sends a request to the server to calculate a projection of data onto a branch. Upon success, the data will be displayed on a scatter plot based on the selected scatter plot
        Parameters:
         params: refer to the scatter script documentation
         axis data: refer to the example of how to intialize a scatter plot to know the format of the axis data

Note: changing the ID's of the divs in the HTML will require changing them accordingly in this function

visual-params-lable.click() Event Handler: Only handles showing and hiding the Visual Parameters div

Bugs and Errors

here is a bug in the server, it can only project the data for 'dataInit01' and 'dataInit02' indicator features, but not with the others

To make it easier for the maintainer to debug any bug that happens with selecting a branch later on, the console.log() calls were not removed

Vector3 and Vector2
Plain vector definitions, nothing fancy in them

Cluster

Represents a cluster sent from the server
The data is a reference to the array of the points which the cluster contains
The centroid is the value of the center of the cluster, should be an array of any length (N-dimensions)
The grade values is an object representing the grade values of this cluster for each grade feature

Node
Refers to a cluster, but is the one used in the graph structure

Weighted Edge

Represents an edge between two nodes (or clusters)
Is directed in structure (with respect to the naming of source and target), but the connection can be reversed anytime
Has the weight, and holds the distance between the two nodes in their original n-dimensional space

Graph Structure

Nodes and Edges
      Array of nodes and edges for the graph (refer to their documentation for more information)

Selection
      An array of the indices of the selected nodes in the graph, represent a selected branch

Threshold
      The value that will be used to filter out some edges; if their weight is less than the threshold they will not be considered edges

N-L Map
      The node-link map, used as a lookup table for the connected edges

Is some sort of a hashmap to make retrieving data faster
JavaScript does not have an implementation of a hashmap, but its objects can serve as hashmaps due to their structure and the way they operate
Is formatted as follows { node key : [ [ [the index of a child, the weight of the edge], ... ], [ [the index of a parent, the weight of the edge], ... ] ], node key : ... }

addNode() Function
Given a node, the function assigns a new index to it which is its index in the nodes array. Then it creates a new key in the nlMap which is going to be the index of that node

findChildren() Function
      Gets the list of children of a certain node from nlMap given its index. Then the children are filtered after the weights of their edges are tested against the threshold, leaving only the ones that are greater than or equal to the threshold value
findParents() Function
      The same as findChildren() but works in the other direction
findEdge() Function
      Loops through the array of edges to find an edge between that node having the index of _source_index and the one having the index of _target_index. The indices are interchangable, it does not matter in which order they are passed. Should use nlMap instead of edges array

Visited
      An array of the previously visited nodes. It was made to avoid infinite loops that happen because of revisiting the same nodes multiple times

The visited array must be reset after each call for any of the branch functions, but not inside the functions. Finding branchs is recursive, it can be defined and reset inside the functions but that will require some changes to them, having the list outside the function and reseting it independently is more convenient and safer

All Branch Functions (differences explained below)

Parameters:
source index: The index of the node from which the function should start searching
target index: The index of the node which the function should stop when reached

Given the source and the target, the function tries to find a set of nodes that link those two nodex. The functions use breadth-first search to find the nodes that make the branch. The functions were implemented first as depth-first but then BFS was found to be more approperiate, therefore, the implementation is not a good BFS but rather a DFS with one step before it.

Functions:
safeBranch()
       Puts the children and the parents in one array and branches based on that (direction does not matter)
branch()
       Gets only the list of children (goes in one direction where the target is a child or a grandchild of the source)
inverseBranch()
       Gets only the list of parents (goes in one direction where the target is a parent or a grandparent of the source)

Examples:
Find the branch between A(source) and E(target)

A -> B -> C -> D -> E : branch() finds its way as it scans the children and E is a grandchild of A and can be reached by going in one direction. inverseBranch() will fail to find a path between A and E in this case
A <- B <- C <- D <- E : branch() will fail to find a path since E cannot be reached by a series of children from A. However, inverseBranch() will work fine
A -> B -> C <- D <- E : both branch() and inverseBranch() will fail in this case since the connection between A and E does not go in one direction

safeBranch() works for all of the three cases.

Note: To make it easier for the maintainer to debug any bug that happens with selecting a branch later on, the console.log() calls were not removed

ode_default_opts Object
Holds the default options that are to be used in the nodelink() to customize the attributes of the elements

sizeScale() Function : scales the size passed to it according to its ratio from minSize to maxSize attributes
color_func Function : maps a certain attribute of a node to a color based on the value of colorScale. It is there for a default setting, but is not the one the actual one being used, it works if the colors of the nodes were not set manually
radius_func() Function : based on the size of the node (the number of elements its cluster contains), the function gives that node a propr radius to represent it. In fact it maps the area to work better with big numbers but the equation is rearranged to give the radius

clear_selection() Function
resets the selection array of a graph, and resets the elements in an SVG to their normal settings (resets the visual representation of a selected branch)

Nodelink Structure

Parameters:

graph: The graph which has the nodes and the edges to be visualized
clusters: An array of the clusters that correspond to the nodes of the graph
element ID: The array of the div where the SVG will be added
SVG ID: The ID of the SVG that will be created
opts (options): A set of options that define the settings of the nodes (has to be in the same format as node_default_opts)
force: A boolean value that determines whether the diagram will rely on the default force-directed placement of the nodes (true) or the positions will be the same as the ones given by a dimensionality reduction algorithm

Settings:

change_mode(): switches between the 'select' mode and the 'drag' mode
change_scale(): swicthes between radius scaling modes. If 'true', it applies the radius_fuc() of the given options
set_threshold(): filters the displayed edges based on the weight of that edge in graph
set_colors(): maps the value of a grade feature of each cluster into a color for the node that represents it

Nodes and Labels

Note that the labels are drawn above the nodes, so that every label is above every other node. This makes it easier for the user to determine whether a node is covering a smaller one behind it or not, if that is not needed then they should be combined into one group ('g' element) to make manipulating them easier

node_click() Event Handler

Handles selecting the nodes when the SVG has the attribute mode set to 'select'. Does not just do the selection visually, it calls the given graph to find a selection, then based on the result from the call of the function it selects the points.
It uses safeBranch() to find the branch between two points, previously it used branch() then when it fails it also checks inverseBranch() to find another way. After utilizing safeBranch() the other case should not be included, unless the graph was mean to be directed.

`node_drag()` Behavior & Its Functions

refer to D3 Drag Behavior for detailed documentation of how it works

force_start(): Event Handler

The handler is only called once, that is when D3 starts to draw the elements. At that stage D3 assigns values to the positions of the nodes (d.{x, y}) and their links (d.{source, target}.{x, y}). In the case of force-directed graph (force is set to be true), the handler does not forcefully set the value for D3, it leaves the value D3 assigned the way it is.
In the case of not wanting a force-directed graph, the handler calls force.stop(), since there is no point in recalculating the positions again. However, with the force-directed graph, it lets D3 continue its caluclations.

force_tick(): Event Handler

A normal classic tick handler, it just repositions the nodes, edges, and labels based on their location if it was changed from the previous one.

force_end(): Event Handler

Pretty much the same as the force_tick() handler, but is to be called when either D3 stops changing the position of the points, or when the force.stop() function is called.

Zoom Behavior

Refer to SVG Geometric Zooming Example

Bugs and Errors

After zooming, the nodes are both scaled and shifted based on the focal points. However, the tooltip does not go with that, therefore after scaling the position of the tooltip will not be the same as the hovered node.
Failed attempt: using d3.mouse(this) inside the event handler for hovered event did not work.
Suggested action: remove the focal point and always zoom into one certain point, in this case it is fairly to locate the new position of the node.
Sometimes after doing some work with the large scatter plots, the node-link SVG disappears either partially or entirely. The causes of the bug are unknown, and attempts to reporoduce it intentionally to find a scenario where this bug occurs failed

Suggestions

Instead of going through the hassle of trying to locate the hovered node and position the tooltip accordingly, it might be a good idea to have a small box in one of the corners, it serves as a tooltip but it doesn't move based on the node's position. The box might be draggable as well to make it more convenient to the user

A setof function that are used to test the dimensionality reduction algorithms. They are generally some utility functions and can be used for any other purpose. The function each() is widely used across the scripts.

generate_normal_point() Function:

Generates a point that will most likely form a normal distribution with the other points generated by that functions. The more the number of the calls of the function, the more normal the points are distributed

generta_conflicting() Function:

Generates consecutive points that conflict with each other in some dimensions. The function does not generate a set of points, it generates one point with each call, but its static parameters ensure that some points will have the save value in some dimensions

loop() Function:

Wraps the for loop, it is kind of obselete now but it was useful for some previous tests. Still exists in the script for future use

each() Function:

A widely used function in almost every script file. It wraps the for loop but it loops through all the elements (loop() required setting the lower bounds and upper bounds of the loop). Makes the script more dynamic, shorter, and easily modifible but lacks in performance since it introduces a function call inside the body of the loop instead of just executing the instructions directly.
It is better to replace the each() calls with normal 'for' loops for better performance.

generate_points() Function:

Given a function that generates one point and the number of points that should be generated, the function then generates a set of points (each point is generated by a call to the function func() passed as an argument)

The rest of the functions are just for displaying and outputting the values

regression_params() Function:

Calculates the slope (m) and the y-intercept (b) of the regression line equation for a set of points

regression_points() Function:

Gets the parameters of the line by calling regression_params(), then it calculates the starting and ending points of the regression line.
The first point lies at x = 0, and by substituting in the line equation y = b. The other points is at x = _xLim, while y = (m * _xLim) + b

merge_points() Function:

As each point lies on a certain edge between two nodes, each point has its location relative to those two nodes. in order to deal with them, each point must be given a location relative to the starting and ending nodes only. For more information about the parameter t that is being used to calculate the new location refer to the documentation of "Dataview data" for branch_axis() Function

scatter() Function:

Plain D3 procedures, documentation embedded within the code

QCRI - Visualizing Progression Tree to Root out Cancer

Frontend Documentation

Dataview Script

Graph Script

Node-link script

Data Reduction script

Scatter-Plot script

Dataview script

Coloring Section

`colorscale()` Constructor

Events Section

Bugs and Errors

Graph Script

Graph Structure

All Branch Functions (differences explained below)

Node-link Script

Nodelink Structure

`node_drag()` Behavior & Its Functions

Zoom Behavior

Bugs and Errors

Suggestions

Data Reduction Script

Scatter-plot Script

QCRI - Visualizing Progression Tree to Root out Cancer

Frontend Documentation

Dataview Script

Graph Script

Node-link script

Data Reduction script

Scatter-Plot script

Dataview script

Coloring Section

colorscale() Constructor

Events Section

Bugs and Errors

Graph Script

Graph Structure

All Branch Functions (differences explained below)

Node-link Script

Nodelink Structure

node_drag() Behavior & Its Functions

Zoom Behavior

Bugs and Errors

Suggestions

Data Reduction Script

Scatter-plot Script

`colorscale()` Constructor

`node_drag()` Behavior & Its Functions