File Structure

Server file structure

Routes

Server Routing and URLs

Data Representation

Data representation and structure

File Parser

Parses the uploaded files

K-means Algorithm

K-means API and usage

Competitive Hebbian Learning

Tree/Graph generation

Data Projection

Data Projection API and usage

Exporter

Incomplete

Export the computed data to re-use

Utilities

Functions used all over the project

This is the file structure of the main directory:

QCRI/
├── bin/
│   ├── www
│   ├── RUNS THE SERVER
├── docs/
│   ├── CONTAINS THE DOCS YOU ARE VIEWING RIGHT NOW
├── public/
│   ├── ALL FILES PUBLICLY SERVED ARE STORED HERE
│   ├── javascript/
│   ├── stylesheets/
├── routes/
│   ├── CONTAINS SERVER ROUTES AND ALGORITHMS
│   ├── index.js
│   ├── kmeans.js
│   ├── tree.js
│   ├── projection.js
│   ├── file-parser.js
│   ├── file-writer.js
│   ├── redis-util.js
│   ├── exporter.js
│   ├── matrix_utils.js
├── views/
│   ├── CONTAINS HTML PAGES THAT ARE SERVED TO THE USER
│   ├── index.ejs
│   ├── dataview.ejs
│   ├── error.ejs
├── redis/
│   ├── CONTAINS SCRIPTS THAT TAKE CARE OF REDIS
│   ├── install_redis.sh
│   ├── start_redis.sh
├── uploads/
│   ├── CONTAINS UPLOADED FILES BY THE USER
└── app.js ─ SERVER CONFIGURATION FILE
└── package.json ─ DEPENDENCIES FILE
└── run_server.js ─ SCRIPT TO RUN THE SERVER

Most of the URLs, which are called routes, are found inside routes/index.js

/
      Renders the homepage
/uploads
      Computes the results and renders the visualization box
/kUpdate
      Runs the K means update and returns the result using AJAX
/download Incomplete
      Exports all the files and returns a zip file (zips all the files together)
/projected
      Projects the data and returns it using AJAX
/getData
      Returns the data associated with the user in the browser (data stored inside redis)
/remUser
      Removes the user from redis with all of the associated data

Point

A point is represented as an array of integers/floats.

The file parser is a class called FileParser. It is used to parse the files the user uploads. The source code can be found inside routes/file-parser.js.

Class Methods


The methods below are all public methods.

FileParser.parseData(filePath)
      Parses the data file, *.dat files, and stores the result inside this.dat variable.
      Requires: the file path of the data file
      Returns: this.dat, contains the parsed data

FileParser.getSbp(filePath)
      Filters the data using the subspace file, *.sbp files, and stores the result inside this.sbp variable.
      Requires: the file path of the data file
      Returns: this.sbp, contains the filtered subspace

FileParser.getFts(filePath)
      Parses the features file, *.fts files, and stores the result inside this.fts variable.
      Requires: the file path of the data file
      Returns: this.fts, contains the features

FileParser.getTis(filePath)
      Parses the tissue file, *.tis files, and stores the result inside this.tis variable.
      Requires: the file path of the data file
      Returns: this.tis, contains the tissues

FileParser.getGrd(filePath)
      Parses the grade file, *.grd files, and stores the result inside this.grd variable.
      Requires: the file path of the data file
      Returns: this.grd, contains the tissues

FileParser.getInd(filePath)
      Parses the indicator features file, *.ind files, and stores the result inside this.ind variable.
      Requires: the file path of the data file
      Returns: this.ind, contains the tissues

FileParser.getTis(filePath)
      Parses the tissue file, *.tis files, and stores the result inside this.tis variable.
      Requires: the file path of the data file
      Returns: this.tis, contains the tissues

FileParser.parseSbp()
      Parses the subspace depending on the features file and stores the result inside this.dat variable.
      Requires: none
      Returns: this.dat, which contains the correctly parsed data

This method will overwrite the this.dat variable

There are more unsuppored methods that needs to be implemented.

Class Properties


	// data to be parsed from uploaded raw data
	this.dat = []; // data file
	this.sbp = []; // subspace file
	this.tis = []; // tissue file
	this.fts = []; // feature file
	this.grd = []; // grade file
	this.ind = []; // indicator file
	this.dimension = ''; // dimension of the uploaded data

	/*********
	 * Variables below are unsuppored
	 ********/

	// data to be parsed from pre-computed data
	// unzipped folder
	this.unZipped = '';
	// Kmeans data
	this.centers = '';
	this.dist2Clusters = '';
	this.clusters = '';
	// Tree data
	this.tree = '';
	// Projection data
	this.projection = '';

Usage


Use the above methods in the following recommended sequence


	var parser = new FileParser();

	// parse the data
	parser.parseData(filePathDat);


	var sbp = parser.getSbp(filePathSbp);
	var fts = parser.getFts(filePathFts);
	var tis = parser.getTis(filePathTis);
	var grd = parser.getGrd(filePathGrd);
	var ind = parser.getInd(filePathInd);
	var dim = parser.dimension;

	// parses the subspace
	var dat = parser.parseSbp();

	// its safe to use the data now

The K-means algorithm is a class called kmeans. The source code can be found inside routes/kmeans.js.

Class Methods


The methods below are all public methods.

kmeans(data, numK, dim, tisNames, centers)
      Runs the kmeans algorithm on the data
      Requires:
         data: array of points
         numK: number of prototypes
         dim: dimension of data
         tisNames: array containing the tissue names
         centers (optional): already computed centers instead of randomly generating prototypes (used when updating the kmeans)
      Returns: An object of the following format


output: { "centers": [c1, c2,...,ck], // coordinates of the centers of the prototypes

          "clusters": [ [pi, pi+1,...],...,[pi+4, pi+n,...] ], // 2D array containing the points closest to each cluster (which points belong to which cluster)

          "dist2Clusters": [ [{cIndex: 1, dis: distance},...,{cIndex: k, dis: distance}],...], // 2D array containing the distances between each point and all clusters

          "tisNames": [[tisX1, tisX2,...],...,[tisX45, tisX49]] // 2D array containing the tissue names of the points each cluster
        }                            			

The Competitive Hebbian Learning (CHL) is used to generate a tree/graph using the data generated from the K-means algorithm.
The CHL algorithm is a function called genTree. The source code can be found inside routes/tree.js

Main Function


genTree(distances, numK)
      Generates an undirected graph or sometimes a tree
      Requires:
         distances: 2D array containing the distances between each point and all clusters generated by the Kmeans algorithm
         numK: number of prototypes
         dim: dimension of data
      Returns: An array of edges of the following format


output: [ {w: weight, n: [src, dest]},...]
		w --> weight of the edge
		src --> source node (int representing the number of the node)
		dest --> destination node (int representing the number of the node)

The exporter is a class called Exporter. It is used to export the data computed and stored inside redis. The source code can be found inside routes/exporter.js.

This class is incomplete due to race conditions when zipping the files.
This class depends on CSVwriter that is found inside router/file-writer.js to output the files in a CSV format. The race condition occurs while writing the CSV files and trying to zip them together. As a result, the Exporter will zip empty files because while it is trying to zip the output, CSVwriter has not yet completed writing the files.
Please view the source code to better understand how it operates. I have had many takes on it but overcoming the race condition wasn't possible. I would be happy/curious to know if anyone could possibly find a solution.

Main Function


Exporter(user, writer)
      Exports all the computed data and stores them inside tmp/user/
      Requires:
         user: current username
         writer: writer to be used to export the data with (nnly writer available is CSVwriter)
      Returns: retuns nothing

Please view the source code to get a better idea of how it works

There are two utility files which include simple functions that are used all over the project.
The first file is routes/matrix_utils.js which has simple matrix utils functions that are used when projecting the data.
The second file is routes/redis-util.js which has redis utils functions, they could be used for reference.