CS4, CT3, TS5
Description: Introduction to statistical methods and tools for
References: analysis of very large data sets and discovery of interesting
1. S-Plus 2000 Guide to Statistics Volume 1 and II, and unexpected relationships in the data.
Mathsoft corporation.
2. Cramer, D. (2003) Advanced Quantitative Data Data preprocessing and exploration: data quality and data
Analysis. Open University Press. cleaning. Data exploration: summarizing and visualizing
3. Evans, J.R. and Olson, D.L. (2007) Statistics, Data data; principal component, multidimensional scaling. Data
Analysis, and Decision Modeling. Prentice Hall analysis and uncertainty: handling uncertainty; statistical
4. Miller, D.C. and Salkind, J. (1983) Handbook of inference; sampling.
Research Design and Social measurements. Sage Statistical approach to data mining and data mining
Publication. algorithms: Regression, Validation; classification and
5. Derr, J. (2000) Statistical Consulting: A guide to clustering: k-means, CART, decision trees; Artificial Neural
effective communication, Pacific Grove: Duxbury. Network; boosting; support vector machine; association
6. Jarman, Kristin H. (2013) Art of Data Analysis : How to rules mining. Modelling: descriptive and predictive
Answer Almost any Question Using Basic Statistics, modelling. Data organization.
John Wiley & Sons
Continuous Assessment: 40%
Techniques of statistical sampling with applications in the Medium of Instruction:
analysis of sample survey data. Topics include simple English
random sampling, stratified sampling, systematic sampling,
cluster sampling, two-stage sampling and ratio and Humanity Skill:
regression estimates. CS3, CT3, LL2
Continuous Assessment: 40% 1. Adriaans, P. and Zantige, D. (1996). Data Mining.
Final Examination: 60% Addison-Wesley.
2. Hand, D., Mannila, H. and Smyth, P. (2001). Principles
Medium of Instruction: of Data Mining. MIT Press.
English 3. Cios. K.J. et al. (2010). Data mining : a knowledge
discovery approach. New York : Springer-Verlag
Humanity Skill:
CT4, LL2
1. Scheaffer, R. L. (2006), Elementary Survey Sampling, Statistical modelling of DNA/protein sequences:
Duxbury (6 ed.). Assessing statistical significance in BLAST using the
2. Thompson, S. K. (2002), Sampling, Wiley, (2 ed.). Gumbel distribution; DNA substitution models; Poisson and
3. Lohr, Sharon L. (2010), Sampling: Design and negative binomial models for gene counts; Hidden Markov
Analysis, Cengage Learning (2 ed). Model.
4. Cochran, W. (1977), Sampling Techniques, Wiley
(3 ed.). Algorithms for sequence analysis and tree
construction: Dynamic programming for sequence
alignment and Viterbi decoding; neighbour-joining,
SIT3009 STATISTICAL PROCESS CONTROL UPGMA, parsimony and maximum likelihood tree-building
Methods and philosophy of statistical process control.
Control charts for variables and attributes. CUSUM and Analysis of high-dimensional microarray / RNA-Seq
EWMA charts. Process capability analysis. Multivariate gene expression data: Statistical tests for detecting
control charts. Acceptance sampling by attributes and differential expression, feature selection, visualization, and
variables. phenotype classification.
Assessment: Assessment:
Continuous Assessment: 40% Continuous Assessment: 40%
Final Examination : 60% Final Examination: 60%
Medium of Instruction:
English Medium of Instruction:
Humanity Skill:
CS3, CS3, TS2, LL2 Humanity Skill:
CS3, CT3, LL2
1. D. C. Montgomery, Introduction to Statistical Quality References:
Control, 6th ed., Wiley, 2009.
2. R. S. Kenett and S. Zacks, Modern Industrial Statistics: 1. Jones, N.C. & Pevzner, P.A. (2004). An Introduction to
Design and control of quality and reliability, Duxbury Bioinformatics Algorithms. Massachusetts: MIT Press.
Press, 1998. 2. Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. (1998).
3. A. J. Duncan, Quality Control and industrial Statistics, Biological Sequence Analysis: Probabilistic Models of
5th ed., Irwin, 1986. Proteins and Nucleic Acids. Cambridge: Cambridge
University Press.