![]() |
![]() |
Purpose
and
Goals: The purpose of this one
credit Math 410 course is to introduce students to big
data analysis, data science and possible undergraduate
research projects in these topics at William and
Mary. The format will consist mainly of weekly
talks by faculty followed by class discussions and/or
exercises related to the presented topics. The
typical student in this course will be in his or her
sophomore or junior year and will have an interest in
pursuing a research project related to computational
mathematics. For many, this course can serve as
a gateway to establishing a research project through
the EXTREEMS-QED
program.
Date |
Title |
Speaker |
Abstract/Reading
material/video |
Week 2 (1/22) |
What
is big data? |
Junping Shi
(mathematics) |
article: Data
Driven: The New Big Science video: explaining big data (8 min) |
Week 3 (1/29) |
cancelled due to snow |
||
Week 4 (2/5) |
Data from a Stone: Aid Transparency, PDF Ghettos, and Data Mining | Albert Decatur (AidData) | AidData is one organization among many working to
increase aid transparency through open data.
To find out more about an aid project we look at
the documents that have already been produced.
So far we've worked with human coders, who we
trust. But aid documents are probably being
written far faster than our coders can read them, and
we have yet to get through an enormous backlog of
documents. We'd like to partner with skilled and
creative mathematicians and computer scientists to
mine our documents for data.
Are you ready to contribute?
|
Week 5 (2/12) |
Using a Computer Algebra System in Data Analysis | Larry Leemis
(mathematics) |
|
Week 6 (2/19) |
Adaptive Social Networks | Leah Shaw
(applied science) |
|
Week 7 (2/26) |
Decomposition of quantum gates | Chi-Kwong Li
(mathematics) |
In quantum computing, quantum operations are
applied to quantum states to process information.
Mathematically quantum states are represented by
complex vectors, and quantum operations are
represented by unitary transforms. It is important to
derive efficient scheme to implement unitary
transforms because of the hardware constraints and the
very high dimensional space under consideration. In
this talk, we will describe some recent work by me
with undergraduate and graduate students, and
future research directions in this line of study.
|
Week 8 (3/5) |
Spring Break (no class) | ||
Week 9 (3/12) |
Saving Infants in a
Heartbeat! |
John
Delos (physics) |
|
Week 9 (3/19) |
"Big Data" from RNA-Seq Experiments | Margaret Saha (biology) |
The introduction of
“next-generation” sequencing technology has allowed
biologists to obtain unprecedented amount of sequence
data in short periods of time. In particular, this
technology has revolutionized our ability to analyze
gene expression on a global level through RNA-Seq—a
method that converts the RNA in a given sample into
cDNA, which is then sequenced. RNA-Seq is
quickly becoming an essential tool for every field of
biology—from biomedicine and drug development to
evolution and ecology. A typical RNA-Seq
experiment can produce 30-100 million bases in less
than three hours. However the sheer amount of
data and the unanticipated complexity of the
transcriptome (the complete collection of transcribed
RNA) have made data analysis extremely challenging and
have necessitated the development of novel statistical
and computational tools. In this lecture we will
review the nature of RNA-Seq data and discuss the
major challenges (and opportunities!) presented by
these data sets. ![]() |
Week 10 (3/26) |
Visual and Virtual Data: Using Simulation to Manage Your Expectations | Greg
Smith (applied science) |
|
Week 11 (4/2) |
Data Mining at NASA
Langley Research Center |
Nipa
Phojanamongkolkij (NASA), Ersin
Ancel (NASA) |
In the NASA Aviation
Safety program data mining, especially text mining,
within the narrative section of accident/incident
reports from NTSB (National Transportation Safety Board)
is needed. There are many incident reports that would
take too long for individuals to read all narratives and
to find key phrases for incident precursors. |
Week 12 (4/9) |
Exploratory Methods for the Integrated Analysis
of Multi-Source Data
|
Eric Lock
(Duke University) |
Research in molecular biology and other fields often requires the analysis of datasets in which multiple related sources of data are available for a common sample set. We describe two exploratory methods for the integrated analysis of such datasets: Joint and Individual Variation Explained (JIVE) and Bayesian Consensus Clustering (BCC). JIVE gives a general decomposition of variation consisting of three terms: a low-rank approximation capturing joint variation across data sources, low-rank approximations capturing structured variation individual to each data source, and residual noise. JIVE quantifies the amount of joint variation between data sources, reduces the dimensionality of the data in an insightful way, and allows for the visual exploration of joint and individual structure. BCC is a tool to cluster a set of objects based on multi-source data. The Bayesian model permits a separate clustering of the objects for each data source that adhere loosely to an overall clustering. We illustrate the above methods with applications to publicly available data from The Cancer Genome Atlas. This is joint work with collaborators at The University of North Carolina and Duke University. |
Week 13 (4/16) |
Computing phylogenetic trees (quickly) | Anke van Zuylen (mathematics) and Jamie Bieron | I will be talking about a research project that I have been working on with two students, and one of them (Jamie Bieron) will explain his contribution (a new algorithm for the problem we are considering). The plan is that I will deliver about 40 minutes of the lecture, and he will take about 10 minutes. |
Week 14 (4/23) |
Using Big Data for Marine Species to Achieve Conservation Goals | Rom
Lipcius (marine science) |
Colloquium Talks related to
big data analysis and suitable for Undergraduate students: (normally Friday 2-2:50pm,
Jones Hall 301)
2. April 4, 2-3pm, Jones Hall 301, Mathematics Colloquium: Ana Moura, University of Aveiro, Portugal. Title: A mixed integer programming model to solve the short sea shipping distribution problem