Coarse geometry and cohomology of large data sets
Find Similar History 40 Claim Ownership Request Data Change Add FavouriteTitle
CoPED ID
Status
Value
Start Date
End Date
Description
The digital economy is founded on data. The current trend to develop intelligent, customer-led, interactive, real-time systems requires the ability to handle and interpret vast amounts of data efficiently, quickly, and with a degree of accuracy corresponding to the requirements. This point is underlined very well by two important recent developments. The Smarter Planet initiative, supported by the IBM, envisages 'instrumented, interconnected data systems' where main elements of the physical environment are equipped with sensors constantly exchanging information. Secondly, the new transparency drive of the UK government will make huge data sets available to the public, creating 'an opportunity to build innovative applications which will bring significant economic benefit'. The need for synthetic geometric methods in data analysis arises because of the large size and of high dimensionality of the sets involved. This proposal will extend recent important theoretic results to create a set of geometric and topological tools for data analysis, placing special emphasis on flexibility, efficiency, and on close alignment with potential practical applications. This is an ideal and a very exciting time to launch a project of this nature, and its results are very likely to have direct and important consequences from the point of view of initiatives mentioned above and many other possible applications. A central theme of the proposal is the study of geometric properties of large data sets at various scales, which corresponds to varying degree of 'sharpness' with which a data set is viewed. For example, in searching large numbers of digital photographs for those that contain pictures of of people one requires a different resolution than when trying to identify a specific person. This proposal offers a very exciting opportunity for developing pure mathematical methods to the point where they can be directly applied to important, difficult and timely practical problems. The proposed work is adventurous, interdisciplinary, and brings together pure and applied mathematicians, experts in OR, computer science, statistics, and energy systems. Potential for long-term practical applications will be tested in two specific areas of applications within the context of the wider Smarter Plane initiative. A main objective of the project is to develop geometric and cohomological tools of scale-dependent coarse geometry with special emphasis on applications to finite metric spaces and more specifically, to data sets. We will place strong emphasis on methods that can be developed into efficient tools for data analysis, and the research will be informed by specific problems arising from applications which range from the theoretical to the more practical. We will test the theoretical ideas and results two important cases: one, data sets arising from the UK Government's Open data initiative. Secondly, within the context of smart grids, we will consider data generated by large number of sensors monitoring various aspects of the performance of a power grid with the objective to provide an accurate matching between supply and demand.
More Information
Potential Impact:
This interdisciplinary proposal will develop new mathematical tools to study large data sets and will investigate algorithmic implementation of the the most promising results, which will be tested on specific practical problems of direct economic importance. This suggests a wide spectrum of possible ways this work can have impact. The theoretical problems studied in the proposal arise from intensive recent work in pure mathematics, where important developments have opened up possibilities for developing direct applications of these new results and methods. Our main applications will be to the analysis of data sets, and by making this the main focus of our work, we are aiming at the very centre of digital economy. No matter what kind of digital economic activity one considers, there is a database behind it. Our methods will be flexible, scaleable, portable, and applicable in a great variety of economic contexts. We will test their strength on specific problems arising in real world systems, to ensure the best possible fit between theoretical results and practical requirements. Specifically, we will directly contribute to the analysis of data emerging through the UK Government's Transparency Inititative, where on the one hand our work can create new possibilities for exploiting the data, and on the other it may have an influence on Government Policy of data acquisition and processing. Through Professor Nigel Shadbolt (CI) who is a member of Public Sector Transparency Board working with the UK Government on its Open Data policy we will have an opportunity to influence policy in this area. Another specific project will concern citation data emerging from the use of the arXiv preprint data base, which is of crucial importance to the scientific community. We will also study specific data problems arising from designing intelligent instrumentation to support the Smart Grids idea for efficient management of energy supply and demand. All significant results achieved during this investigation will be published in appropriate peer-reviewed scientific journals and presented at conferences as well as seminars. We will ensure that potential beneficiaries have the opportunity to engage with this research by disseminating findings via CORMSIS, the Centre for Operational Research, Management Science, and Information Systems at the University of Southampton, and via SIMM, the Southampton Initiative in Mathematical Modelling. CORMSIS and SIMM have direct contacts with more than 90 different organisations in business, industry, and government, among them IBM, BA, bmi, Ford, Boeing, Qualcomm, AA, BT, Logical Transport, Philips, Dstl, Southern Water, Hampshire Container Terminals, American Express, Barclays, JP Morgan, Tesco, NATS, Unilever, Rolls Royce, several NHS thrusts, various City Councils, HM Revenue and Customs, the Met Office, MoD, ONS, etc. In a similar way, we will exploit connections offered by the Durham Energy Institute. A very important pathway to impact is through the Web Science initiative at Southampton, which is led by Professor Dame Wendy Hall, Professor Sir Tim Berners-Lee and Professor Nigel Shadbolt (a CI on the project), and through the University Strategic Research group in Digital Economy. This connection will join the project with a wide variety of potential users in academia, business, politics, and we will explore these possibilities vigorously.
University of Southampton | LEAD_ORG |
Queen Mary, University of London | COLLAB_ORG |
Jacek Brodzki | PI_PER |
Chris Dent | COI_PER |
Joerg Fliege | COI_PER |
Nigel Shadbolt | COI_PER |
Leslie Carr | COI_PER |
Janusz Bialek | COI_PER |
Benjamin MacArthur | COI_PER |
Jonathan Forster | COI_PER |
Subjects by relevance
- Data
- Data mining
- Machine learning
- Mathematics
Extracted key phrases
- Dependent coarse geometry
- Large datum set
- Specific datum problem
- Open datum initiative
- Interconnected datum system
- ArXiv preprint datum base
- Huge data set available
- Datum acquisition
- Citation datum
- Potential practical application
- Recent important theoretic result
- Specific practical problem
- Term practical application
- Main application
- Direct application