|
|
Data
Integration Systems provide users with uniform data access and
efficient information sharing. The ability to share information
is particularly important for interdisciplinary research, where
a comprehensive picture of the subject requires large amounts of
historical data from disparate data sources from a
variety of disciplines. For example, epidemiological data
analysis often relies upon knowledge of population dynamics,
climate change, migration of biological species, drug
development, etc. As another example, consider the task of
exploring long-term and short-term social changes which requires
consolidation of a comprehensive set of data on
social-scientific, health, and environmental dynamics. In
this project, we address the challenges in developing a
global integrated repository of historical data to support a
wide range of interdisciplinary research. The major tasks of the
project are:
Task 1: Scalable architectures
for historical data integration
Task 2: Reliable fusion of
historical data
Task 1: Scalable architectures for historical data integration
Nowadays, there are numerous historical data sources available
from various groups worldwide. Such data sources, however,
cannot be easily consolidated, metadata-indexed, and maintained
by small groups of developers. The solution that we propose is
to engage a large community of researches to share their data,
collectively resolve the data heterogeneities, and harmonize
their efforts in data reliability assessment and data fusion.
We propose an approach, based on the collective intelligence of
research communities, which supports efficient “crowdsourcing”
of the large-scale historical data integration task. This
research is undertaken in conjunction with the
World-Historical Dataverse project (www.dataverse.pitt.edu)
of the World History Center at the University of Pittsburgh and
the Collaborative for
Historical Information and Analysis (CHIA) initiative
(http://chia.pitt.edu)
. CHIA currently involves nine different research groups
throughout the U.S. and Europe; it aims to create a major
repository of consolidated global historical data from the past
several centuries. In particular, my group is developing an
advanced
Col*Fusion
infrastructure for systematic accumulation, integration and
utilization of historical data.
Task 2: Reliable fusion of historical data
Historical data sources may have different levels of reliability
for many reasons, e.g., issues with the primary sources of
information, faulty data collection methodology, etc.
Integration of the historical data sources may also face severe
data conflicts. It is common to have multiple reports
about the same event within overlapping time intervals.
We may also have multiple reports on historical statistics for
overlapping locations. Another challenge is
overlapping names: evolving concepts may be reported under
different names and categories co-existing at different time
intervals. Note that historical data conflicts do not necessary
imply data inconsistency. If the overlapping historical reports
are accurate, the conflicts reflect data redundancy which
prevents researchers from obtaining reliable aggregate query
results. Meanwhile, data inconsistency is caused by inaccurate
reports. In many cases, such inconsistency can be discovered
through analysis of relationships between existing reports in
the integrated database. In this task we devise a systematic and
efficient approach to address the problem of large-scale
historical data fusion to ensure data reliability. We develop
integrated data reliability analysis methods to explore data
conflicts and data inconsistencies so as to provide automatic
information reliability assessment.
PhD Students:
Ying-Feng Hsu
Evgeny Karataev
Julian Lee
Fatimah Radwan
Selected
References:
-
van
Panhuis,W.G., Grefenstette, J., Jung, S.Y., Chok, N.S.
Cross, A., Eng, H., Lee, B., Zadorozhny, V., Brown, S.,
Cummings, D., Burke, D. Surveillance and control of
contagious diseases in the United States from 1888 to the
present.
To appear in The New England
Journal of Medicine, 2013
-
Zadorozhny,
V., Manning, P., Bain, D., Mostern, R. Collaborative for
Historical Information and Analysis: Vision and Work Plan.
Journal of World-Historical Information, v. 1, N.1,
2013.
-
Zadorozhny, V, Hsu, Y.-F., Conflict-Aware Fusion of
Historical Data. Proc. of 5th International Conference on
Scalable Uncertainty Management (SUM'11), 2011.
-
Pelechrinis,
K., Zadorozhny, V., Oleshchuk, V., Collaborative Assesment
of Information Provider's Reliability and Expertise using
Subjective Logic. Proc. of the 7th International
Conference on Collaborative Computing (CollaborateCom'11),
2011.
Complete
list of publications
|
|