|
BMe Research Grant |
|
BMe Research Grant - 2010
1st Prize
Nowadays, information technology is present
both in our work and personal life. Accumulated information results in large
and continuously increasing amount of digital data, a significant part of which is confidential
and cannot be reproduced easily, therefore their safe storage is essential.
Numerous methods and applications are available to
simultaneously produce and back up
copies, but the difficulty and high
costs of the archiving process
preclude most users to provide safety to their data.
My research aims at
studying backup service in preferably
peer-to-peer (P2P, distributed network of functionally equal peers) systems,
where users save their backup data on one of the others' unexploited storage devices over the Internet for free. As a main
consequence, no scalability problems arise (higher number of users come with
larger overall storage space); moreover, geographical as well as
ownership diversity of the storing hosts assure great safety for the backed up
data. However, the management of users who are not willing to share their local resources with other participants is extremely important for the system to remain operational. Furthermore, assuring the quality of service in a distributed system
requires careful planning.
I carried out my research work at BME Department of Telecommunications and
Media Informatics in Hungary, at Telecom Bretagne and Eurecom in France. All
three institutions conduct research on distributed computer systems and networks
using game and distributed algorithmic theory-based models. The quality of
their work is reflected in the large number of high quality international
publications on prestigious and selective forums.
Additionally to the widespread use of personal computers at firms and households, recently we are also experiencing
an explosive growth in the number of mobile devices capable of creating digital data: PDAs,
tablet PCs, smartphones etc. The volume of user-generated content
(documents, photos, audio and video files) increases at a greater rate than ever
before. It is therefore a relevant issue to find safe storage for the
large amount of information that is usually
difficult or impossible to re-generate. The importance of data duplication and
storage at different locations, i.e., data backup, stems from the provided
protection against data perishing, possibly due to unintentional deletion,
random device failure, unforeseen natural disaster or
theft. When preparing archives, it is important that the backed up files
are saved in geographically remote locations, preferably in more than one copy;
when restoring lost data the availability of archives is essential.
Two
important aspects incur when
creating safe backups: costs and
automation. Currently available options mostly improve one at the expense of the
other. The cheapest and least
automated solution is to save the data on some secondary storage device from time to
time (e.g., on external hard drive, USB flash drive, CD, DVD,
Blu-ray) and to place the device to a safe location. At the other end of
the spectrum are the online storage services (Amazon S3, Dropbox, Wuala etc.): saved data is transferred
over the Internet to profit-oriented companies'
reliable servers, data centers where they store it after replication.
Although this latter solution is
automated (e.g., files placed in a designated
folder are saved continuously), the occurring cost is much higher than with
manual, local (offline) backups.
My
research focused on the feasibility of a data backup system that would combine
existing methods' advantages, without their drawbacks, of course.
In addition to the relatively high
expenses of user-friendly online backup services, other problems arise in relation to data privacy and reliability.
Outsourcing important and often sensitive data into the hands of a single,
profit-oriented company comes with serious security risks. Encrypting the
files appropriately solves the secrecy
issues. The consequences of the data
center's vulnerability and the service
provider company's eventual bankruptcy, however, are still not
remedied.
In a P2P system, backed up files can be stored in a
distributed manner at several users at different locations (Figure 1). Every user shares resources with the
community, and in return, can save backup data on other members' storage devices
having network access. The system is designed to maintain high
availability and accessibility for the backup, and to assure the
recoverability of local
files in the event of data loss (Figure 2). Nevertheless, participants may
join to misuse the system. Exclusion of those, who do not wish to
share their own resources (disk space, Internet connection, time spent online) with the community, is
important to maintain the service for ethical users of the network [6]. Once appropriate incentives are in place, distributed architecture
outperforms the centralized one in scalability: a growing number of users does
not lead to service performance degradation. On the contrary, saved data
can be further dispersed on the larger user
base.
My research mainly focused on these incentives: on their impact,
efficiency and the related control burden. Storing data on temporarily unavailable users,
even with implemented incentives, causes temporary data loss. To remedy
this phenomenon, data must be stored in multiple copies, i.e. with redundancy. The management of data redundancy and network operations greatly affect the quality of service; the investigation of these aspects was also subject of my research.
Figure 1. The user's program, running on its computer being connected to the Internet, safely stores its data by scattering the secure copies on other participants' computers.
Figure 2. After a loss of local data, the program recovers the required data from the storing participants.
I
started my research work by creating a theoretical model of P2P storage systems,
in order to study possible incentive mechanisms (policies and rules) [1]. I suggested and
compared two approaches for encouraging resource
sharing.
In symmetric systems, the service that each participant
may receive is limited to the user's contribution. In payment-based
systems a profit-oriented agent buys and sells
network storage from and to participants, respectively. I described user "selfishness" with
a non-cooperative game theoretic [7] model and
studied the overall user benefit in both systems. I also provided necessary and
sufficient conditions for a system to socially
outperform the other.
Next, I developed the symmetric system model so that users were able to
"selfishly" choose the peers, on which they stored data [2, 3]. Selection of storing
partners is based on their
characteristics (network availability, Internet connection's upload and download
bandwidths). In case of mutual willingness, two users offer the same amount of
storage space to each other. I demonstrated that the selfish peer selection
encourages users to raise their devoted contribution to the system in terms of
online availability and dedicated bandwidth. This essentially improves
the overall quality of service offered by the system. My work also included
extension of a well-known matching theory problem [8], which allowed me to
formulate the algorithmic problem of peer
selection in the game theoretic context. I have proposed a fast (polynomial time)
algorithm that increases the user satisfaction as
much as possible via the organized peer selection. I have verified my model and the effectiveness of the algorithm through
simulation tests.
Basically, to maintain a P2P backup
system, participating users must share three types of resources: disk space,
bandwidth and time spent connected to the network. However, a system with a small number of users might not be able to guarantee the appropriate quality of service merely with the shared user resources available. Therefore, I examined the effects of
introducing a central storage server in order to avoid such (temporary) situations and presented the cost implications of persistent quality guarantees [4].
In such a
hybrid system, the central, reliable (highly available) server might be used to
store data in exchange for the reimbursement of costs. Furthermore, it can also be used to restore data from the remaining copies when backup is
lost on peers that leave the system. I have
shown that the relatively low occurring costs of such a hybrid system allow for
highly relaxing certain incentive rules.
Subsequently, I built analytic models to study the system's other, relevant operational parameters both in pure P2P and hybrid systems. Then, I evaluated various options of data redundancy, data maintenance and network data transfer schemes with simulations. To show reasonable comparison between different system designs, I introduced simple metrics to describe the quality of service, e.g., duration of archiving and data recovery process, probability of backup loss. I determined the settings that provided well-suited service quality for backup purposes and, at the same time, implied the possible smallest burden on users (i.e., minimal shared resources). Also suggested a new procedure for determining data redundancy, and evaluated its performance against currently available techniques. The redundancy-determining method is based on the time required to retrieve backup data and it guarantees high service quality levels while significantly reducing the applied redundancy and so users' storage and bandwidth requirements.
Finally, I validated the results of my research by implementing the developed algorithms in a P2P backup system application [5]. I wrote a user program with the established incentive rules and tested the created prototype in various settings. The experiments were implemented on a global research network, the PlanetLab and included hundreds of simulated users. The need for incentives has been shown by comparing performance results of experiments with and without symmetric peer selection mechanism. The careful adjustments of data redundancy also proved beneficial to other, existing methods. The experiments revealed that safe, free and user-friendly backup service is feasible in a P2P system.
My research findings include extensive investigation of distributed storage and
backup systems with
various features. The presented
combination of matching and game theory models may provide new insights
into existing problems. The defined
performance metrics and the novel data redundancy
calculation approach based on the data retrieval duration may
affect other P2P storage system designs (not necessarily only for backup
purposes). In
addition, the prototype can provide a basis for developing practical backup
applications.
The real impact of the proposed incentive mechanism
could only be measured during operation in a system with real users. Therefore, setting up a real experimental
testbed is certainly one of the directions of future work. Observable online availability pattern of users also worth further investigations:
peer selection based on the daily, weekly recurring regularity may further reduce
the resources necessary to share.
Related own publications
[1] Patrick
Maille, Laszlo Toka: Managing Peer-to-Peer Data Storage System in a Selfish
Society, IEEE JSAC Special issue on "Game Theory in Communication Systems",
Volume 26, Issue 7, September 2008, pp. 1295–1301
[2] Pietro Michiardi,
Laszlo Toka: Selfish Neighbor Selection in Peer-to-Peer Backup and Storage
Applications, Euro-Par'09, 15th International Conference on Parallel and
Distributed Computing, Delft, The Netherlands, 2009
[3] Laszlo Toka, Pietro
Michiardi: Analysis of User-Driven Peer Selection in Peer-to-Peer Backup and
Storage Systems, Springer Telecommunication Systems, Special Issue dedicated
to GameComm'08, 2010
[4] Laszlo Toka, Matteo Dell'Amico, Pietro Michiardi: Online Data
Backup: Peer-Assisted Approach, IEEE P2P'10, IEEE International Conference
on Peer-to-Peer Computing, Delft, The Netherlands, 2010
[5] Attila Csoma, Laszlo Toka, Attila Vidacs: A Peer-to-Peer Backup
System with Incentives, TSP'10, 33rd International Conference on
Telecommunications and Signal Processing, Vienna, Austria, 2010
(submitted)
Link collection
BME Department of Telecommunications and Media
Informatics: http://www.tmit.bme.hu/?bl=y
HSNLab:
http://hsnlab.tmit.bme.hu/Welcome
Télécom Bretagne:
http://www.telecom-bretagne.eu/
Eurecom: http://www.eurecom.fr/
Amazon S3:
http://aws.amazon.com/s3/
Dropbox: http://www.dropbox.com/
Wuala:
http://www.wuala.com/
Game Theory:
http://en.wikipedia.org/wiki/Game_theory
PlanetLab:
http://www.planet-lab.org/
List of references
[6] Eytan Adar,
Bernardo A. Huberman: Free Riding in Gnutella, Technical report, Xerox
Palo Alto Research Center, 2000
[7] Martin J. Osborne, Ariel Rubinstein: A
Course in Game Theory, MIT Press, 1994
[8] David Gale, Lloyd S. Shapley: College admissions and the
stability of marriage, American Mathematical Monthly, Volume 69, Issue 1,
1962, pp. 9–15