Cloud2hdd: Large-Scale Hdd Data Analysis on Cloud for Cloud Datacenters

dc.authorid Şuayb Şefik Arslan / 0000-0003-3779-0731
dc.authorid Şuayb Şefik Arslan / K-2883-2015
dc.contributor.author Zeydan, Engin
dc.contributor.author Arslan, Şefik Şuayb
dc.date.accessioned 2020-05-31T13:51:23Z
dc.date.available 2020-05-31T13:51:23Z
dc.date.issued 2020
dc.department Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
dc.description.WoSDocumentType Proceedings Paper
dc.description.WoSIndexDate 2020 en_US
dc.description.WoSPublishedMonth Şubat en_US
dc.description.WoSYOKperiod YÖK - 2019-20 en_US
dc.description.abstract The main focus of this paper is to develop a distributed large scale data analysis platform for the opensource data of Backblaze cloud datacenter which consists of operational hard disk drive (HDD) information collected over an observable period of 2272 days (over 74 months). To carefully analyze the intrinsic characteristics of the hard disk behavior, we have exploited a large bolume of data and the benefits of Hadoop ecosystem as our big data processing engine. In other words, we have utilized a special distributed scheme on cloud for cloud HDD data, which is termed as Cloud2HDD. To classify the remaining lifetime of hard disk drives based on health indicators such as in-built S.M.A.R.T (Self-Monitoring, Analysis, and Reporting Technology) features, we used some of the state-of-the-art classification algorithms and compared their accuracy, precision, and recall rates simultaneously. In addition, importance of various S.M.A.R.T. features in predicting the true remaining lifetime of HDDs are identified. For instance, our analysis results indicate that Random Forest Classifier (RFC) can yield up to 94% accuracy with the highest precision and recall at a reasonable time by classifying the remaining lifetime of drives into one of three different classes, namely critical, high and low ideal states in comparison to other classification approaches based on a specific subset of S.M.A.R.T. features. en_US
dc.description.sponsorship TÜBİTAK, MINECO en_US
dc.description.woscitationindex Conference Proceedings Citation Index - Science en_US
dc.identifier.citation Zeydan, E. & Arslan S. S. (February 01, 2020). Cloud2HDD: large-scale HDD data analysisn cloud for cloud datacenters, 23rd Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN 2020), Paris, France, IEEE, Article number: 9059482, pp. 243-249, DOI: https://doi.org/10.1109/ICIN48450.2020.9059482 en_US
dc.identifier.doi 10.1109/ICIN48450.2020.9059482
dc.identifier.endpage 249 en_US
dc.identifier.isbn 9781728151281
dc.identifier.isbn 9781728151274
dc.identifier.issn 2472-8144
dc.identifier.issn 2162-3414
dc.identifier.scopus 2-s2.0-85084061181
dc.identifier.scopusquality N/A
dc.identifier.startpage 243 en_US
dc.identifier.uri https://hdl.handle.net/20.500.11779/1325
dc.identifier.uri https://doi.org/10.1109/ICIN48450.2020.9059482
dc.identifier.wos WOS:000569984100041
dc.identifier.wosquality N/A
dc.institutionauthor Arslan, Şuayb Şefik
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.ispartof 23rd Conference on Innovation in Clouds, Internet and Networks and Workshops = ICIN 2020 en_US
dc.relation.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Lifetime en_US
dc.subject Hadoop en_US
dc.subject Cloud en_US
dc.subject Machine learning en_US
dc.subject Data center en_US
dc.subject Hdds en_US
dc.title Cloud2hdd: Large-Scale Hdd Data Analysis on Cloud for Cloud Datacenters en_US
dc.type Conference Object en_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Şefik Şuayb ARSLAN.pdf
Size:
1.13 MB
Format:
Adobe Portable Document Format
Description:
Full Text - Conference Proceeding

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: