On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers

dc.authorid Şuayb Şefik Arslan / 0000-0003-3779-0731
dc.authorid Şuayb Şefik Arslan / K-2883-2015
dc.contributor.author Arslan, Şuayb Şefik
dc.contributor.author Zeydan, Engin
dc.date.accessioned 2021-07-09T08:48:36Z
dc.date.available 2021-07-09T08:48:36Z
dc.date.issued 2021
dc.department Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
dc.description.WoSDocumentType Article
dc.description.WoSIndexDate 2021 en_US
dc.description.WoSInternationalCollaboration Uluslararası işbirliği ile yapılan - EVET en_US
dc.description.WoSPublishedMonth June en_US
dc.description.WoSYOKperiod YÖK - 2020-21 en_US
dc.description.abstract It has become commonplace to observe frequent multiple disk failures in big data centers in which thousands of drives operate simultaneously. Disks are typically protected by replication or erasure coding to guarantee a predetermined reliability. However, in order to optimize data protection, real life disk failure trends need to be modeled appropriately. The classical approach to modeling is to estimate the probability density function of failures using nonparametric estimation techniques such as kernel density estimation (KDE). However, these techniques are suboptimal in the absence of the true underlying density function. Moreover, insufficient data may lead to overfitting. In this article, we propose to use a set of transformations to the collected failure data for almost perfect regression in the transform domain. Then, by inverse transformation, we analytically estimated the failure density through the efficient computation of moment generating functions, and hence, the density functions. Moreover, we developed a visualization platform to extract useful statistical information such as model-based mean time to failure. Our results indicate that for other heavy-tailed data, the complex Gaussian hypergeometric distribution and classical KDE approach can perform best if the overfitting problem can be avoided and the complexity burden is overtaken. On the other hand, we show that the failure distribution exhibits less complex Argus-like distribution after performing the Box–Cox transformation up to appropriate scaling and shifting operations. en_US
dc.description.sponsorship Turkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) 115C111 - 119E235 / Spanish MINEC TEC2017-88373-R / Generalitat de Catalunya 2017SGR1195 en_US
dc.description.woscitationindex Science Citation Index Expanded en_US
dc.identifier.citation Arslan, S. S., & Zeydan, E. (2021). On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers. IEEE Transactions on Reliability, 70(2), 507–524. https://doi.org/10.1109/tr.2020.3007127 en_US
dc.identifier.doi 10.1109/TR.2020.3007127
dc.identifier.issn 1558-1721
dc.identifier.issn 0018-9529
dc.identifier.issue 2 en_US
dc.identifier.scopus 2-s2.0-85110818271
dc.identifier.scopusquality Q1
dc.identifier.startpage 507 - 524 en_US
dc.identifier.uri https://hdl.handle.net/20.500.11779/1512
dc.identifier.uri https://doi.org/10.1109/TR.2020.3007127
dc.identifier.volume 70 en_US
dc.identifier.wos WOS:000659549200008
dc.identifier.wosquality Q1
dc.institutionauthor Arslan, Şuayb Şefik
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.journal IEEE Transactions on Reliability en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Estimation en_US
dc.subject Kernel density estimation (kde) en_US
dc.subject Kernel en_US
dc.subject Reliability en_US
dc.subject Probability density function en_US
dc.subject Measurement en_US
dc.subject Modeling en_US
dc.subject Predictive models en_US
dc.subject Hard-disk systems en_US
dc.subject Data analytics en_US
dc.subject Data models en_US
dc.subject Data storage en_US
dc.title On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers en_US
dc.type Article en_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
On the Distribution Modeling.pdf
Size:
1.39 MB
Format:
Adobe Portable Document Format
Description:
Full Text - Article

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: