DOI QR코드

DOI QR Code

E2GSM: Energy Effective Gear-Shifting Mechanism in Cloud Storage System

  • You, Xindong (Beijing Institute Of Graphic Communication) ;
  • Han, GuangJie (College of Internet of Things Engineering, Hohai University) ;
  • Zhu, Chuan (College of Internet of Things Engineering, Hohai University) ;
  • Dong, Chi (Alibaba Cloud Computing Co. Ltd, Alibaba Group) ;
  • Shen, Jian (School of Computer and Software, Nanjing University of Information Science and Technology)
  • Received : 2016.04.10
  • Accepted : 2016.07.27
  • Published : 2016.10.31

Abstract

Recently, Massive energy consumption in Cloud Storage System has attracted great attention both in industry and research community. However, most of the solutions utilize single method to reduce the energy consumption only in one aspect. This paper proposed an energy effective gear-shifting mechanism (E2GSM) in Cloud Storage System to save energy consumption from multi-aspects. E2GSM is established on data classification mechanism and data replication management strategy. Data is classified according to its properties and then be placed into the corresponding zones through the data classification mechanism. Data replication management strategies determine the minimum replica number through a mathematical model and make decision on replica placement. Based on the above data classification mechanism and replica management strategies, the energy effective gear-shifting mechanism (E2GSM) can automatically gear-shifting among the nodes. Mathematical analytical model certificates our proposed E2GSM is energy effective. Simulation experiments based on Gridsim show that the proposed gear-shifting mechanism is cost effective. Compared to the other energy-saved mechanism, our E2GSM can save energy consumption substantially at the slight expense of performance loss while meeting the QoS of user.

Keywords

1. Introduction

Cloud Storage System has become a trend in the future storage development. With the increasing popularity of data-intensive applications and services, enormous energy is consumed by large-scale data centers. It is reported that energy costs alone cloud account for 23%-50% of the expenses and this bill mounts up to $30 billion worldwide [1,2]. Energy consumption of storage system accounting for 25%-35% [11] of data centers. With various application requiring storage devices by the 60% annual growth rate, the energy consumed of storage system will not be ignored. Therefore, how to reduce cloud storage devices energy consumption in large data center is an urgent problem need to be solved.

However, research indicates that enormous energy has been consumed by the cloud storage system, but the utility of the servers or the disks is low to 25%~30% [17]. Therefore, how to achieve the energy consumed proportional to the utility (Energy-proportionality) is the effective method to reduce the energy consumption in Cloud Storage. In order to achieve Energy-proportionality or approximately Energy-proportionality, recent researches employ many techniques, such as data classification, data concentration, data replication, workload consolidation, and Dynamic Power Management etc[4,10~12,17~21,31~39]. These Energy-proportionality related researches achieve the objective in some degree, however only employing one fold technique lead to the limited reduced energy. We elaborately combine some of the techniques to save the energy consumption through more dimensional. An energy effective gear-shifting mechanism (E2GSM) is proposed in this paper: in which data partitioning mechanism, data replication management strategies with dynamic rotate speed management technique are designed to construct the gear-shifting mechanism. Data partitioning mechanism divides data into four categories based on its properties, and then place it into the corresponding zones. Different zones employ different disk rotate speed to save energy consumption in first stage. Data replication management strategies, which include the replication number determine model and the replicas placing strategy are designed to save energy consumption in second stage. E2GSM determine the minimum number of data replicas through a mathematical model, and then we placing the replicas in the certain nodes to assure the gear-shifting mechanism can be carried out smoothly. Base on the above classification mechanism and replica management mechanism, the energy effective gear-shifting mechanism(E2GSM) is established, in which neural network is employed to predict the load of the next period, which make it feasible to shift among the different gears to save the energy consumption in third stage. Mathematical analytical model certificates our proposed E2GSM is energy effective. Simulation experiments based on Gridsim show that the proposed gear-shifting mechanism is cost effective: averagely save 43% energy while meeting the QoS of user. And the maximum energy savings is about to 78%.

As mentioned before, compared to the current energy-saved techniques, the main contributions of this paper are as follows:

1) Saving energy consumption through multi-dimension: data classification and replica management mechanism are designed to reduce energy consumption in first and second stage. Gear-shifting mechanism is carried out to save energy consumption in the third stage.

2) Neural network is utilized to predict the load of the next period, which makes the gear-shifting mechanism is energy effective possible.

3) Energy consumption model is construced in this paper, and the effective energy of our proposed E2GSM is confirmed through the mathematical proof.

4) Simulation experiments are done on Gridsim simulator to verify the energy-effective mechanism.

 

2. Related Work

Recently, a wide variety of research has been attracted on cloud storage energy consumption field, in which data management related strategies combined with rotate speed adjust technique is the hotspot technology. In general, data classification is the primary technique to reduce the energy consumption. According to the different access rate in the storage system, E.Pinhero et al proposed PDC (Popular Data Concentration) model[3], it periodically migrate data to a few hot disk, and the data with lower access rate stored on the left of the disk. PDC can reduce energy consumption obviously, but it has a side effect on the performance of the system, since most applications request in PDC will be focus on a small part of the disk, which resulting in heavy load on the fraction of the disk and increasing I/O response time. In [7] AutoMig collected comprehensive parameters such as history file access, file size, disk utilization etc. to dynamically classify documents. Furthermore, it utilizes the LRU strategy to maintain the state of files that in flash memory device. Experiments in hierarchical storage systems show that compared with existing methods, AutoMig effectively shortening I/O response time of the front-end. RiniT. Kaushik et al proposed Green HDFS named Lightning in [8,9]. According to the characteristics of the data, the cloud storage system is divided into the Hot Zone and Cold Zone. Data that has not been accessed for a long time is stored in the Cold Zone, which can be as long as possible in off or low speed state with low energy consumption. Data with frequent access rate placed in the Hot Zone, which has high energy consumption for the high rotate speed. Simulation experiments show that division of the hot and cold zones can reduce energy consumption by 26%. Furthermore, well designed data placement or data layout is another method to reduce energy consumption. Li, Hongyan proposed the REST architecture in[13]: by slightly changing the data layout strategies, REST can safely keep lots of redundant storage node in standby mode during a relative long period time. Even in term of power failure, REST can ensure the redundant nodes safety. Data classification utilized in our E2GSM is different from the above stratiges, as it classifys the data into more types according to the characteristic of the data.

Data replication technique is widely utilized in cloud storage system to achieve traditional performance such as: data availability, scalability, reliability, load balanced and parallel access et al. [16][24~28]. In recent years, replication technique with the basis of the multi-speed disk is also widely utilized to achieve energy consumption reduction. LiuJingyu et al proposed S-RAID structure which uses SSD disks mixed with ordinary disk [6], by turning down part of the idle disk to save energy. Experiments show that the hybrid S-RAID5 disk which composed of 12 general disks and two solid-state SSD disk compared with the same level RAID5, energy consumption of hybrid S-RAID5 disk is only 28% of RAID5. C. Weddle etc. [5] built the power-aware RAID(PARAID) based on the elaborated data placement and replication strategies, which reduces energy user of commodity server class disks without specialized hardware. PARAID uses a skewed stripping pattern to adapt to the system load by varying the number of the powered disks. By spinning disks down during light loads, PARAID can reduce power consumption, while still meeting performance demands. Inspired by the PARAID mechanism, Kim and Rotem proposed FREP [15] (Fractional Replication for Energy Proportionality) mechanism. In FREP data replicated at the node granularity, thus the entire node can be shut down while the system load is light to a certain threshold. Simulation experiments on Disksim show that compared to PARAID mechanism, FREP mechanism is more energy-efficient and with shorter response time. In addition, a large number of simulation experiments show that the FREP mechanism can reduce 90% energy consumption at the cost of bare performance loss in theory. Saiqin Long, et al proposed TPES saving strategies in [14]: in which by designing replication management mechanism with the variable replica factor and by operating the cluster reconfiguring based on the best total cost, TPES shifts the node mode through workload prediction and observation. Data replication technique is also utilized to reduce the energy consumption of network components [29][30]. Data replication technique utilized in our mechanism is not only for the data availability, but also for making the energy-effective mechanism carried out possible.

The above related work shows that it is feasible and effective to reduce energy consumption while employing techniques, such as: data classification, data replication, data placement, gear-shifting with dynamic rotate speed management technique. However, all of them only employing one or two of the techniques, which reduce energy finitely. We try to combine some of the techniques to save the energy consumption through more dimensional, and to reduce the energy consumption more quantifiable. Therefore, we proposed an energy effective gear-shifting mechanism (E2GSM) in Cloud Storage System: in which data classification mechanism, data replication management strategy, dynamic rotate speed management technique and gear-shifting mechanism are integrated to save energy consumption in corresponding aspect. Mathematical analytical model certificates our proposed E2GSM is energy effective. Simulation experiments done on Gridsim show that E2GSM is cost effective: averagely save 43% energy while guaranteeing the QoS. And the maximum energy savings reach 78%.

 

3. E2GSM: Energy-Effective Gear-Shifting Mechanism

3.1 System Architecture

In E2GSM, Data Classification Strategy, Replica Number Determine Model and Replica Placement Strategy are designed to construct the energy gear-shifting mechanism, in which the neural-network and the dynamic rotate speed management techniques are employed. In this section, we will describe our energy gear-shifting mechanism in detail.

The architecture of E2GSM is shown in Fig. 1. Data requests from applications are processed by our energy gear-shifting mechanism, in which the neural-network prediction technique is utilized to classify data and distribute to the Cold Zone or Hot Zone. Based on the Data replica management strategies (include replica number decided model and the replica placement strategy) and the Rotation speed adjust technique, it is feasible to carry out our proposedE2GSM while guaranteeing the data availability. Every component of our E2GSM will be detailedly described in the following subsections.

Fig. 1.The Architecture of E2GSM

3.2 Data Classification Strategy

In order to describe the problem clearly, the following definitions are given.

Data Temperature: The term reflects the average accessed number on data. The higher average accessed number is, the higher data temperature is, and vice-versa.

Cold data: Data that the average accessed number is less than cold temperature threshold.

Hot data: Data that the average accessed number is greater than hot temperature threshold.

Seasonal data: Data that the averages accessed number is greater than hot threshold some times, less than the cold temperature threshold some times. That is, the data temperature is fluctuating. Therefore, there are seasonal hot data and seasonal cold data.

Accordingly, Cloud Storage System is divided into Cold Zone and Hot Zone. Nodes in Hot Zone are with higher energy consumption due to the higher disk rotate rate. Nodes in Clod Zone are with lower energy consumption due to the lower disk rotate rate. Data will be distributed between the two Zones according to its temperature. In order to support our energy-saving replica placement strategy deployed. Hot zone will further be divided into the source hot zone, the first backup zone, the second backup zone, the 3rd zone ... the k-th backup zone (k is less than 4 in general), where the i-th backup zone is the storage zone of the i-th replica. And Cold zone will be divided into source cold zone and backup zone (cold data contains only one replica). Furthermore, the node mode will be divided into the following status.

Sleep mode: the node will not accept any request, and with lowest power consumption.

Active mode: that node can work, accept and process requests, with normal power consumption.

Positive status: the node is processing tasks, with high power consumption.

Idle status: the node is active but no processing request, in idle status, with lower power consumption.

Data Classification Strategy of E2GSM described as follow:

1) The Data initially divided into cold data, hot data or seasonal data by green data classification strategy based on anticipation (AGDC), which is our preliminary work in [22];

2) The cold data is stored at the source cold zone, which contains only one node at the initial moment. Only when the node is full, E2GSM opening a new node as a node of the source cold zone storage data, turn 3);

3) The seasonal data which pre-divided for cold individually placed in source cold zone of cold zone turn 4);

4) The hot data putted into the source hot zone, which contains only one node at the initial moment. Only when the node full, E2GSM opening a new node as a node of the source hot zone storage data. As the number of nodes increased, the id number of nodes increased from low to high, the temperature of data gradually reduced, turn 5);

5) The seasonal data that pre-divided for hot individually placed in opened node in the source hot zone, if and only if the current node is full, a new node will be opened. The id of the node will be numbered increasingly. Hot data and seasonal hot data may exist at the interlinked node, but the id of node that stored the seasonal hot data is greater than or equal to the id of the node that stored hot data.

3.3 Data Replication Management Strategy

Well-designed Replica Management Strategy can achieve favorable traditional performance, such as: data availability, load balance, response time, etc. It also can be the basis of implementing our gear-shifting mechanism to save energy consumption. The minimum replica decided model and the replica placement strategy will be described in the following subsections.

3.3.1 The minimum replica number decided model

Our decided model of replica number only considers the hot data, which is classified above, because the cold data will only contains a replica to guarantee the availability of the data.

We assume that the number of nodes is n, the number of files is m, node Si contains mi files, Si contains a collection of files fi for fi = {fi1,fi2,…fimi}, pi is the fault tolerance of node Si, ri is the replica number of fi, P (NA) is probability of node availability, is the unavailability of node =1−P(NA). P(FA) is the probability of file availability, is the unavailability probability of file =1−P(FA).

Nodes in system are independent to each other, thus:

Then , assume that the user expectation for data availability is Aexpect, then , this formula can derive to the minimum replica number rmin.

3.3.2 Data Replicating

In this subsection, we designed the data replicating strategy. Cloud Storage System is divided into cold zone and hot zone, hot zone containing the source hot zone, the first backup zone, the second backup zone ... m-rd backup zone; cold zone contains source cold zone and backup zone. In order to describe the problem clearly, related parameters are given in the Table 1.

Table 1.Parameter Description

We assume that after the implementing our Data Classification Strategy, the node HSNj of source hot zone contained source hot data HSDj and the source seasonal hot data HSSDj. The node CSNj of source cold zone contained source cold data CSDj and the source seasonal cold data CSSDj. In general, hot data has at least rmin replicas and seasonal hot data has only rmin replicas, all data in the cold zone contains only one replica. Hot zone is further divided into m+1 zones (m>rmin), in which a source hot zone and m backup zones are contained. The number of backup zones in hot zone is m, the 1~rmin zones replicating the hot data and seasonal data, the remaining (m-rmin)zones replicating the ai% most hot data (where i is the number of the backup zone). Hot data block in i-th backup zone is{ HSD1, HSD2 ,… HSDhsn}∗(ai%), of which only the 1,2,…rmin backup zone containing source seasonal hot block{ HSSDhsn, HSSDhsn+1,… HSSDhsn+hssn}, a1% , a2%, …, armin% default value is generally 100%.

The procedure of determining the value of m and ai% determination ( by default even if the node capacity is not sufficient, hot data block has rmin replicas can be guaranteed )

1) Firstly, data blocks in source hot zone and in source cold zone fully replicated in the related backup zone, in source hot zone called the first backup zone and in source cold zone called the backup zone. Therefore, the remaining nodes is s1=n−hsn−hssn−csn−cssn, turn 2);

2) If s1>0 &&s1≥hsn+hssn, then data in source hot zone 100% fully replicated again in the backup zone that called the second backup zone. Then, the number of the remaining nodes iss2=s1−hsn−hssn.The hot data blocks are replicated completely to the rmin-th backup zone in the same way. Thus, the remaining number of nodes is srmin=s(rmin−1)−hsn−hssn, turn 3);

3) If srmin>0&&srmin≥|hsn∗armin+1%|, the armin+1% most hot data blocks of source hot zone are replicated, the remaining number of nodes iss(rmin+1)=srmin−|hsn∗armin+1%|, this zone known asrmin+1 backup zone, turn 4). Otherwise, the source hot data by ranking each replica to the new node until the remaining nodes filled, this zone known asrmin+1 backup zone, replicating ended;

4) If s(rmin+1)>0&&s(rmin+1)≥|hsn∗armin+2%|, the armin+2% most data blocks are replicated, the remaining number of nodes iss(rmin+2)=s(rmin+1)−|hsn∗armin+2%|, this zone known asrmin+2 backup zone, turn 4). Otherwise, the source hot data by ranking each replica to the new node until the remaining nodes filled, this zone known asrmin+2 backup zone, ending replicating;

5) If s(rmin+2)>&&s(rmin+2)≥|hsn∗armin+3%|, the armin+3%most hot data blocks are replicated. Therefore, the remaining number of nodes iss(rmin+3)=s(rmin+2)−|hsn∗armin+3%|, this zone known asrmin+3 backup zone, turn 4). Otherwise, the source hot data by ranking each replica to the new node until the remaining nodes filled, this zone known asrmin+3 backup zone, replicating ended;

In generally, the maximum number of replicas is set to 5.

3.4 Energy-Effective Gear-Shifting Mechanism

In this section, we will describe in detail the proposed Energy Gear-shifting Mechanism (E2GSM), which is constructed based on the above data classification strategy and data replication strategy. E2GSMemploy the neural network to predict the load of the next period time to carry out the automatic gear-shifting mechanism, in which the performance model and energy consumption model are built accordingly.

Based on the collected history data, we assume the request arrival rate of the K period before current is {v1, v2,…,vK} respectively, where the vi represents the request arrival rate of the i-th period. We define the numv represents the total number of request arrival during the beginning time ts to the current momentt, then the request arrival rate of current period is According to {v0, v1, v2,…,vK}, employ the neural network to predict the request arrival rate v of the next period. Assume the number of nodes in active status is numo, then the request arrival rate of single-node is if the node-task processing rate is cnum, employing neural network can deduce the following parameters:

The arrival time of the i-th request (RTi) is:

The beginning time of the i-th request (Pti) is:

Then the waiting time for the i-th request (Wti) is:

Then the maximum waiting time for all requests (MWT) is: MWT=max1≤i≤aWti

As shown above, we can predict the maximum waiting time among all requests in the current period isMWT. If the user set the waiting time threshold as thwt .Our gear-shifting mechanism operated as follow: when the maximum waiting timeMWT≥thwt , the gear will shift up (that is the number of nodes in active status increased), when MWT

Fig. 2 depicts the gear-shifting architecture of our E2GSM system, which can automatic gear-shifting from high to low or low to high according to the predict MWT and the thwt of the user. Highest gear means that all nodes in the system are active, which has the best performance but with the highest energy consumption rate. Lowest gear means that only nodes in the source data area are active, the nodes in the backup area are in sleep state (with green color in figure). This gear has the lowest energy consumption rate but with poor performance. Since the maximum waiting time for all tasks (MWT) is related to the system load, E2GSM automatically gear-shift between highest and lowest gear based on the current load on the system: when the system load is heavy E2GSM carry out gear shifting up, otherwise gear shifting down. In this way we reduce the energy consumption in greatest degree while guaranteeing the requirement of user. In the next section, we will evaluate the traditional performance and energy consumption of our E2GSM through mathematic analysis and simulation experiments.

Fig. 2.Gear-shifting architecture diagram of E2GSM

 

4. Evaluation through mathematic analysis

4.1 Performance Model of the Gear-shifting Mechanism

In this section, we will establish the performance model of the gear-shifting mechanism, in which fnumi represents the number of noeds has file i,Nij represents j-th node has file i, rsNijk represents the size of k-th task of Nij, rnNij represents the number of task in Nij, st represents the average location time of system disk (s), ttrepresents the system disk transmission rate (M/s).

Then the time of finishing k-th task of Nij is :

Assume the task of reading file i is allocated to Nij, the waiting time wtNij is as follow:

Then we can obtain the minimum waiting time required to read file i on the current gear is

Assume t represents the performance testing period, acr represents the request arrival rate, therefore the number of task is numr=acr∗t . Assume R={r1, r2,…, rnumr} represents the file set of the task request, which sizei represents the size (M) of file i. By the formula (3), we can obtain the minimum waiting time required to read file ri on the current gear is :

Then the average waiting time of the current gear is

The average actual service time is

Then the average response time is

The average delay is

Getting the average response time of the Cloud Storage System (AAT) and the OoS can guarantee we can employ our gear-shifting mechanism to save energy consumption whiling meeting the requirement of the user.

4.2 Energy Consumption Model of the Gear-shifting Mechanism

According to gear-shifting mechanism, we established the corresponding energy consumption model. Assume R={r1,…,rb,rp,…,rx} represents the data requests set, Rh={r1,…,rh,…,rb} represents hot data request set, and Rc={rp,…,rc,…,rx} represents cold data request set, where R=Rh∪Rc and Rh ∩ Rc=∅. Besides, F={f1,…,fu, fv,…,fm} represents data file set, Fh={f1,…,fh,…,fu} represents hot data file set, and Fc={fv,…,fc,…,fm}represents cold data file set, where F=Fh∪Fc and Fh∩Fc=∅. Assume Sh represents average size of the hot request file, Sc represents average size of the cold request file, and the current total number of disks is n.

We assume that each disk can be configured to high speed rotation mode and low speed rotation mode. Upon the mode is set, the disk cannot be dynamically scheduled to another mode during providing the service, while it can be assigned to another mode by the administrator. D={d1,…,de,df,…,dn} represents the disk set opened on current gear, Dh={d1,…,dh,…,de} represents the rapidly rotating disk set. And Dc={df,…,dc,…,dn} represents the relative slowly rotating disk set, where D=Dh∪Dc and Dh∩Dc=∅. th (Mb⁄s) represent the transfer rate of high-speed disk, ch(s) represents the average seeking time of high-speed disk, ph (Mb/J) indicates the positive energy of high-speed disk, ih (J⁄s) represents the ideal energy consumption of high-speed disk, tc(Mb/s) represents the transfer rate of low-speed disk, cc (s) indicates the average seeking time of low-speed disk, pc (J⁄Mb) represents the positive energy of low-speed disk, ic (J⁄s) represents the ideal energy consumption of low-speed disk, and is (J⁄s) indicates the energy consumption of hibernated disk. Besides, from the relevant literature, we get that the ratio of transfer speed between disks has a relation with the ratio among positive energy consumption and the ideal energy consumption ratio[1].

Assume that the current gear is opening rmin+k−X(1−rmin≤k≤3) number of nodes, where rmin+k indicates the current maximum backup number of the hot zone data X, indicates the cold zone had opened X copies of data currently. According to our data replicating strategy, we can obtain the current number of the hot nodes |Dh| is:

The current number of the cold nodes |Dc| is: |Dc|=X(csn+cssn), then the number of the disks in a sleep status is:

Assume that the energy consumption of the requester rh belongs to the request set Rh requesting access to the file fh(fh∈Fh) is

The average service time for the requester rh requesting access to the file fh is

The total time for the testing is tceshi, the total energy consumption of the hot request set Rh accessing to the file is:

The total service time provided for the hot request set Rh accessing to the file is:

In the test time, the ideal energy consumption of hot disk is:

Then the energy consumption of hot disk is:

Similarly, the energy consumption of cold disk is:

The energy consumption of the disk in sleep status is:

Then in the case of distinguishing hot and cold disk, the total energy consumption is:

Similarly, under the circumstances that the cloud storage system zones are not partitioned, the total energy consumption is:

e′total=|Rh|Shph+|Rc|Scph+ih∗(ntceshi−|Rh|(Sh/th+ch)−|Rc|(Sc/th+ch))

Because pc

e′total=|Rh|Shph+|Rc|Scph+ih

Then because is

As th>tc, that Sc/th

So |Dc|tceshi−|Rc|(Sc/tc+cc)<|Dc|tceshi−|Rc|(Sc/th+ch), as ic

As is∗|Ds|∗tceshi

|Rh|Shph+|Rc|Scph+ih∗(|Dh|tceshi−|Rh|(Sh/th+ch))+ic

So that etotal

The above mathematical derivation shows that our E2GSM can be proved energy effective in theory. In the next section, we will do simulation experiments to verify it’s energy effective at the cost of acceptable performance loss.

 

5. Simulations and Discussions

The proposed E2GSM is integrated into GridSim simulator[23], in which the corresponding energy parameters are added to nodes, and the data classification mechanism and replica management strategy are also embeded. To verify the effectiveness of the proposed mechanism, the traditional performance (mean response of requests) and energy consumption are compared among the non-integrated data partition algorithms cloud storage system (NPS), the integrated traditional classification algorithm cloud storage system (TDCS)[8], and cloud storage system integrated our Energy-Effective Gear-Shifting Mechanism(E2GSM). We assumes that the number replicas of in NPS systems is 3, the number replicas of hot data in TDCS system is 3, of cold data is 1, and the seasonal hot data in TDCS system are all stored in a hot zone. We evaluate the performance and energy impacted by the different synthetic load, different proportion of hot data and seasonal hot data, and different ratio of hot disk and cold disk to evaluate our proposed mechanism.

5.1 Parameter Description

Nodes in our simulated experiments are actualized based on the cold/hot disk array simulator. There are two kind of parameters directly affect our simulation experiments result, that is the characteristics of the disk and the workload characteristics. The disk-related parameters are shown in Table 2.

Table 2.Disk-related Parameters

Workload characteristics are affected by many of parameters, in which we identify the five major parameters as follow :

(1) The Number of Files. Due to the total number of files directly determines the load’s distribution on the disks, we set the total number of files to 5000. The number of files on each disk is determined based on the real situation.

(2) Request Arriving Distribution. The request arrival rate directly affects the trends of load in the Cloud Storage System, thereby affect the gears and lead to the different level energy consumption. This paper assumes that the request arrival distribution obeys exponential distribution, such as :exp(100), exp(50), exp(20), exp(15), exp(10), exp(9), exp(8), exp(5) and exp(3),where exp(a) indicates the average interval arrival time is a ms.

(3) Data’s temperature Distribution. Since the proportion of the hot data and cold data affects the data replication strategy, thereby affect the gear shifting and lead to the different energy consumption. In the experiment, we assume that the data’s temperature distribution obeys Zipf distribution. Different indices for Zipf distribution affect requests from different hot and cold files, while the index means A percent of all accesses were directed to B percent of files. In generally, we set the value of θ as 1.8.

(4) The Coverage of the File System. We set the coverage of the system to 100%, which means that all files in the file system access at least once in the parallel disk array system.

(5) The Ratio of Hot disks and Cold disks. Reasonably set the ratio of the number of hot and cold disks can effectively save energy. Based on the previous formula, we set the hot and cold disk ratio as 1:2, 2:2, 3:2, 4:2, 5:2, 6:2, 7:2, 8:2, 9:2, 10:2, 11:2, 12:2, 13:2 respectively, wherein E2GSM(k:l) indicates the number of the hot nodes in E2GSM is k and the number of cold nodes is 1. The default value is 6:2.

5.2 Different Synthetic Load Impact on Performance and Energy Consumption

In order to evaluate the performance and energy efficiency of ourE2GSM, we select the different synthetic load: S11, S12, S13, S14, S15, S16, S17, S18, S19, where the default proportion of seasonal hot data is 4%. (The detailed parameters are list in Table 3 and Table 4). We adopt the energy reduction percentage (while compared to NPS) and mean response time as the metrics. Simulation experiments run on the modified Gridsim (in which the properties of the nodes and the gridlets are extended, and the TDCS and our E2GSM are implemented into the related class), the received results are shown in Fig. 3 and Fig. 4. As shown in Fig. 3, the highest energy reduction percentage of TDCS is about 16% and the lowest is about 13%, while the average energy reduction percentage of our proposed E2GSM(6:2) systems is about 43%,and the minimum is about 16%and the maximum is about 70%. The effect of E2GSM is self-evident, which saves about 28% higher than the TDCS. As shown in Fig. 4, the response time of E2GSM is 0.2 milliseconds higher than the TDCS system, with a maximum about 1.7ms response time difference between the NPS system. The performance loss of E2GSM is little, while saving effect is obvious. When the load is heavy, E2GSM is 6 milliseconds higher than the NPS, which show that in the heavy load conditions, the partition of hot and cold disk may fail to make full use of the all nodes in the system, thereby reducing system performance. However, even in heavy load conditions,E2GSMcan also meet the requirement of the users and reach nearly 16% of energy reduction at the expense of increasing about 6ms response time.

Table 3.Description for the Relevant Data Used in the Experiments

Table 4.Synthetic Load Table

Fig. 3.Load impact on the energy consumption of TDCS and E2GSM

Fig. 4.Load impact on mean response time of NPS, TDCS, and E2GSM

5.3 Proportion of Hot data and Seasonal Hot Data Impact on Performance and Energy Consumption

In order to evaluate the proportion of seasonal hot data impact on performance and energy efficiency, we set proportion of seasonal hot data 2%, 4%, 6% ,8%, 10%,12%,14%,16%,18%, 20%, respectively in our simulation experiments, in which the synthetic load is S15. We also adopt the energy reduction percentage (compared to the NPS) and mean response time as the metrics. The results of the simulation experiments run in the GridSim are shown in Fig. 5 and Fig. 6.

Fig. 5.Proportion of hot data and seasonal hot data impact on energy consumption

Fig. 6.Proportion of Hot Data and Seasonal Hot Data impact on mean response time

As shown in Fig. 5, the energy reduction percentage of TDCS, E2GSM (6:2) reduced gradually with increasing proportion seasonal hot data. However, E2GSM proposed in this paper receive the more energy saved than TDCS. While compared to NPS, the energy reduction percentage of TDCS is about 15.5%, of E2GSM (6:2) is about 42%. As shown in Fig. 6, response time of E2GSM (6:2) system is about 0.2 millisecond higher than that of TDCS system. The performance difference between the two strategies is not obvious. However, the average response time of E2GSM (6:2)is about 2.3 milliseconds higher than NPS. Obviously, E2GSM (6:2) reduce the energy consumption at the cost of performance, but the simulation also show that E2GSM(6:2) can meet the requirement of user with the gear-shifting mechanism.

5.4 The Ratio of Hot Disk and Cold Disk Impact on Energy Consumption and Performance

In order to evaluate the ratio of hot disk and cold disk impact on performance and energy efficiency, we set the ratio of hot disk and cold disk as E2GSM (1:2), E2GSM (2:2), E2GSM (3:2), E2GSM (4:2), E2GSM (4:2), E2GSM (5:2), E2GSM (6:2), E2GSM (7:2), E2GSM (8:2), E2GSM (9:2), E2GSM (10:2), E2GSM (11:2), E2GSM (12:2), E2GSM (13:2), where the synthetic load is S11 and the default proportion of seasonal hot data is 4%. The simulation results are shown in Fig. 7 and Fig. 8.

Fig. 7.Ratio of Hot Disk and Cold Disk Impact on Energy Consumption In E2GSM

Fig. 8.Ratio of Hot Disk and Cold Disk Impact on Mean Response Time In E2GSM

As shown in the Fig. 7, E2GSM can reduce energy consumption whether compared to NPS or TDCS. Especially, the maximal percentage reduction of energy is reach about 60% while compared to NPS. As shown in Fig. 8, the mean response time of E2GSM is higher than NPS and TDCS, however the Qos can also be meet through the auto gear-shifting mechanism in E2GSM.

 

6. Conclusion and Future work

This paper purposed an energy effective gear-shifting mechanism (E2GSM) in Cloud Storage System, which includes data classification mechanism and data replication management strategy. E2GSM automatically carrys out gear-shifting through neural network model to predict the load of the next period. Mathematical analytical model certificates E2GSM is energy effective. Simulation experiments based on Gridsim show that the proposed gear-shifting mechanism is cost effective: substantially saving energy consumption at the slight expense of performance loss while meeting the Qos of user. One of our future work is implementing our E2GSM into the hadoop, which will be deployed in the Hangzhou Dianzi Cloud Storage System. Then evaluate the energy effective of our E2GSM in real Cloud Storage System. The other is encapsulating more detailed Qos in the request of the user, and to design more flexible gear-shifting mechanism to achieve energy cost more effective.

References

  1. Elnozahy, M.; Kistler, M. and Rajamony, R. et al. “Energy conservation policies for web servers,” in Proc. of the 4th USENIX Symposium on Internet Technologies and Systems, Berkeley, CA,USA, 26–28 March 2003.Article (CrossRef Link)
  2. Raghavendra R., Ranganathan, P. Talwar, V. Wang, Z. Zhu, X., “No "power" struggles:coordinated multi-level power management for the data center,” SIGARCH Comput. Archit. News, 36, 48–59, 2008.Article (CrossRef Link) https://doi.org/10.1145/1353534.1346289
  3. Pinheiro E, Bianchini, “Energy Conservation Techniques for Disk Array-Based Servers [C],” in Proc. of of the 18th International Conference on Supercomputing (ICS), New York, NY, USA: ACM, 68-78, 2004.Article (CrossRef Link)
  4. Andrew Krioukov, Sara Alspaugh, et al., “Design and Evaluation of an Energy Agile Computing Cluster,” Electrical Engineering and Computer Sciences University of California at Berkeley. Technical Report No UCB/EECS-2012-13. Jan 17, 2012 Article (CrossRef Link)
  5. Weddle C, Oldham M.Qian Jin, et al, “PARAID: A Gear-Shifting Power-Aware RAID [C],” in Proc. of the 5th USENIX Conference on File and Storage Technologies (FAST), Berkeley, CA, USA: USENIX, 245-260, 2007. Article (CrossRef Link)
  6. Liu Jingyu, Zheng Jun, Li Yuanzhang, Sun Zhizhuo and Wang Wenming, “Hybrid S-RAID: An Energy-Efficient Data Layout for Sequential Data Storage [J],” Journal of Computer Research and Development, 50(1), 2013. Article (CrossRef Link)
  7. Zhang Guangyan, QiuJianping, “An Approach for Migrating Data Adaptively in Hierarchical Storage Systems[J],” Journal of Computer Research and Development, 49(8), 2012. Article (CrossRef Link)
  8. Kaushik, R. T. and Bhandarkar, M., “GreenTDCS: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster,” in Proc. of the 2010 international conference on Power aware computing and systems.HotPower '10, USENIX Association, Berkeley, CA,USA, 1-9, 2010. Article (CrossRef Link)
  9. Kaushik, R. T., Cherkasova, L., Campbell, R., and Nahrstedt, K, “Lightning: self-adaptive, energy-conserving, multi-zoned, commodity green cloud storage system,” in Proc. of the 19th ACM International Symposium on High Performance Distributed Computing. HPDC '10.ACM, New York, NY, USA, 332-35. Article (CrossRef Link)
  10. Zhu, Q. and Zhou, Y., “Power-aware storage cache management. IEEE Trans,” Computer 54, 587-602, 2005.Article (CrossRef Link)
  11. Kim, J. and Rotem, D., “Energy proportionality for disk storage using replication,” in Proc. of the 14th International Conference on Extending Database Technology, EDBT/ICDT'11, ACM, New York, NY, USA, 81-92, 2011.Article (CrossRef Link)
  12. Li H., “REST: A Redundancy-Based Energy-Efficient Cloud Storage System[C],” in Proc. of the 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies, IEEE Computer Society, 537-542, 2012.Article (CrossRef Link)
  13. Saiqin Long, Yuelong Zhao and Wei Chen, “A three-phase energy-saving strategy for cloud storage systems,” Journal of Systems and Software, vol. 87, no. 1, p 38-47, January 2014. Article (CrossRef Link) https://doi.org/10.1016/j.jss.2013.08.018
  14. Long S Q, Zhao Y L, Chen W., “MORM: A Multi-objective Optimized Replication Management strategy for cloud storage cluster[J],” Journal of Systems Architecture, 2013. Article (CrossRef Link)
  15. Kim, J. and Rotem, D. “Energy proportionality for disk storage using replication,” in Proc. of the 14th International Conference on Extending Database Technology. EDBT/ICDT'11. ACM, New York, NY, USA, 81-92. Article (CrossRef Link)
  16. Qingsong, Wei BharadwajVeeravalli, Bozhao Gong, Lingfang Zeng, Dan Feng, “CDRM: A Cost-effective Dynamic Replication Management Scheme for Cloud Storage Cluster,” in Proc. of 2010 IEEE International Conference on Cluster Computing, 2010 Article (CrossRef Link)
  17. L. A. Barroso and U. H¨olzle, “The Case for Energy-Proportional Computing,” IEEE Computer,40(12), 2007. Article (CrossRef Link) https://doi.org/10.1109/MC.2007.443
  18. MaisNijim, Xiao Qin, MeikangQiu and Kenli Li, “An adaptive energy-conserving strategy forparallel disk systems,” Journal of Future Generation Computer Systesms 29, 196-20, 2013. Article (CrossRef Link) https://doi.org/10.1016/j.future.2012.05.003
  19. Hrishikesh Amur†, James Cipar et al. “Robust and Flexible Power-Proportional Storage,” in Proc. of the 1st ACM symposium on Cloud computing SoCC’10, June 10–11, Indianapolis, Indiana, USA, 2010. Article (CrossRef Link)
  20. A. Verma, R. Koller, L. Useche, and R. Rangaswami, “SRCMap: energy proportional storage using dynamic consolidation,” FAST, pages 267–280, 2010. Article (CrossRef Link)
  21. Jinoh Kim, Jerry Chou, and DoronRotem, “Energy Proportionality and Performance in Data Parallel Computing Clusters,” SSDBM, 20-22 July 2011. Article (CrossRef Link)
  22. Xindong You, Chi Dong, Li Zhou, et al. “Anticipation-based green data classification strategy in Cloud Storage System,” App.Math.Inf.Sci.6, No.1, 29-37, 2014. Article (CrossRef Link)
  23. Gridsim simulator, Article (CrossRef Link)
  24. Guangjie Han, Wenhui Que, Gangyong Jia, Lei Shu, “An Efficient Virtual Machine Consolidation Scheme for Multimedia Cloud Computing,” Sensors, Vol.16, No.2, Article 246, 2016. Article (CrossRef Link)
  25. Guangjie Han, Liangtian Wan, Lei Shu, Naixing Feng, “Two Novel DoA Estimation Approaches for Real Time Assistant Calibration System in Future Vehicle Industrial,” IEEE Systems Journal, 2015. Article (CrossRef Link)
  26. Nagamani H Shahapure, P Jayarekha, “Replication: A Technique for scalability in Cloud Computing,” International Journal of Computer Applications (0975-8887), Vol. 122, No.5, July 2015. Article (CrossRef Link)
  27. Wenhao LI, Dong Yuan, A Novel Cost-effective Dynamic Data replication Strategy for Reliability in Cloud Data Centres. Article (CrossRef Link)
  28. Tao Chen, Rami Bahsoon, Abdel-Rahman H. Tawil, “Scalable service-oriented replication with flexible consistency guarantee in the cloud,” Information Sciences 264, 349–370, 2014. Article (CrossRef Link) https://doi.org/10.1016/j.ins.2013.11.024
  29. Dzmitry Kliazovich, Pascal Bouvry, Smee Ullah Khan, “GreenCloud: a packet-level simulator of energy-aware cloud computing data centres,” Journal of supercomputing, 2011. Article (CrossRef Link)
  30. Dejene Boru. Dzmitry Kliazovich,et al. “Energy-efficient data replication in cloud computing datacenters,” Journal of Cluster computing, 2014. Article (CrossRef Link)
  31. Zhihua Xia, Xinhui Wang, Xingming Sun, and Qian Wang, "A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud Data," IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 2, pp. 340-352, 2015. Article (CrossRef Link) https://doi.org/10.1109/TPDS.2015.2401003
  32. Zhangjie Fu, Xingming Sun, Qi Liu, Lu Zhou, and Jiangang Shu, "Achieving Efficient Cloud Search Services: Multi-keyword Ranked Search over Encrypted Cloud Data Supporting Parallel Computing," IEICE Transactions on Communications, vol. E98-B, no. 1, pp.190-200, 2015. Article (CrossRef Link) https://doi.org/10.1587/transcom.E98.B.190
  33. Yongjun Ren, Jian Shen, Jin Wang, Jin Han, and Sungyoung Lee, "Mutual Verifiable Provable Data Auditing in Public Cloud Storage," Journal of Internet Technology, vol. 16, no. 2, pp. 317-323, 2015. Article (CrossRef Link) https://doi.org/10.6138/JIT.2015.16.2.20140918
  34. Tinghuai Ma, Jinjuan Zhou, Meili Tang, Yuan Tian, Abdullah Al-Dhelaan, Mznah Al-Rodhaan, and Sungyoung Lee, "Social network and tag sources based augmenting collaborative recommender system," IEICE transactions on Information and Systems, vol. E98-D, no.4, pp. 902-910, Apr. 2015. Article (CrossRef Link) https://doi.org/10.1587/transinf.2014EDP7283
  35. Dawei Sun, Guangyan Zhang, Songlin Yang, Weimin Zheng, Samee U. Khan, Keqin Li, “Re-Stream: real-time and energy-efficient resource scheduling in big data stream computing environments,” Information Sciences, 319: 92-112, 2015. Article (CrossRef Link) https://doi.org/10.1016/j.ins.2015.03.027
  36. G. Jia, G. Han, J. Jiang, J. Rodrigues, “PARS: A Scheduling of Periodically Active Rank to Optimize Power Efficiency for Main Memory,” Jounal of Network and Computer Application, Vol. 58, pp. 327-336, 2015. Article (CrossRef Link) https://doi.org/10.1016/j.jnca.2015.08.001
  37. G. Jia, G. Han, D. Zhang, L. Li, L. Shu, “An Adaptive Framework for Improving Quality of Service in Industrial Systems,” IEEE ACCESS, Vol. 3, pp. 2129-2139, 2015. Article (CrossRef Link) https://doi.org/10.1109/ACCESS.2015.2496959
  38. G. Jia, G. Han, J. Jiang, N. Sun, K. Wang, “Dynamic Resource Partitioning for Heterogeneous Multi core-based Cloud Computing in Smart Cities,” IEEE ACCESS, Vol.4, pp.108-118, 2016. Article (CrossRef Link) https://doi.org/10.1109/ACCESS.2015.2507576
  39. G. Jia, X. Li, J. Wan, L. Shi and C. Wang, “Coordinate Page Allocation and Thread Group for Improving Main Memory Power Efficiency,” in Proc. of Hotpower conjunction with SOSP’13, 2013. Article (CrossRef Link)