123Line blog explosion

Lets find out whats going on in this world

So many people are surprised when their restore is slower than their backup. You shouldn’t be, as it’s quite common. The good news is there are things you can do to make it faster – if you know them in advance. W. Curtis Preston (Mr. Backup) and Prasanna Malaiyandi tackle the seven reasons why your restore may be slower than your backup. Topics covered include RAID penalties, tape issues, database concerns, and others. You’ll walk away knowing what to do in order to find out how slow your restores are – and how to fix them. This podcast is packed with good info! (And the death of a USB hub.)

Video

 

Transcript

 

[00:00:00] Prasanna Malaiyandi: Are we going to have a funeral for your USB hub?

[00:00:04] W. Curtis Preston: I don’t think we’ll have a funeral.

[00:00:06] Prasanna Malaiyandi: Here lies a USB hub. It served me well.

[00:00:32] W. Curtis Preston: Hi, and welcome to Backup Central’s Restore it All podcast. I’m your host, W. Curtis Preston. AKA Mr. Backup and I have with me, my computer peripheral consultant, Prasanna Malaiyandi. How’s it going?

[00:00:43] Prasanna Malaiyandi: Good. I do not specialize in recommending mice or keyboards or other accessories

[00:00:50] W. Curtis Preston: Uh, but you know, as usual you came in very handy during my peripheral crisis, the Preston, the Preston peripheral put, I need something with a P Preston peripheral. Come on, give me a word, a problem.

[00:01:13] Prasanna Malaiyandi: It’s going to be one of those.

[00:01:15] W. Curtis Preston: the Preston peripheral problem of 2022, uh, I got to buy two new USB hubs, man, and no turning it off and on again. Didn’t fix it.

[00:01:26] Prasanna Malaiyandi: I’m telling

[00:01:26] W. Curtis Preston: out thing.

[00:01:27] Prasanna Malaiyandi: surge protectors.

[00:01:29] W. Curtis Preston: Yeah. I don’t know why I don’t have a surge protector. You think, you know, as a computer guy, I would know.

[00:01:34] Prasanna Malaiyandi: Well, and it’s funny that you bring this up because literally two days ago I just went and replaced most of the surge protectors in my house. Did you know, surge protectors only have a finite amount of life. And then after that they do nothing.

[00:01:48] W. Curtis Preston: no,

[00:01:49] Prasanna Malaiyandi: Yeah,

[00:01:50] W. Curtis Preston: you know what I do on all my outlets that’s not a surge protector, what the thing is, this is what I’m curious. I got to look into and see if it is a surge protector. Although if it did it, didn’t do its job. Um, I have the.

[00:02:03] Prasanna Malaiyandi: whole house.

[00:02:05] W. Curtis Preston: No, no. The, the, the power monitoring thingy, um, I have one on every outlet that matters, uh, so that I can figure out, you know, where all the power’s going to my house.

[00:02:22] Prasanna Malaiyandi: Gotcha. But those don’t do surge protection.

[00:02:25] W. Curtis Preston: You don’t think so.

[00:02:25] Prasanna Malaiyandi: Nope. And

[00:02:27] W. Curtis Preston: You’d for all that money that, you know, cause it’s one outlet.

[00:02:30] Prasanna Malaiyandi: yeah. Well, so here’s the thing with surge protectors. Uh, there’s also a notion, so I did a lot of research because that’s just the type of

[00:02:38] W. Curtis Preston: Of course you did. How many YouTube videos did you watch?

[00:02:41] Prasanna Malaiyandi: None. I just read a lot of articles, but, but there is something right. Some surge protectors have what they call let in voltage or a pass through voltage, which is how much it actually allows in before it like clamps down on the surge, because that’s what a surge protector is supposed to do.

Right. You get a spike and it’s supposed to clamp down to prevent it. And so some of them have. Normally you want it to be like 400 volts or less, which is still a lot of voltage which could fry your device, but it’s much better than letting it all pass through. And so the lower, the number, the better it is.

And the challenge is a lot of surge protectors after their life has gone, they don’t automatically shut down. So they’re just kind of letting everything pass through and they’re not protecting you at all,

but there are some brands. Yeah. There are some brands. So. The protection is gone. It actually shuts off the outlet.

[00:03:35] W. Curtis Preston: Huh,

[00:03:36] Prasanna Malaiyandi: So, you know, you have to replace it because how many people go and look at the green little protected light that’s on their surge protector, right? That’s hidden in a corner behind, like underneath your desk. Like no one ever does that. And so.

[00:03:49] W. Curtis Preston: you know, this falls under the, I could go and spend a whole lot of money every few years, and I don’t feel like I should have to. Like, it it’s bad enough that I got to buy it in the first place, but then if I got to I, and so I didn’t know this, I didn’t know, to even look at the little green light. I didn’t know that was a thing.

Am I losing my, my tech cred?

[00:04:13] Prasanna Malaiyandi: Some of them don’t but it’s one of the things you should just take a look and yeah, they say two to five years. It also depends on like the power in your area and how clean it is. If you get a lot of spikes, things like that, or you can live in like an area with lots of lightning. I think California, it’s pretty good for the most part.

[00:04:32] W. Curtis Preston: We would need to have some rain,

[00:04:35] Prasanna Malaiyandi: Yeah. So not as big of a concern, but it’s just one of those things yet periodically you might want to change or if you have like a ups. Yeah. Or if you have a

[00:04:45] W. Curtis Preston: on the other hand,

[00:04:46] Prasanna Malaiyandi: yeah. Florida on the other hand, lots of lightning and everything else, but yeah. Or if you have a ups, typically that already has surge protection built in as well.

[00:04:54] W. Curtis Preston: Right, right.

[00:04:56] Prasanna Malaiyandi: So things to think about.

[00:04:57] W. Curtis Preston: Once Again, see, this is why you’re my computer peripheral consultant. Doesn’t that, that counts as a peripheral. Doesn’t it. But a surge protector

[00:05:07] Prasanna Malaiyandi: I think so.

[00:05:08] W. Curtis Preston: into the computer.

[00:05:09] Prasanna Malaiyandi: Yeah.

[00:05:10] W. Curtis Preston: It’s a accessory. It’s it’s on the periphery.

[00:05:14] Prasanna Malaiyandi: Yeah. Curtis, Curtis.

[00:05:19] W. Curtis Preston: Uh, it’s been an interesting week. We did have a minor and I mean really minor, just random surge a while back. And, um, and my, and my USB hub just stopped delivering data. Like it

[00:05:35] Prasanna Malaiyandi: But that’s weird that it only stops delivering data.

[00:05:39] W. Curtis Preston: yeah. Yeah. And it was, it was immediately that moment because I was actually using my camera, which is a USB device.

And it just, you know, the second it happened, it was like, Oop, no

[00:05:50] Prasanna Malaiyandi: but everything else works. Right. All the other devices plugged in. So you should probably be glad that that $30 hub took the hit rather than you having to go replace like five devices.

[00:06:02] W. Curtis Preston: Yeah. That’s good. Yeah. It’s just like, I’m trying to figure out, well, I can’t, I can’t figure out exactly what happened. You know what I mean? From an electrical electrical perspective,

[00:06:12] Prasanna Malaiyandi: No.

[00:06:13] W. Curtis Preston: it just, it fried the brains, but it didn’t fry. You know what it is? Is it fried the chip?

[00:06:19] Prasanna Malaiyandi: Yeah, but not the power circuit.

[00:06:21] W. Curtis Preston: power circuit. Yeah.

The power circuit is probably pretty, pretty basic. And then, yeah, good times. May you live in interesting times?

[00:06:32] Prasanna Malaiyandi: Are we going to have a funeral for your uSB hub

[00:06:37] W. Curtis Preston: I don’t think we’ll have a funeral.

[00:06:39] Prasanna Malaiyandi: Here lies a USB hub. It served me well.

[00:06:45] W. Curtis Preston: It’s served me. Well, Curtis should have brought a surge protector.

I want to move on.

[00:06:52] Prasanna Malaiyandi: What do you want to talk about?

[00:06:54] W. Curtis Preston: I want to talk about slow restores.

This is seven ways to have a slow restore. Should be really popular as a podcast, seven ways to have a slow restore. And, uh, it’s based on this article that I found on network world, this guy seems to really knows what he’s talking about. What do you think?

[00:07:17] Prasanna Malaiyandi: Who was it? The

[00:07:19] W. Curtis Preston: Uh, it’s a W. Curtis Preston, that guy.

[00:07:26] Prasanna Malaiyandi: Is he a relative of yours?

[00:07:28] W. Curtis Preston: He is related He is related. Uh, I see him on a pretty regular basis. Uh, although sometimes when I’m looking at him, uh, I

[00:07:36] Prasanna Malaiyandi: He’s dashing.

[00:07:38] W. Curtis Preston: Oh, he’s, he’s gorgeous. Uh, and um, I like the picture that they have on the article. Just some random dude looking into some sort of computer innards, like he’s going to figure out anything, but. You know what I mean? Like you look at that picture. What’s that guy going to figure out.

[00:07:58] Prasanna Malaiyandi: everything in the world,

[00:08:00] W. Curtis Preston: Yeah. So I I’d like to start this podcast with a story.

[00:08:06] Prasanna Malaiyandi: not our disclaimer.

[00:08:08] W. Curtis Preston: oh yeah, sure. We’ll do the disclaimer Prasanna and I work for different companies. He works for zoom. I worked for Druva this is not a, uh, this is not a podcast of either company. The opinions that you hear are all Prasanna’s and, uh, be sure to rate our podcast ratethispodcast.com/restore.

And if you care about this topic and any of the related topics, security, you know, cybersecurity, ransomware, backup, recovery, disaster recovery, uh, you know, I don’t know, did I forget a category? So it’s barbecue and what’s that all privacy. Yeah, absolutely. If you’re, you know, anything that we can, that’s in the periphery, it’s a big word today.

that word’s going to pop up at least one

[00:08:56] Prasanna Malaiyandi: the word of the day.

[00:08:58] W. Curtis Preston: Uh, then just reach out to me @wcpreston on Twitter, you can DM me or, uh, wcurtispreston@gmail. And, uh, so yeah, so I wanna, I want to tell a little story and I’m I’m, if you’re a, if you’re a longtime listener of the podcast, you may have heard this story before, but you know, based on the listenership, I don’t think anybody out there has listened to all the podcasts except for maybe Daniel Rose Hill.

Hi Daniel. Um, he’s our backup anorak.

[00:09:26] Prasanna Malaiyandi: Hi Daniel, hope the M-disc is working out.

[00:09:29] W. Curtis Preston: He’s he’s been a guest on the podcast and, uh, you know, big fan of the podcast anyway. So back in the day when I got my first. Uh, commercial backup and recovery tool. And I, and I can actually say what it was because this it’s a, it’s a company that’s gone by the wayside.

A company’s name was software moguls. They were headquartered in, um, uh, Minnesota. They were a suburb of Minneapolis and the, the name of the product was SM-arch SM-arch

arc

[00:10:06] Prasanna Malaiyandi: I think you

brought this up a couple

times.

[00:10:07] W. Curtis Preston: Yeah. Yeah. W which is funny because you know, it should be a SM dash back, but that’s a whole other thing. That’s a different podcast.

Um, cause archive is not backup, right? Yeah. Okay. And in case that wasn’t obvious to everybody, and if you don’t understand the difference then look at our podcasts, we definitely have talked about that topic.

[00:10:28] Prasanna Malaiyandi: Or purchase curtis’s book.

[00:10:31] W. Curtis Preston: Well, actually, you know what? You don’t even have to purchase it. Now you can get a, uh, you know, if you, if you do it in time, uh, for a limited time only, you can get a free ebook version of my book by going to druva.com/podcast. And you can get a free copy. So the, um, and there is a whole chapter that basically says archive is not backup. And wow. I really got off the topic. All right. So they had a feature. This is way before deduplication . This is way before multiplexing, but they had a feature that was inline compression. And this was again, before all tape drives had compression. And so they were going to really make my tapes like so much bigger.

Right. And. So I turned on this feature and, you know, I had been running this new commercial backup product for a couple of months, but being the paranoid backup person that I was, I still had the old system running in parallel.

[00:11:40] Prasanna Malaiyandi: just in case.

[00:11:42] W. Curtis Preston: Yeah. And then we had our first major restore. And I remember, um, I remember exactly where I was.

I remember exactly, you know, where the server was. And I remember that I had to hop in my car and drive down. Um, if there’s any listeners in Delaware, I was in, I was on Christiana road. Uh, Newark Delaware. And I drove down the street and I, I remember going in there and what I did is I put the old backup tapes in my back pocket, but I brought the new fancy backup tapes in my front pocket.

And I put the tape in the drive and I went to go, um, I kicked off the first restore and me being who I was, I created a while loop that, you know, you know, while true; do df -k /directory; sleep 60; done. Right. And I’m watching this thing and I’m watching it. I’m watching, I’m watching it.

It’s not changing. The restore is just running and like, after What I felt was a really long period of time. It finally changed to 1%. And I’m like, based on, based on the current rate, this restore is going to

take yeah, it was going to take forever. And by the way, it was probably two gigabytes.

This, you know what I

mean? Yeah. Back in the day it was, it was, it was probably less than two gigabytes. Cause I remember our biggest server was Zeus and Zeus was, uh, six gigabytes. Um, and that was, that was the entire server. So this is one file system. So it could have been, you know, it could have been one gigabyte. And so I was like, what is going on?

And I went over to the tape drive and I’m looking at the tape drive and I’m looking at the little Blinky light that indicates that that data is being read or written. And I see blink, blink.

[00:13:40] Prasanna Malaiyandi: pause.

[00:13:42] W. Curtis Preston: Long pause, long pause, blink, blink. And this went on. I’d made it like a 911 call to software moguls. I’m like, Hey man, whiskey tango foxtrot (WTF)? I’m restoring this primary server. This is the first time I am using my new fancy backup system that we paid all this money for. Uh, I remember that it was $16,000. I remember that, that, you know, the

[00:14:15] Prasanna Malaiyandi: That was.

[00:14:16] W. Curtis Preston: caboodle,

it was a lot of money.

It was a lot of servers, but it was a lot of money. Right. And, um, this was 19, 19 93, right. 93 94. And. Um, they’re like, well, did you, by chance turn on the compression feature. Yeah.

[00:14:37] Prasanna Malaiyandi: You’re like, of course you’re saying that I can save space. Of course, I’m going to turn it on.

[00:14:40] W. Curtis Preston: Yeah. So they’re like, so here’s how the compression feature works during backup during backup. It runs a compress minus C. This is all Unix stuff, compress minus C and S and redirects the compression, which sends the compression to standard out. And then, and then it redirects it to a temporary file in /tmp right.

A filename.Z. And then, then we copied filename.Z to, uh, to the tape. During restore. We, um, we restore filename.Z to /tmp, and then we run uncompress in place. And then once it’s done, then we copy it from temp to the file system. And so, yeah, basically working as designed dude, like you did test the restore, right?

When you, you should have known. Right. And so basically, you know, I was, luckily the story had a happy ending. I had, I had the

[00:15:45] Prasanna Malaiyandi: Yeah, you have to think in the back

pocket. And,

[00:15:47] W. Curtis Preston: I restored it and

everything was beautiful, you know, and

[00:15:50] Prasanna Malaiyandi: and did you get rid of that

software

or that solution?

[00:15:53] W. Curtis Preston: not, I did not get rid of the software. Uh, in fact, uh, we’ll, we’ll bring that full circle.

So I did not, I did not get rid of the software. I turned off the feature. Right. Um, and, uh, continued to use the software for the next couple of years. And then when I left. Uh, MBNA which at that time was the second largest credit card company. I left MBNA to go into consulting and they put me in the headquarters of Amoco was my first account, which was the American oil company in Chicago.

And, uh, they didn’t have any decent backups. Right. And so I was like, man, they need, they need like commercial backup software. And they said, well, we, we had some commercial backup software, but we kinda dumped it because nobody could figure out how to use it. I’m like, what did you have? And they go SM-arch. Seriously, like the one commercial product out of 50 that I know. And you, this is literally complete coincidence. So like, when I say that, like, I ended up being Mr. Backup because due to a series of events beyond my control, this is an example of that. My first ever client was using the one and only commercial piece of product that I, that I, you know, that I knew.

And I was able to call up to SM-arch to the company, software moguls. And I said, listen, I’m at Amoco and I’m going to save your ass. Right. So if you could just rework the license so that it’ll work in the current environment, because whatever they bought, it doesn’t match what they, what they have now. And they agreed to do it for me.

And so we got a license and we got the, we got everything backed up and that’s when that’s when, uh, everything started falling apart. And didn’t, we have a podcast on why I used to be called a crash.

[00:17:37] Prasanna Malaiyandi: Yes, we did

have an episode.

[00:17:39] W. Curtis Preston: Yeah. So that episode kicks in after this episode, because basically what happened is that the moment I got a decent backup of the entire data center, the data center just started falling apart.

Like they, you know, and we ended up restoring like crazy. So I got really, really good at restoring servers.

[00:17:56] Prasanna Malaiyandi: But if you think about it, most people probably don’t get that experience.

Right. It’s kind of like you learn trial by fire. Right? What you learned in the matter of many months is probably more than most people learn over like five years.

[00:18:13] W. Curtis Preston: I have also fought a giant fire as well, an actual fire. Remember, that’s a whole other story.

[00:18:18] Prasanna Malaiyandi: That

is another story as

[00:18:19] W. Curtis Preston: I have lots of lots of lessons. That’s what, that’s the one advantage of being oaf.

So, uh, yeah, so, okay. So what we’re talking about here is reasons, and this is not one of them, but this is just, uh, just to sort of give you an example of. That the restore speed of a given backup will almost always be slower than the backup speed of that backup. Okay. And there are a lot of reasons for that.

And I don’t think that this is especially if, like you said, if you haven’t. Um, done this, you know, I used the phrase fired in anger, right? If you’ve never fired your backups in anger, then you don’t know what I’m talking about. Trust me, this is the case for many, many systems. Not all, right? It’s not a universal truth, but it, but, but there are many reasons that it can often be the case.

So, uh, Y you are looking at the article just as I am, right.

The first TA talk about the first problem that we have here.

[00:19:33] Prasanna Malaiyandi: Yeah. So the first one is just sort of around if you’re using a disk based subsystem, more than likely you’re going to have raid. Right. Which

[00:19:43] W. Curtis Preston: Right?

[00:19:44] Prasanna Malaiyandi: simple explanation. It’s a bunch of disk. Brought together to make it look like one disc with some level of redundancy within the, uh, within those disks. Right.

And there are different types of encoding. Of course, erasure coding versus your normal parity based raid. But what happens is when you’re writing to a raid disk or set of disks, Every time you do a write. There is some amount of additional writes that need to happen because you are keeping additional information with the data.

So in case one disk fails, it can always be recalculated and you can get back your data. And this is normally known as parity information,

[00:20:27] W. Curtis Preston: Right, right.

[00:20:28] Prasanna Malaiyandi: Calculating parity isn’t free, right? It’s a bunch of checksum operations or other mechanisms in order to be able to calculate that. And then you have to end up writing that data across all of those discs.

And so when you’re doing the write. The performance could have some penalty, right? Because you have to do all the calculations and send all the writes to the appropriate places

[00:20:58] W. Curtis Preston: Right. so so just generally speaking, and by the way, this, this generally only applies. To like RAIDs two through six, no one uses RAID two. So really three through six, no one uses three or four or five anymore. So really what we’re talking it’s RAID Uh, it doesn’t uh, and what, and why don’t they use raid?

Well, I mean, some people still do, but they really shouldn’t and why not? But like anything lower than

[00:21:27] Prasanna Malaiyandi: Yeah. So in most cases it basically only handles a single disk failure. So you lose one disc and you can handle that case. But if you

[00:21:36] W. Curtis Preston: Like, how are, how often do you list? Do you lose multiple disks though,

[00:21:41] Prasanna Malaiyandi: At the worst possible time, right?

Because, because normally what I’ve seen happen, right? Having worked at past storage vendors is you are, you have a disk fail. Now, if you have a spare, it’s going to start rebuilding and repopulating that new disc that just got added and now the problem is. When you’re in the process of rebuilding that disc, you now have to do reads across all the other discs and put additional load on your system, which could potentially lead to another disk failure, especially if your discs have been bought around the same time or have similar age, right.

Or come from similar batches. Right. All these sorts of issues. And so. If you have one disk failed, it’s highly likely that another disk may fail. And you’re hoping that you can finish a rebuild before the next disc fails. But if you start thinking about like eight, 12 terabyte drives, that

might take some time.

And so

[00:22:34] W. Curtis Preston: that’s the real problem. Yeah, that’s the real modern problem is that you’ve got these giant disk drives that take a really long time to rebuild. And so that, that risky time. In between you’ve had a disc failure and you you’ve rebuilt that failed disc. That can be a really long time. And during that time you could suffer another disc failure and then you’d be, then you’d be Sol.

And that’s why everybody uses RAID six, or at least they should be. And if you’re not, you should really look into that. But I just want to make a point, this isn’t a problem with raid 10, right. Or raid one, which isn’t really raid, right? Uh, well, no RAID. Right. One is fine. Zero. Yeah. A raid one is mirroring, uh, but re generally most people use RAID 10, uh, which is, um, it’s mirroring plus striping,

[00:23:25] Prasanna Malaiyandi: Yep. And, and there are optimizations that some vendors do, trying to minimize the amount of data that gets written out. Like you write in full stripes rather than partial writes. You try to aggregate as much data as possible. Right. There are optimizations people try to do, right. But in the end, there’s only so much you could do for having to recompute parity.

Right. And the checksums and then send it to.

[00:23:53] W. Curtis Preston: exactly. And, but just the general rule is. That it is slower to write to a RAID array than it is to read from a raid array, a parity based RAID array. And I would say the same is also true of an eraser coding based array. Um, so, and so that is the first reason why you might have, um, um, a, you know, a penalty when writing and then next we have is this little thing called copy-on-write.

Snapshots. So I’m a huge fan of snapshots. Um, I am right. Um, I mean, you know, I ha I have caveats that, you know, they need to be copied in order to be a backup.

[00:24:37] Prasanna Malaiyandi: They’re there for good purpose.

[00:24:39] W. Curtis Preston: they are, they have a great purpose. Having said that I am less of a fan of the copy-on-write. Style of snapshots. Right. And, uh, I need to explain what that means.

So once you create a snapshot, uh, it creates a moment in time that the snapshot, you, you didn’t really create anything. You just create a, like a, it’s like a view into your storage, right? You didn’t copy anything. Then when you go to overwrite a block of data with new data, they, the snapshot system needs to preserve the old block that you had when you created snapshot.

So it copies that block out into a snapshot area. And so when you go to read that snapshot, it gets most of the data from the main drive, and then it gets. Any before images from that other thing. So that’s why it’s called copy on, write? Because when you write, you’re going to copy the data. The longer you hold on to a snapshot, the more blocks that have to get copied out.

And the more blocks you have to read when you go to, um, to do that, and which is why, um, and by the way, this is different than redirect on write. Which is, um, where. You simply write the new block in a new place. Right. And it’s a series of pointers. It’s a lot more complicated and redirect on write. Is close to, but not the same as what NetApp does really close NetApps says it’s different.

And you know, I’m sure it is, but it’s close enough to that, but this is why NetApp and products like NetApp. They can have tons of snapshots without it impacting their write performance, but it is. Absolute certainty that if you have copy-on-write snapshots and you keep a lot, you keep them around for a long time.

Um, or you just created a copy-on-write snapshot. And now you go to do a large restore. It’s going to do a copy of every single block that you’re trying to overwrite before it can overwrite it, which means there’s just going to be a big penalty when you’re going to do that. Um, and you know, th th this isn’t me, you know, I remember I never worked for NetApp, but I remember when I was explaining this. To somebody and they’re like, oh, or you’re just a NetApp lover. And you’re just, I’m like, okay, it’s just a fact, right? Like I’m I remember being at a large, not Amoco, but an a really large oil and gas company and a certain other large storage vendor.

[00:27:14] Prasanna Malaiyandi: Yup.

[00:27:15] W. Curtis Preston: One that you might be very familiar with, um, came in and we just asked them a point blank question.

The customer wants to keep six months of user browsable snapshots, what would happen to their performance? And they were like, no one does that. That was a response. No one does that. Well, we’re doing it here with the NetApp systems. We already have what would happen if they, and they literally had to guess they were, they guessed it like a 50%. Performance hit was, was the best guess. So anyway, so if you have a copy on write snapshot based storage array, you’ve created a snapshot and then you go to do a large restore. You’re going to have a huge write penalty when you, um, overwrite that. Um, so, and then the next, uh, what about this file system bit?

[00:28:03] Prasanna Malaiyandi: Yeah. So the challenge with file systems, right? And this is when it comes to writing into a file system, right? It’s no longer the small, like your laptop file systems we’re talking about, but these very, very dense file systems. If you look at some of the scale-out file systems out there with millions and millions of files on it, right when you’re restoring the file.

First, it creates a file that it wants to restore the data too. And then separately, it has to pull the data for the actual data contained within the file system itself. Right. And so, because there are these two steps and depending on how many files or files are in the file system, because in the end, all of that needs to be tracked in that system. And so creating these files can actually take a tough take quite a while. And so if you’re pulling all this data and say you have millions and millions of files that could take you much longer than say having one large file versus a

million small files.

[00:29:10] W. Curtis Preston: Yeah, exactly. That, that it could actually, I could actually take more time to create the files than it does to actually transfer the data.

[00:29:17] Prasanna Malaiyandi: Most people think it’s just, oh, I’m just creating the file. Isn’t that simple. But you have to remember when you’re restoring file. It’s not only creating the file. Right. It’s also setting appropriate permissions on the file or anything else that needs to be done to the file in addition to moving the data.

[00:29:33] W. Curtis Preston: And it is a really small amount of time, but when you have millions of files, it’s a small amount of time divided or multiplied times millions. And I, and this is why, by the way, there, there, there are products that are specifically designed to do it. The only one that’s coming to my mind and I hope I get the product name.

Cause it’s been awhile, but a net backup had maybe still has a product called flashbackup. And what it does is when you have a scenario like this, where you have a very dense file system, they can back it up at the block level. And then when you need to restore the entire file system, they restored it to block level.

When you restore it at the block level, you don’t have this problem. Right. We don’t have this problem when we restore an entire VM.

[00:30:19] Prasanna Malaiyandi: Yep. I will say I’ll just want to say yeah. Image based or app aware image-based

backups is I think

the

industry

[00:30:27] W. Curtis Preston: exactly. So, but if you are, if you are restoring at the file system level and you have a very dense file system is going to take awhile, it’s going to take longer than, than, than otherwise. Um, the next one it’s, it’s a bit odd, but it is what it is. It’s a bit, it’s a bit like it’s a bit more out there.

But if you, you know, and I, I mentioned overburdened transaction logs. So in a transaction log, in a database, it’s got to keep track of all the transactions depending on how you. Do your restore. So if you’ve got a re the way you do a database restore is you generally have a backup of the database from, let’s say 24 hours ago, maybe even longer than that could be 12 or whatever.

It depends on a number of factors. And then you have a number of transaction logs that you use to move that database forward in time from when the backup was taken up to the point in time of the outage. And if the transaction logs are, um, you know, if the storage, if the performance of that is not up to snuff, it can really slow down the playing replaying of all those transaction logs.

And this is something you might not notice during normal operations, but the replaying of the transaction logs, it’s like you’re taking. What could be 24 hours of transactions and you’re playing them all within 20 minutes. Right. And so it could really bog down your, your transaction logs and if your transaction logs, uh, if the storage is not up to snuff, uh, so what, what does this say?

It put your transaction logs on flash and that’s all I got to say

[00:32:07] Prasanna Malaiyandi: The other thing I would also say is. Make sure you understand how long it’ll take to replay those lines. Right. So for instance, if you were only doing, you had in your example, 24 hours to do your normal backups, but say the customer decides, oh, I’m only going to do it once a week. Now you have seven days worth of transaction logs to play back, maybe in the case of a, not so heavily used database, that’s fine.

But if this is like amazon.com, right, and you’re trying to play back a week worth of transactions, that’s a lot of records to

replay against the database.

[00:32:44] W. Curtis Preston: Yeah. And we’re going to cover this again at the end, but basically I would, I would do this, do a test restore and see how long, you know, if you’re, if you’re doing it once a day and then you play a typical days worth of transaction logs and you’re like, oh, well that takes one hour. Are we okay with one, you know, a one hour RTO.

It actually, it’s going to be more than an hour RTO. Cause it’s going to be the time to restore the database and then the time to restore the transaction logs. So then you can adjust perhaps your backup frequency,

[00:33:14] Prasanna Malaiyandi: And do it all against, or do it for a production backup that you did. Don’t do it for like a test instance that you’re just trying out,right do it for an actual production instance that you can actually test and see, see what in real life those transactions look like. And you’ll get an understanding of your RTO

[00:33:34] W. Curtis Preston: would you recommend doing like our friend in Alaska did?

[00:33:39] Prasanna Malaiyandi: Yeah. Don’t do it in your production environment while you’re tearing down. Right. Paul, we love you. But man, that was crazy story.

[00:33:47] W. Curtis Preston: Oh God, that was a crazy story. Um, uh, what was it, what was that episode? Um,

[00:33:54] Prasanna Malaiyandi: It was with Paul van Dyke. Episode 1 35, it admin deletes entire data center. Then tests his backups.

[00:34:05] W. Curtis Preston: Yeah, that would be the one. Uh, so when you say testing it with production data, you don’t mean testing your restored by restoring your production database on top of your desk. Do you mean using your production backups and restoring it to it to a test area?

[00:34:23] Prasanna Malaiyandi: That is

[00:34:23] W. Curtis Preston: Key key differentiator there. So the, the next thing, and this is, I think, I think this is less of a problem for most people, but for those of you still backing up to tape, this is a real problem.

So multiplexing, and again, if you go back to the backup is evil episode from four or five episodes ago. Uh, we talked about this. We talked about that. That multiplexing is evil. It was, and still is a necessary evil. If you’re backing up to a modern tape drive, the reason is that the tape drive wants to go a lot faster than the backup can go.

And so you take and you, interleave a bunch of different backups together

[00:35:05] Prasanna Malaiyandi: Which sounds amazing.

[00:35:07] W. Curtis Preston: Which sounds amazing. And it makes a tape drive happy and it eliminates shoe shining or at least reduces shoe shining. And everything’s great. But the problem is when you go to restore, you have to read all of those backups and throw away all, but the one that you need and modern multiplexing settings they’re as high as like 32.

So you’re, you’re throwing away, you know, I dunno what a 32 divided by a hundred. Uh, no. Is it, would that be 97? What

[00:35:39] Prasanna Malaiyandi: you’re throwing away.

97% of the data.

[00:35:42] W. Curtis Preston: is that really? 97% of the data. Is, are you, are you just really good in your head or did you divide

[00:35:47] Prasanna Malaiyandi: no. Can in my

head,

[00:35:50] W. Curtis Preston: Okay. Uh, so yeah, so you’re throwing away 97% of the data, which means that your restore speed is gonna suck!

[00:35:58] Prasanna Malaiyandi: It’s actually 96.37, sorry. 8

[00:36:01] W. Curtis Preston: good. That was pretty good. pretty good. in your head, um, thing there. So, um, And that’s why this is why we stopped using tape.

This is why I stopped recommending the use of tape as a primary protection mechanism. I’m not even that big of a fan of it as a secondary production mechanism. Um, you know, this is, you know, we, we talked about this when an episode with Brian Greenberg, uh, and his, uh, a colleague where we, you know, th there, there is a.

Th there is a group of people that are bigger fans of tape now because of ransomware, but you got to address this issue and you got to make sure that you understand that when you restore from tape, if you used multiplexing, then you’re basically, if you did not use multiplexing, you don’t have this problem.

If you did use Mo I’m sorry. If you did not have multiplexing, then you don’t have this problem, but you have a different problem. You you’ll just you’ll.

Well, you want, you have full backups and you have one stream you’ll more than likely be suffering shoe shining from your restore instead of shoe shining during your backups, because you’ll get, you’ll get the, uh, the raid penalty and the, the write, the write speed.

Even if you don’t get the right, the raid penalty you’re discouraged probably has a limit at which it can write. And it’s probably different than the speed at which the tape. Tape drive can go. A lot of people don’t realize that tape drives are typically way faster than most, uh, in terms of throughput, not random access, but throughput.

[00:37:33] Prasanna Malaiyandi: Yup.

[00:37:34] W. Curtis Preston: So you’ll, so your choices, your choices, both suck. That’s why I don’t like using tape anymore for backups. I like using them for archive. Because they’re much better than disk at holding onto data for long periods of time. Um, so, so really what we’re talking about here, and that’s the end of the reasons, and some of those you can address, you could potentially say, well, because of the restore speed problem. We’re going to stop using RAID six, or we’re going to go to RAID 10, right? That’s a huge cost because that is a significant difference in the number of disks that you will need. Although the jump, the jump from raid six to 10 is not

[00:38:17] Prasanna Malaiyandi: How bad as yeah.

[00:38:19] W. Curtis Preston: five to 10. Right. Um, and by the way, it should be RAID 10, not raid zero plus one.

There is a difference between read 10 and RAID zero plus one. Uh, there is a difference in the num the number of drives that you can lose.

[00:38:32] Prasanna Malaiyandi: Or, Or, do you think some of this goes away also, if you’re considering like SSD for primary storage.

[00:38:42] W. Curtis Preston: You know, that’s a good question. Uh, and the answer is I have no idea. Um, SSD is really good at random, you know, it’s fast at writing, but if the problem is the calculation, then maybe it

doesn’t

[00:38:58] Prasanna Malaiyandi: yeah, I don’t know. Maybe if you get a wide RAID group plus

yeah. Interesting. Yeah.

[00:39:06] W. Curtis Preston: I don’t know. Um, I mean, we’re all going to be moving to SSDs.

[00:39:10] Prasanna Malaiyandi: Yeah.

[00:39:11] W. Curtis Preston: I honestly think that we’re going to get to a point where almost everything is either on SSD or tape,

[00:39:16] Prasanna Malaiyandi: Yeah. You do the two ends of the spectrum? You decide where your workload runs and you’re

good to go.

[00:39:21] W. Curtis Preston: Yeah. Yeah. So all I’m saying here is just be aware of these things now. Don’t don’t be like me. Don’t be like what happened? Right. And find out that your, your raid penalty, when you go to do a large restore and everyone is looking at you,

[00:39:41] Prasanna Malaiyandi: Yeah.

[00:39:42] W. Curtis Preston: figure this out. Now think about the worst case scenario that you have, and then go test

[00:39:47] Prasanna Malaiyandi: Yeah. Or,

[00:39:48] W. Curtis Preston: the biggest server.

[00:39:49] Prasanna Malaiyandi: I was thinking of when doing your file server restores, don’t just pick a single directory with like a hundred files in. Right. Pick something more substantial to restore. So you can understand what the real world performance looks like rather than you having to do it in urgent need.

[00:40:07] W. Curtis Preston: You need to do test restores and you need to do representative test restores, similar sizes, similar hardware. Um, you know, generally you’re going to get slower hardware to test on. I do think that VMware and virtual I’ll just say virtualization in general makes this a lot easier. It’s a whole lot easier to restore an entire VM than “back in the day” when we did a bare metal recovery of a physical server, that was a giant pain in the butt. You’ll notice for those of you that get the, the new book, modern data protection. Barely mentioned BMR because you just shouldn’t be doing that at this point. Right. Um, you just, everything should be virtualized in this point.

It should either be a VM in the cloud or a VM in one of your, you know, pick your favorite hypervisor, the advantages from a backup and recovery perspective alone. Um, you know, figure that out or work it out. So this is all I’m saying is, is. Is test it now and then set expectations because it’s just like, it’s just like, you know, fights in a marriage.

So many times you get over, you get over a fight over something so stupid. And it’s because one of you just had a different set of expectations than the other. Just make sure that you go in, like you have a meeting before the bad thing happens and say, listen, I’ve been doing some tests restores. And it turns out that the raid five penalty of our umpty-squat array.

It means that our restore is going to take roughly 50% more amount of time than the backups. Let’s talk about that now. And we can either accept that and then don’t yell at me when this happens during production or, um, let’s make a change to the design.

[00:42:05] Prasanna Malaiyandi: I think another important point is it’s not just a one time at the start of a project and you’re done because data sets change requirements change. This is an ongoing basis. You should be doing realistic restores going back, communicating with your stakeholders, right? Keeping them up to date on what’s going on because what might have been agreed upon on day one, right?

Three months for now, when the requirements have changed, right? If you don’t go back and communicate them, then. And set expectations and things may still

blow up.

[00:42:39] W. Curtis Preston: And I’d like to suggest, and maybe we should do a whole podcast on this of just ways to affect test restores. But one thing that I tried was. When we procure when we procured a new server. We, the backup team was given access to that server for a little while before it got production access. And what we would do is use that to test full server restores.

And you can do that with a new box that you bought in to brought in, to be a VM-ware server or hyper V or AHV or whatever. Um, and then just test the crap out of a test, different VMs, you know, make sure that it’s in some kind of bubble.

[00:43:26] Prasanna Malaiyandi: Yeah,

[00:43:28] W. Curtis Preston: So that it doesn’t start sending out exchange email,

[00:43:31] Prasanna Malaiyandi: cool.

[00:43:34] W. Curtis Preston: speaking of exchange, what’s my opinion on on-prem exchange Prasanna.

[00:43:41] Prasanna Malaiyandi: Don’t do it.

[00:43:42] W. Curtis Preston: That’s right. Who the hell is still doing on prem Exchange? You know what, if you’re out there and you’re listening to this and you’re doing on-prem exchange and you’re like, why does he keep yelling at me? I want to know what is your deal? What is it that you like? About on-prem exchange that you, you know, that you don’t get. Sure.

[00:44:04] Prasanna Malaiyandi: I think one of them could be data residency related.

[00:44:09] W. Curtis Preston: Do you really think that’s a thing? Like people wanting the, the copy of the, just their data just in their data center,

[00:44:17] Prasanna Malaiyandi: Not in their data center, but if there are regulations.

[00:44:21] W. Curtis Preston: what do you, you can, regulations should keep. That, you know, that’s a good question. I don’t know. I, you know, there, there are so many industries and so many regulations, there could be something, but I am not aware of any,

[00:44:35] Prasanna Malaiyandi: I’m not particularly aware, but yeah, that’s the only thing that comes to mind is. And most like Microsoft Azure. They are pretty good in terms of where they’re located these days.

So I could see that being less of an issue versus like five years ago. But I’m just wondering if there was still some of those customers.

[00:44:54] W. Curtis Preston: I could see there being like the touchy feely problem. I’m like, I just want to touch it with my hot little hands. I get that. Although I disagree with that, uh, you know, the, the value of physically touching your server. Vastly overrated. Um, and I think that you, you, you gain a vast amount of security and whatnot by using SaaS services and by using, um, you know, IaaS services where you can just point and click and say, I need this firewall and this set of rules and this set of thing.

And you just get all that stuff with a point click button, rather than having to piece it all together. Um, you know, I’ll take, I’ll take the security of an average cloud vendor over the average data center any day of the week. And that isn’t just because I work for Druva I’ve I’ve always said that. Um, so anyway, well it’s been, uh, you know, it’s been one of those sad episodes.

[00:45:47] Prasanna Malaiyandi: It’s not

[00:45:47] W. Curtis Preston: we delivered

nothing but bad

[00:45:49] Prasanna Malaiyandi: It’s not sad. It’s just things that we think people should be aware.

Right. And because otherwise, well, it’s not sad because it would be sad if we didn’t tell them this information and then things blew up and escalated right. Way. Cause like you, right. If they have an issue where they need to do a restore, they had never tested it out.

And now they’re like what happened?

[00:46:11] W. Curtis Preston: Yeah. Yeah. Don’t be don’t. don’t.

be like

earn it. Yeah. We need the little stick figure. Here it is. Didn’t test his restores. Curtis had to use the old back up. Don’t be like Curtis.

[00:46:23] Prasanna Malaiyandi: least Curtis

had an old

backup.

[00:46:25] W. Curtis Preston: Yeah. You know, actually there is in the book, there is a stick figure. There is one of those stick figures that, that talks about Curtis.

I forgot exactly. I forgot which one it was, but we did one of those little stick figure memes of don’t be like Curtis. Um, so anyway, well, uh, you know, thanks for discussing this article written by this brilliant guy.

[00:46:48] Prasanna Malaiyandi: Anytime Curtis. Yeah, that

author was really

good. Maybe we should have him on the podcast.

[00:46:55] W. Curtis Preston: Great minds think alike. I like that. And, uh, thanks to the listeners. Um, you know, we’d be nothing without you and remember to subscribe so that you can restore it all.

—– Signature and Disclaimer —–

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I’ve specialized in backup & recovery since 1993. I’ve written the O’Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

The post Seven reasons why your restore may slower than your backup appeared first on Backup Central.

Subscribe Our Newsletter.

What is more interesting is if you find out about  all the existing new things we are up to