3: Booting and Coredumps 1
11minThis transcript is also available as a PDF download within the files tab.
Booting and Coredumps 1
Hi and welcome back. This is module two which will be on booting and managing core dumps. Now let’s have a look at the boot procedure of our system. In this module we will have a look at what happens when you boot and node. Also we will introduce a problem to the viral final system and we will fix that problem. Next to that we’re going to have a look at core dumps. Now when a controller is powered on, which is something you should always do only after the other devices like the shelves and switches have already been booted. So when a controller is booted, the loader from where it started, the Netapp loader is equivalent to the bias of any other system and never bladder is identical on all controllers. So you have the same look and feel no matter what controller type you have. Now the letter has many variables.
Some of these variables are used to locate the colonel and we will see that in a second. So when the loader is started it will find the kernel and the root file system as well, which are both located in the CF card. So the loader will excess the CF card to access these two items, the kernel and the root file system. Now don’t confuse the root file system with full zero, so zero is not in play yet. We’re talking about a file system, which is on the CF card, so it’s on the compact flash. The root file system is mounted read only as the first file system that gets loaded so you cannot change that file system ever. Then the environment’s activated and the Var file system is restored from the CF card as well. So in short only safeguard. You will find the kernel, you will find the root file system and you will find the Var file system. You will be able to get to the boot menu when pressing control c. If you don’t do that, the booting will simply continue.
Now one of the very important files in the Var file system is the Var RDB cyclist file and this file contains cluster information. So for example, it states what is the name of the cluster that this node will have to join? What are the other nodes in the cluster and what is their id? Uh, also what are the IP addresses of the cluster interconnect interfaces? So you may think that the cluster configuration is only involves zero, but the node will start the closest software and joined the cluster before vol zero is mounted, and then it will synchronize the itbs after four zero has mounted. Actually the mounting of all zero is the very last thing that is done before the login prompt appears. Now after you’ve logged in, you can see how the cluster booted by viewing of all the environment variables of the by executing the command 10th that’s probably short for colon environment. Anyway, it will display all the colonel variables. Now let’s have a look at the demo. We’re going to log into the system shell. Then we’re going to run the command. We’re going to check the CF card, and we’re also going to have a look at the cyclist file.
So first we specify the password of the Admin user to get into the cluster shell and then we go to the system shall we typeset d to get to the diag mode and we run the system shall command and specify in this case node one. So we’re going to log into the system shall of node one and we’re in the system show.
Okay. Now first let’s have a look at these colonel variables and you will see that there are loads of colonel variables, most of which you will never touch to be precise. If we want to know how many variables there are, we run the word count and we count the number of lines and we see we’ve got 172 variables.
Okay. To be a bit more precise about the CF card variables, we can grep for CF card and that will tell us that we’ve got some variables that define the location of the kernel that be fine to base directory for all the images that we can run, which is maximum of two by the way. So you can have two kernels on one CF card and we also see the root file system location. So we see the to the CF card to have a closer look.
Okay. We go to x 86 64 free BSD and when we do a listing we see the image one directory in which we’ll find the kernel, but we also see the viral file system. So we go to image one and we will see the colonel and the root file system. Then we go to slash fire to have a look at the cyclist file.
So we go to RDB and we opened the cyclist fine. These are the nodes cluster node one up to an including node four and we also see the Ip addresses that are used to connect to the cluster interconnect network. The epsilon ID, which we’ll cover in a another module. And the number of nodes is for the cluster name is cluster one. So we exit and now let’s have a look at the cluster and we see that we’ve got four nodes and the epsilon is hosted by no to again, the epsilon will be discussed later on in greater depth and we see that no two is the epsilon. No two has the ID 1001 which is also the epsilon id. So there’s our match. So you can imagine what happens if you lose this file or the file system in which it resides. No. In the following demo, we will break our node by removing the Var file system from the CF card.
This simulates the corruption of the card or the replacement of a node. So if for example you have replaced in node, you will no longer be able to boot that note into the cluster because the node has no idea about the cluster still var fs is on vole zero of that particular node. So if you would be able to get it from full zero your node has its cluster config in the cyclist filed back again so it knows how to connect to the cluster interconnect and successfully boat into the cluster. So we will remove the virus from the CF God. Then we will reset our system. We will not reboot it because [inaudible] will also be in and VRM. So if you would just reboot your system, then the end Veran content or at least to Veritas file system would be restored to the pink flesh. So we will hide, reset the system and at boot time we will go to the boot menu. And from there we can restore it from vol zero. Now let’s do it. So we log into our system and we have a look at the cluster environment. Everything is fine, everything is healthy.
Then we go to the system shell of node four we can pick any note. And so we pick node four we login with the Diet user, we’re in home dynamic, which is our login directory and we CD two four zero. So we want to see whether we want to see whether the voter file system is there. And there it is, 103 kilobytes. And now we’re going to check it in the CF card. So we go to the CF card, we go to the directory that contains the viral file system and we have a look at it and we see that it is exactly the same formal system of 103 kilobytes. Now let’s remove it and just to make sure, oh, sorry, we have to be, um, route. So just to make sure that we remove everything that has to do with Barr, we not only remove virus dot teaches it, but we will also remove the check sum that verifies and the old virus dot teaches it, which would be the previous version of the same file system.
So we’ve removed everything has to do with far. Um, we see that the kernel and the root file system was still there and all the other stuff that’s needed, uh, that has to do with the kernel. We do need that in order to be able to boot. Now let’s reset the system. In a real life environment, you would reset your system by using the service processor or simply by removing the power to the system and vmware. We will reset it heart and then we type controls c to go to the blue menu and we will pause the video for awhile because you have an important decision to make right now. You should not run a normal boot. If you would do that, then the system will act as if it is set up for the first time because there’s no cyclists file. Then it will try and create a new cluster and that’s definitely not what you want. You want to join the existing cluster, so you need to restore the virus with the cyclist file. So we choose six and then it will prompt us to confirm our choice and we say yes. Then it will start restoring virus from vol zero. As you can see, it says restore using slash en route at c virus, dop tgs yet so it mounted four zero to get to the backup and restore it to come back fresh and increase towards successfully. And then it will automatically reboot.
Okay. And then after some time, we get the login prompt again. We login, uh, of course we check whether the cluster is healthy and no four is up and running and is part of the cluster again. Now let’s have a look at core dumps.
You must Login to access notes.
All videos in this course
Lessons
- Log in to get access 1: Courselanding 3min
- Log in to get access 2: Architecture and Concepts 22min
- Log in to get access 3: Booting and Coredumps 1 11min
- Log in to get access 4: Booting and Coredumps 2 5min
- Log in to get access 5: Replicated Databases 17min
- Log in to get access 6: Epsilon 11min
- Log in to get access 7: Eligibility 7min
- Log in to get access 8: Volumes 6min
- Log in to get access 9: Losing SVM root 12min
- Log in to get access 10: Losing vol0 1 8min
- Log in to get access 11: Losing vol0 2 17min
- Log in to get access 12: Networking 1 8min
- Log in to get access 13: Networking 2 6min
- Log in to get access 14: Networking 3 5min
- Log in to get access 15: Logfiles 11min