Search

Ads

Tuesday, July 22, 2014

How to Backup/Restore a Windows 2003 Domain Controller

How to Backup/Restore a Windows 2003 Domain Controller 

 
 
Posted by General Zod in Microsoft, Tech.
trackback
A couple years back, I was working for a rather large company with hundreds of sites in about 50 different countries that were all linked by a single global network… except for 4 or 5 data center sites that were called “solution centers”.  I worked at one of these special sites.  The purpose of the solution centers was to house whatever services a customer company required us to while keeping it separate for our company’s global network.  As we were not part of the global network, we were considered the black sheep of the company… and I was the lone systems engineer responsible for keeping the servers at my site running.  No bother… I do my best work when I’m left to my own devices.
However, this did present many additional complications that others in my company did not have to contend with.  The largest challenge to overcome was our site’s disaster recovery plan.  We could not just assume to relocate to a new site because we would need to recover our own environment, which included our own domain.
Yes, I know… I could have just housed one of our domain controllers at another location and established a special VPN just for the communications between the DC’s.  That would be a valid solution, but just not good enough.  During a DR event, that would place me very dependent upon the IT staff at that other location… and call me crazy, but I want to be able to ensure that I would be able to perform the recovery 100% without the assistance of anyone else.
I spent a lot of time reading over Microsoft white papers and procedures written by various individuals, throwing ideas around with colleagues, and plucking away at ideas in an attempt to develop a procedure that would fulfill our needs.  Eventually, I developed the procedure that you’ll read below… and tested it successfully on several occasions.  Knowing that someone else out there is probably looking for the same thing, I figured it would be grand to share it with you.

How to Backup the Domain Controller(s)

Obviously, before you can restore your domain, you have to back it up first.  :)
Mainly what we’re interested in backing up is the System State of a Domain Controller.  So what is the System State?

The System State of your server includes the Registry, the Boot files, some System files, the Active Directory service, and other components.  (Read more about it here.)  You can not pick and choose between which components are backed up during a System State backup.  It’s an all or nothing situation.
Since this includes the whole of your Registry, you have to understand that this includes the information about the original System’s installed hardware.  This may complicate the restore process somewhat.  If you backed the System State from DC on an HP Proliant DL380 G5 series server… and attempt to restore it on a Dell PowerEdge T100… you will most likely have issues with booting up the OS afterwards because the hardware set is significantly different.
As part of your DR plan, I recommend making a point of documenting the hostname, IP address, Operating System, Service Pack level, and the hardware make/model of each of your domain controllers.  You may find this information useful when the time comes.
These instructions are going to use the hostname "DC123" as name of the domain controller, and assume that you want to run your System State backup every day at 3:00am.
Login to your domain controller, and perform the following steps:
  1. Create a C:\Backup\ folder.
  2. Click Start — All Programs — Accessories — System Tools — Backup.
  3. Click [Next] — Select Backup Files and Settings — [Next].
  4. Select Let me choose what to back up — [Next].
  5. Expand My Computer — Check System State — [Next].
  6. Set the location of the backup file to C:\Backup\ folder.
    Set the Name of the Backup to “DC123 System State”.
  7. Click [Next] — [Advanced] — Select Normal — [Next].
  8. Check the Verify Data after Backup box — [Next].
  9. Select Replace the existing backups — [Next].
  10. Select Later — Set the Job Name to “DC123 System State”.
  11. Click [Set Schedule] — Schedule the job to run Daily at 3:00am.
  12. Click [OK] — Enter a set of user credentials — [OK].
  13. Click [Next] — Enter a set of the user credentials — [OK] — [OK] — [Finish].
The actual backup job itself will probably take somewhere between 15 – 30 minutes to run.  Then, you can backup the C:\Backup\ folder to tape.  Personally, I had preferred to schedule another task that would launch at 4:00am to “robocopy” (which can be found as part of the Windows Server 2003 Resource Kit Tools download) each of the backup files to another server where they were all dumped to tape a few hours later.
You only really need to backup 1 domain controller for this to work, but then your pretty much locked into a single hardware set when it comes time to do the restore.  Since I was never sure what kind of hardware I would have available to me when it came time to do the restores, I tried to make a practice of housing each domain controller on a different model of server… and backing each of them up individually.  Each backup ran me somewhere between 600 – 800 MB of disk space (which is rather a small pittance by today’s standards).
Yes, this was probably a significant amount of overkill on my part.  However, I find that the more paranoid you are, the better prepared you tend to find yourself.  And I tend to be rather paranoid about things like DR.

How to Restore the Domain Controller(s)

Now let’s pretend that a disaster has struck!
You’ve retrieved your tapes from off-site storage and acquired your target hardware, so let’s get to work!  (Remember that matching the hardware to the DC restore would be best, but you can make substitutions.  It’s not an exact science, so some experimentation may be required.)
Note:  These instructions are written with a few assumptions in mind.
  1. We assume that your entire domain has been leveled by some catastrophic event.
  2. We assume that your domain controllers are running a Windows 2003 operating system.
  3. We assume that whomever is doing the work knows the login credentials (from the original domain) to the domain’s Administrator account or a user account that is a member of both the domain’s "Domain Admins" and "Schema Admins" groups.
  1. Build a stand-alone Windows 2003 server, and bring it up to the same Service Pack level as the original DC.
  2. Name the server with the same hostname as your original DC.
  3. Restore your System State backup files from tape, and copy them to the new server’s local hard disk.
  4. Reboot the server.
  5. After POST, hit [F8] and select to boot into “Directory Services Restore Mode (Windows domain controllers only)”.
  6. Click Start — All Programs — Accessories — System Tools — Backup.
  7. Click [Next] — Select Restore files and settings — [Next] — Browse to the location of the backup file — [Next].
  8. Expand File – System State Backup — Check the System State box — [Next].
  9. Click [Advanced] — Select Original Location — [Next] — [OK] — Select Leave existing files (Recommended) — [Next].
  10. Check the boxes for: *  Restore Security Settings
    *  Restore junction points, but not the folder and file data
    *  Preserve existing volume mount points
    *  When restoring replicated data sets, mark the restored data as the primary data for all replicas
  11. Click [Next] — [Finish].
  12. After the restore is completed, click [Close] — [Yes] to reboot the system.
If your server hardware is significantly different from the original DC, then you may experience difficulty with the boot to the GUI.  If this is the case, then you might be able to still recover the OS by booting into Safe Mode or by booting to an original Windows 2003 OS CD to perform a Repair.
Once you get into the GUI, you will need to login using the local Administrator password from the original DC.
Now you will be able to seize the FSMO roles.  (Note:  After each "seize" command, click [Yes] and allow 3-5 minutes for the task to complete.)
  1. Click Start — Run — NTDSUTIL — [OK].
  2. Type the following commands into NTDSUTIL. roles
    connections
    connect to server DC123
    q
    seize domain naming master
    seize infrastructure master
    seize PDC
    seize RID master
    seize schema master
    q
    q
Next, confirm that your DC is a Global Catalog server.
  1. Launch AD Sites and Services
    (C:\Windows\System32\dssite.msc)
  2. Expand Sites – Default-First-Site-Name – Servers – DC123.
  3. Right-click and select NTDS Settings — On the General tab, verify that the Global Catalog box is checked.
  4. Perform a clean reboot of the system.
Now we’ll clean the old domain controllers out of the AD database.
  1. Click Start — Run — NTDSUTIL — [OK].
  2. Type the following commands into NTDSUTIL. metadata
    cleanup connections
    connect to server DC123
    quit
    select operation target
    list domains
    select domain <#>
    list sites
    select site <#>
    list servers in site
    select server <# of bad DC>
    quit
    remove selected server
    quit
  3. Launch Active Directory Sites and Services(C:\Windows\System32\dssite.msc).
  4. Expand Sites – Default-First-Site-Name – Servers.
  5. Right-click on — Select Delete.
  6. Launch Active Directory Users and Computers (C:\Windows\System32\dsa.msc).
  7. Expand the domain — Open the Domain Controllers container.
  8. Right-click on — Select Delete.
  9. Select The domain controller is permanently offline and can no longer be demoted using Active Directory Installation Wizard (DCPROMO).
  10. Click [Delete] — [Yes] to confirm.
Your domain should now be successfully restored, but don’t consider yourself finished at this point.  This restored server should be considered hinky at best, and should not be kept as a long-term solution.
Before doing anything else, I recommend that you build a 2nd “clean” domain controller alongside this restored 1st DC.  Then, transfer the FSMO roles to the 2nd DC.  Finally, demote the 1st DC to a member server and retire it from the domain.  That will hopefully ensure that your domain is running on a clean and stable DC that you can rely upon.  Then, build a new 2nd DC to ensure some redundancy.
Congratulations!  Your domain is restored.  Now get to work on restoring everything else.  :)

Windows Server 2003 Disaster Recovery Planning (Part 1)

Windows Server 2003 Disaster Recovery Planning (Part 1)


In this article, we will discuss what every Microsoft Windows Administrator and Engineer should think about when trying to manage their environments in the scope of planning for Disaster Recovery and Business Continuity. This is Part I in a 4 part article series where we will cover many of the details administrators and engineers need to know about planning Disaster Recovery for Windows Systems, as well as for their networks in general. In part I, we will look at Windows 2000 & Windows Server 2003 Clustering & Load Balancing for high availability, as well as general planning information.


For a complete guide to security, check out 'Security+ Study Guide and DVD Training System' from Amazon.com

Planning for High Availability  

Windows Server Disaster Recovery Planning can be a chore, but if you have the details and a plan, it can go smooth to setup, and will be a life saver when your systems start to smoke, and your VP’s are knocking on your office door asking what the heck is going on! In this section we will look at how to plan for High Availability.
Taking the time to plan and design is the key to your success, and it’s not only the design, but also the study efforts you put in. I always joke with my administrators and tell them they’re doctors of technology. I say, “When you become a doctor, you’re expected to be a professional and maintain that professionalism by educational growth through constant learning and updating of your skills.” Many IT staff technicians think their job is 9 to 5, with no studying done after hours. I have one word for them: Wrong! You need to treat your profession as if you’re a highly trained surgeon except, instead of working on human life, you’re working on technology. And that’s how planning for High Availability solutions needs to be addressed. You can’t simply wing it and you can’t guess at it. You must be precise, otherwise, your investment goes down the drain – and all the work you put in will be not only useless, but also wasteful.

Plan Your Downtime

You need to achieve as close to 100 percent uptime as possible. You know a 100 percent uptime isn’t realistic, though, and it can never be guaranteed. Breakdowns occur because of disk crashes, power or UPS failure, application problems resulting in system crashes, or any other hardware or software malfunction. So, the next best thing is 99.999 percent, which is still somewhat reasonable with today’s technology. You can also define in a Service Level Agreement (SLA) what 99.999 percent means to both parties. If you promised 99.999 percent uptime to someone for a single year, that translates to a downtime ratio of about five to ten minutes. I would strive for a larger number, one that’s more realistic to scheduled outages and possible disaster-recovery testing performed by your staff. Go for 99.9 percent uptime, which allots for about nine to ten hours of downtime per year. This is more practical and feasible to obtain. Whether providing or receiving such a service, both sides should test planned outages to see if delivery schedules can be met. You can figure this formula by taking the amount of hours in a day (24) and multiplying it by the number of days in the year (365). This equals 8,760 hours in a year. Use the following equation: percent of uptime per year = (8,760 - number of total hours down per year) / 8,760 If you schedule eight hours of downtime per month for maintenance and outages (96 hours total), then you can say the percentage of uptime per year is 8,760 minus 96 divided by 8,760. You can see you’d wind up with about 98.9 percent uptime for your systems. This should be an easy way for you to provide an accurate accounting of your downtime. Remember, you must account for downtime accurately when you plan for high availability. Downtime can be planned or, worse, unexpected. Sources of unexpected downtime include the following:
  • Disk crash or failure
  • Power or UPS failure
  • Application problems resulting in system crashes
  • Any other hardware or software malfunction

Building the Highly Available Solutions’ Plan

Let’s look at the plan to use a Highly Available design in your organization and review the many questions you need to ask before implementing it ‘live’. Remember, if the server is down, people can’t work, and millions of dollars can be lost within hours. The following is a list of what could happen in sequence:
  1. A company uses a server to access an application that accepts orders and does transactions.
  2. The application, when it runs, serves not only the sales staff, but also three other companies who do business-to-business (B2B) transactions. The estimate is, within one hour’s time, the peak money made exceeded 2.5 million dollars.
  3. The server crashes and you don’t have a Highly Availability solution in place. This means no failover, redundancy, or load balancing exists at all. It simply fails.
  4. It takes you (the systems engineer) 5 minutes to be paged, but about 15 minutes to get onsite. You then take 40 minutes to troubleshoot and resolve the problem.
  5. The company’s server is brought back online and connections are reestablished.
Everything appears functional again. The problem was simple this time—a simple application glitch that caused a service to stop and, once restarted, everything was okay. Now, the problem with this whole scenario is this: although it was a true disaster, it was also a simple one. The systems engineer happened to be nearby and was able to diagnose the problem quite quickly. Even better, the problem was a simple fix. This easy problem still took the companies’ shared application down for at least one hour and, if this had been a peak-time period, over 2 million dollars could have been lost. They want to become aware, so the possibility of 2 million in sales evaporating never occurs again. Worse still, the companies you connect to and your own clientele start to lose faith in your ability to serve them. This could also cost you revenue and the possibility of acquiring new clients moving forward. People talk and the uneducated could take this small glitch as a major problem with your company’s people, instead of the technology. Let’s look at this scenario again, except with a Highly Available solution in place:
  1. A company uses a Server to access an application that accepts orders and does transactions
  2. The application, when it runs, serves not only the sales staff, but also three other companies who do business-to-business (B2B) transactions. The estimate is, within one hour’s time, the peak money made exceeded 2.5 million dollars.
  3. The server crashes, but you do have a Highly Available solution in place. (Note, at this point, it doesn’t matter what the solution is. What matters is that you added redundancy into the service.)
  4. Server and application are redundant, so when a glitch takes place, the redundancy spares the application from failing.
  5. Customers are unaffected. Business resumes as normal. Nothing is lost and no downtime is accumulated.
  6. The ‘one hour’ you saved your business in downtime just paid for the entire Highly Available solution you implemented.

Human Resources and Highly Available Solutions

Human Resources (people) need to be trained and work on site to deal with a disaster. They also need to know how to work under fire. As a former United States Marine, I know about the “fog of war,” where you find yourself tired, disoriented, and probably unfocused on the job. These characteristics don’t help your response time with management. In any organization, especially with a system as complex as one that’s highly available, you need the right people to run it.

Managing Your Services

In this section, you see all the factors to consider while designing a Highly Available solution. The following is a list of the main services to remember:
• Service Management is the management of the true components of Highly
Available solutions: the people, the process in place, and the technology needed to create the solution. Keeping this balance to have a truly viable solution is important. Service Management includes the design and deployment phases.
  • Change Management is crucial to the ongoing success of the solution during the production phase. This type of management is used to monitor and log changes on the system.
  • Problem Management addresses the process for Help Desks and Server monitoring.
  • Security Management as discussed in Chapter 7, is tasked to prevent unauthorized penetrations of the system.
  • Performance Management is discussed in greater detail in this chapter. This type of management addresses the overall performance of the service, availability, and reliability. Other main services also exist, but the most important ones are highlighted here. Service management is crucial to the development of your Highly Available solution. You must cater to your customer’s demands for uptime. If you promise it, you better deliver it.

Highly Available System Assessment Ideas

The following is a list of items for you to use during the postproduction-planning phase. Make sure you covered all your bases with this list:
  • Now that you have your solution configured, document it! A lack of documentation will surely spell disaster for you. Documentation isn’t difficult to do, it’s simply tedious, but all that work will pay off in the end if you need it.
  • Train your staff. Make sure your staff has access to a test lab, books to read, and advanced training classes. Go to free seminars to learn more about High Availability. If you can ignore the sales pitch, they’re quite informative.
  • Test your staff with incident response drills and disaster scenarios. Written procedures are important, but live drills are even better to see how your staff responds. Remember, if you have a failure on a system, it could failover to another system, but you must quickly resolve the problem on the first system that failed. You could have the same issue on the other nodes in your cluster and if, that’s the case, you’re on borrowed time. Set up a scenario and test it.
  • Assess your current business climate, so you know what’s expected of your systems at all times. Plan for future capacity especially as you add new applications, and as hardware and traffic increase.
  • Revisit your overall business goals and objectives. Make sure what you intend to do with your high-availability solution is being provided. If you want faster access to the systems, is it, in fact, faster? When you have a problem, is the failover seamless? Are customers affected? You don’t want to implement a high-availability solution and have performance that gets worse. This won’t look good for you!
Do a data-flow analysis on the connections the high availability uses. You’d be surprised that damaged NICs, the wrong drivers, excessive protocols, bottlenecks, mismatched port speeds, and duplex, to name a few problems, have on the system. I’ve made significant differences in networks by simply running an analysis on the data flow on the wire and, through this analysis, have made great speed differences. A good example could be if you had old ISA-based NIC cards that only ran at 10 Mbps. If you plugged your system into a port that uses 100 Mbps, then you will only run at 10, because that’s as fast as the NIC will go. What would happen if the switch port was set to 100 Mbps and not to autonegotiate? This would create a problem because the NIC wouldn’t communicate on the network because of a mismatch in speeds. Issues like this are common on networks and could quite possibly be the reason for poor or no data flow on your network.
  • Monitor the services you consider essential to operation and make sure they’re always up and operational. Never assume a system will run flawlessly unless a change is implemented . . . at times, systems choke up on themselves, either by a hung thread or process. You can use network-monitoring tools like GFI, Tivoli, NetIQ, or Argent’s software solutions to monitor such services.
  • Assess your total cost of ownership (TCO) and see if it was all worth it.

Cost Analysis

Do a final cost analysis to check if you made the right decision. The best way to determine TCO is to go online and use a TOC calculator program that shows you TCO based on your own unique business model. Because, for the most part, all business models will be different, the best way to determine TCO is to run the calculator and figure TCO based on your own personal answers to the calculator’s questions. Here’s an example of a specific one, but many more are available to use online - just run a search in a search engine (like Google.com) on ROI/TCO calculators, and you will see them.

Testing a High Availability System

Now that you have the planning and design fundamentals down, let’s discuss the process of testing your high-availability systems. You need to assure the test is run for a long enough time, so you can get a solid sampling of how the system operates normally without stress (or activity) and how it runs with activity. Then, run a test long enough to obtain a solid baseline, so you know how your systems operate normally on a daily basis. Use that for a comparison during times of activity.

In Sum

This should give you a good running start on advanced planning for high availability, and it gives you many things to check and think about, especially when you’re done with your implementation.

Common Scenarios for Active Directory Related Backup and Disaster Recovery

Common Scenarios for Active Directory Related Backup and Disaster Recovery


(Or, Everything you ever wanted to know about AD DR Plans but couldn’t find in one place..)
As part of our Active Directory Risk Assessment Program we perform an Operational Interview portion of the engagement.  During this we talk about things we can’t really ask the machines.  Such as, SLA’s, OLA’s, DR Plans and various other things that involve humans more than computers but, are a vital part to the health and risk associated with an enterprise environment.  One of the things that often come up are common scenarios in your Disaster Recovery plans.  This came up for one of my customers and they asked me to compile information on how to handle these common topics.  So I compiled a list of TechNet and KB articles that will hopefully help you guys plug these into your DR plans. (Even having the links to the online articles handy when a disaster comes up can save you time, money, frustration and sanity Smile )  Hope this helps!
- How to recover an Active Directory forest
Planning for Active Directory Forest Recovery
http://technet.microsoft.com/en-us/library/planning-active-directory-forest-recovery(WS.10).aspx
Word Doc of the entire Forest Recovery Whitepaper:
http://go.microsoft.com/fwlink/?LinkId=152459
- How to recover domains
Recovering Active Directory Domain Services
http://technet.microsoft.com/en-us/library/cc816751(WS.10).aspx
- How to recover DNS
(Mostly covered in the Recovering Active Directory Domain Services article but additional info found here)
How to reinstall a dynamic DNS Active Directory-integrated zone
http://support.microsoft.com/kb/294328
- How to seize and transfer FSMO roles
Using Ntdsutil.exe to transfer or seize FSMO roles to a domain controller
http://support.microsoft.com/kb/255504
How to view and transfer FSMO roles in the graphical user interface
http://support.microsoft.com/kb/255690
- How to perform metadata cleanup
Clean Up Server Metadata (2008 & 2008R2)
http://technet.microsoft.com/en-us/library/cc816907%28WS.10%29.aspx
Clean up server metadata (2000, 2003 & 2003R2)
http://technet.microsoft.com/en-us/library/cc736378(WS.10).aspx
- How to recover an entire server
Windows Server Backup Step-by-Step Guide for Windows Server 2008
http://technet.microsoft.com/en-us/library/cc770266(WS.10).aspx
Performing a Full Server Recovery of a Domain Controller
http://technet.microsoft.com/en-us/library/cc772519(WS.10).aspx
- How to perform authoritative restores
- Active Directory database
Performing Authoritative Restore of Active Directory Objects
http://technet.microsoft.com/en-us/library/cc816878(WS.10).aspx
Performing Authoritative Restore of an Application Directory Partition
http://technet.microsoft.com/en-us/library/cc816934(WS.10).aspx
- SYSVOL (requires special recovery procedures)
For DFS Replicated SYSVOL
Restoring and Rebuilding SYSVOL
http://technet.microsoft.com/en-us/library/cc816596(WS.10).aspx
How to force an authoritative and non-authoritative synchronization for DFSR-replicated SYSVOL (like "D4/D2" for FRS)
http://support.microsoft.com/kb/2218556
For FRS Replicated SYSVOL
Using the BurFlags registry key to reinitialize File Replication Service replica sets
http://support.microsoft.com/kb/290762
How to rebuild the SYSVOL tree and its content in a domain
http://support.microsoft.com/kb/315457
- Successfully restoring users and their group memberships
How to restore deleted user accounts and their group memberships in Active Directory
http://support.microsoft.com/kb/840001
- How to perform non-authoritative restores
- Active Directory database
Performing Nonauthoritative Restore of Active Directory Domain Services
http://technet.microsoft.com/en-us/library/cc816627(WS.10).aspx
- SYSVOL (requires special recovery procedures) (Note: Same articles as Authoritative Restore since they include both procedures in the info.)
For DFS Replicated SYSVOL
Restoring and Rebuilding SYSVOL
http://technet.microsoft.com/en-us/library/cc816596(WS.10).aspx
How to force an authoritative and non-authoritative synchronization for DFSR-replicated SYSVOL (like "D4/D2" for FRS)
http://support.microsoft.com/kb/2218556
For FRS Replicated SYSVOL
Using the BurFlags registry key to reinitialize File Replication Service replica sets
http://support.microsoft.com/kb/290762
How to rebuild the SYSVOL tree and its content in a domain
http://support.microsoft.com/kb/315457