March 2008
Oracle 11g: How NOT to RAC Your Brain
by Matthew Zito, GridApp Systems Chief Scientist
Over the past few years, it seems that Oracle's database product has become more and more about add-on features and functionality than the core RDBMS product. That is to say, the lead up to 11g has shown us exactly how important the add-on options and new features are to Oracle's ongoing revenue demands. For example, while the Oracle 11g New Features Guide lists more than 150 new features (I stopped counting at 150), the major features that Oracle announced had to do with online upgrades, enhanced Data Guard replication, table compression, and improved encryption.
However, the dodgy bit about those features is that they are all additional licensable options. This wasn't revealed until after the official launch of 11g, thereby conveniently hiding the bad news until after the good news had some time to percolate into the Oracle DBA consciousness. Of course, there was a predictable outcry from some of the folks on Oracle-L, the excellent Oracle mailing list, complaining that Oracle is already expensive enough, so why make new features an additional cost?
The answer is, of course, that they're very useful new features, and that Oracle needs to continue to "draw from the well" - generate more revenue from existing customers.
So what does this have to do with automation and DBAs, and anything of any technical value whatsoever?
The DBA: Caught in the Middle
So, if we look at the players in this 11g rollout, one clear winner is Oracle, who gets to roll out a new database version that continues its commanding lead, feature-wise, over Microsoft SQL Server. They get to take a lot of the existing technologies they have, log miner, RAC, Data Guard, and improve and integrate them, so they can create whole new features out of the integration. Even better, they get to generate revenue from these new features, increasing shareholder value and in general helping maintain their market share and support revenue. So, they're doing okay.
The business side of the Oracle shops wins, because they get a pile of new features they can use in their applications and reports - get better uptime, more functional queries, faster reports, and improved overall access to information. Of course, after they've paid extra for these features, they want to make sure they're actually being used and taken advantage of.
So who loses? The DBA. DBAs are now saddled with more features to learn, implement, maintain, and leverage. Not to mention, they have to coordinate and oversee the upgrades and rollouts of a whole new major release of Oracle, with its bugs, quirks, and complexities. Finally, there's the continuing shift towards moving more and more responsibility on database servers from systems and storage administrators to the database team - contributing to the increased load on the DBA.
11g: A Chance for a Fresh Start
It may seem like based on all of this, we think 11g is a bad idea, or something that's not worth doing. The reality is quite the contrary however, as we see the move towards 11g, both for upgrades and provisioning new databases, as a marvelous opportunity to implement automation and take control of the new complexities of 11g before they get out of hand.
Why is automation so important for DBAs? Because there's plenty of moving parts in existing 10gR2 environments before we start adding all of the new bells and whistles in 11g. It's critical to establish standards, best practices, and automation methodologies at the beginning, when there isn't any organizational inertia around doing things a particular way.
So let's start from the ground up, and talk about how we can automate your database operations.
Step 1: To RAC or not to RAC?
If you're already running 10g RAC, then 11g RAC shouldn't be a huge paradigm shift, though it's worth thinking about what you would change about your current way of doing things if you had the opportunity. If you're not running RAC, the question is, "Why not?"
RAC today is stable enough, widely deployed enough, and intelligent enough to add a lot of value to your organization. Oracle 11g, from a RAC perspective, is primarily an enhancement release, with some additional optimizations and management improvements. This means that the jump from 10gR2 to 11g is not nearly as dramatic as the jump from 9i RAC to 10gR1 RAC, with all of the trials and tribulations that ensued.
Another interesting shift that we've seen over the past two years or so is a move from RAC being purely a high availability solution to actually being used for scalability. In addition, we've seen more people moving toward shared clusters, where there are multiple databases and instances co-hosted on the same physical cluster, often 4 nodes or more. Oracle has been advocating this in its Grid architecture since 10g's introduction, but it's only until recently that this has moved from a slide in an Oracle marketing deck to a successful reality at a number of customers.
So today RAC can bring you better availability, scalability, and server efficiency when used in conjunction with a shared cluster. By leveraging technologies like OCFS2 and ASM, you can also remove the need for traditional volume management tools like Veritas Volume Manager, which can help offset some of the cost of the RAC licenses themselves. It's definitely worth considering as part of your 11g rollout.
When it comes to RAC itself, what is there to automate and standardize? We recommend standardizing the following things:
Directly automating the deployment of Oracle CRS is not for the faint of heart, but is possible if you plan to roll out a lot of clusters. For an easier return on your investment, manually create the clusters, and then automate the database deployments, cloning, scale, etc. in your environments. After all, you'll only need to do one cluster install per cluster, but you'll need to do many database creates in a shared cluster model.
If your organization uses a datacenter automation tool such as Bladelogic or Opsware, it is definitely worth working with your systems administrators to build an OS profile that includes the Oracle prerequisites, as that is a lot of the complexity we see in our customer environments with RAC.
Step 2: ASM and Storage
ASM, though fairly straightforward conceptually, is one of the more ambitious features in Oracle today. In a world where traditionally DBAs have worked closely with storage and systems administrators to finely tune the storage configuration, creating individual LUNs and volumes for things like redo logs; ASM throws all of that out the window. With ASM, there's a ton of automation you get just by using it - after all, you no longer have to deal with the placement and layout of individual files. Instead, Oracle will take care of evenly distributing data across however many physical disks are available to it.
So, when it comes to ASM, the best automation you can do is simply make your storage configuration consistent. Create disk groups comprised of identically sized disks - yes, Oracle can cheerfully deal with differently sized devices, but you want to be able to scale in reasonable increments that are consistent and predictable. Creation of the ASM instance itself can be automated, though since you do this once per server/cluster, it doesn't necessarily make sense to spend the effort automating this part of the puzzle.
11g ASM, like RAC, is mostly basic enhancements or fixing glaring problems with the ASM in 10gR2. For example, one of the great weaknesses of using ASM to do mirroring in 10gR2 is that if one side of the mirror goes down, when it comes back up, the entire contents of the disk need to be replicated from the surviving disk, even if only a few megabytes of data changed on the surviving disk during that time. In 11g, it keeps track of what changed since when the disk vanished, and it will do a "fast resilvering", immediately checking the consistency of the disk and repairing any damaged/missing blocks.
The one big management enhancement for ASM in 11g is the concept of the SYSASM group. Similar to the SYSDBA group for Oracle, these are OS users who have privileges to log into the ASM instance on that node. This is a golden opportunity to push the administration of ASM back on to the systems or storage administrators. You may feel an internal resistance at this idea - after all ASM is part of Oracle, and Oracle is your responsibility. Fight this urge! There is nothing high-value or unique about being able to add disks to an ASM disk group, and better the people who are experienced in storage management do it. The other resistance you'll likely get is from the systems people, who were probably looking forward to getting rid of that annoying responsibility. However, there is no reason to not take advantage of this capability to divide storage management back the way it was before ASM.
Step 3: Database Automation
Almost nothing of this section is new or unique to Oracle 11g, as Oracle has been fairly consistent with its methodology for unattended provisioning for a number of releases now. It is important though, as we said earlier, to take the opportunity of an upgrade to 11g as a chance to do the right thing, and get your environment standardized and consistent. It can be time-consuming and frustrating up-front, but it is a no-brainer in terms of long-term value to you and the rest of your team.
In general, with automation, it's best to try to have one consistent workflow, or path, for doing as many different kinds of environments as possible. This is so that as you need to maintain and continue to develop your different standards, there's as much code reuse as possible. Based on that, then, while you can have the OUI create a database for you, we recommend running a software-only installation with the OUI, then running the DBCA and the NetCA. This is because on certain environments, like ASM, you need to follow this process. In addition, separating out these steps allows you to have finer-grained control over what happens throughout the process.
The OUI, like all of the Oracle installation tools, uses response files to pass in information that would normally be prompted for in the GUI installer. Response files are in the format of name/value pairs, and the easiest way to get a starting response file is to record one yourself. If you simply run: ./runInstaller -record-destinationFile /path/to/some/file and go through the GUI install process, the inputs you entered will be stored as a response file that you can then edit and "tweak" for your specific environments.
Running the OUI itself actually has two modes - silent and suppressed. "Silent" mode actually prevents the GUI from showing up at all during the installation process, so it's perfect for low-bandwidth links or other situations where you really don't want to have to deal with an X GUI popping up on your display. "Suppressed" mode is also useful - the GUI still pops up, but it prompts you only for questions that weren't filled in by the response file. This is a simple starting point for automated database provisioning that requires almost no scripting. Just make a response file with everything filled in except for the few values you want a user to be able to select (like home name, etc.), and then make a little wrapper script that calls the OUI with the response file in suppressed mode. Then, the people doing the database setup just answer the one or two questions they get to answer, and the rest is done automatically. A slightly more advanced version of this is a script that prompts the user for the fields that are missing in the response file, and it then passes those files to the runInstaller on the command line, like this: ./runInstaller -silent "ORACLE_HOME_NAME=OraDBHome1". This allows you to fill in the missing pieces of the response file without having to deal with editing the file or programmatically removing/setting different values.
On the database creation side of things, there's also response files that dictate things like file paths, SID, etc. - basically the -path and configuration- information that's *separate* from the actual contents of the database - its schema, users, etc. All of that information is stored in the database template - you'd recognize that from the DBCA where it asks if you want to create a "Starter Database" - those are all templates.
There are two kinds of DBCA templates - "seed" and "non-seed". Seed templates are actually copies of datafiles, and they are simply copied into place. This has the advantage of allowing you to easily include data in the template, as well as being much faster to deploy, since there's no data dictionary to create (it's already in the datafiles). It has the disadvantage that the size of the template itself is equal to the size of the database you end up with. Hence, it's probably not a good idea to take that 1TB data warehouse and turn it into a seed template.
Non-seed templates simply have the data definition for your database, so it's dramatically smaller. It also, however, takes longer to provision, as it needs to build the data dictionary, create the datafiles, etc. In both cases, templates can be created from existing databases or templates, or from scratch, using the DBCA.
Conclusion
Oracle 11g is an interesting release - a lot of new features, new prices, and new complexity for the DBA who needs to take advantage of it all. While upgrading is going to be a complex, in-depth process, it gives DBAs the opportunity to bring a consistent standardized environment, and put automation in place to make things as efficient as possible, allowing DBAs to spend time on the 100+ new features that come along with Oracle 11g. Three main target areas for automation and standardization with 11g are RAC, ASM, and the database provisioning and creation process. With all of these areas, some best practices and automation can be the key to a successful rollout of 11g.