Time to talk about a subject that most DBAs hate, but some DBAs like me have a love-hate relation with it. Replication. Not storage replication or Availability groups. That old workhorse SQL Server data replication.
- Limit the number of articles per publication: Having to occasionally rebuild replication is a humongous pain in the backside, but it’s a reality of working with replication. You will have to rebuild replication. If you need to replicate a large number of tables, rebuilding replication can take a long time. Creating a new snapshot of all those tables, transferring it to the subscriber, and importing all that data can take a really long time. Often times, when you are forced to rebuild replication, it is because only one or a few tables are having issues. If you break the database apart into separate publications with only a limited number of tables per publication, you can rebuild replication for just that publication and cut your rebuild time by a huge amount.
- Don’t add an article to more than one publication: The distributor contains transactions for every article in every publication. If a table is in more than 1 publication, it’s transactions get replicated in the distributor. Since each publication could have different retention policies or different filters or different settings, it can’t rely on a single copy of the transaction to be stored for a table. It has to store the transactions for the table for each publication. If it’s in 2 publications, it doubles the amount of data stored for that table. If it’s in 3 publications, it triples the data stored. And so on. The more data that is stored in the distributor, the more data that has to be scanned/seeked by the agents that apply the data on the subscribers. More data = more slower (poor grammar for emphasis).
- Know how long it takes to remove and rebuild replication: As I mentioned above, rebuilding replication is matter of when, not if. Prior to rebuilding replication, you’re probably going to spend some time troubleshooting the current problem that may or may not lead to rebuilding replication. And this is precisely why you need to know how long it takes to rebuild replication.
An issue I see over and over are people who spend a lot more time troubleshooting replication than it would take to rebuild it. I get it, rebuilding replication sucks! But spending 4 times the amount of time to rebuild replication on troubleshooting because you hate rebuilding it and then having to rebuild it anyway, sucks 5 times as much.
My number 1 rule of thumb for managing replication is “never spend more time troubleshooting replication than it takes to rebuild it”. - Avoid replicating LOB data if you can: In general, you shouldn’t replicate any data you don’t need, but we all end up replicating more than we need due to the complexity involved around being selective. I definitely do it. So I’m offering a compromise here. LOB data is one area where the complexity of being selective is worth it.
At one of the places I worked at previously, the company bought another company, and when we were working on incorporating them into our infrastructure, we learned what a nightmare their replication was. They were merge replicating their data which was them re-published via merge replication to kisks over the web. And they were storing massive amounts of images in the database … twice. Yes, twice. They stored the image in it’s raw format and then encrypted it. They kept both versions of the images and replicated both versions of the images.
Does that sound bad? Well, the kiosks created the images and never had a reason to read them after creating them initially. They never displayed them. One table alone was merge replicating over 300 GB of images OVER THE WEB. The images should never had been replicated back to the machines. And definitely not 2 versions of the images. - Become best friends with Replication Monitor: Replication Monitor should be your best friend if you are managing replication. It shows you how replication is performing, how far behind it is if it isn’t performing, and gives you details about errors it experiences.
Learn it … love it … make it your new best friend.
P.S. Replication monitor is also Database Mirroring Monitor and its real name is SQL Monitor.