DATA PROTECTION ~ TYPES OF BACKUP AND CONSIDERATIONS

Businesses and individuals alike should be aware of the options they have for data protection/data storage. Once they have solid backup/recovery implementations in place they can feel secure in knowing they will not lose important and irreplacement information. Time and labor costs will decrease because recovery times will be minimized when steps have been implemented to protect data in business and personal applications, remote offices, and desktop/mobile systems.

Data protection using external storage refers to various techniques and devices for storing large amounts of data. The earliest storage devices were punched paper cards, which were used as early as 1804 to control silk-weaving looms. Modern mass storage devices include all types of disk drives and tape drives. Mass data storage, or auxiliary storage, is distinct from your computer's memory, which refers to temporary storage areas within the computer. Your computer's main memory or RAM (random-access memory) refers to read and write memory; that is, you can both write data into RAM and read data from RAM. RAM is physical memory in the form of chips. This is in contrast to ROM (read-only memory) that holds the instructions to start up your computer, therefore is read only data. Most RAM is volatile, which means that it requires a steady flow of electricity to maintain its contents. As soon as the power is turned off, whatever data was in RAM is lost. Unlike main memory, mass data storage devices retain data even when the computer is turned off.
Mass data storage is measured in kilobytes (1,024 bytes), megabytes (1,024 kilobytes), gigabytes (1,024 megabytes) and terabytes (1,024 gigabytes).

The main types of data storage are:

  • Floppy disks : Relatively slow and have a small capacity, but they are portable, inexpensive, and universal.
    a) 8-inch: The first floppy disk design, invented by IBM in the late 1960s and used in the early 1970s as first a read-only format and then as a read-write format. The typical desktop/laptop computer does not use the 8-inch floppy disk.
    b) 5¼-inch: The common size for PCs made before 1987 and the predecessor to the 8-inch floppy disk. This type of floppy is generally capable of storing between 100K and 1.2MB (megabytes) of data. The most common sizes are 360K and 1.2MB.
    c) 3½-inch: Floppy is something of a misnomer for these disks, as they are encased in a rigid envelope. Despite their small size, microfloppies have a larger storage capacity than their cousins -- from 400K to 1.4MB of data. The most common sizes for PCs are 720K (double-density) and 1.44MB (high-density). Macintoshes support disks of 400K, 800K, and 1.2MB.
  • Hard disks : Very fast and with more capacity than floppy disks, but also more expensive. Hard disks hold more data and are faster than floppy disks. A hard disk can store anywhere from 10 to more than 100 gigabytes. Disks are random-access media, meaning that a disk drive can access any point at random without passing through intervening points.Some hard disk systems are portable (removable cartridges), but most are not, although it is possible to buy removable hard disks (cartridges).
  • Optical disks : Unlike floppy and hard disks, which use electromagnetism to encode data, optical disk systems use a laser to read and write data. Optical disks have very large storage capacity - up to 6 gigabytes (6 billion bytes), but they are not as fast as hard disks. In addition, the inexpensive optical disk drives are read-only. Read/write varieties are expensive.
    There are three basic types of optical disks:
    a) CD-ROM : Like audio CDs, CD-ROMs come with data already encoded onto them. The data is permanent and can be read any number of times, but CD-ROMs cannot be modified.

    b) WORM : Stands for write-once, read -many. With a WORM disk drive, you can write data onto a WORM disk, but only once. After that, the WORM disk behaves just like a CD-ROM.

    c) erasable: Optical disks that can be erased and loaded with new data, just like magnetic disks. These are often referred to as EO (erasable optical) disks.

    These three technologies are not compatible with one another; each requires a different type of disk drive and disk. Even within one category, there are many competing formats, although CD-ROMs are relatively standardized.

  • Tapes : Tapes have large storage capacities, ranging from a few hundred kilobytes to several gigabytes. Tapes are relatively inexpensive and can have very large storage capacities, but they do not permit random access of data as they are a sequential access media, which means that to get to a particular point on the tape, the tape must go through all the preceding points. Accessing data on tapes is much slower than accessing data on disks. Because tapes are so slow, they are generally used only for long-term storage and backup. Data to be used regularly is almost always kept on a disk. Tapes (sometimes called streamers or streaming tapes) are also used for transporting large amounts of data. Tapes come in a variety of sizes and formats.

OTHER TYPES OF BACKUP TECHNOLOGY TO CONSIDER:

VIRTUAL TAPE LIBRARY - A VTL is an archival backup solution that combines traditional tape backup methodology (software or appliance based) with low-cost disk technology to create an optimized backup and recovery solution. This provides backup and recovery performance benefits compared to tape based solutions but lets users continue using technologies and processes designed to work with their tape environments. It is an intelligent disk-based library acting like a tape library with the performance of modern disk drives, data is deposited onto disk drives just as it would onto a tape library, only faster. VTL cand be used as a stand alone tape library solution. A VTL generally consists of a Virtual Tape appliance or server, and software which emulates traditional tape devices and formats. Vendors include ADIC, Alacritus, Diligent, Falcon-Stor, Neartek, Overland, Quantum, Sepaton, and SpectraLogic.

NEAR-LINE DISK TARGET - A disk array that acts as a target or cache for tape backup. These arrays typically offier faster backup and recovery times when compared with tape and are cost effective because they're increasingly based on low cost Advanced Technology Attachment disk drives. Unlike virtual tape libraries, however, they typically require configuration and process changes to existing backup / recovery operations. Disk array refers to a linked group of one or more physical independent hard disk drives generally used to replace larger, single disk drive systems. The most common disk arrays are in daisy chain configuration or implement RAID (Redundant Array of Independent Disks) technology. A disk array may contain several disk drive trays, and is structured to improve speed and increase protection against loss of data.
Disk arrays organize their data storage into Logical Units (LUs), which appear as linear block paces to their clients. A small disk array, with a few disks, might support up to 8 LUs; a large one, with  hundreds of disk drives, can support thousands.
Disk arrays are an integral part of high-performance storage systems, and their importance and scale are growing as continuous access to information becomes critical to the day-to-day operation of modern business.
Vendors include Engenio, Network Appliance, and Nexsan.

CONTENT-ADDRESSED STORAGE (CAS) - A disk based storage system that uses the content of the data as a locator for the information, eliminating dependence on file system locators or volume/block/device descriptors to identify and locate specific data. CAS an object-oriented system for storing data that are not intended to be changed once they are stored (e.g., medical images, sales invoices, archived e-mail). CAS assigns a unique identifying logical address to the data record when it is stored, and that address is neither duplicated nor changed in order to ensure that the record always contains the exact same data as were originally stored. CAS relies on disk storage instead of removable media, such as tape. CAS is often used as a new story paradigm for archiving reference information. EMC's Centera is an example of CAS.

MASSIVE ARRAY OF IDLE DISKS (MAID) - A disk system in which disks spin only when necessary (such as during read/write operations), reducing total power consumption and enabling massive high-capacity disk systems with comparable economics to tape libraries. The many hundred disks share a power supply/controller/cabling cabinet infrastructure An algorithm is used to decide which disks in a cabinet should spin and which not. Inactive disks are powered down, and then spun up again when needed. Reactivation typically takes under 10 seconds. Disks are spun on a regular basis even when not used to keep them operational. This so-called duty cycle management can reduce the number of stops experienced by a drive by a quarter. For comparison a typical ATA drive is built for 40,000 stops over its life.
Copan Systems' Revolution 200T is an example of MAID. Treat the disk drives as tapes in a library and only have them powered up when needed, hence the massive array of idle disks idea. The Copan array will contain hundreds of trabytes of disk in a single cabinet (think bladed disks). Copan claims it will be able to offer disk backup at tape prices because of the savings involved.

SNAPSHOTS AND INCREMENTAL CAPTURE - A snapshot is a copy of a volume that is essentially empty but has pointers to existing files. When one of the files changes the snap volume creates a copy of the original file just before the new file is written to disk on the original volume. IT administrators have a second copy of data saved to disk that they can use for instantaneous recovery or as an offline copy for backups. The most common method is a copy-on-write technique. When one of the existing files changes, the snap volume creates a copy of the original file just before the new file is written to disk on the original volume. Incremental capture solutions can take snapshots at the block, file, or volume level. This provides users with more granularity when capturing data and offers unique integration capabilities with applications because these products typically write at the block level. A wide variety of vendors offer some type of snapshot capability. Software vendors with volume management capabilities, such as Microsoft and Veritas, also provide snapshot functionality. Vendors such as FilesX, have the capability to either replace existing backup technologies or co-exist with them.

INCREMENTAL CAPTURE - Vendors in this category can replace existing backup technologies or co-exist with them. Incremental capture solutions can take snapshots at the block, file, or volume level. This gives users more detail when capturing data and offers unique integration capabilities with applications because these products typically write at the block level. FilesX is an example of incremental capture.

CONTINUOUS CAPTURE - This segment of the data-protection market includes software or appliances designed to capture every write made to primary storage and make a time-stamped copy on a secondary device. The main objective is to have the ability to re-create a data set as it existed at any point in time with the goal of being able to rapidly restore applications. Representative vendors include Alacritus, Mendocino Software (via acquired assets from Vyant Software), Revivio, and StorageTek. While it will be a while before these technologies become mainstream, today they are helping end users who need instantaneous recoverability for their applications.

ARRAY-BASED REPLICATION - These products have been around for a long time and have traditionally come from large disk-array vendors such as EMC, Hitachi Data Systems, and IBM. These products run on high-end arrays and are very robust (and expensive). They usually come in two types: synchronous or asynchronous. In the past, these replication technologies only worked between homogeneous arrays from the same vendor, requiring two expensive arrays with two expensive software licenses for each replication pair. As host-based replication became more robust, the array-based replication vendors began to add more flexibility in their solutions. For example, the requirement to replicate from one high-end array to another no longer exists, allowing companies to deploy lower-cost arrays at remote sites. Additionally, prices have come down, and new vendors are getting into the game. For example, vendors such as EqualLogic, Exagrid, and Intransa provide replication with their disk arrays at relatively low prices.

HOST-BASED REPLICATION - Host-based replication software runs on servers. As writes are made to one array, they are also written to a second array. Vendors in this category have eliminated many of the complexities in their products, making them easier to deploy and manage. Representative vendors of host-based replication software include EMC-Legato, DataCore Software, NSI, Softek, Sun, Topio, and Veritas.

FABRIC-BASED REPLICATION - The new debate raging in the storage industry revolves around the following question: "Where should storage services, or applications, reside—on hosts, arrays, or in the fabric on switches or appliances?" The hardware that connects workstations and servers to storage devices in a SAN is referred to as a "fabric." The SAN fabric enables any-server-to-any-storage device connectivity through the use of Fibre Channel switching technology. Storage Area Network (SAN) is a high-speed subnetwork of shared storage devices. Because stored data does not reside directly on any of a network's servers, server power is utilized for business applications, and network capacity is released to the end user. Fabric-based applications are relatively new but IT professionals expect a strong trend toward fabric-based intelligence over the next couple years due to a number of potential advantages. For example, the sooner an I/O is captured, the sooner it can be sent to a secondary device, thus enabling better performance. Examples of vendors with solutions in this space include Brocade, Candera, Cisco, CNT, FalconStor, IBM, Kashya, Maranti Networks, McDATA, and Troika. A variety of traditional switch vendors are putting intelligent blades into their core products, and third-party developers are porting their applications to the blades. Blades are a single circuit board populated with components such as processors, memory, and network connections that are usually found on multiple boards. Server blades are designed to slide into existing servers. Server blades are more cost-efficient, smaller and consume less power than traditional box-based servers.



Sources:

MAID: further information at: http://sc-2002.org/paperpdfs/pap.pap312.pdf

http://is.pennnet.com/Articles/Article_Display.cfm?Section=Curri&Subsection=Display&P=23&ARTICLE_ID=203173&KEYWORD=Kenniston

Network World, May 16, 2005, Vol. 22, Number 19, Page41.