also called DATA COMPACTION, the process of reducing the amount of data needed for the storage or transmission of a given piece of information, typically by the use of encoding techniques.
Numerous techniques exist for compressing the binary data used by digital computers and communications devices. The binary system represents each alphanumeric character with a string of eight binary digits (bits), each of which is either a 0 or a 1, and which together form a byte. A rudimentary data-compression system is key-word encoding, whereby frequently occurring words such as "the" are converted into a two-byte token. More advanced techniques analyze, identify, and then replace commonly occurring text patterns with single characters and symbols; for example, "ing to" in phrases such as "going to" could be converted to "$," thus significantly reducing the size of a large block of text. These techniques may also represent characters and symbols with strings of fewer than eight bits, with the characters used most often represented by the least number of bits. A requirement for successful decoding in schemes using strings of variable length is that the bits designating the ends of characters must be unambiguously identifiable. Huffman is a widely used form of this technique. Run-length encoding is used for data containing repetitive characters; it stores the repeated string once and indicates the number of occurrences.
The principal benefits of data compression are: larger data-storage capacity, particularly on CD-ROM devices; more efficient transmission of information over facsimile machines and modems; and encryption, or disguising the meaning of information. A trade-off between time and speed characterizes most advanced methods of data compression. Typically, the more time a compression program (instruction set) is allowed to analyze data, the greater will be the compression, although that rule is subject to diminishing marginal returns.