Errorcorrecting codes and cryptology Ruud Pellikaan 1 , XinWen Wu 2 , Stanislav Bulygin 3 and Relinde Jurrius 4 PRELIMINARY VERSION 23 January 2012 All rights reserved. To be published by Cambridge University Press. No part of this manuscript is to be reproduced without written consent of the authors and the publisher.
1
[email protected], Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, NL5600 MB Eindhoven, The Netherlands 2
[email protected], School of Information and Communication Technology, Griffith University, Gold Coast, QLD 4222, Australia 3
[email protected], Department of Mathematics, Technische Universit¨ at Darmstadt, Mornewegstrasse 32, 64293 Darmstadt, Germany 4
[email protected], Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, NL5600 MB Eindhoven, The Netherlands
2
Contents 1 Introduction 11 1.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Errorcorrecting codes 2.1 Block codes . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Repetition, product and Hamming codes . . 2.1.2 Codes and Hamming distance . . . . . . . . 2.1.3 Exercises . . . . . . . . . . . . . . . . . . . 2.2 Linear Codes . . . . . . . . . . . . . . . . . . . . . 2.2.1 Linear codes . . . . . . . . . . . . . . . . . 2.2.2 Generator matrix and systematic encoding 2.2.3 Exercises . . . . . . . . . . . . . . . . . . . 2.3 Parity checks and dual code . . . . . . . . . . . . . 2.3.1 Parity check matrix . . . . . . . . . . . . . 2.3.2 Hamming and simplex codes . . . . . . . . 2.3.3 Inner product and dual codes . . . . . . . . 2.3.4 Exercises . . . . . . . . . . . . . . . . . . . 2.4 Decoding and the error probability . . . . . . . . . 2.4.1 Decoding problem . . . . . . . . . . . . . . 2.4.2 Symmetric channel . . . . . . . . . . . . . . 2.4.3 Exercises . . . . . . . . . . . . . . . . . . . 2.5 Equivalent codes . . . . . . . . . . . . . . . . . . . 2.5.1 Number of generator matrices and codes . . 2.5.2 Isometries and equivalent codes . . . . . . . 2.5.3 Exercises . . . . . . . . . . . . . . . . . . . 2.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
13 14 15 18 21 21 21 22 25 26 26 28 30 32 33 34 35 38 38 38 40 44 45
3 Code constructions and bounds 3.1 Code constructions . . . . . . . . . . . . . . . 3.1.1 Constructing shorter and longer codes 3.1.2 Product codes . . . . . . . . . . . . . 3.1.3 Several sum constructions . . . . . . . 3.1.4 Concatenated codes . . . . . . . . . . 3.1.5 Exercises . . . . . . . . . . . . . . . . 3.2 Bounds on codes . . . . . . . . . . . . . . . . 3.2.1 Singleton bound and MDS codes . . . 3.2.2 Griesmer bound . . . . . . . . . . . . 3.2.3 Hamming bound . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
47 47 47 52 55 60 62 63 63 68 69
3
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
4
CONTENTS . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
71 72 73 74 74 77 78 78
4 Weight enumerator 4.1 Weight enumerator . . . . . . . . . . . . . . . . . . . . 4.1.1 Weight spectrum . . . . . . . . . . . . . . . . . 4.1.2 Average weight enumerator . . . . . . . . . . . 4.1.3 MacWilliams identity . . . . . . . . . . . . . . 4.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . 4.2 Error probability . . . . . . . . . . . . . . . . . . . . . 4.2.1 Error probability of undetected error . . . . . . 4.2.2 Probability of decoding error . . . . . . . . . . 4.2.3 Random coding . . . . . . . . . . . . . . . . . . 4.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . 4.3 Finite geometry and codes . . . . . . . . . . . . . . . . 4.3.1 Projective space and projective systems . . . . 4.3.2 MDS codes and points in general position . . . 4.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . 4.4 Extended weight enumerator . . . . . . . . . . . . . . 4.4.1 Arrangements of hyperplanes . . . . . . . . . . 4.4.2 Weight distribution of MDS codes . . . . . . . 4.4.3 Extended weight enumerator . . . . . . . . . . 4.4.4 Puncturing and shortening . . . . . . . . . . . 4.4.5 Exercises . . . . . . . . . . . . . . . . . . . . . 4.5 Generalized weight enumerator . . . . . . . . . . . . . 4.5.1 Generalized Hamming weights . . . . . . . . . 4.5.2 Generalized weight enumerators . . . . . . . . . 4.5.3 Generalized weight enumerators of MDScodes 4.5.4 Connections . . . . . . . . . . . . . . . . . . . . 4.5.5 Exercises . . . . . . . . . . . . . . . . . . . . . 4.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
79 79 79 83 85 88 88 89 89 90 90 90 90 95 97 97 97 102 104 107 110 111 111 113 115 118 120 120
5 Codes and related structures 5.1 Graphs and codes . . . . . . . . . . . . . . . . . . . 5.1.1 Colorings of a graph . . . . . . . . . . . . . 5.1.2 Codes on graphs . . . . . . . . . . . . . . . 5.1.3 Exercises . . . . . . . . . . . . . . . . . . . 5.2 Matroids and codes . . . . . . . . . . . . . . . . . . 5.2.1 Matroids . . . . . . . . . . . . . . . . . . . 5.2.2 Realizable matroids . . . . . . . . . . . . . 5.2.3 Graphs and matroids . . . . . . . . . . . . . 5.2.4 Tutte and Whitney polynomial of a matroid 5.2.5 Weight enumerator and Tutte polynomial . 5.2.6 Deletion and contraction of matroids . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
123 123 123 126 127 128 128 129 130 131 132 133
3.3
3.4
3.2.4 Plotkin bound . . . . . . . . . . . . . 3.2.5 Gilbert and Varshamov bounds . . . . 3.2.6 Exercises . . . . . . . . . . . . . . . . Asymptotically good codes . . . . . . . . . . 3.3.1 Asymptotic GibertVarshamov bound 3.3.2 Some results for the generic case . . . 3.3.3 Exercises . . . . . . . . . . . . . . . . Notes . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
CONTENTS
5 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
134 136 136 136 141 144 145 146 146 148 150 156 157 157 158 158 158 161 161 162
6 Complexity and decoding 6.1 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 BigOh notation . . . . . . . . . . . . . . . . . . . . 6.1.2 Boolean functions . . . . . . . . . . . . . . . . . . . 6.1.3 Hard problems . . . . . . . . . . . . . . . . . . . . . 6.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Decoding complexity . . . . . . . . . . . . . . . . . . 6.2.2 Decoding erasures . . . . . . . . . . . . . . . . . . . 6.2.3 Information and covering set decoding . . . . . . . . 6.2.4 Nearest neighbor decoding . . . . . . . . . . . . . . . 6.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Difficult problems in coding theory . . . . . . . . . . . . . . 6.3.1 General decoding and computing minimum distance 6.3.2 Is decoding up to half the minimum distance hard? . 6.3.3 Other hard problems . . . . . . . . . . . . . . . . . . 6.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
165 165 165 166 171 172 173 173 174 177 184 184 184 184 187 188 188
7 Cyclic codes 7.1 Cyclic codes . . . . . . . . . . . . 7.1.1 Definition of cyclic codes 7.1.2 Cyclic codes as ideals . . 7.1.3 Generator polynomial . . 7.1.4 Encoding cyclic codes . . 7.1.5 Reversible codes . . . . . 7.1.6 Parity check polynomial . 7.1.7 Exercises . . . . . . . . . 7.2 Defining zeros . . . . . . . . . . . 7.2.1 Structure of finite fields .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
189 189 189 191 192 195 196 197 200 201 201
5.3
5.4
5.5
5.6
5.2.7 McWilliams type property for duality . . . . 5.2.8 Exercises . . . . . . . . . . . . . . . . . . . . Geometric lattices and codes . . . . . . . . . . . . . 5.3.1 Posets, the M¨ obius function and lattices . . . 5.3.2 Geometric lattices . . . . . . . . . . . . . . . 5.3.3 Geometric lattices and matroids . . . . . . . 5.3.4 Exercises . . . . . . . . . . . . . . . . . . . . Characteristic polynomial . . . . . . . . . . . . . . . 5.4.1 Characteristic and M¨obius polynomial . . . . 5.4.2 Characteristic polynomial of an arrangement 5.4.3 Characteristic polynomial of a code . . . . . 5.4.4 Minimal codewords and subcodes . . . . . . . 5.4.5 Two variable zeta function . . . . . . . . . . 5.4.6 Overview . . . . . . . . . . . . . . . . . . . . 5.4.7 Exercises . . . . . . . . . . . . . . . . . . . . Combinatorics and codes . . . . . . . . . . . . . . . . 5.5.1 Orthogonal arrays and codes . . . . . . . . . 5.5.2 Designs and codes . . . . . . . . . . . . . . . 5.5.3 Exercises . . . . . . . . . . . . . . . . . . . . Notes . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
6
CONTENTS
7.3
7.4
7.5
7.6
7.2.2 Minimal polynomials . . . . . . . . . . . . . . . . . 7.2.3 Cyclotomic polynomials and cosets . . . . . . . . . 7.2.4 Zeros of the generator polynomial . . . . . . . . . 7.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . Bounds on the minimum distance . . . . . . . . . . . . . . 7.3.1 BCH bound . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Quadratic residue codes . . . . . . . . . . . . . . . 7.3.3 Hamming, simplex and Golay codes as cyclic codes 7.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . Improvements of the BCH bound . . . . . . . . . . . . . . 7.4.1 HartmannTzeng bound . . . . . . . . . . . . . . . 7.4.2 Roos bound . . . . . . . . . . . . . . . . . . . . . . 7.4.3 AB bound . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 Shift bound . . . . . . . . . . . . . . . . . . . . . . 7.4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . Locator polynomials and decoding cyclic codes . . . . . . 7.5.1 MattsonSolomon polynomial . . . . . . . . . . . . 7.5.2 Newton identities . . . . . . . . . . . . . . . . . . . 7.5.3 APGZ algorithm . . . . . . . . . . . . . . . . . . . 7.5.4 Closed formulas . . . . . . . . . . . . . . . . . . . . 7.5.5 Key equation and Forney’s formula . . . . . . . . . 7.5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 Polynomial codes 8.1 RS codes and their generalizations . . . . . . . . . . . . 8.1.1 ReedSolomon codes . . . . . . . . . . . . . . . . 8.1.2 Extended and generalized RS codes . . . . . . . 8.1.3 GRS codes under transformations . . . . . . . . 8.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . 8.2 Subfield and trace codes . . . . . . . . . . . . . . . . . . 8.2.1 Restriction and extension by scalars . . . . . . . 8.2.2 Parity check matrix of a restricted code . . . . . 8.2.3 Invariant subspaces . . . . . . . . . . . . . . . . . 8.2.4 Cyclic codes as subfield subcodes . . . . . . . . . 8.2.5 Trace codes . . . . . . . . . . . . . . . . . . . . . 8.2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . 8.3 Some families of polynomial codes . . . . . . . . . . . . 8.3.1 Alternant codes . . . . . . . . . . . . . . . . . . . 8.3.2 Goppa codes . . . . . . . . . . . . . . . . . . . . 8.3.3 Counting polynomials . . . . . . . . . . . . . . . 8.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . 8.4 ReedMuller codes . . . . . . . . . . . . . . . . . . . . . 8.4.1 Punctured ReedMuller codes as cyclic codes . . 8.4.2 ReedMuller codes as subfield subcodes and trace 8.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . 8.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . codes . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
205 206 211 213 214 214 217 217 218 219 219 220 223 224 228 229 229 230 232 234 235 238 239
. . . . . . . . . . . . . . . . . . . . . .
241 241 241 243 247 250 251 251 252 254 257 258 258 259 259 260 263 265 266 266 267 270 270
CONTENTS 9 Algebraic decoding 9.1 Errorcorrecting pairs . . . . . . . . . . . . . 9.1.1 Decoding by errorcorrecting pairs . . 9.1.2 Existence of errorcorrecting pairs . . 9.1.3 Exercises . . . . . . . . . . . . . . . . 9.2 Decoding by key equation . . . . . . . . . . . 9.2.1 Algorithm of EuclidSugiyama . . . . 9.2.2 Algorithm of BerlekampMassey . . . 9.2.3 Exercises . . . . . . . . . . . . . . . . 9.3 List decoding by Sudan’s algorithm . . . . . . 9.3.1 Errorcorrecting capacity . . . . . . . 9.3.2 Sudan’s algorithm . . . . . . . . . . . 9.3.3 List decoding of ReedSolomon codes . 9.3.4 List Decoding of ReedMuller codes . 9.3.5 Exercises . . . . . . . . . . . . . . . . 9.4 Notes . . . . . . . . . . . . . . . . . . . . . .
7
. . . . . . . . . . . . . . .
271 271 271 275 276 277 277 278 281 281 282 285 287 291 292 292
10 Cryptography 10.1 Symmetric cryptography and block ciphers . . . . . . . . . . . . 10.1.1 Symmetric cryptography . . . . . . . . . . . . . . . . . . . 10.1.2 Block ciphers. Simple examples . . . . . . . . . . . . . . . 10.1.3 Security issues . . . . . . . . . . . . . . . . . . . . . . . . 10.1.4 Modern ciphers. DES and AES . . . . . . . . . . . . . . . 10.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Asymmetric cryptosystems . . . . . . . . . . . . . . . . . . . . . 10.2.1 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 Discrete logarithm problem and publickey cryptography 10.2.3 Some other asymmetric cryptosystems . . . . . . . . . . . 10.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Authentication, orthogonal arrays, and codes . . . . . . . . . . . 10.3.1 Authentication codes . . . . . . . . . . . . . . . . . . . . . 10.3.2 Authentication codes and other combinatorial objects . . 10.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Secret sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Basics of stream ciphers. Linear feedback shift registers . . . . . 10.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 PKC systems using errorcorrecting codes . . . . . . . . . . . . . 10.6.1 McEliece encryption scheme . . . . . . . . . . . . . . . . . 10.6.2 Niederreiter’s encryption scheme . . . . . . . . . . . . . . 10.6.3 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.4 The attack of Sidelnikov and Shestakov . . . . . . . . . . 10.6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.1 Section 10.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.2 Section 10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.3 Section 10.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.4 Section 10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.5 Section 10.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.6 Section 10.6 . . . . . . . . . . . . . . . . . . . . . . . . . .
295 295 295 296 300 302 308 308 311 314 316 317 317 317 321 324 324 328 329 334 335 336 338 340 343 344 345 345 347 348 348 349 349
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
8
CONTENTS
11 The theory of Gr¨ obner bases and its applications 11.1 Polynomial system solving . . . . . . . . . . . . . . 11.1.1 Linearization techniques . . . . . . . . . . . 11.1.2 Gr¨ obner bases . . . . . . . . . . . . . . . . 11.1.3 Exercises . . . . . . . . . . . . . . . . . . . 11.2 Decoding codes with Gr¨obner bases . . . . . . . . . 11.2.1 Cooper’s philosophy . . . . . . . . . . . . . 11.2.2 Newton identities based method . . . . . . 11.2.3 Decoding arbitrary linear codes . . . . . . . 11.2.4 Exercises . . . . . . . . . . . . . . . . . . . 11.3 Algebraic cryptanalysis . . . . . . . . . . . . . . . 11.3.1 Toy example . . . . . . . . . . . . . . . . . 11.3.2 Writing down equations . . . . . . . . . . . 11.3.3 General SBoxes . . . . . . . . . . . . . . . 11.3.4 Exercises . . . . . . . . . . . . . . . . . . . 11.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
351 352 352 355 362 363 363 368 371 373 374 374 375 378 379 380
12 Coding theory with computer algebra packages 12.1 Singular . . . . . . . . . . . . . . . . . . . . . . . 12.2 Magma . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Linear codes . . . . . . . . . . . . . . . . 12.2.2 AGcodes . . . . . . . . . . . . . . . . . . 12.2.3 Algebraic curves . . . . . . . . . . . . . . 12.3 GAP . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Sage . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 Coding Theory . . . . . . . . . . . . . . . 12.4.2 Cryptography . . . . . . . . . . . . . . . . 12.4.3 Algebraic curves . . . . . . . . . . . . . . 12.5 Coding with computer algebra . . . . . . . . . . 12.5.1 Introduction . . . . . . . . . . . . . . . . 12.5.2 Errorcorrecting codes . . . . . . . . . . . 12.5.3 Code constructions and bounds . . . . . . 12.5.4 Weight enumerator . . . . . . . . . . . . . 12.5.5 Codes and related structures . . . . . . . 12.5.6 Complexity and decoding . . . . . . . . . 12.5.7 Cyclic codes . . . . . . . . . . . . . . . . . 12.5.8 Polynomial codes . . . . . . . . . . . . . . 12.5.9 Algebraic decoding . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
381 381 383 384 385 385 386 387 387 388 388 388 388 388 392 395 397 397 397 399 401
13 B´ ezout’s theorem and codes on plane curves 13.1 Affine and projective space . . . . . . . . . . . . . . . . . . 13.2 Plane curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 B´ezout’s theorem . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Another proof of Bezout’s theorem by the footprint 13.4 Codes on plane curves . . . . . . . . . . . . . . . . . . . . . 13.5 Conics, arcs and Segre . . . . . . . . . . . . . . . . . . . . . 13.6 Qubic plane curves . . . . . . . . . . . . . . . . . . . . . . . 13.6.1 Elliptic cuves . . . . . . . . . . . . . . . . . . . . . . 13.6.2 The addition law on elliptic curves . . . . . . . . . . 13.6.3 Number of rational points on an elliptic curve . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
403 403 403 406 413 413 414 414 414 414 414
. . . . . . . . . . . . . . . . . . . .
CONTENTS 13.6.4 The discrete logarithm on elliptic curves 13.7 Quartic plane curves . . . . . . . . . . . . . . . 13.7.1 Flexes and bitangents . . . . . . . . . . 13.7.2 The Klein quartic . . . . . . . . . . . . 13.8 Divisors . . . . . . . . . . . . . . . . . . . . . . 13.9 Differentials on a curve . . . . . . . . . . . . . . 13.10The RiemannRoch theorem . . . . . . . . . . . 13.11Codes from algebraic curves . . . . . . . . . . . 13.12Rational functions and divisors on plane curves 13.13Resolution or normalization of curves . . . . . . 13.14Newton polygon of plane curves . . . . . . . . . 13.15Notes . . . . . . . . . . . . . . . . . . . . . . .
9 . . . . . . . . . . . .
14 Curves 14.1 Algebraic varieties . . . . . . . . . . . . . . . . . 14.2 Curves . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Curves and function fields . . . . . . . . . . . . . 14.4 Normal rational curves and Segre’s problems . . 14.5 The number of rational points . . . . . . . . . . . 14.5.1 Zeta function . . . . . . . . . . . . . . . . 14.5.2 HasseWeil bound . . . . . . . . . . . . . 14.5.3 Serre’s bound . . . . . . . . . . . . . . . . 14.5.4 Ihara’s bound . . . . . . . . . . . . . . . . 14.5.5 DrinfeldVl˘ adut¸ bound . . . . . . . . . . . 14.5.6 Explicit formulas . . . . . . . . . . . . . . 14.5.7 Oesterl´e’s bound . . . . . . . . . . . . . . 14.6 Trace codes and curves . . . . . . . . . . . . . . . 14.7 Good curves . . . . . . . . . . . . . . . . . . . . . 14.7.1 Maximal curves . . . . . . . . . . . . . . . 14.7.2 Shimura modular curves . . . . . . . . . . 14.7.3 Drinfeld modular curves . . . . . . . . . . 14.7.4 TsfasmanVl˘ adut¸Zink bound . . . . . . . 14.7.5 Towers of GarciaStichtenoth . . . . . . . 14.8 Applications of AG codes . . . . . . . . . . . . . 14.8.1 McEliece crypto system with AG codes . 14.8.2 Authentication codes . . . . . . . . . . . . 14.8.3 Fast multiplication in finite fields . . . . . 14.8.4 Correlation sequences and pseudo random 14.8.5 Quantum codes . . . . . . . . . . . . . . . 14.8.6 Exercises . . . . . . . . . . . . . . . . . . 14.9 Notes . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
414 414 414 414 414 417 419 421 424 424 424 425
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sequences . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
427 428 428 428 428 428 428 428 428 428 428 428 428 428 428 428 428 428 428 428 429 429 429 431 431 431 431 431
10
CONTENTS
Chapter 1
Introduction Acknowledgement:
1.1
Notes
11
12
CHAPTER 1. INTRODUCTION
Chapter 2
Errorcorrecting codes Ruud Pellikaan and XinWen Wu The idea of redundant information is a well known phenomenon in reading a newspaper. Misspellings go usually unnoticed for a casual reader, while the meaning is still grasped. In Semitic languages such as Hebrew, and even older in the hieroglyphics in the tombs of the pharaohs of Egypt, only the consonants are written while the vowels are left out, so that we do not know for sure how to pronounce these words nowadays. The letter “e” is the most frequent occurring symbol in the English language, and leaving out all these letters would still give in almost all cases an understandable text to the expense of greater attention of the reader. The art and science of deleting redundant information in a clever way such that it can be stored in less memory or space and still can be expanded to the original message, is called data compression or source coding. It is not the topic of this book. So we can compress data but an error made in a compressed text would give a different message that is most of the time completely meaningless. The idea in errorcorrecting codes is the converse. One adds redundant information in such a way that it is possible to detect or even correct errors after transmission. In radio contacts between pilots and radarcontroles the letters in the alphabet are spoken phonetically as ”Alpha, Bravo, Charlie, ...” but ”Adams, Boston, Chicago, ...” is more commonly used for spelling in a telephone conversation. The addition of a parity check symbol enables one to detect an error, such as on the former punch cards that were fed to a computer, in the ISBN code for books, the European Article Numbering (EAN) and the Universal Product Code (UPC) for articles. Errorcorrecting codes are common in numerous situations such as in audiovisual media, faulttolerant computers and deep space telecommunication. more examples: QR quick response 2D code. deep space, compact disc and DVD, ..... more pictures
13
14
source
CHAPTER 2. ERRORCORRECTING CODES message 
sender
001... 
011... 
encoding
receiver
message target
decoding 6 noise
Figure 2.1: Block diagram of a communication system
2.1
Block codes
Legend goes that Hamming was so frustrated the computer halted every time it detected an error after he handed in a stack of punch cards, he thought about a way the computer would be able not only to detect the error but also to correct it automatically. He came with his nowadays famous code named after him. Whereas the theory of Hamming is about the actual construction, the encoding and decoding of codes and uses tools from combinatorics and algebra , the approach of Shannon leads to information theory and his theorems tell us what is and what is not possible in a probabilistic sense.
According to Shannon we have a message m in a certain alphabet and of a certain length, we encode m to c by expanding the length of the message and adding redundant information. One can define the information rate R that measures the slowing down of the transmission of the data. The encoded message c is sent over a noisy channel such that the symbols are changed, according to certain probabilities that are characteristic of the channel. The received word r is decoded to m0 . Now given the characteristics of the channel one can define the capacity C of the channel and it has the property that for every R < C it is possible to find an encoding and decoding scheme such that the error probability that m0 6= m is arbitrarily small. For R > C such a scheme is not possible. The capacity is explicitly known as a function of the characteristic probability for quite a number of channels. The notion of a channel must be taken in a broad sense. Not only the transmission of data via satellite or telephone but also the storage of information on a hard disk of a computer or a compact disc for music and film can be modeled by a channel. The theorem of Shannon tells us the existence of certain encoding and decoding schemes and one can even say that they exist in abundance and that almost all schemes satisfy the required conditions, but it does not tell us how to construct a specific scheme efficiently. The information theoretic part of errorcorrecting codes is considered in this book only so far to motivate the construction of coding and decoding algorithms.
2.1. BLOCK CODES
15
The situation for the best codes in terms of the maximal number of errors that one can correct for a given information rate and code length is not so clear. Several existence and nonexistence theorems are known, but the exact bound is in fact still an open problem.
2.1.1
Repetition, product and Hamming codes
Adding a parity check such that the number of ones is even, is a wellknown way to detect one error. But this does not correct the error. Example 2.1.1 Replacing every symbol by a threefold repetition gives the possibility of correcting one error in every 3tuple of symbols in a received word by a majority vote. The price one has to pay is that the transmission is three times slower. We see here the two conflicting demands of errorcorrection: to correct as many errors as possible and to transmit as fast a possible. Notice furthermore that in case two errors are introduced by transmission the majority decoding rule will introduce an decoding error. Example 2.1.2 An improvement is the following product construction. Suppose we want to transmit a binary message (m1 , m2 , m3 , m4 ) of length 4 by adding 5 redundant bits (r1 , r2 , r3 , r4 , r5 ). Put these 9 bits in a 3 × 3 array as shown below. The redundant bits are defined by the following conditions. The sum of the number of bits in every row and in every column should be even. m1 m3 r3
m2 m4 r4
r1 r2 r5
It is clear that r1 , r2 , r3 and r4 are well defined by these rules. The condition on the last row and on the last column are equivalent, given the rules for the first two rows and columns. Hence r5 is also well defined. If in the transmission of this word of 9 bits, one symbol is flipped from 0 to 1 or vice versa, then the receiver will notice this, and is able to correct it. Since if the error occurred in row i and column j, then the receiver will detect an odd parity in this row and this column and an even parity in the remaining rows and columns. Suppose that the message is m = (1, 1, 0, 1). Then the redundant part is r = (0, 1, 1, 0, 1) and c = (1, 1, 0, 1, 0, 1, 1, 0, 1) is transmitted. Suppose that y = (1, 1, 0, 1, 0, 0, 1, 0, 1) is the received word. 1 0 1
1 1 0
0 0 1 ↑
←
Then the receiver detects an error in row 2 and column 3 and will change the corresponding symbol. So this product code can also correct one error as the repetition code but its information rate is improved from 1/3 to 4/9. This decoding scheme is incomplete in the sense that in some cases it is not decided what to do and the scheme will fail to determine a candidate for the transmitted word. That is called a decoding failure. Sometimes two errors can be corrected. If the first error is in row i and column j, and the second in row i0
16
CHAPTER 2. ERRORCORRECTING CODES
and column j 0 with i0 > i and j 0 6= j. Then the receiver will detect odd parities in rows i and i0 and in columns j and j 0 . There are two error patterns of two errors with this behavior. That is errors at the positions (i, j) and (i0 , j 0 ) or at the two pairs (i, j 0 ) and (i0 , j). If the receiver decides to change the first two pairs if j 0 > j and the second two pairs if j 0 < j, then it will recover the transmitted word half of the time this pattern of two errors takes place. If for instance the word c = (1, 1, 0, 1, 0, 1, 1, 0, 1) is transmitted and y = (1, 0, 0, 1, 0, 0, 1, 0, 1) is received, then the above decoding scheme will change it correctly in c. But if y0 = (1, 1, 0, 0, 1, 1, 1, 0, 1) is received, then the scheme will change it in the codeword c0 = (1, 0, 0, 0, 1, 0, 1, 0, 1) and we have a decoding error. 1 0 1
0 1 0 ↑
0 0 1 ↑
← ←
1 0 1
1 0 0 ↑
1 1 1 ↑
← ←
If two errors take place in the same row, then the receiver will see an even parity in all rows and odd parities in the columns j and j 0 . We can expand the decoding rule to change the bits at the positions (1, j) and (1, j 0 ). Likewise we will change the bits in positions (i, 1) and (i0 , 1) if the columns give even parity and the rows i and i0 have an odd parity. This decoding scheme will correct all patterns with 1 error correctly, and sometimes the patterns with 2 errors. But it is still incomplete, since the received word (1, 1, 0, 1, 1, 0, 0, 1, 0) has an odd parity in every row and in every column and the scheme fails to decode. One could extend the decoding rule to get a complete decoding in such a way that every received word is decoded to a nearest codeword. This nearest codeword is not always unique. In case the transmission is by means of certain electromagnetic pulses or waves one has to consider modulation and demodulation. The message consists of letters of a finite alphabet, say consisting of zeros and ones, and these are modulated, transmitted as waves, received and demodulated in zeros and ones. In the demodulation part one has to make a hard decision between a zero or a one. But usually there is a probability that the signal represents a zero. The hard decision together with this probability is called a soft decision. One can make use of this information in the decoding algorithm. One considers the list of all nearest codewords, and one chooses the codeword in this list that has the highest probability. Example 2.1.3 An improvement of the repetition code of rate 1/3 and the product code of rate 4/9 is given by Hamming. Suppose we have a message (m1 , m2 , m3 , m4 ) of 4 bits. Put them in the middle of the following Venndiagram of three intersecting circles as given in Figure 2.2. Complete the three empty areas of the circles according to the rule that the number of ones in every circle is even. In this way we get 3 redundant bits (r1 , r2 , r3 ) that we add to the message and which we transmit over the channel. In every block of 7 bits the receiver can correct one error. Since the parity in every circle should be even. So if the parity is even we declare the circle correct, if the parity is odd we declare the circle incorrect. The error is in the incorrect circles and in the complement of the correct circles. We see that every pattern of at most one error can be corrected in this way. For instance, if m = (1, 1, 0, 1) is the message, then r = (0, 0, 1) is the redundant information
2.1. BLOCK CODES
17
'$ r3
'$ '$ m2
m1
m4
&% r
r1
m3
2
&% &%
Figure 2.2: Venn diagram of the Hamming code
'$ 1
'$ '$ 0
1 1
&% 0
0
0
&% &%
Figure 2.3: Venn diagram of a received word for the Hamming code
added and c = (1, 1, 0, 1, 0, 0, 1) the codeword sent. If after transmission one symbol is flipped and y = (1, 0, 0, 1, 0, 0, 1) is the received word as given in Figure 2.3. Then we conclude that the error is in the left and upper circle, but not in the right one. And we conclude that the error is at m2 . But in case of 2 errors and for instance the word y0 = (1, 0, 0, 1, 1, 0, 1) is received, then the receiver would assume that the error occurred in the upper circle and not in the two lower circles, and would therefore conclude that the transmitted codeword was (1, 0, 0, 1, 1, 0, 0). Hence the decoding scheme creates an extra error. The redundant information r can be obtained from the message m by means of three linear equations or parity checks modulo two r1 r2 r3
= m2 = m1 = m1
+ + +
m3 m3 m2
+ m4 + m4 + m4
Let c = (m, r) be the codeword. Then c is a codeword if and only if HcT = 0,
18 where
CHAPTER 2. ERRORCORRECTING CODES
0 H= 1 1
1 0 1
1 1 0
1 1 1
1 0 0
0 1 0
0 0 . 1
The information rate is improved from 1/3 for the repetition code and 4/9 for the product code to 4/7 for the Hamming code. *** gate diagrams of encoding/decoding scheme ***
2.1.2
Codes and Hamming distance
In general the alphabets of the message word and the encoded word might be distinct. Furthermore the length of both the message word and the encoded word might vary such as in a convolutional code. We restrict ourselves to [n, k] block codes that is the message words have a fixed length of k symbols and the encoded words have a fixed length of n symbols both from the same alphabetQ. For the purpose of error control, before transmission, we add redundant symbols to the message in a clever way. Definition 2.1.4 Let Q be a set of q symbols called the alphabet. Let Qn be the set of all ntuples x = (x1 , . . . , xn ), with entries xi ∈ Q. A block code C of length n over Q is a nonempty subset of Qn . The elements of C are called codewords. If C contains M codewords, then M is called the size of the code. We call a code with length n and size M an (n, M ) code. If M = q k , then C is called an [n, k] code. For an (n, M ) code defined over Q, the value n − logq (M ) is called the redundancy. The information rate is defined as R = logq (M )/n. Example 2.1.5 The repetition code has length 3 and 2 codewords, so its information rate is 1/3. The product code has length 9 and 24 codewords, hence its rate is 4/9. The Hamming code has length 7 and 24 codewords, therefore its rate is 4/7. Example 2.1.6 Let C be the binary block code of length n consisting of all words with exactly two ones. This is an (n, n(n − 1)/2) code. In this example the number of codewords is not a power of the size of the alphabet. Definition 2.1.7 Let C be an [n, k] block code over Q. An encoder of C is a onetoone map E : Qk −→ Qn such that C = E(Qk ). Let c ∈ C be a codeword. Then there exists a unique m ∈ Qk with c = E(m). This m is called the message or source word of c. In order to measure the difference between two distinct words and to evaluate the errorcorrecting capability of the code, we need to introduce an appropriate metric to Qn . A natural metric used in Coding Theory is the Hamming distance. Definition 2.1.8 For x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) ∈ Qn , the Hamming distance d(x, y) is defined as the number of places where they differ: d(x, y) = {i  xi 6= yi }.
2.1. BLOCK CODES
19
y
x
YH H H d(y,z) H HH H d(x,y) H HH j : d(x,z) 9
z
Figure 2.4: Triangle inequality
Proposition 2.1.9 The Hamming distance is a metric on Qn , that means that the following properties hold for all x, y, z ∈ Qn : (1) d(x, y) ≥ 0 and equality hods if and only if x = y, (2) d(x, y) = d(y, x) (symmetry), (3) d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality),
Proof. Properties (1) and (2) are trivial from the definition. We leave (3) to the reader as an exercise. Definition 2.1.10 The minimum distance of a code C of length n is defined as d = d(C) = min{ d(x, y)  x, y ∈ C, x 6= y } if C consists of more than one element, and is by definition n + 1 if C consists of one word. We denote by (n, M, d) a code C with length n, size M and minimum distance d. The main problem of errorcorrecting codes from “Hamming’s point view” is to construct for a given length and number of codewords a code with the largest possible minimum distance, and to find efficient encoding and decoding algorithms for such a code. Example 2.1.11 The triple repetition code consists of two codewords: (0, 0, 0) and (1, 1, 1), so its minimum distance is 3. The product and Hamming code both correct one error. So the minimum distance is at least 3, by the triangle inequality. The product code has minimum distance 4 and the Hamming code has minimum distance 3. Notice that all three codes have the property that x + y is again a codeword if x and y are codewords. Definition 2.1.12 Let x ∈ Qn . The ball of radius r around x, denoted by Br (x), is defined by Br (x) = { y ∈ Qn  d(x, y) ≤ r }. The sphere of radius r around x is denoted by Sr (x) and defined by Sr (x) = { y ∈ Qn  d(x, y) = r }.
20
CHAPTER 2. ERRORCORRECTING CODES
q q q q q q q
q q q q q q q q q q q q '$ q q q q q q *q q q q q q q q q q q q &% q q q q q q q q q q q q
Figure 2.5: Ball of radius
q q q q q
q q q q q
√
2 in the Euclidean plane
q q q q qmq q q q q
q q q q q
Figure 2.6: Balls of radius 0 and 1 in the Hamming metric Figure 2.1.2 shows the ball in the Euclidean plane. This is misleading in some respects, but gives an indication what we should have in mind. Figure 2.1.2 shows Q2 , where the alphabet Q consists of 5 elements. The ball B0 (x) consists of the points in the circle, B1 (x) is depicted by the points inside the cross, and B2 (x) consists of all 25 dots. Proposition 2.1.13 Let Q be an alphabet of q elements and x ∈ Qn . Then n Si (x) = (q − 1)i i
and
Br (x) =
r X n i=0
i
(q − 1)i .
Proof. Let y ∈ Si (x). Let I be the subset of {1, . . . , n} consisting of all positions j such that yj 6= xj . Then the number of elements of I is equal to i. And (q − 1)i is the number of words y ∈ Si (x) that have the same fixed I. The number of possibilities to choose the subset I with a fixed number of elements i is equal to ni . This shows the formula for the number of elements of Si (x). Furthermore Br (x) is the disjoint union of the subsets Si (x) for i = 0, . . . , r. This proves the statement about the number of elements of Br (x).
2.2. LINEAR CODES
2.1.3
21
Exercises
2.1.1 Consider the code of length 8 that is obtained by deleting the last entry r5 from the product code of Example 2.1.2. Show that this code corrects one error. 2.1.2 Give a gate diagram of the decoding algorithm for the product code of Example 2.1.2 that corrects always 1 error and sometimes 2 errors. 2.1.3 Give a proof of Proposition 2.1.9 (3), that is the triangle inequality of the Hamming distance. 2.1.4 Let Q be an alphabet of q elements. Let x, y ∈ Qn have distance d. Show that the number of elements in the intersection Br (x) ∩ Bs (y) is equal to X dd − in − d (q − 2)j (q − 1)k , i j k i,j,k
where i, j and k are nonnegative integers such that i + j ≤ d, k ≤ n − d, i + j + k ≤ r and d − i + k ≤ s. 2.1.5 Write a procedure in GAP that takes n as an input and constructs the code as in Example 2.1.6.
2.2
Linear Codes
Linear codes are introduced in case the alphabet is a finite field. These codes have more structure and are therefore more tangible than arbitrary codes.
2.2.1
Linear codes
If the alphabet Q is a finite field, then Qn is a vector space. This is for instance the case if Q = {0, 1} = F2 . Therefore it is natural to look at codes in Qn that have more structure, in particular that are linear subspaces. Definition 2.2.1 A linear code C is a linear subspace of Fnq , where Fq stands for the finite field with q elements. The dimension of a linear code is its dimension as a linear space over Fq . We denote a linear code C over Fq of length n and dimension k by [n, k]q , or simply by [n, k]. If furthermore the minimum distance of the code is d, then we call by [n, k, d]q or [n, k, d] the parameters of the code. It is clear that for a linear [n, k] code over Fq , its size M = q k . The information rate is R = k/n and the redundancy is n − k. Definition 2.2.2 For a word x ∈ Fnq , its support, denoted by supp(x), is defined as the set of nonzero coordinate positions, so supp(x) = {i  xi 6= 0}. The weight of x is defined as the number of elements of its support, which is denoted by wt(x). The minimum weight of a code C, denoted by mwt(C), is defined as the minimal value of the weights of the nonzero codewords: mwt(C) = min{ wt(c)  c ∈ C, c 6= 0 }, in case there is a c ∈ C not equal to 0, and n + 1 otherwise.
22
CHAPTER 2. ERRORCORRECTING CODES
Proposition 2.2.3 The minimum distance of a linear code C is equal to its minimum weight. Proof. Since C is a linear code, we have that 0 ∈ C and for any c1 , c2 ∈ C, c1 − c2 ∈ C. Then the conclusion follows from the fact that wt(c) = d(0, c) and d(c1 , c2 ) = wt(c1 − c2 ). Definition 2.2.4 Consider the situation of two Fq linear codes C and D of length n. If D ⊆ C, then D is called a subcode of C, and C a supercode of D. Remark 2.2.5 Suppose C is an [n, k, d] code. Then, for any r, 1 ≤ r ≤ k, there exist subcodes with dimension r. And for any given r, there may exist more than one subcode with dimension r. The minimum distance of a subcode is always greater than or equal to d. So, by taking an appropriate subcode, we can get a new code of the same length which has a larger minimum distance. We will discuss this later in Section 3.1. Now let us see some examples of linear codes. Example 2.2.6 The repetition code over Fq of length n consists of all words c = (c, c, . . . , c) with c ∈ Fq . This is a linear code of dimension 1 and minimum distance n. Example 2.2.7 Let n be an integer with n ≥ 2. The even weight code C of length n over Fq consists of all words in Fnq of even weight. The minimum weight of C is by definition 2, the minimum distance of C is 2 if q = 2 and 1 otherwise. The code C linear if and only if q = 2. Example 2.2.8 Let C be a binary linear code. Consider the subset Cev of C consisting of all codewords in C of even weight. Then Cev is a linear subcode and is called the even weight subcode of C. If C 6= Cev , then there exists a codeword c in C of odd weight and C is the disjunct union of the cosets c + Cev and Cev . Hence dim(Cev ) ≥ dim(C) − 1. Example 2.2.9 The Hamming code C of Example 2.1.3 consists of all the words c ∈ F72 satisfying HcT = 0, where 0 1 1 1 1 0 0 H = 1 0 1 1 0 1 0 . 1 1 0 1 0 0 1 This code is linear of dimension 4, since it is given by the solutions of three independent homogeneous linear equations. The minimum weight is 3 as shown in Example 2.1.11. So it is a [7, 4, 3] code.
2.2.2
Generator matrix and systematic encoding
Let C be an [n, k] linear code over Fq . Since C is a kdimensional linear subspace of Fnq , there exists a basis that consists of k linearly independent codewords, say g1 , . . . , gk . Suppose gi = (gi1 , . . . , gin ) for i = 1, . . . , k. Denote g1 g11 g12 · · · g1n g2 g21 g22 · · · g2n G= . = . .. .. .. . .. .. . . . gk gk1 gk2 · · · gkn
2.2. LINEAR CODES
23
Every codeword c can be written uniquely as a linear combination of the basis elements, so c = m1 g1 + · · · + mk gk where m1 , . . . , mk ∈ Fq . Let m = (m1 , . . . , mk ) ∈ Fkq . Then c = mG. The encoding E : Fkq −→ Fnq , from the message word m ∈ Fkq to the codeword c ∈ Fnq can be done efficiently by a matrix multiplication. c = E(m) := mG. Definition 2.2.10 A k × n matrix G with entries in Fq is called a generator matrix of an Fq linear code C if the rows of G are a basis of C. A given [n, k] code C can have more than one generator matrix, however every generator matrix of C is a k ×n matrix of rank k. Conversely every k ×n matrix of rank k is the generator matrix of an Fq linear [n, k] code. Example 2.2.11 The linear codes with parameters [n, 0, n + 1] and [n, n, 1] are the trivial codes {0} and Fnq , and they have the empty matrix and the n × n identity matrix In as generator matrix, respectively. Example 2.2.12 The repetition code of length n has generator matrix G = ( 1 1 · · · 1 ). Example 2.2.13 The binary evenweight code of length n has for instance the following two generator matrices 1 1 0 ... 0 0 0 1 0 ... 0 0 1 0 1 1 ... 0 0 0 0 1 ... 0 0 1 .. .. .. . . .. .. .. . .. .. .. and .. .. . . . . . . . . . . . . . . . 0 0 ... 1 0 1 0 0 0 ... 1 1 0 0 0 ... 0 1 1 0 0 0 ... 0 1 1 Example 2.2.14 The Hamming code C of Example 2.1.3 is a [7, 4] code. The message symbols mi for i = 1, . . . , 4 are free to choose. If we take mi = 1 and the remaining mj = 0 for j 6= i we get the codeword gi . In this way we get the basis g1 , g2 , g3 , g4 of the code C, that are the rows of following generator matrix 1 0 0 0 0 1 1 0 1 0 0 1 0 1 G= 0 0 1 0 1 1 0 . 0 0 0 1 1 1 1 From the example, the generator matrix G of the Hamming code has the following form (Ik  P ) where Ik is the k × k identity matrix and P a k × (n − k) matrix.
24
CHAPTER 2. ERRORCORRECTING CODES
Remark 2.2.15 Let G be a generator matrix of C. From Linear Algebra, see Section ??, we know that we can transform G by Gaussian elimination in a row equivalent matrix in row reduced echelon form by a sequence of the three elementary row operations: 1) interchanging two rows, 2) multiplying a row with a nonzero constant, 3) adding one row to another row. Moreover for a given matrix G, there is exactly one row equivalent matrix that is in row reduced echelon form, denoted by rref(G). In the following proposition it is stated that rref(G) is also a generator matrix of C. Proposition 2.2.16 Let G be a generator matrix of C. Then rref(G) is also a generator matrix of C and rref(G) = M G, where M is an invertible k × k matrix with entries in Fq . Proof. The row reduced echelon form rref(G) of G is obtained from G by a sequence of elementary operations. The code C is equal to the row space of G, and the row space does not change under elementary row operations. So rref(G) generates the same code C. Furthermore rref(G) = E1 · · · El G, where E1 , . . . , El are the elementary matrices that correspond to the elementary row operations. Let M = E1 · · · El . Then M is an invertible matrix, since the Ei are invertible, and rref(G) = M G. Proposition 2.2.17 Let G1 and G2 be two k ×n generator matrices generating the codes C1 and C2 over Fq . Then the following statements are equivalent: 1) C1 = C2 , 2) rref(G1 ) = rref(G2 ), 3) there is a k × k invertible matrix M with entries in Fq such that G2 = M G1 . Proof. 1) implies 2): The row spaces of G1 and G2 are the same, since C1 = C2 . So G1 and G2 are row equivalent. Hence rref(G1 ) = rref(G2 ). 2) implies 3): Let Ri = rref(Gi ). There is a k × k invertible matrix Mi such that Gi = Mi Ri for i = 1, 2, by Proposition 2.2.17. Let M = M2 M1−1 . Then M G1 = M2 M1−1 M1 R1 = M2 R2 = G2 . 3) implies 1): Suppose G2 = M G1 for some k × k invertible matrix M . Then every codeword of C2 is linear combination of the rows of G1 that are in C1 . So C2 is a subcode of C1 . Similarly C1 ⊆ C2 , since G1 = M −1 G2 . Hence C1 = C2 . Remark 2.2.18 Although a generator matrix G of a code C is not unique, the row reduced echelon form rref(G) is unique. That is to say, if G is a generator matrix of C, then rref(G) is also a generator matrix of C, and furthermore if G1 and G2 are generator matrices of C, then rref(G1 ) = rref(G2 ). Therefore the row reduced echelon form rref(C) of a code C is welldefined, being rref(G) for a generator matrix G of C by Proposition 2.2.17. Example 2.2.19 The generator matrix G2 of Example 2.2.13 is in rowreduced echelon form and a generator matrix of the binary evenweight code C. Hence G2 = rref(G1 ) = rref(C).
2.2. LINEAR CODES
25
Definition 2.2.20 Let C be an [n, k] code. The code is called systematic at the positions (j1 , . . . , jk ) if for all m ∈ Fkq there exists a unique codeword c such that cji = mi for all i = 1, . . . , k. In that case, the set {j1 , . . . , jk } is called an information set. A generator matrix G of C is called systematic at the positions (j1 , . . . , jk ) if the k × k submatrix G0 consisting of the k columns of G at the positions (j1 , . . . , jk ) is the identity matrix. For such a matrix G the mapping m 7→ mG is called systematic encoding. Remark 2.2.21 If a generator matrix G of C is systematic at the positions (j1 , . . . , jk ) and c is a codeword, then c = mG for a unique m ∈ Fkq and cji = mi for all i = 1, . . . , k. Hence C is systematic at the positions (j1 , . . . , jk ). Now suppose that the ji with 1 ≤ j1 < · · · < jk ≤ n indicate the positions of the pivots of rref(G). Then the code C and the generator matrix rref(G) are systematic at the positions (j1 , . . . , jk ). Proposition 2.2.22 Let C be a code with generator matrix G. Then C is systematic at the positions j1 , . . . , jk if and only if the k columns of G at the positions j1 , . . . , jk are linearly independent. Proof. Let G be a generator matrix of C. Let G0 be the k × k submatrix of G consisting of the k columns at the positions (j1 , . . . , jk ). Suppose C is systematic at the positions (j1 , . . . , jk ). Then the map given by x 7→ xG0 is injective. Hence the columns of G0 are linearly independent. Conversely, if the columns of G0 are linearly independent, then there exists a k × k invertible matrix M such that M G0 is the identity matrix. Hence M G is a generator matrix of C and C is systematic at (j1 , . . . , jk ). Example 2.2.23 Consider a code 1 0 1 1 G= 1 1 1 1 Then
C with generator matrix 1 0 1 0 1 0 0 0 1 1 0 0 . 0 1 0 0 1 0 0 1 0 0 1 1
1 0 rref(C) = rref(G) = 0 0
0 1 0 0
1 1 0 0
0 0 1 0
1 0 1 0
0 1 1 0
1 1 1 0
0 0 0 1
and the code is systematic at the positions 1, 2, 4 and 8. By the way we notice that the minimum distance of the code is 1.
2.2.3
Exercises
2.2.1 Determine for the product code of Example 2.1.2 the number of codewords, the number of codewords of a given weight, the minimum weight and the minimum distance. Express the redundant bits rj for j = 1, . . . , 5 as linear equations over F2 in the message bits mi for i = 1, . . . , 4. Give a 5 × 9 matrix H such that c = (m, r) is a codeword of the product code if and only if HcT = 0, where m is the message of 4 bits mi and r is the vector with the 5 redundant bits rj .
26
CHAPTER 2. ERRORCORRECTING CODES
2.2.2 Let x and y be binary words of the same length. Show that wt(x + y) = wt(x) + wt(y) − 2supp(x) ∩ supp(y). 2.2.3 Let C be an Fq linear code with generator matrix G. Let q = 2. Show that every codeword of C has even weight if and only if every row of a G has even weight. Show by means of a counter example that the above statement is not true if q 6= 2. 2.2.4 Consider the following matrix 1 1 G= 0 1 0 1
with entries in F5 1 1 1 0 2 3 4 0 . 4 4 1 1
Show that G is a generator matrix of a [5, 3, 3] code. Give the row reduced echelon form of this code. 2.2.5 Compute the complexity of the encoding of a linear [n, k] code by an arbitrary generator matrix G and in case G is systematic, respectively, in terms of the number of additions and multiplications.
2.3
Parity checks and dual code
Linear codes are implicitly defined by parity check equations and the dual of a code is introduced.
2.3.1
Parity check matrix
There are two standard ways to describe a subspace, explicitly by giving a basis, or implicitly by the solution space of a set of homogeneous linear equations. Therefore there are two ways to describe a linear code. That is explicitly as we have seen by a generator matrix, or implicitly by a set of homogeneous linear equations that is by the null space of a matrix. Let C be an Fq linear [n, k] code. Suppose that H is an m × n matrix with entries in Fq . Let C be the null space of H. So C is the set of all c ∈ Fnq such that HcT = 0. These m homogeneous linear equations are called parity check equations, or simply parity checks. The dimension k of C is at least n − m. If there are dependent rows in the matrix H, that is if k > n − m, then we can delete a few rows until we obtain an (n − k) × n matrix H 0 with independent rows and with the same null space as H. So H 0 has rank n − k. Definition 2.3.1 An (n − k) × n matrix of rank n − k is called a parity check matrix of an [n, k] code C if C is the null space of this matrix. Remark 2.3.2 The parity check matrix of a code can be used for error detection. This is useful in a communication channel where one asks for retransmission in case more than a certain number of errors occurred. Suppose that C is a linear code of minimum distance d and H is a parity check matrix of C. Suppose that the codeword c is transmitted and r = c + e is received. Then e
2.3. PARITY CHECKS AND DUAL CODE
27
is called the error vector and wt(e) the number of errors. Now HrT = 0 if there is no error and HrT 6= 0 for all e such that 0 < wt(e) < d. Therefore we can detect any pattern of t errors with t < d. But not more, since if the error vector is equal to a nonzero codeword of minimal weight d, then the receiver would assume that no errors have been made. The vector HrT is called the syndrome of the received word. We show that every linear code has a parity check matrix and we give a method to obtain such a matrix in case we have a generator matrix G of the code. Proposition 2.3.3 Suppose C is an [n, k] code. Let Ik be the k × k identity matrix. Let P be a k × (n − k) matrix. Then, (Ik P ) is a generator matrix of C if and only if (−P T In−k ) is a parity check matrix of C. Proof. Every codeword c is of the form mG with m ∈ Fkq . Suppose that the generator matrix G is systematic at the first k positions. So c = (m, r) with r ∈ Fn−k and r = mP . Hence for a word of the form c = (m, r) with m ∈ Fkq q n−k and r ∈ Fq the following statements are equivalent: c is a codeword , −mP + r = 0, −P T mT + rT = 0, −P T In−k (m, r)T = 0, −P T In−k cT = 0. Hence −P T In−k is a parity check matrix of C. The converse is proved similarly. Example 2.3.4 The trivial codes {0} and Fnq have In and the empty matrix as parity check matrix, respectively. Example 2.3.5 As a consequence of Proposition 2.3.3 we see that a parity check matrix of the binary even weight code is equal to the generator matrix ( 1 1 · · · 1 ) of the repetition code, and the generator matrix G2 of the binary even weight code of Example 2.2.13 is a parity check matrix of the repetition code. Example 2.3.6 The ISBN code of a book consists of a word (b1 , . . . , b10 ) of 10 symbols of the alphabet with the 11 elements: 0, 1, 2, . . . , 9 and X of the finite field F11 , where X is the symbol representing 10, that satisfies the parity check equation: b1 + 2b2 + 3b3 + · · · + 10b10 = 0. Clearly his code detects one error. This code corrects many patterns of one transposition of two consecutive symbols. Suppose that the symbols bi and bi+1 are interchanged and there are no other errors, then the parity check gives as outcome X ibi+1 + (i + 1)bi + jbj = s. j6=i,i+1
28
CHAPTER 2. ERRORCORRECTING CODES
P We know that j jbj = 0, since (b1 , . . . , b10 ) is an ISBN codeword. Hence s = bi − bi+1 . But this position i is in general not unique. Consider for instance the following code: 0444815933. Then the checksum gives 4, so it is not a valid ISBN code. Now assume that the code is the result of transposition of two consecutive symbols. Then 4044815933, 0448415933, 0444185933, 0444851933 and 0444819533 are the possible ISBN codes. The first and third code do not match with existing books. The second, fourth and fifth code correspond to books with the titles: “The revenge of the dragon lady,” “The theory of errorcorrecting codes” and “Nagasaki’s symposium on Chernobyl,” respectively. Example 2.3.7 The generator matrix G of the Hamming code C in Example 2.2.14 is of the form (I4 P ) and in Example 2.2.9 we see that the parity check matrix is equal to (P T I3 ). Remark 2.3.8 Let G be a generator matrix of an [n, k] code C. Then the row reduced echelon form G1 = rref(G) is not systematic at the first k positions but at the positions (j1 , . . . , jk ) with 1 ≤ j1 < · · · < jk ≤ n. After a permutation π of the n positions with corresponding n × n permutation matrix, denoted by Π, we may assume that G2 = G1 Π is of the form (Ik P ). Now G2 is a generator matrix of the code C2 which is not necessarily equal to C. A parity check matrix H2 for C2 is given by (−P T In−k ) according to Proposition 2.3.3. A parity check matrix H for C is now of the form (−P T In−k )ΠT , since Π−1 = ΠT . This remark motivates the following definition. Definition 2.3.9 Let I = {i1 , . . . , ik } be an information set of the code C. Then its complement {1, . . . , n} \ I is called a check set. Example 2.3.10 Consider the code C of Example 2.2.23 with generator matrix G. The row reduced echelon form G1 = rref(G) is systematic at the positions 1, 2, 4 and 8. Let π be the permutation (348765) with corresponding permutation matrix Π. Then G2 = G1 Π = (I4 P ) and H2 = (P T I4 ) with 1 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 1 1 , H2 = 1 0 1 0 0 1 0 0 G2 = 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0 Now π −1 = (356784) and
1 1 T H = H2 Π = 0 1
1 0 1 1
1 0 0 0
0 1 1 1
0 1 0 0
0 0 1 0
0 0 0 1
0 0 0 0
is a parity check matrix of C.
2.3.2
Hamming and simplex codes
The following proposition gives a method to determine the minimum distance of a code in terms of the number of dependent columns of the parity check matrix.
2.3. PARITY CHECKS AND DUAL CODE
29
Proposition 2.3.11 Let H be a parity check matrix of a code C. Then the minimum distance d of C is the smallest integer d such that d columns of H are linearly dependent. Proof. Let h1 , . . . , hn be the columns of H. Let c be a nonzero codeword of weight w. Let supp(c) = {j1 , . . . , jw } with 1 ≤ j1 < · · · < jw ≤ n. Then HcT = 0, so cj1 hj1 + · · · + cjw hjw = 0 with cji 6= 0 for all i = 1, . . . , w. Therefore the columns hj1 , . . . , hjw are dependent. Conversely if hj1 , . . . , hjw are dependent, then there exist constants a1 , . . . , aw , not all zero, such that a1 hj1 + · · · + aw hjw = 0. Let c be the word defined by cj = 0 if j 6= ji for all i, and cj = ai if j = ji for some i. Then HcT = 0. Hence c is a nonzero codeword of weight at most w. Remark 2.3.12 Let H be a parity check matrix of a code C. As a consequence of Proposition 2.3.11 we have the following special cases. The minimum distance of code is 1 if and only if H has a zero column. An example of this is seen in Example 2.3.10. Now suppose that H has no zero column, then the minimum distance of C is at least 2. The minimum distance is equal to 2 if and only if H has two columns say hj1 , hj2 that are dependent. In the binary case that means hj1 = hj2 . In other words the minimum distance of a binary code is at least 3 if and only if H has no zero columns and all columns are mutually distinct. This is the case for the Hamming code of Example 2.2.9. For a given redundancy r the length of a binary linear code C of minimum distance 3 is at most 2r − 1, the number of all nonzero binary columns of length r. For arbitrary Fq , the number of nonzero columns with entries in Fq is q r − 1. Two such columns are dependent if and only if one is a nonzero multiple of the other. Hence the length of an Fq linear code code C with d(C) ≥ 3 and redundancy r is at most (q r − 1)/(q − 1). Definition 2.3.13 Let n = (q r − 1)/(q − 1). Let Hr (q) be a r × n matrix over Fq with nonzero columns, such that no two columns are dependent. The code Hr (q) with Hr (q) as parity check matrix is called a qary Hamming code. The code with Hr (q) as generator matrix is called a qary simplex code and is denoted by Sr (q). Proposition 2.3.14 Let r ≥ 2. Then the qary Hamming code Hr (q) has parameters [(q r − 1)/(q − 1), (q r − 1)/(q − 1) − r, 3]. Proof. The rank of the matrix Hr (q) is r, since the r standard basis vectors of weight 1 are among the columns of the matrix. So indeed Hr (q) is a parity check matrix of a code with redundancy r. Any 2 columns are independent by construction. And a column of weight 2 is a linear combination of two columns of weight 1, and such a triple of columns exists, since r ≥ 2. Hence the minimum distance is 3 by Proposition 2.3.11. Example 2.3.15 Consider the following ternary Hamming H3 (3) code of redundancy 3 of length 13 with parity check matrix 1 1 1 1 1 1 1 1 1 0 0 0 0 H3 (3) = 2 2 2 1 1 1 0 0 0 1 1 1 0 . 2 1 0 2 1 0 2 1 0 2 1 0 1
30
CHAPTER 2. ERRORCORRECTING CODES
By Proposition 2.3.14 the code H3 (3) has parameters [13, 10, 3]. Notice that all rows of H3 (3) have weight 9. In fact every linear combination xH3 (3) with x ∈ F33 and x 6= 0 has weight 9. So all nonzero codewords of the ternary simplex code of dimension 3 have weight 9. Hence S3 (3) is a constant weight code. This is a general fact of simplex codes as is stated in the following proposition. Proposition 2.3.16 The ary simplex code Sr (q) is a constant weight code with parameters [(q r − 1)/(q − 1), r, q r−1 ]. Proof. We have seen already in Proposition 2.3.14 that Hr (q) has rank r, so it is indeed a generator matrix of a code of dimension r. Let c be a nonzero codeword of the simplex code. Then c = mHr (q) for some nonzero m ∈ Frq . Let hTj be the jth column of Hr (q). Then cj = 0 if and only if m · hj = 0. Now m · x = 0 is a nontrivial homogeneous linear equation. This equation has q r−1 solutions x ∈ Frq , it has q r−1 − 1 nonzero solutions. It has (q r−1 − 1)/(q − 1) solutions x such that xT is a column of Hr (q), since for every nonzero x ∈ Frq there is exactly one column in Hr (q) that is a nonzero multiple of xT . So the number of zeros of c is (q r−1 − 1)/(q − 1). Hence the weight of c is the number of nonzeros which is q r−1 .
2.3.3
Inner product and dual codes
Definition 2.3.17 The inner product on Fnq is defined by x · y = x1 y1 + · · · + xn yn for x, y ∈ Fnq . This inner product is bilinear, symmetric and nondegenerate, but the notion of “positive definite” makes no sense over a finite field as it does over the real numbers. For instance for a binary word x ∈ Fn2 we have that x · x = 0 if and only if the weight of x is even. Definition 2.3.18 For an [n, k] code C we define the dualdual or orthogonal code C ⊥ as C ⊥ = {x ∈ Fnq  c · x = 0 for all c ∈ C}. Proposition 2.3.19 Let C be an [n, k] code with generator matrix G. Then C ⊥ is an [n, n − k] code with parity check matrix G. Proof. From the definition of dual codes, the following statements are equivalent: x ∈ C ⊥, c · x = 0 for all c ∈ C, mGxT = 0 for all m ∈ Fkq , GxT = 0. This means that C ⊥ is the null space of G. Because G is a k × n matrix of rank k, the linear space C ⊥ has dimension n − k and G is a parity check matrix of C ⊥.
2.3. PARITY CHECKS AND DUAL CODE
31
Example 2.3.20 The trivial codes {0} and Fnq are dual codes. Example 2.3.21 The binary even weight code and the repetition code of the same length are dual codes. Example 2.3.22 The simplex code Sr (q) and the Hamming code Hr (q) are dual codes, since Hr (q) is a parity check matrix of Hr (q) and a generator matrix of Sr (q) A subspace C of a real vector space Rn has the property that C ∩ C ⊥ = {0}, since the standard inner product is positive definite. Over finite fields this is not always the case. Definition 2.3.23 Two codes C1 and C2 in Fnq are called orthogonal if x·y = 0 for all x ∈ C1 and y ∈ C2 , and they are called dual if C2 = C1⊥ . If C ⊆ C ⊥ , we call C weakly selfdual or selforthogonal. If C = C ⊥ , we call C selfdual. The hull of a code C is defined by H(C) = C ∩ C ⊥ . A code is called complementary dual if H(C) = {0}.
Example 2.3.24 The binary repetition code of length n is selforthogonal if and only if n is even. This code is selfdual if and only if n = 2. Proposition 2.3.25 Let C be an [n, k] code. Then: (1) (C ⊥ )⊥ = C. (2) C is selfdual if and only C is selforthogonal and n = 2k. Proof. (1) Let c ∈ C. Then c · x = 0 for all x ∈ C ⊥ . So C ⊆ (C ⊥ )⊥ . Moreover, applying Proposition 2.3.19 twice, we see that C and (C ⊥ )⊥ have the same finite dimension. Therefore equality holds. (2) Suppose C is selforthogonal, then C ⊆ C ⊥ . Now C = C ⊥ if and only if k = n − k, by Proposition 2.3.19. So C is selfdual if and only if n = 2k. Example 2.3.26 Consider
1 0 G= 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 1 1 1
1 0 1 1
1 1 0 1
1 1 . 1 0
Let G be the generator matrix of the binary [8,4] code C. Notice that GGT = 0. So x · y = 0 for all x, y ∈ C. Hence C is selforthogonal. Furthermore n = 2k. Therefore C is selfdual. Notice that all rows of G have weight 4, therefore all codewords have weights divisible by 4 by Exercise 2.3.11. Hence C has parameters [8,4,4]. Remark 2.3.27 Notice that x · x ≡ wt(x) mod 2 if x ∈ Fn2 and x · x ≡ wt(x) mod 3 if x ∈ Fn3 . Therefore all weights are even for a binary selforthogonal code and all weights are divisible by 3 for a ternary selforthogonal code.
32
CHAPTER 2. ERRORCORRECTING CODES
Example 2.3.28 Consider the ternary (I6 A) with 0 1 1 1 0 1 1 1 0 A= 1 2 1 1 2 2 1 1 2
code C with generator matrix G = 1 2 1 0 1 2
1 2 2 1 0 1
1 1 2 2 1 0
.
It is left as an exercise to show that C is selfdual. The linear combination of any two columns of A has weight at least 3, and the linear combination of any two columns of I6 has weight at most 2. So no three columns of G are dependent and G is also a parity check matrix of C. Hence the minimum distance of C is at least 4, and therefore it is 6 by Remark 2.3.27. Thus C has parameters [12, 6, 6] and it is called the extended ternary Golay code. By puncturing C we get a [11, 6, 5] code and it is called the ternary Golay codecode. Corollary 2.3.29 Let C be a linear code. Then: (1) G is generator matrix of C if and only if G is a parity check matrix of C ⊥ . (2) H is parity check matrix of C if and only if H is a generator matrix of C ⊥ . Proof. The first statement is Proposition 2.3.19 and the second statement is a consequence of the first applied to the code C ⊥ using Proposition 2.3.25(1). Proposition 2.3.30 Let C be an [n, k] code. Let G be a k × n generator matrix of C and let H be an (n − k) × n matrix of rank n − k. Then H is a parity check matrix of C if and only if GH T = 0, the k × (n − k) zero matrix. Proof. Suppose H is a parity check matrix. For any m ∈ Fkq , mG is a codeword of C. So, HGT mT = H(mG)T = 0. This implies that mGH T = 0. Since m can be any vector in Fkq . We have GH T = 0. Conversely, suppose GH T = 0. We assumed that G is a k × n matrix of rank k and H is an (n − k) × n matrix of rank n − k. So H is the parity check matrix of an [n, k] code C 0 . For any c ∈ C, we have c = mG for some m ∈ Fkq . Now HcT = (mGH T )T = 0. So c ∈ C 0 . This implies that C ⊆ C 0 . Hence C 0 = C, since both C and C 0 have dimension k. Therefore H is a parity check matrix of C. Remark 2.3.31 A consequence of Proposition 2.3.30 is another proof of Proposition 2.3.3 Because, let G = (Ik P ) be a generator matrix of C. Let H = (−P T In−k ). Then G has rank k and H has rank n − k and GH T = 0. Therefore H is a parity check matrix of C.
2.3.4
Exercises
2.3.1 Assume that 3540461335 is obtained from an ISBN code by interchanging two neighboring symbols. What are the possible ISBN codes? Now assume moreover that it is an ISBN code of an existing book. What is the title of this book?
2.4. DECODING AND THE ERROR PROBABILITY
33
2.3.2 Consider the binary product code C of Example 2.1.2. Give a parity check matrix and a generator matrix of this code. Determine the parameters of the dual of C. 2.3.3 Give a parity check matrix of the C of Exercise 2.2.4. Show that C is selfdual. 2.3.4 Consider the binary simplex code S3 (2) with generator matrix H as given in Example 2.2.9. Show that there are exactly seven triples (i1 , i2 , i3 ) with increasing coordinate positions such that S3 (2) is not systematic at (i1 , i2 , i3 ). Give the seven fourtuples of positions that are not systematic with respect to the Hamming code H3 (2) with parity check matrix H. 2.3.5 Let C1 and C2 be linear codes of the same length. Show the following statements: (1) If C1 ⊆ C2 , then C2⊥ ⊆ C1⊥ . (2) C1 and C2 are orthogonal if and only if C1 ⊆ C2⊥ if and only if C2 ⊆ C1⊥ . (3) (C1 ∩ C2 )⊥ = C1⊥ + C2⊥ . (4) (C1 + C2 )⊥ = C1⊥ ∩ C2⊥ . 2.3.6 Show that a linear code C with generator matrix G has a complementary dual if and only if det(GGT ) 6= 0. 2.3.7 Show that there exists a [2k, k] selfdual code over Fq if and only if there is a k × k matrix P with entries in Fq such that P P T = −Ik . 2.3.8 Give an example of a ternary [4,2] selfdual code and show that there is no ternary selfdual code of length 6. 2.3.9 Show that the extended ternary Golay code in Example 2.3.28 is selfdual. 2.3.10 Show that a binary code is selforthogonal if the weights of all codewords are divisible by 4. Hint: use Exercise 2.2.2. 2.3.11 Let C be a binary selforthogonal code which has a generator matrix such that all its rows have a weight divisible by 4. Then the weights of all codewords are divisible by 4. 2.3.12 Write a procedure either in GAP or Magma that determines whether the given code is selfdual or not. Test correctness of your procedure with commands IsSelfDualCode and IsSelfDual in GAP and Magma respectively.
2.4 Intro
Decoding and the error probability
34
2.4.1
CHAPTER 2. ERRORCORRECTING CODES
Decoding problem
Definition 2.4.1 Let C be a linear code in Fnq of minimum distance d. If c is a transmitted codeword and r is the received word, then {iri 6= ci } is the set of error positions and the number of error positions is called the number of errors of the received word. Let e = r − c. Then e is called the error vector and r = c + e. Hence supp(e) is the set of error positions and wt(e) the number of errors. The ei ’s are called the error values. Remark 2.4.2 If r is the received word and t0 = d(C, r) is the distance of r to the code C, then there exists a nearest codeword c0 such that t0 = d(c0 , r). So there exists an error vector e0 such that r = c0 + e0 and wt(e0 ) = t0 . If the number of errors t is at most (d − 1)/2, then we are sure that c = c0 and e = e0 . In other words, the nearest codeword to r is unique when r has distance at most (d − 1)/2 to C. ***Picture*** Definition 2.4.3 e(C) = b(d(C) − 1)/2c is called the errorcorrecting capacity decoding radius of the code C. Definition 2.4.4 A decoder D for the code C is a map D : Fnq −→ Fnq ∪ {∗} such that D(c) = c for all c ∈ C. If E : Fkq → Fnq is an encoder of C and D : Fnq → Fkq ∪ {∗} is a map such that D(E(m)) = m for all m ∈ Fkq , then D is called a decoder with respect to the encoder E. Remark 2.4.5 If E is an encoder of C and D is a decoder with respect to E, then the composition E ◦ D is a decoder of C. It is allowed that the decoder gives as outcome the symbol ∗ in case it fails to find a codeword. This is called a decoding failure. If c is the codeword sent and r is the received word and D(r) = c0 6= c, then this is called a decoding error. If D(r) = c, then r is decoded correctly. Notice that a decoding failure is noted on the receiving end, whereas there is no way that the decoder can detect a decoding error. Definition 2.4.6 A complete decoder is a decoder that always gives a codeword in C as outcome. A nearest neighbor decoder, also called a minimum distance decoder, is a complete decoder with the property that D(r) is a nearest codeword. A decoder D for a code C is called a tbounded distance decoder or a decoder that corrects t errors if D(r) is a nearest codeword for all received words r with d(C, r) ≤ t errors. A decoder for a code C with errorcorrecting capacity e(C) decodes up to half the minimum distance if it is an e(C)bounded distance decoder, where e(C) = b(d(C) − 1)/2c is the errorcorrecting capacity of C. Remark 2.4.7 If D is a tbounded distance decoder, then it is not required that D gives a decoding failure as outcome for a received word r if the distance of r to the code is strictly larger than t. In other words: D is also a t0 bounded distance decoder for all t0 ≤ t. A nearest neighbor decoder is a tbounded distance decoder for all t ≤ ρ(C), where ρ(C) is the covering radius of the code. A ρ(C)bounded distance decoder is a nearest neighbor decoder, since d(C, r) ≤ ρ(C) for all received words r.
2.4. DECODING AND THE ERROR PROBABILITY
35
Definition 2.4.8 Let r be a received word with respect to a code C. A coset leader of r + C is a choice of an element of minimal weight in the coset r + C. The weight of a coset is the minimal weight of an element in the coset. Let αi be the number of cosets of C that are of weight i. Then αC (X, Y ), the coset leader weight enumerator of C is the polynomial defined by αC (X, Y ) =
n X
αi X n−i Y i .
i=0
Remark 2.4.9 The choice of a coset leader of the coset r + C is unique if d(C, r) ≤ (d − 1)/2, and αi = ni (q − 1)i for all i ≤ (d − 1)/2, where d is the minimum distance of C. Let ρ(C) be the covering radius of the code, then there is at least one codeword c such that d(c, r) ≤ ρ(C). Hence the weight of a coset leader is at most ρ(C) and αi = 0 for i > ρ(C). Therefore the coset leader weight enumerator of a perfect code C of minimum distance d = 2t + 1 is given by t X n αC (X, Y ) = (q − 1)i X n−i Y i . i i=0 The computation of the coset leader weight enumerator of a code is in general a very hard problem. Definition 2.4.10 Let r be a received word. Let e be the chosen coset leader of the coset r + C. The coset leader decoder gives r − e as output. Remark 2.4.11 The coset leader decoder is a nearest neighbor decoder. Definition 2.4.12 Let r be a received word with respect to a code C of dimension k. Choose an (n − k) × n parity check matrix H of the code C. Then is called the syndrome of r with respect to H. s = rH T ∈ Fn−k q Remark 2.4.13 Let C be a code of dimension k. Let r be a received word. Then r + C is called the coset of r. Now the cosets of the received words r1 and r2 are the same if and only if r1 H T = r2 H T . Therefore there is a one to one correspondence between cosets of C and values of syndromes. Furthermore every element of Fn−k is the syndrome of some received word r, since H has q rank n − k. Hence the number of cosets is q n−k . A list decoder gives as output the collection of all nearest codewords.
Knowing the existence of a decoder is nice to know from a theoretical point of view, in practice the problem is to find an efficient algorithm that computes the outcome of the decoder. To compute of a given vector in Euclidean nspace the closest vector to a given linear subspace can be done efficiently by an orthogonal projection to the subspace. The corresponding problem for linear codes is in general not such an easy task. This is treated in Section 6.2.1.
2.4.2 ....
Symmetric channel
36
CHAPTER 2. ERRORCORRECTING CODES
Definition 2.4.14 The qary symmetric channel (qSC) is a channel where qary words are sent with independent errors with the same crossover probability p at each coordinate, with 0 ≤ p ≤ 21 , such that all the q − 1 wrong symbols occur with the same probability p/(q − 1). So a symbol is transmitted correctly with probability 1 − p. The special case q = 2 is called the binary symmetric channel (BSC). picture
Remark 2.4.15 Let P (x) be the probability that the codeword x is sent. Then 1 this probability is assumed to be the same for all codewords. Hence P (c) = C for all c ∈ C. Let P (rc) be the probability that r is received given that c is sent. Then d(c,r) p P (rc) = (1 − p)n−d(c,r) q−1 for a qary symmetric channel. Definition 2.4.16 For every decoding scheme and channel one defines three probabilities Pcd (p), Pde (p) and Pdf (p), that is the probability of correct decoding, decoding error and decoding failure, respectively. Then Pcd (p) + Pde (p) + Pdf (p) = 1 for all 0 ≤ p ≤
1 . 2
So it suffices to find formulas for two of these three probabilities. The error probability, also called the error rate is defined by Perr (p) = 1 − Pcd (p). Hence Perr (p) = Pde (p) + Pdf (p). Proposition 2.4.17 The probability of correct decoding of a decoder that corrects up to t errors with 2t + 1 ≤ d of a code C of minimum distance d on a qary symmetric channel with crossover probability p is given by t X n w p (1 − p)n−w . Pcd (p) = w w=0 Proof. Every codeword has the same probability of transmission. So X X 1 X X Pcd (p) = P (c) P (yr) = P (rc), C c∈C
d(c,r)≤t
c∈C d(c,r)≤t
Now P (rc) depends only on the distance between r and c by Remark 2.4.15. So without loss of generality we may assume that 0 is the codeword sent. Hence w t X X n p w Pcd (p) = P (r0) = (q − 1) (1 − p)n−w w q − 1 w=0 d(0,r)≤t
by Proposition 2.1.13. Clearing the factor (q − 1)w in the numerator and the denominator gives the desired result. In Proposition 4.2.6 a formula will be derived for the probability of decoding error for a decoding algorithm that corrects errors up to half the minimum distance.
2.4. DECODING AND THE ERROR PROBABILITY
37
Example 2.4.18 Consider the binary triple repetition code. Assume that (0, 0, 0) is transmitted. In case the received word has weight 0 or 1, then it is correctly decoded to (0, 0, 0). If the received word has weight 2 or 3, then it is decoded to (1, 1, 1) which is a decoding error. Hence there are no decoding failures and Pcd (p) = (1 − p)3 + 3p(1 − p)2 = 1 − 3p2 + 2p3 and Perr (p) = Pde (p) = 3p2 − 2p3 . If the Hamming code is used, then there are no decoding failures and Pcd (p) = (1 − p)7 + 7p(1 − p)6 and Perr (p) = Pde (p) = 21p2 − 70p3 + 105p4 − 84p5 + 35p6 − 6p7 . This shows that the error probabilities of the repetition code is smaller than the one for the Hamming code. This comparison is not fair, since only one bit of information is transmitted with the repetition code and four bits with the Hamming code. One could transmit 4 bits of information by using the repetition code four times. This would give the error probability 1 − (1 − 3p2 + 2p3 )4 = 12p2 − 8p3 − 54p4 + 72p5 + 84p6 − 216p7 + · · · plot of these functions
Suppose that four bits of information are transmitted uncoded, by the Hamming code and the triple repetition code, respectively. Then the error probabilities are 0.04, 0.002 and 0.001, respectively if the crossover probability is 0.01. The error probability for the repetition code is in fact smaller than that of the Hamming code for all p ≤ 12 , but the transmission by the Hamming code is almost twice as fast as the repetition code. Example 2.4.19 Consider the binary nfold repetition code. Let t = (n−1)/2. Use the decoding algorithm correcting all patterns of t errors. Then n X n i Perr (p) = p (1 − p)n−i . i i=t+1 Hence the error probability becomes arbitrarily small for increasing n. The price one has to pay is that the information rate R = 1/n tends to 0. The remarkable result of Shannon states that for a fixed rate R < C(p), where C(p) = 1 + p log2 (p) + (1 − p) log2 (1 − p) is the capacity of the binary symmetric channel, one can devise encoding and decoding schemes such that Perr (p) becomes arbitrarily small. This will be treated in Theorem 4.2.9. The main problem of errorcorrecting codes from “Shannon’s point view” is to construct efficient encoding and decoding algorithms of codes with the smallest error probability for a given information rate and crossover probability. Proposition 2.4.20 The probability of correct decoding of the coset leader decoder on a qary symmetric channel with crossover probability p is given by p Pcd (p) = αC 1 − p, . q−1
38
CHAPTER 2. ERRORCORRECTING CODES
Proof. This is left as an exercise.
Example 2.4.21 ...........
2.4.3
Exercises
2.4.1 Consider the binary repetition code of length n. Compute the probabilities of correct decoding, decoding error and decoding failure in case of incomplete decoding t = b(n − 1)/2c errors and complete decoding by choosing one nearest neighbor. 2.4.2 Consider the product code of Example 2.1.2. Compute the probabilities of correct decoding, decoding error and decoding failure in case the decoding algorithm corrects all error patterns of at most t errors for t = 1, t = 2 and t = 3, respectively. 2.4.3 Give a proof of Proposition 2.4.20. 2.4.4 ***Give the probability of correct decoding for the code .... for a coset leader decoder. *** 2.4.5 ***Product code has error probability at most P1 (P2 (p)).***
2.5
Equivalent codes
Notice that a Hamming code over Fq of a given redundancy r is defined up to the order of the columns of the parity check matrix and up to multiplying a column with a nonzero constant. A permutation of the columns and multiplying the columns with nonzero constants gives another code with the same parameters and is in a certain sense equivalent.
2.5.1
Number of generator matrices and codes
The set of all invertible n × n matrices over the finite field Fq is denoted by Gl(n, q). Now Gl(n, q) is a finite group with respect to matrix multiplication and it is called the general linear group. Proposition 2.5.1 The number of elements of Gl(n, q) is (q n − 1)(q n − q) · · · (q n − q n−1 ). Proof. Let M be an n × n matrix with rows m1 , . . . , mn . Then M is invertible if and only if m1 , . . . , mn are independent and that is if and only if m1 6= 0 and mi is not in the linear subspace generated by m1 , . . . , mi−1 for all i = 2, . . . , n. Hence for an invertible matrix M we are free to choose a nonzero vector for the first row. There are q n − 1 possibilities for the first row. The second row should not be a multiple of the first row, so we have q n − q possibilities for the second row for every nonzero choice of the first row. The subspace generated by m1 , . . . , mi−1 has dimension i − 1 and q i−1 elements. The ith row is not in this subspace if M is invertible. So we have q n − q i−1 possible choices for the ith row for every legitimate choice of the first i − 1 rows. This proves the claim.
2.5. EQUIVALENT CODES
39
Proposition 2.5.2 1) The number of k × n generator matrices over Fq is (q n − 1)(q n − q) · · · (q n − q k−1 ). 2) The number of [n, k] codes over Fq is equal to the Gaussian binomial (q n − 1)(q n − q) · · · (q n − q k−1 ) n := k k q (q − 1)(q k − q) · · · (q k − q k−1 ) Proof. 1) A k × n generator matrix consists of k independent rows of length n over Fq . The counting of the number of these matrices is done similarly as in the proof of Proposition 2.5.1. 2) The second statement is a consequence of Propositions 2.5.1 and 2.2.17, and the fact the M G = G if and only if M = Ik for every M ∈ Gl(k, q) and k × n generator matrix G, since G has rank k. It is a consequence of Proposition 2.5.2 that the Gaussian binomials are integers for every choice of n, k and q. In fact more is true. Proposition 2.5.3 The number of [n, k] codes over Fq is a polynomial in q of degree k(n − k) with nonnegative integers as coefficients. Proof. There is another way to count the number of [n, k] codes over Fq , since the row reduced echelon form rref(C) of a generator matrix of C is unique by Proposition 2.2.17. Now suppose that rref(C) has pivots at j = (j1 , . . . , jk ) with 1 ≤ j1 < · · · < jk ≤ n, then the remaining entries are free to choose as long as the row reduced echelon form at the given pivots (j1 , . . . , jk ) is respected. Let the number of these free entries be e(j). Then the number of [n, k] codes over Fq is equal to X q e(j) . 1≤j1 <···
Furthermore e(j) is maximal and equal to k(n − k) for j = (1, 2, . . . , k) This is left as Exercise 2.5.2 to the reader. Example 2.5.4 Let us compute the number of [3, 2] codes over Fq . According to Proposition 2.5.2 it is equal to (q 3 − 1)(q 3 − q) 3 = q 2 + q + 1. = 2 2 q (q − 1)(q 2 − q) which is a polynomial of degree 2 · (3 − 2) = 2 with nonnegative integers as coefficients. This is in agreement with Proposition 2.5.3. If we follow the proof of this proposition then the possible row echelon forms are 1 0 ∗ 1 ∗ 0 0 1 0 , and , 0 1 ∗ 0 0 1 0 0 1 where the ∗’s denote the entries that are free to choose. So e(1, 2) = 2, e(1, 3) = 1 and e(2, 3) = 0. Hence the number of [3, 2] codes is equal to q 2 + q + 1, as we have seen before .
40
2.5.2
CHAPTER 2. ERRORCORRECTING CODES
Isometries and equivalent codes
Definition 2.5.5 Let M ∈ Gl(n, q). Then the map M : Fnq
−→
Fnq ,
defined by M (x) = xM is a onetoone linear map. Notice that the map and the matrix are both denoted by M . Let S be a subset of Fnq . The operation xM , where x ∈ S and M ∈ Gl(n, q), is called an action of the group Gl(n, q) on S. For a given M ∈ Gl(n, q), the set SM = {xM  x ∈ S}, also denoted by M (S), is called the image of S under M . Definition 2.5.6 The group of permutations of {1, . . . , n} is called the symmetric group on n letters and is denoted by Sn . Let π ∈ Sn . Define the corresponding permutation matrix Π with entries pij by pij = 1 if i = π(j) and pij = 0 otherwise. Remark 2.5.7 Sn is indeed a group and has n! elements. Let Π be the permutation matrix of a permutation π in Sn . Then Π is invertible and orthogonal, that means ΠT = Π−1 . The corresponding map Π : Fnq → Fnq is given by Π(x) = y with yi = xπ(i) for all i. Now Π is an invertible linear map. Let ei be the ith standard basis row vector. Then Π−1 (ei ) = eπ(i) by the above conventions. The set of n × n permutation matrices is a subgroup of Gl(n, q) with n! elements. Definition 2.5.8 Let v ∈ Fnq . Then diag(v) is the n × n diagonal matrix with v on its diagonal and zeros outside the diagonal. An n × n matrix with entries in Fq is called monomial if every row has exactly one nonzero entry and every column has exactly one nonzero entry. Let Mono(n, q) be the set of all n × n monomial matrices with entries in Fq . Remark 2.5.9 The matrix diag(v) is invertible if and only if every entry of v is not zero. Hence the set of n × n invertible diagonal matrices is a subgroup of Gl(n, q) with (q − 1)n elements. Let M be an element of Mono(n, q). Define the vector v ∈ Fnq with nonzero entries and the map π from {1, . . . , n} to itself by π(j) = i if vi is the unique nonzero entry of M in the ith row and the jth column. Now π is a permutation by the definition of a monomial matrix. So M has entries mij with mij = vi if i = π(j) and mij = 0 otherwise. Hence M = diag(v)Π. Therefore a matrix is monomial if and only if it is the product of a diagonal and a permutation matrix. The corresponding monomial map M : Fnq → Fnq of the monomial matrix M is given by M (x) = y with yi = vi xπ(i) . The set of Mono(n, q) is a subgroup of Gl(n, q) with (q − 1)n n! elements. Definition 2.5.10 A map ϕ : Fnq → Fnq is called an isometry if it leaves the Hamming metric invariant, that means that d(ϕ(x), ϕ(y)) = d(x, y) for all x, y ∈ Fnq . Let Isom(n, q) be the set of all isometries of Fnq . Proposition 2.5.11 Isom(n, q) is a group under the composition of maps.
2.5. EQUIVALENT CODES
41
Proof. The identity map is an isometry. Let ϕ and ψ be isometries of Fnq . Let x, y ∈ Fnq . Then d((ϕ ◦ ψ)(x), (ϕ ◦ ψ)(y)) = d(ϕ(ψ(x)), ϕ(ψ(y))) = d(ψ(x), ψ(y)) = d(x, y). Hence ϕ ◦ ψ is an isometry. Let ϕ be an isometry of Fnq . Suppose that x, y ∈ Fnq and ϕ(x) = ϕ(y). Then 0 = d(ϕ(x), ϕ(y)) = d(x, y). So x = y. Hence ϕ is bijective. Therefore it has an inverse map ϕ−1 . Let x, y ∈ Fnq . Then d(x, y) = d(ϕ(ϕ−1 (x)), ϕ(ϕ−1 (y)) = d(ϕ−1 (x), ϕ−1 (y)), since ϕ is an isometry. Therefore ϕ−1 is an isometry. So Isom(n, q) is notempty and closed under taking the composition of maps and taking the inverse. Therefore Isom(n, q) is a group. Remark 2.5.12 Permutation matrices define isometries. Translations and invertible diagonal matrices and more generally the co¨ordinatewise permutation of the elements of Fq define also isometries. Conversely, every isometry is the composition of the before mentioned isometries. This fact we leave as Exercise 2.5.4. The following proposition characterizes linear isometries. Proposition 2.5.13 Let M ∈ Gl(n, q). Then the following statements are equivalent: (1) M is an isometry, (2) wt(M (x))) = wt(x) for all x ∈ Fnq , so M leaves the weight invariant, (3) M is a monomial matrix. Proof. Statements (1) and (2) are equivalent, since M (x − y) = M (x) − M (y) and d(x, y) = wt(x − y). Statement (3) implies (1), since permutation matrices and invertible diagonal matrices leave the weight of a vector invariant, and a monomial matrix is a product of such matrices by Remark 2.5.9. Statement (2) implies (3): Let ei be the ith standard basis vector of Fnq . Then ei has weight 1. So M (ei ) has also weight 1. Hence M (ei ) = vi eπ(i) , where vi is a nonzero element of Fq , and π is a map from {1, . . . , n} to itself. Now π is a bijection, since M is invertible. So π is a permutation and M = diag(v)Π−1 . Therefore M is a monomial matrix. Corollary 2.5.14 An isometry is linear if and only if it is a map coming from a monomial matrix, that is Gl(n, q) ∩ Isom(n, q) = Mono(n, q). Proof. This follows directly from the definitions and Proposition 2.5.13.
Definition 2.5.15 Let C and D be codes in Fnq that are not necessarily linear. Then C is called equivalent to D if there exists an isometry ϕ of Fnq such that ϕ(C) = D. If moreover C = D, then ϕ is called an automorphism of C. The
42
CHAPTER 2. ERRORCORRECTING CODES
automorphism group of C is the set of all isometries ϕ such that ϕ(C) = C and is denoted by Aut(C). C is called permutation equivalent to D, and is denoted by D ≡ C if there exists a permutation matrix Π such that Π(C) = D. If moreover C = D, then Π is called an permutation automorphism of C. The permutation automorphism group of C is the set of all permutation automorphism of C and is denoted by PAut(C). C is called generalized equivalent or monomial equivalent to D, denoted by D∼ = C if there exists a monomial matrix M such that M (C) = D. If moreover C = D, then M is called a monomial automorphism of C. The monomial automorphism group of C is the set of all monomial automorphism of C and is denoted by MAut(C). Proposition 2.5.16 Let C and D be two Fq linear codes of the same length. Then: (1) If C ≡ D, then C ⊥ ≡ D⊥ . (2) If C ∼ = D⊥ . = D, then C ⊥ ∼ ∼ (3) If C ≡ D, then C = D. (4) If C ∼ = D, then C and D have the same parameters. Proof. We leave the proof to the reader as an exercise.
Remark 2.5.17 Every [n, k] code is equivalent to a code which is systematic at the first k positions, that is with a generator matrix of the form (Ik P ) according to Remark 2.3.8. Notice that in the binary case C ≡ D if and only if C ∼ = D. Example 2.5.18 Let C be a binary [7,4,3] code with parity check matrix H. Then H is a 3×7 matrix such that all columns arre nonzero and mutually distinct by Proposition 2.3.11, since C has minimum distance 3. There are exactly 7 binary nonzero column vectors with 3 entries. Hence H is a permutation of the columns of a parity check matrix of the [7,4,3] Hamming code. Therefore: every binary [7,4,3] code is permutation equivalent with the Hamming code. Proposition 2.5.19 (1) Every Fq linear code with parameters [(q r − 1)/(q − 1), (q r − 1)/(q − 1) − r, 3] is generalized equivalent with the Hamming code Hr (q). (2) Every Fq linear code with parameters [(q r − 1)/(q − 1), r, q r−1 ] is generalized equivalent with the simplex code Sr (q). Proof. (1) Let n = (q r − 1)/(q − 1). Then n is the number of lines in Frq through the origin. Let H be a parity check matrix of an Fq linear code with parameters [n, n − r, 3]. Then there are no zero columns in H and every two columns are independent by Proposition 2.3.11. Every column of H generates a unique line in Frq through the origin, and every such line is obtained in this way. Let H 0 be the parity check matrix of a code C 0 with the same parameters [n, n − r, 3]. Then for every column h0j of H 0 there is a unique column hi of H such that h0j is nonzero multiple of hi . Hence H 0 = HM for some monomial matrix M . Hence C and C 0 are generalized equivalent. (2) The second statement follows form the first one, since the simplex code is the dual of the Hamming code.
2.5. EQUIVALENT CODES
43
Remark 2.5.20 A code of length n is called cyclic if the cyclic permutation of co¨ ordinates σ(i) = i − 1 modulo n leaves the code invariant. A cyclic code of length n has an element of order n in its automorphism group. Cyclic codes are extensively treated in Chapter 7.1. Remark 2.5.21 Let C be an Fq linear code of length n. Then PAut(C) is a subgroup of Sn and MAut(C) is a subgroup of Mono(n, q). If C is a trivial code, then PAut(C) = Sn and MAut(C) = Mono(n, q). The matrices λIn ∈ MAut(C) for all nonzero λ ∈ Fq . So MAut(C) always contains F∗q as a subgroup. Furthermore Mono(n, q) = Sn and MAut(C) = PAut(C) if q = 2. Example 2.5.22 Let C be the nfold repetition code. Then PAut(C) = Sn and MAut(C) isomorphic with F∗q × Sn . Proposition 2.5.23 Let G be a generator matrix of an Fq linear code C of length n. Let Π be an n × n permutation matrix. Let M ∈ Mono(n, q). Then: (1) Π ∈ PAut(C) if and only if rref(G) = rref(GΠ), (2) M ∈ MAut(C) if and only if rref(G) = rref(GM ). Proof. (1) Let Π be a n × n permutation matrix. Then GΠ is a generator matrix of Π(C). Moreover Π(C) = C if and only if rref(G) = rref(GΠ) by Proposition 2.2.17. (2) The second statement is proved similarly. Example 2.5.24 Let C be the code with generator matrix G and let M be the monomial matrix given by 0 x2 0 1 0 a1 0 , G= and M = x1 0 0 1 a2 0 0 x3 where the ai and xj are nonzero elements of Fq . Now G is already in reduced row echelon form. One verifies that 1 0 a2 x3 /x1 rref(GM ) = . 0 1 a1 x3 /x2 Hence M is monomial automorphism of C if and only if a1 x1 = a2 x3 and a2 x2 = a1 x3 . Definition 2.5.25 A map f from the set of all (linear) codes to another set is called an invariant of a (linear) code if f (C) = f (ϕ(C)) for every code C in Fnq and every isometry ϕ of Fnq . The map f is called a permutation invariant if f (C) = f (Π(C)) for every code C in Fnq and every n × n permutation matrix Π. The map f is called a monomial invariant if f (C) = f (M (C)) for every code C in Fnq and every M ∈ Mono(n, q) . Remark 2.5.26 The length, the number of elements and the minimum distance are clearly invariants of a code. The dimension is a permutation and a monomial invariant of a linear code. The isomorphy class of the group of autmorphisms of a code is an invariant of a code. The isomorphy classes of P Aut(C) and M Aut(C) are permutation and monomial invariants, respectively of a linear code.
44
2.5.3
CHAPTER 2. ERRORCORRECTING CODES
Exercises
2.5.1 Determine the number of [5, 3] codes over Fq by Proposition 2.5.2 and show by division that it is a polynomial in q. Determine the exponent e(j) and the number of codes such that rref(C) is systematic at a given 3tuple (j1 , j2 , j3 ) for all 3tuples with 1 ≤ j1 < j2 < j3 ≤ 5, as in Proposition 2.5.3, and verify that they sum up to the total number of [5, 3] codes. Pk 2.5.2 Show that e(j) = t=1 t(jt+1 − jt − 1) for every ktuple (j1 , . . . , jk ) with 1 ≤ j1 < . . . < jk ≤ n and jk+1 = n + 1 in the proof of Proposition 2.5.3. Show that the maximum of e(j) is equal to k(n − k) and that this maximum is attained by exactly one ktuple that is by (1, 2, . . . , k). 2.5.3 Let p be a prime. Let q = pm . Consider the map ϕ : Fnq → Fnq defined by ϕ(x1 , . . . , xn ) = (xp1 , . . . , xpn ). Show that ϕ is an isometry that permutates the elements of the alphabet Fq co¨ordinatewise. Prove that ϕ is a linear map if and only if m = 1. So ϕ is not linear if m > 1 . Show that ϕ(C) is a linear code if C is a linear code. 2.5.4 Show that permutation matrices and the co¨ordinatewise permutation of the elements of Fq define isometries. Show that every element of Isom(n, q) is the composition of a permutation matrix and the co¨ordinatewise permutation of the elements of Fq . Moreover such a composition is unique. Show that the number of elements of Isom(n, q) is equal to n!(q!)n . 2.5.5 Give a proof of Proposition 2.5.16. 2.5.6 Show that every binary (7,16,3) code is isometric with the Hamming code. 2.5.7 Let C be a linear code of length n. Assume that n is a power of a prime. Show that if there exists an element in PAut(C) of order n, then C is equivalent with a cyclic code. Show that the assumption on n being a prime power is necessary by means of a counterexample. 2.5.8 A code C is called quasi selfdual if it is monomial equivalent with its dual. Consider the [2k, k] code over Fq with generator matrix (Ik Ik ). Show that this code quasi selfdual for all q and selfdual if q is even. 2.5.9 Let C be an Fq linear code of length n with hull H(C) = C ∩ C ⊥ . Let Π be an n × n permutation matrix. Let D be an invertible n × n diagonal matrix. Let M ∈ Mono(n, q). (1) Show that (Π(C))⊥ = Π(C ⊥ ). (2) Show that H(Π(C)) = Π(H(C)). (3) Show that (D(C))⊥ = D−1 (C ⊥ ). (4) Show that H(M (C)) = M (H(C)) if q = 2 or q = 3. (5) Show by means of a counter example that the dimension of the hull of a linear code over Fq is not a monomial invariant for q > 3. 2.5.10 Show that every linear code over Fq is monomial equivalent to a code with a complementary dual if q > 3.
2.6. NOTES
45
2.5.11 Let C be the code of Example 2.5.24. Show that this code has 6(q − 1) monomial automorphisms. Compute Aut(C) for all possible choices of the ai . 2.5.12 Show that PAut(C ⊥ ) and and MAut(C ⊥ ) are isomorphic as a groups with PAut(C) and MAut(C), respectively. 2.5.13 Determine the automorphism group of the ternary code with generator matrix 1 0 1 1 . 0 1 1 2 2.5.14 Show that in Example 12.5.5 the permutation automorphism groups obtained for Hamming codes in GAP and Magma programs are different. This implies that these codes are not the same. Find out what is the permutation equivalence between these codes.
2.6
Notes
One considers the seminal papers of Shannon [107] and Hamming [61] as the starting point of information theory and coding theory. Many papers that appeared in the early days of coding theory and information theory are published in Bell System Technical Journal, IEEE Transaction on Information Theory and Problemy Peredachi Informatsii. They were collected as key papers in [21, 10, 111]. We mention the following classical textbooks in coding theory [3, 11, 19, 62, 75, 76, 78, 84, 93] and several more recent ones [20, 67, 77]. The Handbook on coding theory [95] gives a wealth on information. Audiovisual media, compact disc and DVD [76, 105] faulttolerant computers ...[] deep space telecommunication [86, 134] ***Elias, sequence of codes with with R > 0 and error probability going to zero.*** ***Forney, concatenated codes, sequence of codes with with R near capacity and error probability going to zero and efficient decoding algorithm.*** ***Elias Wozencraft, list decoding***.
46
CHAPTER 2. ERRORCORRECTING CODES
Chapter 3
Code constructions and bounds Ruud Pellikaan and XinWen Wu This chapter treats the existence and nonexistence of codes. Several constructions show that the existence of one particular code gives rise to a cascade of derived codes. Upper bounds in terms of the parameters exclude codes and lower bounds show the existence of codes.
3.1
Code constructions
In this section, we discuss some classical methods of constructing new codes using known codes.
3.1.1
Constructing shorter and longer codes
The most obvious way to make a shorter code out of a given code is to delete several co¨ ordinates. This is called puncturing. Definition 3.1.1 Let C be an [n, k, d] code. For any codeword, the process of deleting one or more fixed co¨ ordinates is called puncturing. Let P be a subset of {1, . . . , n} consisting of p integers such that its complement is the set {i1 , . . . , in−p } with 1 ≤ i1 < · · · < in−p ≤ n. Let x ∈ Fnq . Define xP = (xi1 , . . . , xin−p ) ∈ Fn−p . Let CP be the set of all punctured codewords of C, q where the puncturing takes place at all the positions of P : CP = { cP  c ∈ C }. We will also use the notation w.r.t nonpunctured positions. Definition 3.1.2 Let R be a subset of {1, . . . , n} consisting of r integers {i1 , . . . , ir } with 1 ≤ i1 < · · · < ir ≤ n. Let x ∈ Fnq . Define x(R) (xi1 , . . . , xir ) ∈ Frq . Let C(R) be the set of all codewords of C restricted to the positions of R: C(R) = { c(R)  c ∈ C }. 47
48
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Remark 3.1.3 So, CP is a linear code of length n − p, where p is the number or elements of P . Furthermore CP is linear, since C is linear. In fact, suppose G is a generator matrix of C. Then CP is a linear code generated by the rows of GP , where GP is the k × (n − p) matrix consisting of the n − p columns at the positions i1 , . . . , in−p of G. If we consider the restricted code C(R) , then its generator matrix G(R) is the k × r submatrix of G composed of the columns indexed by j1 , . . . , jr , where R = {j1 , . . . , jr }. Proposition 3.1.4 Let C be an [n, k, d] code. Suppose P consists of p elements. Then the punctured code CP is an [n − p, kP , dP ] code with d − p ≤ dP ≤ d and k − p ≤ kP ≤ k. If moreover p < d, then kP = k. Proof. The given upper bounds are clear. Let c ∈ C. Then at most p nonzero positions are deleted from c to obtain cP . Hence wt(cP ) ≥ wt(c) − p. Hence dP ≥ d − p. The column rank of G, which is equal to the row rank, is k. The column rank of GP must be greater than or equal to k − p, since p columns are deleted. This implies that the row rank of GP is at least k − p. So kP ≥ k − p. Suppose p < d. If c and c0 are two distinct codewords in C, then d(cP , c0P ) ≥ d − p > 0 so cP and c0P are distinct. Therefore C and CP have the same number of codewords. Hence k = kP . Example 3.1.5 It is worth pointing out that the dimension of CP can be smaller than k. From the definition of puncturing, CP seemingly has the same number of codewords as C. However, it is possible that C contains some distinct codewords that have the same co¨ordinates outside the positions of P . In this case, after deleting the co¨ordinates in the complement of P , the number of codewords of CP is less than that of C. Look at the following simple example. Let C be the binary code with generator matrix 1 1 0 0 G = 1 1 1 0 . 0 0 1 1 This is a [4, 3, 1] code. Let P = {4}. Then, the rows of GP are (1, 1, 0), (1, 1, 1) and (0, 0, 1). It is clear that the second row is the sum of the first and second ones. So, GP has row rank 2, and CP has dimension 2. In this example we have d = 1 = p. We now introduce an inverse process to puncturing the code C, which is called extending the code. Definition 3.1.6 Let C be a linear code of length n. Let v ∈ Fnq . The extended code C e (v) of length n + 1 is defined as follows. For every codeword c = (c1 , . . . , cn ) ∈ C, construct the word ce (v) by adding the symbol cn+1 (v) ∈ Fq at the end of c such that the following parity check holds v1 c1 + v2 c2 + · · · + vn cn + cn+1 = 0. Now C e (v) consists of all the codewords ce (v), where c is a codeword of C. In case v is the allones vector, then C e (v) is denoted by C e .
3.1. CODE CONSTRUCTIONS
49
Remark 3.1.7 Let C be an [n, k] code. Then it is clear that C e (v) is a linear subspace of Fn+1 , and has dimension k. So, C e (v) is an [n + 1, k] code. Suppose q G and H are generator and parity check matrices of C, respectively. Then, C e (v) has a generator matrix Ge (v) and a parity check matrix H e (v), which are given by v1 v2 · · · vn 1 g1n+1 g2n+1 0 Ge (v) = G and H e (v) = .. .. , H . . gkn+1 0 P n where the last column of Ge (v) has entries gin+1 = − j=1 gij vj . Example 3.1.8 The extension of the [7,4,3] binary Hamming code with the generator matrix given in Example 2.2.14 is equal to the [8,4,4] code with the generator matrix given in Example 2.3.26. The increase of the minimum distance by one in the extension of a code of odd minimum distance is a general phenomenon for binary codes. Proposition 3.1.9 Let C be a binary [n, k, d] code. Then C e has parameters [n + 1, k, de ] with de = d if d is even and de = d + 1 if d is odd. Proof. Let C be a binary [n, k, d] code. Then C e is an [n+1, k] code by Remark 3.1.7. The minimum distance de of the extended code satisfies d ≤ de ≤ d + 1, since wt(c) ≤ wt(ce ) ≤ wt(c) + 1 for all c ∈ C. Suppose moreover that C is a binary code. Assume that d is even. Then there is a codeword c of weight d and ce is obtained form c by extending with a zero. So ce has also weight d. If d is odd, then the claim follows, since all the codewords of the extended code C e have even weight by the parity check c1 + · · · + cn+1 = 0. Example 3.1.10 The binary [2r − 1, 2r − r − 1, 3] Hamming code Hr (2) has the extension Hr (2)e with parameters [2r , 2r − r − 1, 4]. The binary [2r − 1, r, 2r−1 ] Simplex code Sr (2) has the extension Sr (2)e with parameters [2r , r, 2r−1 ]. These claims are a direct consequence of Propositions 2.3.14 and 2.3.16, Remark 3.1.7 and Proposition 3.1.9. The operations extending and puncturing at the last position are inverse to each other. Proposition 3.1.11 Let C be a linear code of length n. Let v in Fnq . Let P = {n + 1} and Q = {n}. Then (C e (v))P = C. If the allones vector is a parity check of C, then (CQ )e = C. Proof. The first statement is a consequence of the fact that (ce (v))P = c for all words. The last statement is left as an exercise. Example 3.1.12 The puncturing of the extended binary Hamming code Hr (2)e gives the original Hamming code back. By taking subcodes appropriately, we can get some new codes. The following technique of constructing a new code involves a process of taking a subcode and puncturing.
50
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Definition 3.1.13 Let C be an [n, k, d] code. Let S be a subset of {1, . . . , n}. Let C(S) be the subcode of C consisting of all c ∈ C such that ci = 0 for all i ∈ S. The shortened code C S is defined by C S = (C(S))S . It is obtained by puncturing the subcode C(S) at S, so by deleting the co¨ordinates that are not in S. Remark 3.1.14 Let S consist of s elements. Let x ∈ Fn−s . Let xS ∈ Fnq be q S the unique word of length n such that x = (x )S and the entries of xS at the positions of S are zero, by extending x with zeros appropriately. Then x ∈ C S if and only if xS ∈ C. Furthermore xS · y = x · yS for all x ∈ Fn−s and y ∈ Fnq . q Proposition 3.1.15 Let C be an [n, k, d] code. Suppose S consists of s elements. Then the shortened code C S is an [n − s, kS , dS ] code with k − s ≤ kS ≤ k
d ≤ dS .
and
Proof. The dimension of C S is equal to the dimension of the subcode C(S) of C, and C(S) is defined by s homogeneous linear equations of the form ci = 0. This proves the statement about the dimension. The minimum distance of C S is the same as the minimum distance of C(S), and C(S) is a subcode of C. Hence d ≤ dS . Example 3.1.16 Consider the binary [8,4,4] code of Example 2.3.26. In the following diagram we show what happens with the generator matrix by shortening at the first position in the left column of the diagram, by puncturing at the first position in the right column, and by taking the dual in the upper and lower row of the diagram .
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
0 1 dual ←→ 1 1
↓ shorten at first position
1 0 0
0 1 0
0 0 1
1 1 1
0 1 1
1 0 1
1 1 0 1
1 1 1 0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
↓ puncture at first postion
1 1 0
1 0 1 1
dual ←→
1 0 1 1
1 1 0 1
1 1 1 0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Notice that the diagram commutes. This is a general fact as stated in the following proposition.
3.1. CODE CONSTRUCTIONS
51
Proposition 3.1.17 Let C be an [n, k, d] code. Let P and S be subsets of {1, . . . , n}. Then (CP )⊥ = (C ⊥ )P
and
(C S )⊥ = (C ⊥ )S ,
dim CP + dim(C ⊥ )P = P 
and
dim C S + dim(C ⊥ )S = S.
Proof. Let x ∈ (CP )⊥ . Let z ∈ C. Then zP ∈ CP . So xP · z = x · zP = 0, by Remark 3.1.14. Hence xP ∈ C ⊥ and x ∈ (C ⊥ )P . Therefore (CP )⊥ ⊆ (C ⊥ )P . Conversely, let x ∈ (C ⊥ )P . Then xP ∈ C ⊥ . Let y ∈ CP . Then y = zP for some z ∈ C. So x · y = x · zP = xP · z = 0. Hence x ∈ (CP )⊥ . Therefore (C ⊥ )P ⊆ (CP )⊥ , and if fact equality holds, since the converse inclusion was already shown. The statement on the dimensions is a direct consequence of the corresponding equality of the codes. The claim about the shortening of C with S is a consequence on the equality on the puncturing with S = P applied to the dual C. If we want to increase the size of the code without changing the code length. We can augment the code by adding a word which is not in the code. Definition 3.1.18 Let C be an Fq linear code of length n. Let v in Fnq . The augmented code, denoted by C a (v), is defined by C a (v) = { αv + c  α ∈ Fq , c ∈ C }. If v is the allones vector, then we denote C a (v) by C a . Remark 3.1.19 The augmented code C a (v) is a linear code. Suppose that G is a generator matrix of C. Then the (k +1)×n matrix Ga (v), which is obtained by adding the row v to G, is a generator matrix of C a (v) if v is not an element of C. Proposition 3.1.20 Let C be a code of minimum distance d. Suppose that the vector v is not in C and has weight w. Then min{d − w, w} ≤ d(C a (v)) ≤ min{d, w}. In particular d(C a (v)) = w if w ≤ d/2. Proof. C is a subcode and v is an element of the augmented code. This implies the upper bound. The lower bound is trivially satisfied if d ≤ w. Suppose w < d. Let x be a nonzero element of C a (v). Then x = αv + c for some α ∈ Fq and c ∈ C. If α = 0, then wt(x) = wt(c) ≥ d > w. If c = 0, then wt(x) = wt(v) = w. If α 6= 0 and c 6= 0, then c = αv − x. So d ≤ wt(c) ≤ w + wt(x). Hence d − w ≤ wt(x). If w ≤ d/2, then the upper and lower bound are both equal to w. Suppose C is a binary [n, k, d] code. We get a new code by deleting the codewords of odd weight. In other words, the new code Cev consists of all the codewords in C which have even weight. It is called the even weight subcode in Example 2.2.8. This process is also called expurgating the code C.
52
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Definition 3.1.21 Let C be an Fq linear code of length n. Let v ∈ Fnq . The expurgated code of C is denoted by Ce (v) and is defined by Ce (v) = { c  c ∈ C and c · v = 0 }. If v = 1, then Ce (1) is denoted by Ce . Proposition 3.1.22 Let C be an [n, k, d] code. Then (C a (v))⊥ = (C ⊥ )e (v). Proof. If v ∈ C, then C a (v) = C and v is a parity check of C, so (C ⊥ )e (v) = C ⊥ . Suppose v is not an element of C. Let G be a generator matrix of C. Then G is a parity check matrix of C ⊥ , by Proposition 2.3.29. Now Ga (v) is a generator matrix of C a (v) by definition. Hence Ga (v) is a parity check matrix of (C a (v))⊥ Furthermore Ga (v) is also a parity check matrix of (C ⊥ )e (v) by definition. Hence (C a (v))⊥ = (C ⊥ )e (v). Lengthening a code is a technique which combines augmenting and extending. Definition 3.1.23 Let C be an [n, k] code. Let v in Fnq . The lengthened code C l (v) is obtained by first augmenting C by v, and then extending it: C l (v) = (C a (v))e . If v = 1, then C l (v) is denoted by C l . Remark 3.1.24 The lengthening of an [n,k] code is linear code. If v is not element of C, then C l (v) is an [n + 1, k + 1] code.
3.1.2
Product codes
We describe a method for combining two codes to get a new code. In Example 2.1.2 the [9,4,4] product code is introduced. This construction will be generalized in this section. Consider the identification of the space of all n1 × n2 matrices with entries in Fq and the space Fnq , where the matrix X = (xij )1≤i≤n1 ,1≤j≤n2 is mapped to the vector x with entries x(i−1)n2 +j = xij . In other words, the rows of X are put in linear order behind each other: x = (x11 , x12 , . . . , x1n2 , x21 , . . . , x2n2 , x31 , . . . , xn1 n2 ). For α ∈ Fq and n1 × n2 matrices (xij ) and (yij ) with entries in Fq , the scalar multiplication and addition are defined by: α(xij ) = (αxij ),
and (xij ) + (yij ) = (xij + yij ).
These operations on matrices give the corresponding operations of the vectors under the identification. Hence the identification of the space of n1 ×n2 matrices and the space Fnq is an isomorphism of vector spaces. In the following these two spaces are identified. Definition 3.1.25 Let C1 and C2 be respectively [n1 , k1 , d1 ] and [n2 , k2 , d2 ] codes. Let n = n1 n2 . The product code, denoted by C1 ⊗ C2 is defined by (cij )1≤i≤n1 ∈ C1 , for all j C1 ⊗ C2 = (cij )1≤i≤n1 ,1≤j≤n2 . (cij )1≤j≤n2 ∈ C2 , for all i
3.1. CODE CONSTRUCTIONS
53
From the definition, the product code C1 ⊗ C2 is exactly the set of all n1 × n2 arrays whose columns belong to C1 and rows to C2 . In the literature, the product code is called direct product, or Kronecker product, or tensor product code. Example 3.1.26 Let C1 = C2 be the [3, 2, 2] binary even weight code. So it consists of the following codewords: (0, 0, 0), (1, 1, 0), (1, 0, 1), (0, 1, 1). This is the set of all words (m1 , m2 , m1 + m2 ) where m1 and m2 are arbitrary bits. By the definition, the following 16 arrays are the codewords of the product code C1 ⊗ C2 : m1 m2 m1 + m2 , m3 m4 m3 + m4 m1 + m3 m2 + m4 m1 + m2 + m3 + m4 where the mi are free to choose. So indeed this is the product code of Example 2.1.2. The sum of two arrays (cij ) and (c0ij ) is the array (cij + c0ij ). Therefore, C1 ⊗ C2 is a linear codes of length 9 = 3 × 3 and dimension 4 = 2 × 2. And it is clear that the minimum distance of C1 ⊗ C2 is 4 = 2 × 2. This is a general fact, but before we state this result we need some preparations. Definition 3.1.27 For two vectors x = (x1 , . . . , xn1 ) and y = (y1 , . . . , yn2 ), we define the tensor product of them, denoted by x ⊗ y, as the n1 × n2 array whose (i, j)entry is xi yj . Remark 3.1.28 It is clear that C1 ⊗ C2 is a linear code if C1 and C2 are both linear. Remark that x ⊗ y ∈ C1 ⊗ C2 if x ∈ C1 and y ∈ C2 , since the ith row of x ⊗ y is xi y ∈ C2 and the jth column is yj xT and yj x ∈ C1 . But the set of all x ⊗ y ∈ C1 ⊗ C2 with x ∈ C1 and y ∈ C2 is not equal to C1 ⊗ C2 . In the previous example 0 1 1 1 0 1 1 1 0 is in the product code, but it is not of the form x ⊗ y ∈ C1 ⊗ C2 with x ∈ C1 , since otherwise it would have at least one zero row and at least one zero column. In general, the number of elements of the form x ⊗ y ∈ C1 ⊗ C2 with x ∈ C1 and y ∈ C2 is equal to q k1 +k2 , but x ⊗ y = 0 if x = 0 or y = 0. Moreover λ(x ⊗ y) = (λx) ⊗ y = x ⊗ (λy) for all λ ∈ Fq . Hence the we get at most (q k1 − 1)(q k2 − 1)/(q − 1) + 1 of such elements. If k1 > 1 and k2 > 1 then this is smaller than q k1 k2 , the number of elements of C1 ⊗C2 according to the following proposition. Proposition 3.1.29 Let x1 , . . . , xk ∈ Fnq 1 and y1 , . . . , yk ∈ Fnq 2 . If y1 , . . . , yk are independent and x1 ⊗ y1 + · · · + xk ⊗ yk = 0, then xi = 0 for all i. Proof. Suppose that y1 , . . . , yk are independent and x1 ⊗y P 1 +· · ·+xk ⊗yk = 0. Let x be the sthe entry of x . Then the sth row of is i j xj ⊗ yj is equal to P x y , which is equal to 0 by assumption. Hence x = 0 for all j and s. js j js j Hence xj = 0 for all j.
54
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Corollary 3.1.30 Let x1 , . . . , xk1 ∈ Fnq 1 and y1 , . . . , yk2 ∈ Fnq 2 . If x1 , . . . , xk1 and y1 , . . . , yk2 are both independent, then { xi ⊗ yj  1 ≤ i ≤ k1 , 1 ≤ j ≤ k2 } is an independent set of matrices. P Proof. ∈ Fq . Then P P Suppose that i,j λij xi ⊗ yj = 0n2for certain scalars λij P ( λ x ) ⊗ y = 0 and y , . . . , y ∈ F are independent. So ij i j 1 k q 2 j i i λij xi = 0 for all j by Proposition 3.1.29. Hence λij = 0 for all i, j, since x1 , . . . , xk1 are independent. Proposition 3.1.31 Let x1 , . . . , xk1 in Fnq 1 be a basis of C1 and y1 , . . . , yk2 in Fnq 2 a basis of C2 . Then { x i ⊗ y j  1 ≤ i ≤ k1 , 1 ≤ j ≤ k2 } is a basis of C1 ⊗ C2 . Proof. The given set is an independent set by Corollary 3.1.30. This set is a subset of C1 ⊗ C2 . So the dimension of C1 ⊗ C2 is at least k1 k2 . Now we will show that they form in fact a basis for C1 ⊗ C2 . Without loss of generality we may assume that C1 is systematic at the first k1 coordinates with generator matrix (Ik1 A) and C2 is systematic at the first k2 coordinates with generator matrix (Ik2 B). Then U is an l × n2 matrix, with rows in C2 if and only if U = (M M B), where M is an l × k2 matrix. And V is an n1 × m matrix, with columns in C1 if and only if V T = (N N A), where N is an m × k1 matrix. Now let M be an k1 × k2 matrix. Then (M M B) is a k1 × n2 matrix with rows in C2 , and M AT M is an n1 × k2 matrix with columns in C1 . Therefore MB M AT M AT M B is an n1 × n2 matrix with columns in C1 and rows in C2 for every k1 × k2 matrix M . And conversely every codeword of C1 ⊗ C2 is of this form. Hence the dimension of C1 ⊗ C2 is equal to k1 k2 and the given set is a basis of C1 ⊗ C2 . Theorem 3.1.32 Let C1 and C2 be respectively [n1 , k1 , d1 ] and [n2 , k2 , d2 ]. Then the product code C1 ⊗ C2 is an [n1 n2 , k1 k2 , d1 d2 ] code. Proof. By definition n = n1 n2 is the length of the product code. It was already mentioned that C1 ⊗ C2 is a linear subspace of Fnq 1 n2 . The dimension of the product code is k1 k2 by Proposition 3.1.31. Next, we prove that the minimum distance of C1 ⊗C2 is d1 d2 . For any codeword of C1 ⊗ C2 , which is a n1 × n2 array, every nonzero column has weight ≥ d1 , and every nonzero row has weight ≥ d2 . So, the weight of a nonzero codeword of the product code is at least d1 d2 . This implies that the minimum distance of C1 ⊗ C2 is at least d1 d2 . Now suppose x ∈ C1 has weight d1 , and y ∈ C2 has weight d2 . Then, x ⊗ y is a codeword of C1 ⊗ C2 and has weight d1 d2 .
3.1. CODE CONSTRUCTIONS
55
Definition 3.1.33 Let A = (aij ) be a k1 × n1 matrix and B = (bij ) a k2 × n2 matrix. The Kronecker product or tensor product A ⊗ B of A and B is the k1 k2 × n1 n2 matrix obtained from A by replacing every entry aij by aij B. Remark 3.1.34 The tensor product x ⊗ y of the two row vectors x and y of length n1 and n2 , respectively, as defined in Definition 3.1.27 is the same as the Kronecker product of xT and y, now considered as n1 × 1 and 1 × n2 matrices, respectively, as in Definition 3.1.33. Proposition 3.1.35 Let G1 be a generator matrix of C1 , and G2 a generator matrix of C2 . Then G1 ⊗ G2 is a generator matrix of C1 ⊗ C2 . Proof. In this proposition the codewords are considered as elements of Fnq and no longer as matrices. Let xi the ith row of G1 , and denote by yj the jth row of G2 . So x1 , . . . , xk1 ∈ Fnq 1 is a basis of C1 and y1 , . . . , yk2 ∈ Fnq 2 is a basis of C2 . Hence the set {xi ⊗ yj  1 ≤ i ≤ k1 , 1 ≤ j ≤ k2 } is a basis of C1 ⊗ C2 by Proposition 3.1.31. Furthermore, if l = (i − 1)k2 + j, then xi ⊗ yj is the lth row of G1 ⊗ G2 . Hence the matrix G1 ⊗ G2 is a generator matrix of C1 ⊗ C2 . Example 3.1.36 Consider the ternary codes C1 and C2 with generator matrices 1 1 1 0 1 1 1 G1 = and G2 = 0 1 2 0 , 0 1 2 0 1 1 1 respectively. Then G1 ⊗ G2 =
1 0 0 0 0 0
1 1 1 0 0 0
1 2 1 0 0 0
0 0 1 0 0 0
The second row of G1 is x2 = (0, 1, 2) G2 . Then x2 ⊗ y2 is equal to 0 0 0 1 0 2
1 0 0 1 0 0
1 1 1 1 1 1
1 2 1 1 2 1
0 0 1 0 0 1
1 0 0 2 0 0
1 1 1 2 2 2
1 2 1 2 1 2
0 0 1 0 0 2
.
and y2 = (0, 1, 2, 0) is the second row of 0 2 1
0 0 , 0
considered as a matrix, and equal to (0, 0, 0, 0, 0, 1, 2, 0, 0, 2, 1, 0) written as a vector, which is indeed equal to the (2 − 1)3 + 2 = 5th row of G1 ⊗ G2 .
3.1.3
Several sum constructions
We have seen that given an [n1 , k1 ] code C1 and an [n2 , k2 ] code C2 , by the product construction, we get an [n1 n2 , k1 k2 ] code. The product code has information rate (k1 k2 )/(n1 n2 ) = R1 R2 , where R1 and R2 are the rates of C1 and C2 , respectively. In this subsection, we introduce some simple constructions by which we can get new codes with greater rate from two given codes.
56
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Definition 3.1.37 Given an [n1 , k1 ] code C1 and an [n2 , k2 ] code. Their direct sum C1 ⊕ C2 , also called (uv) construction is defined by C1 ⊕ C2 = { (uv)  u ∈ C1 , v ∈ C2 }, where (uv) denotes the word (u1 , . . . , un1 , v1 , . . . , vn2 ) if u = (u1 , . . . , un1 ) and v = (v1 , . . . , vn2 ). Proposition 3.1.38 Let Ci be an [ni , ki , di ] code with generator matrix Gi for i = 1, 2. Let d = min{d1 , d2 }. Then C1 ⊕ C2 is an [n1 + n2 , k1 + k2 , d] code with generator matrix G1 0 . G= 0 G2 Proof. Let x1 , . . . , xk1 and y1 , . . . , yk2 be bases of C1 and C2 , respectively. Then (x1 0), . . . , (xk1 0), (0y1 ), . . . , (0yk2 ) is a basis of the direct sum code. Therefore, the direct sum is an [n1 +n2 , k1 +k2 ] with the given generator matrix G. The minimum distance of the direct sum is min{d1 , d2 }. The direct sum or (uv) construction is defined by the juxtaposition of arbitrary codewords u ∈ C1 and v ∈ C2 . In the following definition only a restricted set pairs of codewords are put behind each other. This definition depends on the choice of the generator matrices of the codes C1 and C2 . Definition 3.1.39 Let C1 be an [n1 , k, d1 ] code and C2 an [n2 , k, d2 ] code with generator matrices G1 and G2 , respectively. The juxtaposition of the codes C1 and C2 is the code with generator matrix (G1 G2 ). Proposition 3.1.40 Let Ci be an [ni , k, di ] code for i = 1, 2. Then the juxtaposition of the codes C1 and C2 is an [n1 + n2 , k, d] with d ≥ d1 + d2 . Proof. The length and the dimension are clear from the definition. A nonzero codeword c is of the form mG = (mG1 , mG2 ) for a nonzero element m in Fkq . So mGi is a nonzero codeword of Ci . Hence the weight of c is at least d1 + d2 . The rate of the direct sum is (k1 +k2 )/(n1 +n2 ), which is greater than (k1 k2 )/(n1 n2 ), the rate of the product code. Now a more intelligent construction is studied. Definition 3.1.41 Let C1 be an [n, k1 , d1 ] code and C2 an [n, k2 , d2 ] code, respectively. The (uu + v) construction is the following code {( uu + v)  u ∈ C1 , v ∈ C2 }. Theorem 3.1.42 Let Ci be an [n, ki , di ] code with generator matrix Gi for i = 1, 2. Then the (uu + v) construction of C1 and C2 is an [2n, k1 + k2 , d] code with minimum distance d = min{2d1 , d2 } and generator matrix G=
G1 0
G1 G2
.
3.1. CODE CONSTRUCTIONS
57
Proof. It is straightforward to check the linearity of the (uu+v) construction. Suppose x1 , . . . , xk1 and y1 , . . . , yk2 are bases of C1 and C2 , respectively. Then, it is easy to see that (x1 x1 ), . . . , (xk1 xk1 ), (0y1 ), . . . , (0yk2 ) is a basis of the (uu+v) construction. So, it is an [2n, k1 +k2 ] with generator matrix G as given. Consider the minimum distance d of the (uu + v) construction. For any codeword (xx + y), we have wt(xx + y) = wt(x) + wt(x + y). If y = 0, then wt(xx + y) = 2wt(x) ≥ 2d1 . If y 6= 0, then wt(xx + y) = wt(x) + wt(x + y) ≥ wt(x) + wt(y) − wt(x) = wt(y) ≥ d2 . Hence, d ≥ min{2d1 , d2 }. Let x0 be a codeword of C1 with weight d1 , and y0 be a codeword of C2 with weight d2 . Then, either (x0 x0 ) or (0y0 ) has weight min{2d1 , d2 }. Example 3.1.43 The (uu + v) construction of the binary even weight [4,3,2] code and the 4tuple repetition [4,1,4] code gives a [8,4,4] code with generator matrix 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 , 0 0 0 0 1 1 1 1 which is equivalent with the extended Hamming code of Example 2.3.26. Remark 3.1.44 For two vectors u of length n1 and v of length n2 , we can still define the sum u + v as a vector of length max{n1 , n2 }, by adding enough zeros at the end of the shorter vector. From this definition of sum, the (uu + v) construction still works for codes C1 and C2 of different lengths. Proposition 3.1.45 If C1 is an [n1 , k1 , d1 ] code, and C2 is an [n2 , k2 , d2 ] code, then the (uu + v) construction is an [n1 + max{n1 , n2 }, k1 + k2 , min{2d1 , d2 }] linear code. Proof. The proof is similar to the proof of Theorem 3.1.42.
Definition 3.1.46 The (u + vu − v) construction is a slightly modified construction, which is defined as the following code { (u + vu − v)  u ∈ C1 , v ∈ C2 }. When we consider this construction, we restrict ourselves to the case q odd. Since u + v = u − v if q is even. Proposition 3.1.47 Let Ci be an [n, ki , di ] code with generator matrix Gi for i = 1, 2. Assume that q is odd. Then, the (u + vu − v) construction of C1 and C2 is an [2n, k1 + k2 , d] code with d ≥ min{2d1 , 2d2 , max{d1 , d2 }} and generator matrix G1 G1 G= . G2 −G2
58
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Proof. The proof of the proposition is similar to that of Theorem 3.1.42. In fact, suppose x1 , . . . , xk1 and y1 , . . . , yk2 are bases of C1 and C2 , respectively, then every codeword is of the form (u + vu − v) = (uu) + (v − v). With u ∈ C1 and v ∈ C2 . So (uu) is a linear combination of (x1 x1 ), . . . , (xk1 xk1 ), and (v − v) is a linear combination of (y1  − y1 ), . . . , (yk2  − yk2 ). Using the assumption that q is odd, we can prove that this set of vectors (xi xi ), (yj  − yj ) is linearly independent. Suppose that X
λi (xi xi ) +
i
X
µj (yj  − yj ) = 0,
j
Then P λx Pi i i i λi xi
P + µ y Pj j j − j µj yj
= 0, = 0. P Adding the two equations and dividing by 2 gives i λi xi = 0. So λi = 0 for all i, since the xi are independent. Similarly, the substraction of the equations gives that µj = 0 for all j. So the (xi xi ), (yj  − yj ) are independent and generate the code. Hence they form a basis and this shows that the given G is a generator matrix of this construction. Let (u + vu − v) be a nonzero codeword. The weight of this word is at least 2d1 if v = 0, and at least 2d2 if u = 0. Now suppose u 6= 0 and v 6= 0. Then the weight of u − v is at least wt(u) − w, where w is the number of positions i such that ui = vi 6= 0. If ui = vi 6= 0, then ui + vi 6= 0, since q is odd. Hence wt(u + v) ≥ w, and (u + vu − v) ≥ w + (wt(u) − w) = wt(u) ≥ d1 . In the same way wt(u + vu − v) ≥ d2 . Hence wt(u + vu − v) ≥ max{d1 , d2 }. This proofs the estimate on the minimum distance. Example 3.1.48 Consider the following ternary codes C1 = {000, 110, 220},
C2 = {000, 011, 022}.
They are [3, 1, 2] codes. The (u + vu − v) construction of these codes is a [6, 2, d] code with d ≥ 2 by Proposition 3.1.47. It consists of the following nine codewords: (0, 0, 0, 0, 0, 0), (1, 1, 0, 1, 1, 0), (2, 2, 0, 2, 2, 0),
(0, 1, 1, 0, 2, 2), (1, 2, 1, 1, 0, 2), (2, 0, 1, 2, 1, 2),
(0, 2, 2, 0, 1, 1), (1, 0, 2, 1, 2, 1), (2, 1, 2, 2, 0, 1).
Hence d = 4. On the other hand, by the (uu + v) construction, we get a [6, 2, 2] code, which has a smaller minimum distance than the (u+vu−v) construction. Now a more complicated construction is given. Definition 3.1.49 Let C1 and C2 be [n, k1 ] and [n, k2 ] codes, respectively. The (a + xb + xa + b − x) construction of C1 and C2 is the following code { (a + xb + xa + b − x)  a, b ∈ C1 , x ∈ C2 }
3.1. CODE CONSTRUCTIONS
59
Proposition 3.1.50 Let C1 and C2 be [n, k1 ] and [n, k2 ] codes over Fq , respectively. Suppose q is not a power of 3. Then, the (a + xb + xa + b − x) construction of C1 and C2 is an [3n, 2k1 + k2 ] code with generator matrix G1 G1 0 G = 0 G1 G1 . G2 G2 −G2 Proof. Let x1 , . . . , xk1 and y1 , . . . , yk2 be bases of C1 and C2 , respectively. Consider the following 2k1 + k2 vectors (x1 0x1 ), . . . , (xk1 0xk1 ), (0x1 x1 ), . . . , (0xk1 xk1 ), (y1 y1  − y1 ), . . . , (yk2 yk2  − yk2 ). It is left as an exercise to check that they form a basis of this construction in case q is not a power of 3. This shows that the given G is a generator matrix of the code and that it dimension is 2k1 + k2 . For binary codes, some simple inequalities, for example, Exercise 3.1.9, can be used to estimate the minimum distance of the last construction. In general we have the following estimate for the minimum distance. Proposition 3.1.51 Let C1 and C2 be [n, k1 , d1 ] and [n, k2 , d2 ] codes over Fq , respectively. Suppose q is not a power of 3. Let d0 and d3 be the minimum distance of C1 ∩ C2 and C1 + C2 , respectively. Then, the minimum distance d of the (a+xb+xa+b−x) construction of C1 and C2 is at least min{d0 , 2d1 , 3d3 }. Proof. This is left as an exercise.
The choice of the minus sign in the (a + xb + xa + b − x) construction becomes apparent in the construction of selfdual codes over Fq for arbitrary q not divisible by 3. Proposition 3.1.52 Let C1 and C2 be selfdual [2k,k] codes. The the codes obtained from C1 and C2 by the direct sum, the (uu + v) if C1 = C2 , and the (u + vu − v) constructions and the (a + xb + xa + b − x) construction in case q is not divisible by 3 are also selfdual. Proof. The generator matrix Gi of Ci has size k × 2k and satisfies Gi GTi = 0 for i = 1, 2. In all the constructions the generator matrix G has size 2k × 4k or 3k × 6k as given in Theorem 3.1.42 and Propositions 3.1.38, 3.1.48 and 3.1.50 satisfies also GGT = 0. For instance in the case of the (a + xb + xa + b − x) construction we have T G1 0 GT2 G1 0 G1 GT2 . GGT = 0 G1 G1 0 GT1 T T G2 G2 −G2 G1 G1 −GT2 All the entries in this product are the sum of terms of the form Gi GTi or G1 GT2 − G1 GT2 which are all zero. Hence GGT = 0.
60
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Example 3.1.53 Let C1 be the binary [8, 4, 4] selfdual code with the generator matrix G1 of the form (I4 A1 ) as given in Example 2.3.26. Let C2 be the code with generator matrix G2 = (I4 A2 ) where A2 is obtained from A1 by a cyclic shift of the columns. 1 0 1 1 0 1 1 1 1 1 0 1 1 0 1 1 A1 = 1 1 0 1 , A2 = 1 1 1 0 . 0 1 1 1 1 1 1 0 The codes C1 and C2 are both [8, 4, 4] selfdual codes and C1 ∩ C2 = {0, 1} and C1 +C2 is the even weight code. Let C be the (a+xb+xa+b+x) construction applied to C1 and C2 . Then C is a binary selfdual [24, 12, 8] code. The claim on the minimum distance is the only remaining statement to verify, by Proposition 3.1.52. Let G be the generator matrix of C as given in Proposition 3.1.50. The weights of the rows of G are all divisible by 4. Hence the weights of all codewords are divisible by 4 by Exercise ??. Let c = (a + xb + xa + b + x) be a nonzero codeword with a, b ∈ C1 and x ∈ C2 . If a + x = 0, then a = x ∈ C1 ∩ C2 . So a = x = 0 and c = (0bb) or a = x = 1 and c = (0b + 1b), and in both cases ¯ the weight of c is at least 8, since the weight of b is at least 4 and the weight of 1 is 8. Similarly it is argued that the weight of c is at least 8 if b + x = 0 or a + b + x = 0. So we may assume that neither of a + x, b + x, nor a + b + x is zero. Hence all three are nonzero even weight codewords and wt(c) ≥ 6. But the weight is divisible by 4. Hence the minimum distance is at least 8. Let a be a codeword of C1 of weight 4, then c = (a, 0, a) is a codeword of weight 8. In this way we have constructed a binary selfdual [24, 12, 8] code. It is called the extended binary Golay code. The binary Golay code is the [23, 12, 7] code obtained by puncturing one coordinate.
3.1.4
Concatenated codes
For this section we need some theory of finite fields. See Section 7.2.1. Let q be a prime power and k a positive integer. The finite field Fqk with q k elements contains Fq as a subfield. Now Fqk is a kdimensional vector over Fq . Let ξ1 , . . . , ξk be a basis of Fqk over Fq . Consider the map ϕ : Fkq −→ Fqk . defined by ϕ(a) = a1 ξ1 + · · · + ak ξk . Then ϕ is an isomorphism of vector spaces with inverse map ϕ−1 . The vector space FK×k of K × k matrices over Fq form a vector space of dimenq sion Kk over Fq and it is linear isometric with FKk by taking some ordering of q the Kk entries of such matrices. Let M be a K × k matrix over Fq with ith row mi . The map ϕK : FK×k −→ FK q qk is defined by ϕK (M ) = (ϕ(m1 ), . . . , ϕ(mK )). The inverse map N N ×k ϕ−1 N : Fq k −→ Fq
is given by ϕ−1 N (a1 , . . . , an ) = P , where P is the N × k matrix with ith row pi = ϕ−1 (ai ).
3.1. CODE CONSTRUCTIONS
61
Let A be an [N, K] code over Fqk , and B an [n, k] code over Fq . Let GA and GB be generator matrices of A and B, respectively. The N fold direct sum (N ) (N ) k n GB = GB ⊕ · · · ⊕ GB : FN → FN is defined by GB (P ) which is the N × n q q matrix Q with ith row qi = pi GB for a given N × k matrix P with ith row pi in Fkq . By the following concatenation procedure a message of length Kk over Fq is encoded to a codeword of length N n over Fq . Step 1: The K × k matrix M is mapped to m = ϕK (M ). N Step 2: m in FK q k is mapped to a = mGA in Fq k . −1 Step 3: a in FN q k is mapped to P = ϕN (a).
Step 4: The N × k matrix P with ith row pi is mapped to the N × n matrix Q with ith row qi = pi GB . The encoding map ×n E : FK×k −→ FN q q
is the composition of the four maps explained above: (N )
E = GB
◦ ϕ−1 N ◦ GA ◦ ϕK .
Let C = { E(M )  M ∈ FqK×k }. We call C the concatenated code with outer code A and inner code B. Theorem 3.1.54 Let A be an [N, K, D] code over Fqk , and B an [n, k, d] code over Fq . Let C be the concatenated code with outer code A and inner code B. Then C is an Fq linear [N n, Kk] code and its minimum distance is at least Dd. Proof. The encoding map E is an Fq linear map, since it is a composition of four Fq linear maps. The first and third map are isomorphisms, and the second and last map are injective, since they are given by generator matrices of full rank. Hence E is injective. Hence the concatenated code C is an Fq linear code of length N n and dimension Kk. Next, consider the minimum distance of C. Since A is an [N, K, D] code, every nonzero codeword a obtained in Step 2 has weight at least D. As a result, the N × k matrix P obtained from Step 3 has at least D nonzero rows pi . Now, because B is a [n, k, d] code, every pi GB has weight d, if pi is not zero. Therefore, the minimum distance of C is at least Dd. Example 3.1.55 The definition of the concatenated code depends on the choice of the map ϕ that is on the choice of the basis ξ1 , . . . , ξn . In fact the minimum distance of the concatenated code can be strictly larger than Dd as the following example shows. The field F9 contains the ternary field F3 as a subfield and an element ξ such that ξ 2 = 1 + ξ, since the polynomial X 2 − X − 1 is irreducible in F3 [X]. Now take ξ1 = 1 and ξ2 = ξ as a basis of F9 over F3 . Let A be the [2, 1, 2] outer code
62
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
over F9 with generator matrix GA = [1, ξ 2 ]. Let B be the trivial [2, 2, 1] code over F3 with generator matrix GB = I2 . Let M = (m1 , m2 ) ∈ F1×2 . Then m = 3 ϕ1 (M ) = m1 + m2 ξ ∈ F9 . So a = mGA = (m1 + m2 ξ, (m1 + m2 ) + (m1 − m2 )ξ), since ξ 3 = 1 − ξ. Hence m1 m2 Q = P = ϕ−1 (a) = . 2 m1 + m2 m1 − m2 Therefore the concatenated code has minimum distance 3 > Dd. Suppose we would have taken ξ10 = 1 and ξ20 = ξ 2 as a basis instead. Take M = (1, 0). Then m = ϕ1 (M ) = 1 ∈ F9 . So a = mGA = (1, ξ 2 ). Hence Q = P = ϕ−1 2 (a) = I2 is a codeword in the concatenated code that has weight 2 = Dd. Thus, the definition and the parameters of a concatenated code dependent on the specific choice of the map ϕ.
3.1.5
Exercises
3.1.1 Prove Proposition 3.1.11. 3.1.2 Let C be the binary [9,4,4] product code of Example 2.1.2. Show that puncturing C at the position i gives a [8,4,3] code for every choice of i = 1, . . . , 9. Is it possible to obtain the binary [7,4,3] Hamming code by puncturing C? Show that shortening C at the position i gives a [8,3,4] code for every choice of i. Is it possible to obtain the binary [7,3,4] Simplex code by a combination of puncturing and shortening the product code? 3.1.3 Suppose that there exists an [n0 , k 0 , d0 ]q code and an [n, k, d]q code with a [n, k − k 0 , d + d0 ]q subcode. Use a generalization of the construction for C e (v) to show that there exists an [n + n0 , k, d + d0 ]q code. 3.1.4 Let C be a binary code with minimum distance d. Let d0 be the largest weight of any codeword of C. Suppose that the allones vector is not in C. Then, the augmented code C a has minimum distance min{d, n − d0 }. 3.1.5 Let C be an Fq linear code of length n. Let v ∈ Fnq and S = {n + 1}. Suppose that the allones vector is a parity check of C but not of v. Show that (C l (c))S = C. 3.1.6 Show that the shortened binary [7,3,4] code is a product code of codes of length 2 and 3. 3.1.7 Let C be a nontrivial linear code of length n. Then C is the direct sum of two codes of lengths strictly smaller than n if and only if C = v ∗ C for some v ∈ Fnq with nonzero entries that are not all the same. 3.1.8 Show that the punctured binary [7,3,4] is equal to the (uu + v) construction of a [3, 2, 2] code and a [3, 1, 3] code. 3.1.9 For binary vectors a, b and x, wt(a + xb + xa + b + x) ≥ 2wt(a + b + a ∗ b) − wt(x), with equality if and only if ai = 1 or bi = 1 or xi = 0 for all i, where a ∗ b = (a1 b1 , . . . , an bn ).
3.2. BOUNDS ON CODES
63
3.1.10 Give a parity check matrix for the direct sum, the (uu + v), the (u + vu − v) and the (a + xb + xa + b − x) construction in terms of the parity check matrices H1 and H2 of the codes C1 and C2 , respectively. 3.1.11 Give proofs of Propositions 3.1.50 and 3.1.51. 3.1.12 Let Ci be an [n, ki , di ] code over Fq for i = 1, 2, where q is a power of 3. Let k0 be the dimension of C1 ∩ C2 and d3 the minimum distance of C1 + C2 . Show that the (a + xb + xa + b − x) construction with C1 and C2 gives a [3n, 2k1 + k2 − k0 , d] code with d ≥ min{2d1 , 3d3 }. 3.1.13 Show that C1 ∩ C2 = {0, 1} and C1 + C2 is the even weight code, for the codes C1 and C2 of Example 3.1.53. 3.1.14 Show the existence of a binary [45,15,16] code. 3.1.15 Show the existence of a binary selfdual [72,36,12] code. 3.1.16 [CAS] Construct a binary random [100, 50] code and make sure that identities from Proposition 3.1.17 take place for different position sets: the last position, the last five, the random five. 3.1.17 [CAS] Write procedures that take generator matrices G1 and G2 of the codes C1 and C2 and return a matrix G that is the generator matrix of the code C, which is the result of the • (u + vu − v)construction of Proposition 3.1.47; • (a + xb + xa + b − x)construction of Proposition 3.1.50. 3.1.18 [CAS] Using the previous exercise construct the extended Golay code as in Example 3.1.53. Compare this code with the one returned by ExtendedBinary GolayCode() (in GAP) and GolayCode(GF(2),true) (in Magma). 3.1.19 Show by means of an example that the concatenation of an [3, 2, 2] outer and [, 2, 2, 1] inner code gives a [6, 4] code of minimum distance 2 or 3 depending on the choice of the basis of the extended field.
3.2
Bounds on codes
We have introduced some parameters of a linear code in the previous sections. In coding theory one of the most basic problems is to find the best value of a parameter when other parameters have been given. In this section, we discuss some bounds on the code parameters.
3.2.1
Singleton bound and MDS codes
The following bound gives us the maximal minimum distance of a code with a given length and dimension. This bound is called the Singleton bound. Theorem 3.2.1 (The Singleton Bound) d ≤ n − k + 1.
If C is an [n, k, d] code, then
64
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Proof. Let H be a parity check matrix of C. This is an (n − k) × n matrix of row rank n − k. The minimum distance of C is the smallest integer d such that H has d linearly dependent columns, by Proposition 2.3.11. This means that every d − 1 columns of H are linearly independent. Hence, the column rank of H is at least d − 1. By the fact that the column rank of a matrix is equal to the row rank, we have n − k ≥ d − 1. This implies the Singleton bound. Definition 3.2.2 Let C be an [n, k, d] code. If d = n − k + 1, then C is called a maximum distance separable code or an MDS code, for short. Remark 3.2.3 From the Singleton bound, a maximum distance separable code achieves the maximum possible value for the minimum distance given the code length and dimension. Example 3.2.4 The minimum distance of the the zero code of length n is n+1, by definition. Hence the zero code has parameters [n, 0, n + 1] and is MDS. Its dual is the whole space Fnq with parameters [n, n, 1] and is also MDS. The nfold repetition code has parameters [n, 1, n] and its dual is an [n, n − 1, 2] code and both are MDS. Proposition 3.2.5 Let C be an [n, k, d] code over Fq . Let G be a generator matrix and H a parity check matrix of C. Then the following statements are equivalent: (1) C is an MDS code, (2) every (n − k)tuple of columns of a parity check matrix H are linearly independent, (3) every ktuple of columns of a generator matrix G are linearly independent. Proof. As the minimum distance of C is d any d − 1 columns of H are linearly independent, by Proposition 2.3.11. Now d ≤ n − k + 1 by the Singleton bound. So d = n − k + 1 if and only if every n − k columns of H are independent. Hence (1) and (2) are equivalent. Now let us assume (3). Let c be an element of C which is zero at k given co¨ ordinates. Let c = xG for some x ∈ Fkq . Let G0 be the square matrix consisting of the k columns of G corresponding to the k given zero co¨ordinates of c. Then xG0 = 0. Hence x = 0, since the k columns of G0 are independent by assumption. So c = 0. This implies that the minimum distance of C is at least n − (k − 1) = n − k + 1. Therefore C is an [n, k, n − k + 1] MDS code, by the Singleton bound. Assume that C is MDS. Let G be a generator matrix of C. Let G0 be the square matrix consisting of k chosen columns of G. Let x ∈ Fkq such that xG0 = 0. Then c = xG is codeword and its weight is at most n − k. So c = 0, since the minimum distance is n − k + 1. Hence x = 0, since the rank of G is k. Therefore the k columns are independent. Example 3.2.6 Consider the code C over F5 of length 5 and dimension 2 with generator matrix 1 1 1 1 1 G= . 0 1 2 3 4
3.2. BOUNDS ON CODES
65
Note that while the first row of the generator matrix is the all 1’s vector, the entries of the second row are distinct. Since every codeword of C is a linear combination of the first and second row, the minimum distance of C is at least 5. On the other hand, the second row is a word of weight 4. Hence C is a [5, 2, 4] MDS code. The matrix G is a parity check matrix for the dual code C ⊥ . All columns of G are nonzero, and every two columns are independent since 1 1 det = j − i 6= 0 i j for all 0 ≤ i < j ≤ 4. Therefore, C ⊥ is also an MDS code. In fact, we have the following general result. Corollary 3.2.7 The dual of an [n, k, n − k + 1] MDS code is an [n, n − k, k + 1] MDS code. Proof. The trivial codes are MDS and are dual of each other by Example 3.2.4. Assume 0 < k < n. Let H be a parity check matrix of an [n, k, n − k + 1] MDS code C. Then any n − k columns of H are linearly independent, by (2) of Proposition 3.2.5. Now H is a generator matrix of the dual code. Therefore C ⊥ is an [n, n − k, k + 1] MDS code, since (3) of Proposition 3.2.5 holds. Definition 3.2.8 Let a be a vector of Fkq . Then V (a) is the Vandermonde matrix with entries ai−1 j . Lemma 3.2.9 Let a be a vector of Fkq . Then det V (a) =
Y
(ais − air ).
1≤r
Proof. This is left as an exercise.
Proposition 3.2.10 Let n ≤ q. Let a = (a1 , . . . , an ) be an ntuple of mutually distinct elements of Fq . Let k be an integer such that 0 ≤ k ≤ n. Define the matrices Gk (a) and G0k (a) by Gk (a) =
1 a1 .. .
··· ··· .. .
1 an .. .
ak−1 1
···
ak−1 n
and G0k (a) =
1 a1 .. .
··· ··· .. .
1 an .. .
0 0 .. .
ak−1 1
···
ak−1 n
1
.
The codes with generator matrix Gk (a) and G0k (a) are MDS codes. Proof. Consider a k × k submatrix of G(a). Then this is a Vandermonde matrix and its determinant is not zero by Lemma 3.2.9, since the ai are mutually distinct. So any system of k columns of Gk (a) is independent. Hence Gk (a) is the generator matrix of an MDS code by Proposition 3.2.5. The proof for G0k (a) is similar and is left as an exercise.
66
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Remark 3.2.11 The codes defined in Proposition 3.2.10 are called generalized ReedSolomon codes and are the prime examples of MDS codes. These codes will be treated in Section 8.1. The notion of an MDS code has a nice interpretation of n points in general position in projective space as we will see in Section 4.3.1. The following proposition shows the existence of MDS codes over Fq with parameters [n, k, n − k + 1] for all possible values of k and n such that 0 ≤ k ≤ n ≤ q + 1. Example 3.2.12 Let q be a power of 2. Let n = q + 2 and a1 , a2 , . . . , aq be an enumeration of the elements of Fq . Consider the code C with generator matrix 1 1 ... 1 0 0 a1 a2 . . . aq 0 1 2 a1 a22 . . . a2q 1 0 Then any 3 columns of this matrix are independent, since by the Proposition 3.2.10, the only remaining nontrivial case to check is 1 1 0 ai aj 1 = −(a2j − a2i ) = (ai − aj )2 6= 0, in characteristic 2 2 ai a2j 0 for all 1 ≤ i < j ≤ q − 1. Hence C is a [q + 2, 3, q] code. Remark 3.2.13 From (3) of Proposition 3.2.5 and Proposition 2.2.22 we see that any k symbols of the codewords of an MDS code of dimension k may be taken as message symbols. This is another reason for the name of maximum distance separable codes. Corollary 3.2.14 Let C be an [n, k, d] code. Then C is MDS if and only if for any given d co¨ ordinate positions i1 , i2 , . . . , id , there is a minimum weight codeword with the set of these positions as support. Furthermore two codewords of an MDS code of minimum weight with the same support are a nonzero multiple of each other. Proof. Let G be a generator matrix of C. Suppose d < n − k + 1. There exist k positions j1 , j2 , . . . , jk such that the columns of G at these positions are independent. The complement of these k positions consists of n − k elements and d ≤ n−k. Choose a subset {i1 , i2 , . . . , id } of d elements in this complement. Let c be a codeword with support that is contained in {i1 , i2 , . . . , id }. Then c is zero at the positions j1 , j2 , . . . , jk . Hence c = 0 and the support of c is empty. If C is MDS, then d = n−k +1. Let {i1 , i2 , . . . , id } be a set of d co¨ordinate positions. Then the complement of this set consists of k −1 elements j1 , j2 , . . . , jk−1 . Let jk = i1 . Then j1 , j2 , . . . , jk are k elements that can be used for systematic encoding by Remark 3.2.13. So there is a unique codeword c such that cj = 0 for all j = j1 , j2 , . . . , jk−1 and cjk = 1. Hence c is a nonzero codeword of weight at most d and support contained in {i1 , i2 , . . . , id }. Therefore c is a codeword of weight d and support equal to {i1 , i2 , . . . , id }, since d is the minimum weight of the code. Furthermore, let c0 be another codeword of weight d and support equal to
3.2. BOUNDS ON CODES
67
{i1 , i2 , . . . , id }. Then c0j = 0 for all j = j1 , j2 , . . . , jk−1 and c0jk 6= 0. Then c0 and c0jk c are two codewords that coincide at j1 , j2 , . . . , jk . Hence c0 = c0jk c. Remark 3.2.15 It follows from Corollary 3.2.14 that the number of nonzero codewords of an [n, k] MDS code of minimum weight n − k + 1 is equal to n (q − 1) . n−k+1 In Section 4.1, we will introduce the weight distribution of a linear code. Using the above result the weight distribution of an MDS code can be completely determined. This will be determined in Proposition 4.4.22. Remark 3.2.16 Let C be an [n, k, n − k + 1 code. Then it is systematic at the first k positions. Hence C has a generator matrix of the form (Ik A). It is left as an exercise to show that every square submatrix of A is nonsingular. The converse is also true. Definition 3.2.17 Let n ≤ q. Let a, b, r and s be vectors of Fkq such that ai 6= bj for all i, j. Then C(a, b) is the k × k Cauchy matrix with entries 1/(ai − bj ), and C(a, b; r, s) is the k × k generalized Cauchy matrix with entries ri sj /(ai − bj ). Let k be an integer such that 0 ≤ k ≤ n. Let A(a) be the k × (n − k) matrix with entries 1/(aj+k − ai ) for 1 ≤ i ≤ k, 1 ≤ j ≤ n − k. Then the Cauchy code Ck (a) is the code with generator matrix (Ik A(a)). If ri is not zero for all i, then A(a, r) is the k × (n − k) matrix with entries rj+k ri−1 for 1 ≤ i ≤ k, 1 ≤ j ≤ n − k. aj+k − ai The generalized Cauchy code Ck (a, r) is the code with generator matrix (Ik A(a, r)). Lemma 3.2.18 Let a, b, r and s be vectors of Fkq such that ai 6= bj for all i, j. Then Q n n Y Y i
Proposition 3.2.19 Let n ≤ q. Let a be an ntuple of mutually distinct elements of Fq , and r an ntuple of nonzero elements of Fq . Let k be an integer such that 0 ≤ k ≤ n. Then the generalized Cauchy code Ck (a, r) is an [n, k, n − k + 1] code. Proof. Every square t × t submatrix of A is Cauchy matrix of the form −1 C((ai1 , . . . , ait ), (ak+j1 , . . . , ak+jt ); (b−1 i1 , . . . , bit ), (bk+j1 , . . . , bk+jt )). The determinant of this matrix is not zero by Lemma 3.2.18, since the entries of a are mutually distinct and the entries of r are not zero. Hence (Ik A(a, r)) is the generator matrix of an MDS code by Remark 3.2.16. In Section 8.1 it will be shown that generalized ReedSolomon codes and Cauchy codes are the same.
68
3.2.2
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Griesmer bound
Clearly, the Singleton bound can be viewed as a lower bound on the code length n with given dimension k and minimum distance d, that is n ≥ d + k − 1. In this subsection, we will give another lower bound on the length. Theorem 3.2.20 (The Griesmer Bound) If C is an [n, k, d] code with k > 0, then k−1 X d . n≥ qi i=0 Note that the Griesmer bound implies the Singleton bound. In fact, we have dd/q 0 e = d and dd/q i e ≥ 1 for i = 1, . . . , k − 1, which follow the Singleton bound. In the previous Section 3.1 we introduced some methods to construct new codes from a given code. In the following, we give another construction of a new code, which will be used to prove Theorem 3.2.20. Let C be an [n, k, d] code, and c be a codeword with w = wt(c). Let I = supp(c) (see the definition in Subsection 2.1.2). The residual code of C with respect to c, denoted by Res(C, c), is the code of length n−w punctured on all the co¨ordinates of I. Proposition 3.2.21 Suppose C is an [n, k, d] code over Fq and c is a codeword of weight w < (qd)/(q − 1). Then Res(C, c) is an [n − w, k − 1, d0 ] code with w . d0 ≥ d − w + q Proof. By replacing C by an equivalent code we may assume without loss of the generality that c = (1, 1, . . . , 1, 0, . . . , 0) where the first w components are equal to 1 and other components are 0. Clearly, the dimension of Res(C, c) is less than or equal to k − 1. If the dimension is strictly less than k − 1, then there must be a nonzero codeword in C of the form x = (x1 , . . . , xw , 0, . . . , 0), where not all the xi are the same. There exists α ∈ Fq such that at least w/q co¨ ordinates of (x1 , . . . , xw ) equal to α. Thus, d ≤ wt(x − αc) ≤ w − w/q = w(q − 1)/q, which contradicts the assumption on w. Hence dim Res(C, c) = k − 1. Next, consider the minimum distance. Let (xw+1 , . . . , xn ) be any nonzero codeword in Res(C, c), and x = (x1 , . . . , xw , xw+1 , . . . , xn ) be a corresponding codeword in C. There exists α ∈ Fq such that at least w/q co¨ordinates of (x1 , . . . , xw ) equal α. Therefore, d ≤ wt(x − αc) ≤ w − w/q + wt((xw+1 , . . . , xn )). Thus every nonzero codeword of Res(C, c) has weight at least d − w + dw/qe. Proof of Theorem 3.2.20. We will prove the theorem by mathematical induction on k. If k = 1, the inequality that we want to prove is n ≥ d, which
3.2. BOUNDS ON CODES
69
is obviously true. Now suppose k > 1. Let c be a codeword of weight d. Using Proposition 3.2.21, Res(C, c) is an [n − d, k − 1, d0 ] code with d0 ≥ dd/qe. Applying the inductive assumption to Res(C, c), we have n−d≥
k−2 X 0 i=0
d qi
≥
k−2 X i=0
d
q i+1
.
The Griesmer bound follows.
3.2.3
Hamming bound
In practical applications, given the length and the minimum distance, the codes which have more codewords (in other words, codes of larger size) are often preferred. A natural question is, what is the maximal possible size of a code. given the length and minimum distance. Denote by Aq (n, d) the maximum number of codewords in any code over Fq (which can be linear or nonlinear) of length n and minimum distance d. The maximum when restricted to linear codes is denoted by Bq (n, d). Clearly Bq (n, d) ≤ Aq (n, d). The following is a wellknown upper bound for Aq (n, d). Remark 3.2.22 Denote the number of vectors in Bt (x) the ball of radius t around a given vector x ∈ Fnq as defined in 2.1.12 by Vq (n, t). Then t X n Vq (n, t) = (q − 1)i i i=0
by Proposition 2.1.13. Theorem 3.2.23 (Hamming or spherepacking bound) Bq (n, d) ≤ Aq (n, d) ≤
qn , Vq (n, t)
where t = b(d − 1)/2c. Proof. Let C be any code over Fq (which can be linear or nonlinear) of length n and minimum distance d. Denote by M the number of codewords of C. Since the distance between any two codewords is greater than or equal to d ≥ 2t + 1, the balls of radius t around the codewords From Pt must be disjoint. Proposition 2.1.13, each of these M balls contain i=0 (q − 1)i ni vectors. The total number of vectors in the space Fnq is q n . Thus, we have M · Vq (n, t) ≤ q n . As C is any code with length n and minimum distance d, we have established the theorem. Definition 3.2.24 The covering radius ρ(C) of a code C of length n over Fq is defined to be the smallest integer s such that [ Bt (c) = Fnq , c∈C
70
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
that is every vector Fnq is in the union of the balls of radius t around the codewords. A code is of covering radius ρ is called perfect if the balls Bρ (c), c ∈ C are mutually disjoint. Theorem 3.2.25 (Spherecovering bound) Let C be a code of length n with M codewords and covering radius ρ. Then M · Vq (n, ρ) ≥ q n . Proof. By definition [
Bρ (c) = Fnq ,
c∈C
Now Bρ (c) = Vq (n, ρ) for all c in C by Proposition 2.1.13. So M ·Vq (n, ρ) ≥ q n . Example 3.2.26 If C = Fnq , then the balls B0 (c) = {c}, c ∈ C cover Fnq and are mutually disjoint. So Fnq is perfect and has covering radius 0. If C = {0}, then the ball Bn (0) covers Fnq and there is only one codeword. Hence C is perfect and has covering radius n. Therefore the trivial codes are perfect. Remark 3.2.27 It is easy to see that ρ(C) = maxn min d(x, c). x∈Fq c∈C
Let e(C) = b(d(C) − 1)/2c. Then obviously e(C) ≤ ρ(C). Let C be code of length n and minimum distance d with more than one codeword. Then C is a perfect code if and only if ρ(C) = e(C). Proposition 3.2.28 The following codes are perfect: (1) the trivial codes, (2) (2e + 1)fold binary repetition code, (3) the Hamming code, (4) the binary and ternary Golay code. Proof. (1) The trivial codes are perfect as shown in Example 3.2.26. (2) The (2e + 1)fold binary repetition code consists of two codewords, has minimum distance d = 2e + 1 and errorcorrecting capacity e. Now 2e+1
2
=
2e+1 X i=0
2e + 1 i
=
e X 2e + 1 i=0
i
e X 2e + 1 + e+1+i i=0
Pe 2e+1 = 22e+1 . Therefore the covering radius and e+1+i = 2e+1 . So 2 i=0 2e+1 i i is e and the code is perfect. (3) From Definition 2.3.13 and Proposition 2.3.14, the qary Hamming code Hr (q) is an [n, k, d] code with n=
qr − 1 , q−1
k = n − r,
and d = 3.
3.2. BOUNDS ON CODES
71
For this code, t = 1, n = k + r, and the number of codewords is M = q k . Thus, n M 1 + (q − 1) = M (1 + (q − 1)n) = M q r = q k+r = q n . 1 Therefore, Hr (q) is a perfect code. (4) It is left to the reader to show that the binary and ternary Golay codes are perfect.
3.2.4
Plotkin bound
The Plotkin bound is an upper bound on Aq (n, d) which is valid when d is large enough comparing with n. Theorem 3.2.29 (Plotkin bound) Let C be an (n, M, d) code over Fq such that qd > (q − 1)n. Then qd . qd − (q − 1)n
M≤ Proof.
We calculate the following sum S=
XX
d(x, y)
x∈C y∈C
in two ways. First, since for any x, y ∈ C and x 6= y, the distance d(x, y) ≥ d, we have S ≥ M (M − 1)d. On the other hand, let M be the M × n matrix consisting of the codewords of C. For i = 1, . .P . , n, let ni,α be the number of times α ∈ Fq occurs in column i of M. Clearly, α∈Fq ni,α = M for any i. Now, we have S=
n X X
ni,α (M − ni,α ) = nM 2 −
i=1 α∈Fq
n X X
n2i,α .
i=1 α∈Fq
Using the CauchySchwartz inequality,
X
n2i,α
α∈Fq
2 1 X ≥ ni,α . q α∈Fq
Thus, S ≤ nM 2 −
n X 1
q
i=1
2 X
ni,α = n(1 − 1/q)M 2 .
α∈Fq
Combining the above two inequalities on S, we prove the theorem.
72
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Example 3.2.30 Consider the simplex code S3 (3), that is the dual code of the Hamming code H3 (3) over F3 of Example 2.3.15. This is an [13, 3, 9] code which has M = 33 = 27 codewords. Every nonzero codeword in this code has Hamming weight 9, and d(x, y) = 9 for any distinct codewords x and y. Thus, qd = 27 > 26 = (q − 1)n. Since qd = 27 = M, qd − (q − 1)n this code achieves the Plotkin bound. Remark 3.2.31 For a code, if all the nonzero codewords have the same weight, we call it a constant weight code; if the distances between any two distinct codewords are same, we call it an equidistant code. For a linear code, it is a constant weight code if and only if it is an equidistant code. From the proof of Theorem 3.2.29, only constant weight and equidistant codes can achieve the Plotkin bound. So the simplex code Sr (q) achieves the Plotkin bound by Proposition 2.3.16. Remark 3.2.32 ***Improved Plotkin Bound in the binary case.***
3.2.5
Gilbert and Varshamov bounds
The Hamming and Plotkin bounds give upper bounds for Aq (n, d) and Bq (n, d). In this subsection, we discuss lower bounds for these numbers. Since Bq (n, d) ≤ Aq (n, d), each lower bound for Bq (n, d) is also a lower bound for Aq (n, d). Theorem 3.2.33 (Gilbert bound) logq (Aq (n, d)) ≥ n − logq (Vq (n, d − 1)) . Proof. Let C be a code over Fq , not necessarily linear of length n and minimum distance d, which has M = Aq (n, d) codewords. If M · Vq (n, d − 1) < q n , then the union of the balls of radius d − 1 of all codewords in C is not equal to Fnq by Proposition 2.1.13. Take x ∈ Fnq outside this union. Then d(x, c) ≥ d for all c ∈ C. So C ∪ {x} is a code of length n with M + 1 codewords and minimum distance d. This contradicts the maximality of Aq (n, d). hence Aq (n, d) · Vq (n, d − 1) ≥ q n . In the following the greedy algorithm one can construct a linear code of length n, minimum distance ≥ d, and dimension k and therefore the number of codewords as large as possible. Theorem 3.2.34 Let n and d be integers satisfying 2 ≤ d ≤ n. If k ≤ n − logq (1 + Vq (n − 1, d − 2)) , then there exists an [n, k] code over Fq with minimum distance at least d.
(3.1)
3.2. BOUNDS ON CODES
73
Proof. Suppose k is an integer satisfying the inequality (3.1), which is equivalent to Vq (n − 1, d − 2) < q n−k . (3.2) (n−k)
We construct by induction the columns h1 , . . . , hn ∈ Fq of an (n − k) × n matrix H over Fq such that every d − 1 columns of H are linearly independent. Choose for h1 any nonzero vector. Suppose that j < n and h1 , . . . , hj are chosen such that any d − 1 of them are linearly independent. Choose hj+1 such that hj+1 is not a linear combination of any d − 2 or fewer of the vectors h1 , . . . , hj . The above procedure is a greedy algorithm. We now prove the correctness of the algorithm, by induction on j. When j = 1, it is trivial that there exists a nonzero vector h1 . Suppose that j < n and any d − 1 of h1 , . . . , hj are linearly independent. The number of different linear combinations of d − 2 or fewer of the h1 , . . . , hj is d−2 X j i=0
d−2 X n−1 (q − 1) ≤ (q − 1)i = Vq (n − 1, d − 2). i i i=0 i
Hence under the condition (3.2), there always exists a vector hj+1 which is not a linear combination of d − 2 or fewer of h1 , . . . , hj . By induction, we find h1 , . . . , hn such that hj is not a linear combination of any d − 2 or fewer of the vectors h1 , . . . , hj−1 . Hence, every d − 1 of h1 , . . . , hn are linearly independent. The null space of H is a code C of dimension at least k and minimum distance at least d by Proposition 2.3.11. Let C 0 be be a subcode of C of dimension k. Then the minimum distance of C 0 is at least d. Corollary 3.2.35 (Varshamov bound) logq Bq (n, d) ≥ n − dlogq (1 + Vq (n − 1, d − 2))e. Proof. The largest integer k satisfying (3.1) of Theorem 3.2.34 is given by the right hand side of the inequality. In the next subsection, we will see that the Gilbert bound and the Varshamov bound are the same asymptotically. In the literature, sometimes any of them is called the GilbertVarshamov bound. The resulting asymptotic bound is called the asymptotic GilbertVarshamov bound.
3.2.6
Exercises
3.2.1 Show that for an arbitrary code, possibly nonlinear, of length n over an alphabet with q elements with M codewords and minim distance d the following form of the Singleton bounds holds: M ≤ q n+1−d . 3.2.2 Let C be an [n, k] code. Let d⊥ be the minimum distance of C ⊥ . Show that d⊥ ≤ k + 1, and that equality holds if and only if C is MDS.
74
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
3.2.3 Give a proof of the formula in Lemma 3.2.9 of the determinant of a Vandermonde matrix 3.2.4 Prove that the code G0 (a) in Proposition 3.2.10 is MDS. 3.2.5 Let C be an [n, k, d] code over Fq . Prove that the number of codewords of minimum weight d is divisible by q − 1 and is at most equal to (q − 1) nd . Show that C is MDS in case equality holds. 3.2.6 Give a proof of Remark 3.2.16. 3.2.7 Give a proof of the formula in Lemma 3.2.18 of the determinant of a Cauchy matrix 3.2.8 Let C be a binary MDS code. If C is not trivial, then it is a repetition code or an even weight code. 3.2.9 [20] ***Show that the code C1 in Proposition 3.2.10 is selforthogonal if n = q and k ≤ n/2. Selfdual *** 3.2.10 [CAS] Take q = 256 in Proposition 3.2.10 and construct the matrices G10 (a) and G10 (a0 ). Construct the corresponding codes with these matrices as generator matrices. Show that these codes are MDS by using commands IsMDSCode in GAP and IsMDS in Magma. 3.2.11 Five a proof of the statements made in Remark 3.2.27. 3.2.12 Show that the binary and ternary Golay codes are perfect. 3.2.13 Let C be the binary [7, 4, 3] Hamming code. Let D be the F4 linear code with the same generator matrix as C. Show that ρ(C) = 2 and ρ(D) = 3. 3.2.14 Let C be an [n, k] code. let H be a parity check matrix of C. Show that ρ(C) is the minimal number ρ such that xT is a linear combination of at . Show that the redundancy bound: most ρ columns of H for every x ∈ Fn−k q ρ(C) ≤ n − k. 3.2.15 Give an estimate of the complexity of finding a code satisfying (3.1) of Theorem 3.2.34 by the greedy algorithm.
3.3
Asymptotically good codes
***
3.3.1
Asymptotic GibertVarshamov bound
In practical applications, sometimes long codes are preferred. For an infinite family of codes, a measure of the goodness of the family of codes is whether the family contains socalled asymptotically good codes.
3.3. ASYMPTOTICALLY GOOD CODES
75
Definition 3.3.1 An infinite sequence C = {Ci }∞ i=1 of codes Ci with parameters [ni , ki , di ] is called asymptotically good, if lim ni = ∞, and i→∞
R(C) = lim inf i→∞
ki >0 ni
and
δ(C) = lim inf i→∞
di > 0. ni
Using the bounds that we introduced in the previous subsection, we will prove the existence of asymptotically good codes. Definition 3.3.2 Define the qary entropy function Hq on [0, (q − 1)/q] by x logq (q − 1) − x logq x − (1 − x) logq (1 − x), if 0 < x ≤ q−1 q , Hq (x) = 0, if x = 0. The function Hq (x) is increasing on [0, (q − 1)/q]. The function H2 (x) is the entropy function. Lemma 3.3.3 Let q ≥ 2 and 0 ≤ θ ≤ (q − 1)/q. Then lim 1 n→∞ n Proof.
logq Vq (n, bθnc) = Hq (θ).
Since θn − 1 < bθnc ≤ θn, we have lim 1 bθnc n→∞ n
lim 1 n→∞ n
logq (1 + bθnc) = 0.
(3.3)
Now we are going to prove the following equality n lim n1 logq ( bθnc ) = −θ logq θ − (1 − θ) logq (1 − θ).
(3.4)
=θ
and
n→∞
To this end we introduce the littleo notation and use the Stirling Fomula. log n! = n + 21 log n − n + 21 log(2n) + o(1), (n → ∞). For two functions f (n) and g(n), f (n) = o(g(n)) means for all c > 0 there exists some k > 0 such that 0 ≤ f (n) < cg(n) for all n ≥ k. The value of k must not depend on n, but may depend on c. Thus, o(1) is a function of n, which tends to 0 when n → ∞. By the Stirling Fomula, we have n 1 1 n logq ( bθnc ) = n (logq n! − logq bθnc! − logq (n − bθnc)!) = = logq n − θ logq bθnc − (1 − θ) logq (n − bθnc) + o(1) = = −θ logq θ − (1 − θ) logq (1 − θ) + o(1). Thus (3.4) follows. From the definition we have n bθnc ≤ Vq (n, bθnc) ≤ (1 + bθnc) bθnc (q − 1)
n bθnc
(q − 1)bθnc .
From the righthand part of (3.5) we have logq Vq (n, bθnc) ≤ logq (1 + bθnc) + logq (
n bθnc
) + bθnc logq (q − 1).
(3.5)
76
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
By (3.3) and (3.4), we have lim 1 n→∞ n
logq Vq (n, bθnc) ≤ θ logq (q − 1) − θ logq θ − (1 − θ) logq (1 − θ).
(3.6)
The right hand side is equal to Hq (θ) by definition. Similarly, using the lefthand part of (3.5) we prove lim 1 n→∞ n
logq Vq (n, bθnc) ≥ Hq (θ).
(3.7)
Combining (3.6) and (3.7), we obtain the result.
Now we are ready to prove the existence of asymptotically good codes. Specifically, we have the following stronger result. Theorem 3.3.4 Let 0 < θ < (q − 1)/q. Then there exists an asymptotically good sequence C of codes such that δ(C) = θ and R(C) = 1 − Hq (θ). Proof. Let 0 < θ < (q − 1)/q. Let {ni }∞ i=1 be a sequence of positive integers with lim ni = ∞, for example, we can take ni = i. Let di = bθni c and i→∞
ki = ni − logq (1 + Vq (ni − 1, di − 2)) . By Theorem 3.2.34 and the Varshamov bound, there exists a sequence C = {Ci }∞ i=1 of [ni , ki , di ] codes Ci over Fq . Clearly δ(C) = θ > 0 for this sequence of qary codes. We now prove R(C) = 1−Hq (θ). To this end, we first use Lemma 3.3.3 to prove the following equation: lim 1 i→∞ ni
logq (1 + Vq (ni − 1, di − 2)) = Hq (θ).
(3.8)
First, we have 1 + Vq (ni − 1, di − 2) ≤ Vq (ni , di ). By Lemma 3.3.3, we have lim sup n1i logq (1 + Vq (ni − 1, di − 2)) ≤ lim
1 i→∞ ni
i→∞
logq Vq (ni , di ) = Hq (θ). (3.9)
Let δ = max{1, d3/θe}, mi = ni − δ and ei = bθmi c. Then, di − 2
= bθni c − 2 > θni − 3 ≥ θ(ni − δ) = θmi ≥ ei
and ni − 1 ≥ ni − δ = mi . Therefore 1 ni 1 mi +δ
logq (1 + Vq (ni − 1, di − 2)) ≥
logq Vq (ei , mi ) =
1 mi
logq Vq (ei , mi ) ·
mi mi +δ
3.3. ASYMPTOTICALLY GOOD CODES
77
Since δ is a constant and mi → ∞, we have lim mi /(mi + δ) = 1. Again by i→∞
Lemma 3.3.3, we have that the right hand side of the above inequality tends to Hq (θ). It follows that lim inf i→∞
1 ni
logq (1 + Vq (ni − 1, di − 2)) ≥ Hq (θ).
(3.10)
By inequalities (3.9) and (3.10), we obtain (3.8). Now by (3.8), we have ki = 1 − lim n1i logq (1 + Vq (ni − 1, di − 2)) = 1 − Hq (θ), i→∞ ni i→∞
R(C) = lim
and 1 − Hq (θ) > 0, since θ < (q − 1)/q.
So the sequence C of codes satisfying Theorem 3.3.4 is asymptotically good. However, asymptotically good codes are not necessarily codes satisfying the conditions in Theorem 3.3.4. The number of codewords increases exponentially with the code length. So for large n, instead of Aq (n, d) the following parameter is used α(θ) = lim sup n→∞
logq Aq (n, θn) . n
Since Aq (n, θn) ≥ Bq (n, θn) and for a linear code C the dimension k = logq C, a straightforward consequence of Theorem 3.3.4 is the following asymptotic bound. Corollary 3.3.5 (Asymptotically GilbertVarshamov bound) (q − 1)/q. Then α(θ) ≥ 1 − Hq (θ).
Let 0 ≤ θ ≤
Not that both the Gilbert and Varshamov bound that we introduced in the previous subsection imply the asymptotically GilbertVarshamov bound. ***Manin αq (δ) is a decreasing continuous function. picture ***
3.3.2
Some results for the generic case
In this section we investigate the parameters of ”generic” codes. It turnes out that almost all codes have the same minimum distance and covering radius with the length n and dimension k = nR, 0 < R < 1 fixed. By ”almost all” we mean that as n tends to infinity, the fraction of [n, nR] codes that do not have ”generic” minimum distance and covering radius tends to 0. Theorem 3.3.6 Let 0 < R < 1, then almost all [n, nR] codes over Fq have • minimum distance d0 := nHq−1 (1 − R) + o(n) • covering radius d0 (1 + o(1)) Here Hq is a qary entropy function. Theorem 3.3.7 *** it gives a number of codewords that project on a given kset. Handbook of Coding theory, p.691. ***
78
3.3.3
CHAPTER 3. CODE CONSTRUCTIONS AND BOUNDS
Exercises
***???***
3.4
Notes
Puncturing and shortening at arbitrary sets of positions and the duality theorem is from Simonis [?]. Golay code, Turyn [?] construction, Pless handbook [?] . MacWillimas In 1973 by J. H. van Lint and A. Tietavainen theorem in regards to perfect codes: ***–puncturing gives the binary [23,12,7] Golay code, which is cyclic. –automorphism group of (extended) Golay code. – (ext4ended) ternary Golay code. – designs and Golay codes. – lattices and Golay codes.*** ***repeated decoding of product code (HoeholdtJustesen). ***Singleton defect s(C) = n + 1 − k − d s(C) ≥ 0 and equality holds if and only if C is MDS. s(C) = 0 if and only if s(C ⊥ ) = 0. Example where s(C) = 1 and s(C ⊥ ) > 1. Almost MDS and near MDS. Genus g = max{s(C), s(C ⊥ ) } in 4.1. If k ≥ 2, then d ≤ q(s + 1). If k ≥ 3 and d = q(s + 1), then s + 1 ≤ q. FaldumWillems, de Boer, DodunekovLangev, relation with Griesmer bound***
Chapter 4
Weight enumerator Relinde Jurrius, Ruud Pellikaan and XinWen Wu *** The weight enumerator of a code is introduced and a random coding argument gives a proof of Shannon’s theorem.
4.1
Weight enumerator
Apart from the minimum Hamming weight, a code has other important invariants. In this section, we will introduce the weight spectrum and the generalized weight spectrum of a code. ***applications***
4.1.1
Weight spectrum
The weight spectrum of a code is an important invariant, which provides useful information for both the code structure and practical applications of the code. Definition 4.1.1 Let C be a code of length n. The weight spectrum, also called the weight distribution is the following set {(w, Aw )  w = 0, 1, . . . , n} where Aw denotes the number of codewords in C of weight w. The socalled weight enumerator is a convenient representation of the weight spectrum. Definition 4.1.2 The weight enumerator of C is defined as the following polynomial n X WC (Z) = Aw Z w . w=0
The homogeneous weight enumerator of C is defined as WC (X, Y ) =
n X w=0
79
Aw X n−w Y w .
80
CHAPTER 4. WEIGHT ENUMERATOR
Remark 4.1.3 Note that WC (Z) and WC (X, Y ) are equivalent in representing the weight spectrum. They determine each other uniquely by the following equations WC (Z) = WC (1, Z) and WC (X, Y ) = X n WC (X −1 Y ). Given the weight enumerator or the homogeneous weight enumerator, the weight spectrum is determined completely by the coefficients. Clearly, the weight enumerator and homogeneous weight enumerator can be written in another form, that is X
WC (Z) =
Z wt(c)
(4.1)
c∈C
and WC (X, Y ) =
X
X n−wt(c) Y wt(c) .
(4.2)
c∈C
Example 4.1.4 The zero code has one codeword, and its weight is zero. Hence n the homogeneous weight enumerator of this code is W{0} (X, Y ) = Xw . The n n number of words of weight w in the trivial code Fq is Aw = w (q − 1) . So n X n WFnq (X, Y ) = (q − 1)w X n−w Y w = (X + (q − 1)Y )n . w w=0
Example 4.1.5 The nfold repetition code C has homogeneous weight enumerator WC (X, Y ) = X n + (q − 1)Y n . In the binary case its dual is the even weight code. Hence it has homogeneous weight enumerator bn/2c
WC ⊥ (X, Y ) =
X t=0
n 1 X n−2t Y 2t = ((X + Y )n + (X − Y )n ) . 2 2t
Example 4.1.6 The nonzero entries of the weight distribution of the [7,4,3] binary Hamming code are given by A0 = 1, A3 = 7, A4 = 7, A7 = 1, as is seen by inspecting the weights of all 16 codewords. Hence its homogeneous weight enumerator is X 7 + 7X 4 Y 3 + 7X 3 Y 4 + Y 7 . Example 4.1.7 The simplex code Sr (q) is a constant weight code by Proposition 2.3.16 with parameters [(q r − 1)/(q − 1), r, q r−1 ]. Hence its homogeneous weight enumerator is WSr (q) (X, Y ) = X n + (q r − 1)X n−q
r−1
Yq
r−1
.
4.1. WEIGHT ENUMERATOR
81
Remark 4.1.8 Let C be a linear code. Then A0 = 1 and the minimum distance d(C) which is equal to the minimum weight, is determined by the weight enumerator as follows: d(C) = min{ i  Ai 6= 0, i > 0 }. It also determines the dimension k(C), since WC (1, 1) =
n X
Aw = q k(C) .
w=0
Example 4.1.9 The Hamming code over Fq of length n = (q r − 1)/(q − 1) has parameters [n, n − r, 3] and is perfect with covering radius 1 by Proposition 3.2.28. The following recurrence relation holds for the weight distribution (A0 , A1 , . . . , An ) of these codes: n (q − 1)w = Aw−1 (n − w + 1)(q − 1) + Aw (1 + w(q − 2)) + Aw+1 (w + 1) w for all w. This is seen as follows. Every word y of weight w is at distance at most 1 to a unique codeword c, and such a codeword has possible weights w − 1, w or w + 1. Let c be a codeword of weight w − 1, then there are n − w + 1 possible positions j in the complement of the support of c where cj = 0 could be changed into a nonzero element in order to get the word y of weight w. Similarly, let c be a codeword of weight w, then either y = c or there are w possible positions j in the support of c where cj could be changed into another nonzero element to get y. Finally, let c be a codeword of weight w + 1, then there are w + 1 possible positions j in the support of c where cj could be changed into zero to get y. Multiply the recurrence relation with Z w and sum over w. Let W (Z) = P w w Aw Z . Then (1+(q−1)Z)n = (q−1)nZW (Z)−(q−1)Z 2 W 0 (Z)+W (Z)+(q−2)ZW 0 (Z)+W 0 (Z), since
P n w w P w w (q − 1) Z w (w + 1)A w P w+1 Z w w wAw Z P w (w − 1)A w−1 Z w
= (1 + (q − 1)Z)n , = W 0 (Z), = ZW 0 (Z), = Z 2 W 0 (Z).
Therefore W (Z) satisfies the following ordinary first order differential equation: ((q − 1)Z 2 − (q − 2)Z − 1)W 0 (Z) − (1 + (q − 1)nZ)W (Z) + (1 + (q − 1)Z)n = 0. The corresponding homogeneous differential equation is separable: 1 + (q − 1)nZ W 0 (Z) = W (Z) (q − 1)Z 2 − (q − 2)Z − 1 and has general solution: Wh (Z) = C(Z − 1)q
r−1
((q − 1)Z + 1)n−q
r−1
,
82
CHAPTER 4. WEIGHT ENUMERATOR
where C is some constant. A particular solution is given by: P (Z) =
1 (1 + (q − 1)Z)n . qr
Therefore the solution that satisfies W (0) = 1 is equal to W (Z) =
r−1 r−1 1 qr − 1 (1 + (q − 1)Z)n + (Z − 1)q ((q − 1)Z + 1)n−q . r r q q
To prove that the weight enumerator of a perfect code is completely determined by its parameters we need the following lemma. Lemma 4.1.10 The number Nq (n, v, w, s) of words in Fnq of weight w that are at distance s from a given word of weight v does not depend on the chosen word and is equal to X n−v v v−i Nq (n, v, w, s) = ((q − 2)i q − 1)k . k i j i+j+k=s,v+k−j=w
Proof. Consider a given word x of weight v. Let y be a word of weight w and distance s to x. Suppose that y has k nonzero coordinates in the complement of the support of x, j zero coordinates in the support of x, and i nonzero coordinates in the support of x that are distinct form the coordinates of x. Then s = d(x, y) = i + j + k and wt(y) = w = v + k − j. There are n−v possible subsets of k elements in the complement of the support k of x and there are (q − 1)k possible choices for the nonzero symbols at the corresponding k coordinates. There are vi possible subsets of i elements in the support of x and there are (q − 2)i possible choices of the symbols at those i positions that are distinct from the coordinates of x. There are v−i possible subsets of j elements in the support of x that are zero j at those positions. Therefore X n−v v v−i Nq (n, v, w, s) = (q − 1)k (q − 2)i . k i j i+j+k=s,v+k−j=w
Remark 4.1.11 Let us consider special values of Nq (n, v, w, s). If s = 0, then Nq (n, v, w, 0) = 1 if v = w and Nq (n, v, w, 0) = 1 otherwise. If s = 1, then (n − w + 1)(q − 1) if v = w − 1, w(q − 2) if v = w, Nq (v, w, 1) = w + 1 if v = w + 1, 0 otherwise. Proposition 4.1.12 Let C be a perfect code of length n and covering radius ρ and weight distribution (A0 , A1 , . . . , An ). Then w+ρ X n (q − 1)w = Av w v=w−ρ
ρ X s=v−w
Nq (n, v, w, s) for all w.
4.1. WEIGHT ENUMERATOR
83
Proof. Define the set N (w, ρ) = { (y, c)  y ∈ Fnq , wt(y) = w, c ∈ C, d(y, c) ≤ ρ }. (1) For every y in Fnq of weight w there is a unique codeword c in C that has distance at most ρ to y, since C is perfect with covering radius ρ. Hence n N (w, ρ) = (q − 1)w . w (2) On the other hand consider the fibre of the projection on the second factor: N (c, w, ρ) = { y ∈ Fnq  wt(y) = w, d(y, c) ≤ ρ }. for a given codeword c in C. If c has weight v, then N (c, w, ρ) =
ρ X
Nq (n, v, w, s).
s=0
Hence N (w, ρ) =
n X v=0
Av
ρ X
Nq (n, v, w, s)
s=0
Notice that wt(x) − wt(y) ≤ d(x, y). Hence Nq (n, v, w, s) = 0 if v − w > s. Combining (1) and (2) gives the desired result. Example 4.1.13 The ternary Golay code has parameters [11, 6, 5] and is perfect with covering radius 2 by Proposition 3.2.28. We leave it as an exercise to show by means of the recursive relations of Proposition 4.1.12 that the weight enumerator of this code is given by 1 + 132Z 5 + 132Z 6 + 330Z 8 + 110Z 9 + 24Z 11 . Example 4.1.14 The binary Golay code has parameters [23, 12, 7] and is perfect with covering radius 3 by Proposition 3.2.28. We leave it as an exercise to show by means of the recursive relations of Proposition 4.1.12 that the weight enumerator of this code is given by 1 + 253Z 7 + 506Z 8 + 1288Z 11 + 1288Z 12 + 506Z 15 + 203Z 16 + Z 23 .
4.1.2
Average weight enumerator
Remark 4.1.15 The computation of the weight enumerator of a given code is most of the time hard. For the perfect codes such as the Hamming codes and the binary and ternary Golay codes this is left as exercises to the reader and can be done by using Proposition 4.1.12. In Proposition 4.4.22 the weight distribution of MDS codes is treated. The weight enumerator of only a few infinite families of codes is known. On the other hand the average weight enumerator of a class of codes is very often easy to determine.
84
CHAPTER 4. WEIGHT ENUMERATOR
Definition 4.1.16 Let C be a nonempty class of codes over Fq of the same length. The average weight enumerator of C is defined as the average of all WC with C ∈ C: 1 X WC (Z), WC (Z) = C C∈C
and similarly for the homogeneous average weight enumerator WC (X, Y ) of this class. Definition 4.1.17 A class C of [n, k] codes over Fq is called balanced if there is a number N (C) such that N (C) = { C ∈ C  y ∈ C } for every nonzero word y in Fnq Example 4.1.18 The prime example of a class of balanced codes is the set C[n, k]q of all [n, k] codes over Fq . ***Other examples are:*** Lemma 4.1.19 Let C be a balanced class of [n, k] codes over Fq . Then N (C) = C
qk − 1 . qn − 1
Proof. Compute the number of elements of the set of pairs { (y, C)  y 6= 0, y ∈ C ∈ C } in two ways. In the first place by keeping a nonzero y in Fnq fixed, and letting C vary in C such that y ∈ C. This gives the number (q n − 1)N (C), since C is balanced. Secondly by keeping C in C fixed, and letting the nonzero y in C vary. This gives the number C(q k − 1). This gives the desired result, since both numbers are the same. Proposition 4.1.20 Let f be a function on Fnq with values in a complex vector space. Let C be a balanced class of [n, k] codes over Fq . Then 1 X X qk − 1 f (c) = n C q −1 ∗ C∈C c∈C
X
f (v),
∗ v∈(Fn q)
where C ∗ denotes the set of all nonzero elements of C. Proof. By interchanging the order of summation we get X X X X f (v) = f (v) 1. C∈C v∈C ∗
∗ v∈(Fn q)
v∈C∈C
The last summand is constant and equals N (C), by assumption. Now the result follows by the computation of N (C) in Lemma 4.1.20. Corollary 4.1.21 Let C be a balanced class of [n, k] codes over Fq . Then n qk − 1 X n WC (Z) = 1 + n (q − 1)w Z w . q − 1 w=1 w Proof. Apply Proposition 4.1.20 to the function f (v) = Z wt(v) , and use (4.1) of Remark 4.1.3. ***GV bound for a collection of balanced codes, Loeliger***
4.1. WEIGHT ENUMERATOR
4.1.3
85
MacWilliams identity
Although there is no apparent relation between the minimum distances of a code and its dual, the weight enumerators satisfy the MacWilliams identity MacWilliams identity. Theorem 4.1.22 Let C be an [n, k] code over Fq . Then WC ⊥ (X, Y ) = q −k WC (X + (q − 1)Y, X − Y ). The following simple result is useful to the proof of the MacWilliams identity. Lemma 4.1.23 Let C be an [n, k] linear code over Fq . Let v be an element of Fnq , but not in C ⊥ . Then, for every α ∈ Fq , there exist exactly q k−1 codewords c such that c · v = α. Proof. Consider the map ϕ : C → Fq defined by ϕ(c) = c · v. This is a linear map. The map is not constant zero, since v is not in C ⊥ . Hence every fibre ϕ−1 (α) consists of the same number of elements q k−1 for all α ∈ Fq . To prove Theorem 4.1.22, we introduce the characters of Abelian groups and prove some lemmas. Definition 4.1.24 Let (G, +) be an abelian group with respect to the addition +. Let (S, ·) be the multiplicative group of the complex numbers of modulus one. A character χ of G is a homomorphism from G to S. So, χ is a mapping satisfying χ(g1 + g2 ) = χ(g1 ) · χ(g2 ), for all g1 , g2 ∈ G. If χ(g) = 1 for all elements g ∈ G, we call χ the principal character. Remark 4.1.25 For any character χ we have χ(0) = 1, since χ(0) is not zero and χ(0) = χ(0 + 0) = χ(0)2 . If G is a finite group of order N and χ is a character of G, then χ(g) is an N th root of unity for all g ∈ G, since 1 = χ(0) = χ(N g) = χ(g)N . Lemma 4.1.26 Let χ be a character of a finite group G. Then X G when χ is a principal character, χ(g) = 0 otherwise. g∈G
Proof. The result is trivial when χ is principal. Now suppose χ is not principal. Let h ∈ G such that χ(h) 6= 1. We have X X X χ(h) χ(g) = χ(h + g) = χ(g), g∈G
g∈G
g∈G
P since the map g 7→ h + g is a permutation of G. Hence, (χ(h) − 1) χ(g) = 0, g∈G P which implies χ(g) = 0. g∈G
86
CHAPTER 4. WEIGHT ENUMERATOR
Definition 4.1.27 Let V be a complex vector space. Let f : Fnq → V be a mapping on Fnq with values in V . Let χ be a character of Fq . The Hadamard transform fˆ of f is defined as X χ(u · v)f (v). fˆ(u) = v∈Fn q
Lemma 4.1.28 Let f : Fnq → V be a mapping on Fnq with values in the complex vector space V . Let χ be a nonprincipal character of Fq . Then, X X fˆ(c) = C f (v). v∈C ⊥
c∈C
Proof.
By definition, we have X
fˆ(c) =
c∈C
X X
χ(c · v)f (v) =
c∈C v∈Fn q
X
f (v)
v∈Fn q
X
f (v)
v∈C ⊥
X
X
X
f (v)
⊥ v∈Fn q \C
f (v) +
v∈C ⊥
The results follows, since X
χ(c · v) =
c∈C
χ(c · v) +
c∈C
C
X
X
χ(c · v) = q k−1
c∈C
X
χ(c · v) =
c∈C
f (v)
⊥ v∈Fn q \C
X
X
χ(c · v).
c∈C
χ(α) = 0
α∈Fq
for any v ∈ Fnq \ C ⊥ and χ not principal, by Lemmas 4.1.23 and 4.1.26. Proof of Theorem 4.1.22. sider the following mapping
Let χ be a nonprincipal character of Fq . Con
f (v) = X n−wt(v) Y wt(v) from Fnq to the vector space of polynomials in the variables X and Y with complex coefficients. Then X X f (v) = X n−wt(v) Y wt(v) = WC ⊥ (X, Y ), v∈C ⊥
v∈C ⊥
by applying (4.2) of Remark 4.1.3 to C ⊥ . Let c = (c1 , . . . , cn ) and v = (v1 , . . . , vn ). Define wt(0) = 0 and wt(α) = 1 for all nonzero α ∈ Fq . Then wt(v) = wt(v1 ) + · · · + wt(vn ). The Hadamard transform fˆ(c) is equal to X χ(c · v)X n−wt(v) Y wt(v) = v∈Fn q
4.1. WEIGHT ENUMERATOR X
87
X n−wt(v1 )−···−wt(vn ) Y wt(v1 )+···+wt(vn ) χ(c1 v1 + · · · + cn vn ) =
v∈Fn q
X
n
wt(v) n X Y Y v∈Fn q i=1
X
χ(ci vi ) =
wt(v) n X Y Y X χ(ci v). X i=1 n
v∈Fq
If ci 6= 0, then X Y wt(v) Y Y X χ(α) = 1 − , χ(ci v) = 1 + X X X ∗
v∈Fq
α∈Fq
by Lemma 4.1.26. Hence Y X Y wt(v) 1 + (q − 1) X χ(ci v) = Y 1− X X
v∈Fq
if ci = 0, if ci = 6 0.
Therefore fˆ(c) is equal to wt(c) n−wt(c) Y Y Xn 1 − 1 + (q − 1) = X X (X − Y )wt(c) (X + (q − 1)Y )n−wt(c) . Hence X
fˆ(c) =
c∈C
X
U n−wt(c) V wt(c) = WC (U, V ),
c∈C
by (4.2) of Remark 4.1.3 with the substitution U = X +(q−1)Y and V = X −Y . It is shown that on the one hand X f (v) = WC ⊥ (X, Y ), v∈C ⊥
and on the other hand X
fˆ(c) = WC (X + (q − 1)Y, X − Y ),
c∈C
The results follows by Lemma 4.1.28 on the Hadmard transform.
Example 4.1.29 The zero code C has homogeneous weight enumerator X n and its dual Fnq has homogeneous weight enumerator (X + (q − 1)Y )n , by Example 4.1.4, which is indeed equal to q 0 WC (X + (q − 1)Y, X − Y ) and confirms MacWilliams identity. Example 4.1.30 The nfold repetition code C has homogeneous weight enumerator X n + (q − 1)Y n and the homogeneous weight enumerator of its dual code in the binary case is 21 ((X + Y )n + (X − Y )n ), by Example 4.1.5, which is
88
CHAPTER 4. WEIGHT ENUMERATOR
equal to 2−1 WC (X + Y, X − Y ), confirming the MacWilliams identity for q = 2. For arbitrary q we have WC ⊥ (X, Y ) = q −1 WC (X + (q − 1)Y, X − Y ) = q −1 ((X + (q − 1)Y )n + (q − 1)(X − Y )n ) = n X n (q − 1)w + (q − 1)(−1)w w=0
w
q
X n−w Y w .
Example 4.1.31 ***dual of a balanced class of codes, C ⊥ balanced?*** Definition 4.1.32 An [n, k] code C over Fq is called formally selfdual if C and C ⊥ have the same weight enumerator. Remark 4.1.33 ***A quasi selfdual code is formally selfdual, existence of an asymp. good family of codes***
4.1.4
Exercises
4.1.1 Compute the weight spectrum of the dual of the qary nfold repetition code directly, that is without using MacWilliams identity. Compare this result with Example 4.1.30. 4.1.2 Check MacWilliams identity for the binary [7, 4, 3] Hamming code and its dual the [7, 3, 4] simplex code. 4.1.3 Compute the weight enumerator of the Hamming code Hr (q) by solving the given differential equation as given in Example 4.1.9. 4.1.4 Compute the weight enumerator of the ternary Golay code as given in Example 4.1.13. 4.1.5 Compute the weight enumerator of the binary Golay code as given in Example 4.1.14. 4.1.6 Consider the quasi selfdual code with generator matrix (Ik Ik ) of Exercise 2.5.8. Show that its weight enumerator is equal (X 2 + (q − 1)Y 2 )k . Verify that this code is formally selfdual. 4.1.7 Let C be the code over Fq , with q even, with generator matrix H of Example 2.2.9. For which q does this code contain a word of weight 7 ?
4.2
Error probability
*** Some introductory results on the error probability of correct decoding up to half the minimum distance were given in Section ??. ***
4.2. ERROR PROBABILITY
4.2.1
89
Error probability of undetected error
*** Definition 4.2.1 Consider the qary symmetric channel where the receiver checks whether the received word r is a codeword or not, for instance by computing wether HrT is zero or not for a chosen parity check matrix H, and asks for retransmission in case r is not a codeword. See Remark 2.3.2. Now it may occur that r is again a codeword but not equal to the codeword that was sent. This is called an undetected error . Proposition 4.2.2 Let WC (X, Y ) be the weigh enumerator of the code C. Then the probability of undetected error on a qary symmetric channel with crossover probability p is given by p − (1 − p)n . Pue (p) = WC 1 − p, q−1 Proof. Every codeword has the same probability of transmission and the code is linear. So without loss of generality we may assume that the zero word is sent. Hence X 1 X X P (yx) = P (y0). Pue (p) = C x∈C x6=y∈C
06=y∈C
If the received codeword y has weight w, then w symbols are changed and the w p remaining n − w symbols remained the same. So P (y0) = (1 − p)n−w q−1 by Remark 2.4.15. Hence w n X p Aw (1 − p)n−w Pue (p) = . q−1 w=1 Substituting X = 1 − p and Y = p/(q − 1) in WC (X, Y ) gives the desired result, since A0 = 1. Remark 4.2.3 Now Pretr (p) = 1 − Pue (p) is the probability of retransmission. Example 4.2.4 Let C be the binary triple repetition code. Then Pue (p) = p3 , since WC (X, Y ) = X 3 + Y 3 by Example 4.1.5. Example 4.2.5 Let C be the [7, 4, 3] Hamming code. Then Pue (p) = 7p3 − 21p4 + 21p5 − 7p6 + p7 by Example 4.1.6.
4.2.2
Probability of decoding error
Remember that in Lemma 4.1.10 a formula was derived for Nq (n, v, w, s), the number of words in Fnq of weight w that are at distance s from a given word of weight v.
90
CHAPTER 4. WEIGHT ENUMERATOR
Proposition 4.2.6 The probability of decoding error of a decoder that corrects up to t errors with 2t + 1 ≤ d of a code C of minimum distance d on a qary symmetric channel with crossover probability p is given by w t X n n X X p (1 − p)n−w Pde (p) = Av Nq (n, v, w, s). q−1 s=0 v=1 w=0 Proof. This is left as an exercise.
Example 4.2.7 ...........
4.2.3
Random coding
***ML (maximum likelyhood) decoding = MD (minimum distance or nearest neighbor) decoding for the BSC.***
Proposition 4.2.8 ***...*** p Perr (p) = WC (γ) − 1, where γ = 2 p(1 − p). Proof. ....
Theorem 4.2.9 ***Shannon’s theorem for random codes*** Proof. ***...***
4.2.4
Exercises
4.2.1 ***Give the probability of undetected error for the code ....*** 4.2.2 Give a proof of Proposition 4.2.6. 4.2.3 ***Give the probability of decoding error and decoding failure for the code .... for a decoder correcting up to ... errors.***
4.3
Finite geometry and codes
***Intro***
4.3.1
Projective space and projective systems
The notion of a linear code has a geometric equivalent in the concept of a projective system which is a set of points in projective space. Remark 4.3.1 The affine line A over a field F is nothing else than the field F. The projective line P is an extension of the affine line by one point at infinity. · · · − − − − − − − − − − − − − − − − − − − · · · ∪ {∞}
4.3. FINITE GEOMETRY AND CODES
91
The elements are fractions (x0 : x1 ) with x0 , x1 elements of a field F not both zero, and the fraction (x0 : x1 ) is equal to (y0 : y1 ) if and only if (x0 , x1 ) = λ(y0 : y1 ) for some λ ∈ F∗ . The point (x0 : x1 ) with x0 6= 0 is equal to (1 : x1 /x0 ) and corresponds to the point x1 /x0 ∈ A. The point (x0 : x1 ) with x0 = 0 is equal to (0 : 1) and is the unique point at infinity. The notation P(F) and A(F) is used to emphasis that the elements are in the field F. The affine plane A2 over a field F consists of points and lines. The points are in F2 and the lines are the subsets of the vorm { a + λv  λ ∈ F } with v 6= 0, in a parametric explicit description. A line is alternatively given by an implicit description by means of an equation aX + bY + c = 0, with a, b, c ∈ F not all zero. Every two distinct points are contained in exactly one line. Two lines are either parallel, that is they coincide or do not intersect, or they intersect in exactly one point. If F is equal to the finite field Fq , then there are q 2 points and q 2 + q lines, and every line consists of q points, and the number of lines though a given point is q + 1. Being parallel defines an equivalence relation on the set of lines in the affine plane, and every equivalence or parallel class of a line l defines a unique point at infinity Pl . So Pl = Pm if and only if l and m are parallel. In this way the affine plane is extended to the projective plane P2 by adding the points at infinity Pl . A line in the projective plane is a line l in the affine plane extended with its point at infinity Pl or the line at infinity, consisting of all the points at infinity. Every two distinct points in P2 are contained in exactly one line, and two distinct lines intersect in exactly one point. If F is equal to the finite field Fq , then there are q 2 + q + 1 points and the same number of lines, and every line consists of q +1 points, and the number of lines though a given point is q +1. ***picture*** Another model of the projective plane can be obtained as follows. Consider the points of the affine plane as the plane in three space F3 with coordinates (x, y, z) given by the equation Z = 1. Every point (x, y, 1) in the affine plane corresponds with a unique line in F3 through the origin parameterized by λ(x, y, 1), λ ∈ F. Conversely, a line in F3 through the origin parameterized by λ(x, y, z), λ ∈ F, intersects the affine plane in the unique point (x/z, y/z, 1) if z 6= 0, and corresponds to the unique parallel class Pl of the line l in the affine plane with equation xY = yX if z = 0. Furthermore every line in the affine plane corresponds with a unique plane through the origin in F3 , and conversely every plane through the origin in F3 with equation aX + bY + cZ = 0 intersects the affine plane in the unique line with equation aX + bY + c = 0 if not both a = 0 and b = 0, or corresponds to the line at infinity if a = b = 0. ***picture*** An Frational point of the projective plane is a line through the origin in F3 . Such a point is determined by a threetuple (x, y, z) ∈ F3 , not all of them being zero. A scalar multiple determines the same point in the projective plane. This defines an equivalence relation ≡ by (x, y, z) ≡ (x0 , y 0 , z 0 ) if and only if there exists a nonzero λ ∈ F such that (x, y, z) = λ(x0 , y 0 , z 0 ). The equivalence class with representative (x, y, z) is denoted by (x : y : z), and x, y and z are called homogeneous coordinates of the point. The set of all projective points (x : y : z),
92
CHAPTER 4. WEIGHT ENUMERATOR
with x, y, z ∈ F not all zero, is called the projective plane over F. The set of Frational projective points is denoted by P2 (F). A line in the projective plane that is defined over F is a plane through the origin in F3 . Such a line has a homogeneous equation aX + bY + cZ = 0 with a, b, c ∈ F not all zero. The affine plane is embedded in the projective plane by the map (x, y) 7→ (x : y : 1). The image is the subset of all projective points (x : y : z) such that z 6= 0. The line at infinity is the line with equation Z = 0. A point at infinity of the affine plane is a point on the line at infinity in the projective plane. Every line in the affine plane intersects the line at infinity in a unique point and all lines in the affine plane which are parallel, that is to say which do not intersect in the affine plane, intersect in the same point at infinity. The above embedding of the affine plane in the projective plane is standard, but the mappings (x, z) 7→ (x : 1 : z) and (y, z) 7→ (1 : y : z) give two alternative embeddings of the affine plane. The images are the complement of the line Y = 0 and X = 0, respectively. Thus the projective plane is covered with three copies of the affine plane. Definition 4.3.2 An affine subspace of Fr of dimension s is a subset of the form { a + λ1 v1 + · · · + λs vs  λi ∈ F, i = 1, . . . , s }, where a ∈ Fr , and v1 , . . . , vs is a linearly independent set of vectors in Fr , and r − s is called the codimension of the subspace. The affine space of dimension r over a field F, denoted by Ar (F) consists of all affine subsets of Fr . The elements of Fr are called points of the affine space. Lines and planes are the linear subspaces of dimension one and two, respectively. A hyperplane is an affine subspace of codimension 1. Definition 4.3.3 A point of the projective space over a field F of dimension r is a line through the origin in Fr+1 . A line in Pr (F) is a plane through the origin in Fr+1 . More generally a projective subspace of dimension s in Pr (F) is a linear subspace of dimension s + 1 of the vector space Fr+1 , and r − s is called the codimension of the subspace. The projective space of dimension r over a field F, denoted by Pr (F), consists of all its projective subspaces. A point of a projective space is incident with or an element of a projective subspace if the line corresponding to the point is contained in the linear subspace that corresponds with the projective subspace. A hyperplane in Pr (F) is a projective subspace of codimension 1. Definition 4.3.4 A point in Pr (F) is denoted by its homogeneous coordinates (x0 : x1 : · · · : xr ) with x0 , x1 , . . . , xr ∈ F and not all zero, where λ(x0 , x1 , . . . , xr ), λ ∈ F, is a parametrization of the corresponding line in Fr+1 . Let (x0 , x1 , . . . , xr ) and (y0 , y1 , . . . , yr ) be two nonzero vectors in Fr+1 . Then (x0 : x1 : · · · : xr ) and (y0 : y1 : · · · : yr ) represent the same point in Pr (F) if and only if (x0 , x1 , . . . , xr ) = λ(y0 , y1 , . . . , yr ) for some λ ∈ F∗ . The standard homogeneous coordinates of a point in Pr (F) are given by (x0 : x1 : · · · : xr ) such that there exists a j with xj = 1 and xi = 0 for all i < j. The standard embedding of Ar (F) in Pr (F) is given by (x1 , . . . , xr ) 7→ (1 : x1 : · · · : xr ). Remark 4.3.5 Every hyperplane in Pr (F) is defined by an equation a0 X0 + a1 X1 + · · · + ar Xr = 0,
4.3. FINITE GEOMETRY AND CODES
93
where a0 , a1 , . . . , ar are r elements of F, not all zero. Furthermore a00 X0 + a01 X1 + · · · + a0r Xr = 0, defines the same hyperplane if and only if there exists a nonzero λ in F such that a0i = λai for all i = 0, 1, . . . , r. Hence there is a duality between points and hyperplanes in Pr (F), where a (a0 : a1 : . . . : ar ) is send to the hyperplane with equation a0 X0 + a1 X1 + · · · + ar Xr = 0. Example 4.3.6 The columns of a generator matrix of a simplex code Sr (q) represent all the points of Pr−1 (Fq ). Proposition 4.3.7 Let r and s be nonnegative integers such that s ≤ r. The number of s dimensional projective subspaces of Pr (Fq ) is equal to the Gaussian binomial (q r+1 − 1)(q r+1 − q) · · · (q r+1 − q s ) r+1 = s+1 s+1 q (q − 1)(q s+1 − q) · · · (q s+1 − q s ) In particular, the number of points of Pr (Fq ) is equal to
r+1 1
= q
q r+1 − 1 = q r + q r−1 + · · · + q + 1. q−1
Proof. An s dimensional projective subspace of Pr (Fq ) is an s + 1 dimensional subspace of Fr+1 , which is an [r + 1, s + 1] code over Fq . The number of the q latter objects is equal to the stated Gaussian binomial, by Proposition 2.5.2. Definition 4.3.8 Let P = (P1 , . . . , Pn ) be an ntuple of points in Pr (Fq ). Then P is called a projective system in Pr (Fq ) if not all these points lie in a hyperplane. This system is called simple if the n points are mutually distinct. Definition 4.3.9 A code C is called degenerate if there is a coordinate i such that ci = 0 for all c ∈ C. Remark 4.3.10 A code C is nondegenerate if and only if there is no zero column in a generator matrix of the code if and only if d(C ⊥ ) ≥ 2. Example 4.3.11 Let G be a generator matrix of a nondegenerate code C of dimension k. So G has no zero columns. Take the columns of G as homogeneous coordinates of points in Pk−1 (Fq ). This gives the projective system PG of G. Conversely, let (P1 , . . . , Pn ) be an enumeration of the points of a projective system P in Pr (Fq ). Let (p0j : p1j : · · · : prj ) be homogeneous coordinates of Pj . Let GP be the (r + 1) × n matrix with (p0j , p1j , . . . , prj )T as jth column. Then GP is the generator matrix of a nondegenerate code of length n and dimension r + 1, since not all points lie in a hyperplane. Proposition 4.3.12 Let C be a nondegenerate code of length n with generator matrix G. Let PG be the projective system of G. The code has generalized Hamming weight dr if and only if n − dr is the maximal number of points of PG in a linear subspace of codimension r.
94
CHAPTER 4. WEIGHT ENUMERATOR
Proof. Let G = (gij ) and Pj = (g1j : . . . : gkj ). Then P = (P1 , . . . , Pn ). Let D be a subspace of C of dimension r of minimal weight dr . Let c1 , . . . , cr be a basis of D. Then ci = (ci1 , . . . , cin ) = hi G for a nonzero hi = (hi1 , . . . , hik ) ∈ Fkq . Let Hi be the hyperplane in Pk−1 (Fq ) with equation hi1 X1 + . . . + hik Xk = 0. Then cij = 0 if and only if Pj ∈ Hi for all 1 ≤ i ≤ r and 1 ≤ j ≤ n. Let H be the intersection of H1 , . . . , Hr . Then H is a linear subspace of codimension r, since the c1 , . . . , cr are linearly independent. Furthermore Pj ∈ H if and only if cij = 0 for all 1 ≤ i ≤ r if and only if j 6∈ supp(D). Hence n − dr points lie in a linear subspace of codimension r. The proof of the converse is left to the reader. Definition 4.3.13 A code C is called projective if d(C ⊥ ) ≥ 3. Remark 4.3.14 A code of length n is projective if and only if G has no zero column and a column is not a scalar multiple of another column of G if and only if the projective system PG is simple for every generator matrix G of the code. Definition 4.3.15 A map ϕ : Pr (F) → Pr (F) is called a projective transformation Prif ϕ is given by ϕ(x0 : x1 : · · · : xr ) = (y0 : y1 : · · · : yr ), where yi = j=0 aij xj for all i = 0, . . . , r, for a given invertible matrix (aij ) of size r + 1 with entries in Fq . Pr Remark 4.3.16 The map ϕ is well defined by ϕ(x) = y with yi = j=0 aij xj . Since the equations for the yi are homogeneous in the xj . The diagonal matrices λIr+1 induce the identity map on Pr (F) for all λ ∈ F∗q . Definition 4.3.17 Let P = (P1 , . . . , Pn ) and Q = (Q1 , . . . , Qn ) be two projective systems in Pr (F). They are called equivalent if there exists a projective transformation ϕ of Pr (F) and a permutation σ of {1, . . . , n} such that Q = (ϕ(Pσ(1) , . . . , ϕ(Pσ(n) ). Proposition 4.3.18 There is a onetoone correspondence between generalized equivalence classes of nondegenerate [n, k, d] codes over Fq and equivalence classes of projective systems of n points in Pk−1 (Fq ). Proof. The correspondence between codes and projective systems is given in Example 4.3.11. Let C be a nondegenerate code over Fq with parameters [n, k, d]. Let G be a generator matrix of C. Take the columns of G as homogeneous coordinates of points in Pk−1 (Fq ). This gives the projective system PG of G. If G0 is another generator matrix of C, then G0 = AG for some invertible k × k matrix A with entries in Fq . Furthermore A induces a projective transformation ϕ of Pk−1 (Fq ) such that PG0 = ϕ(PG ). So PG0 and PG are equivalent. Conversely, let P = (P1 , . . . , Pn ) be a projective system in Pk−1 (Fq ). This gives the k × n generator matrix GP of a nondegenerate code. Another enumeration of the points of P and another choice of the homogeneous coordinates of the Pj gives a permutation of the columns of GP and a nonzero scalar multiple of the columns and therefore a generalized equivalent code. Proposition 4.3.19 Every rtuple of points in Pr (Fq ) lie in a hyperplane.
4.3. FINITE GEOMETRY AND CODES
95
Proof. Let P1 , . . . , Pr be r points in Pr (Fq ). Let (p0j : p1j : · · · : prj ) be the standard homogeneous coordnates of Pj . The r homogeneous equations Y0 p0j + Y1 p1j + · · · + Yr prj = 0, j = 1, . . . , r, in the r + 1 variables Y0 , . . . , Yr have a nonzero solution (h0 , . . . , hr ). Let H be the hyperplane with equation h0 X0 + · · · + hr Xr = 0. Then P1 , . . . , Pr lie in H.
4.3.2
MDS codes and points in general position
***points in general position*** A second geometric proof of the Singleton bound is given by means of projective systems. Corollary 4.3.20 (Singleton bound) The minimum distance d of a code of length n and dimension k is at most n − k + 1. Proof. The zero code has parameters [n, 0, n + 1] by definition, and indeed this code satisfies the Singleton bound. If C is not the zero code, we may assume without loss of generality that the code is not degenerate, by deleting the coordinates where all the codewords are zero. Let P be the projective system in Pk−1 (Fq ) of a generator matrix of the code. Then k − 1 points of the system lie in a hyperplane by Proposition 4.3.19. Hence n − d ≥ k − 1, by Proposition 4.3.12. The notion for projective systems that corresponds to MDS codes is the concept of general position. Definition 4.3.21 A projective system of n points in Pr (Fq ) is called in general position or an narc if no r + 1 points lie in a hyperplane. Example 4.3.22 Let n = q + 1 and a1 , a2 , . . . , aq−1 be an enumeration of the nonzero elements of Fq . Consider the code C with generator matrix a1 a2 . . . aq−1 0 0 G = a21 a22 . . . a2q−1 0 1 1 1 ... 1 1 0 Then C is a [q + 1, 3, q − 1] code by Proposition 3.2.10. Let Pj = (xj : x2j : 1) for 1 < j ≤ q − 1 and Pq = (0 : 0 : 1), Pq+1 = (0 : 1 : 0). Let P = (P1 , . . . , Pn ). Then P = PG and P is a projective system in the projective plane in general position. Remark that P is the set all points in the projective plane with coordinates (x : y : z) in Fq that lie on the conic with equation X 2 = Y Z. Remark 4.3.23 If q is large enough with respect to n, then almost every projective system of n points in Pr (Fq ) is in general position, or equivalently a random code over Fq of length n is MDS. The following proposition and corollary show that every Fq linear code with parameters [n, k, d] is contained in an Fqm linear MDS code with parameters [n, n − d + 1, d] if m is large enough.
96
CHAPTER 4. WEIGHT ENUMERATOR
Proposition 4.3.24 Let B be a qary code. If q m > max{ ni 0 ≤ i ≤ t} and d(B ⊥ ) > t, then there exists a sequence {Br  0 ≤ r ≤ t} of q m ary codes such that Br−1 ⊆ Br and Br is an [n, r, n−r+1] code and contained in the Fqm linear code generated by B for all 0 ≤ r ≤ t. Proof. The minimum distances of B ⊥ and (B ⊗Fqm )⊥ are the same. Induction on t is used. In case t = 0, there is nothing to prove, we can take B0 = 0. Suppose the statement is proved for t. Let B be a code such that d(B ⊥ ) > t + 1 n m and suppose q > max{ i 0 ≤ i ≤ t + 1}. By induction we may assume that there is a sequence {Br  0 ≤ r ≤ t} of q m ary codes such that Br−1 ⊆ Br ⊆ B ⊗ Fqm and Br is an [n, r, n − r + 1] code for all r, 0 ≤ r ≤ t. So B ⊗ Fqm has a generator matrix G with entries gij for 1 ≤ i ≤ k and 1 ≤ j ≤ n, such that the first r rows of G give a generator matrix Gr of Br . In particular the determinants of all (t × t)sub matrices of Gt are nonzero, by Proposition 3.2.5. Let ∆(j1 , . . . , jt ) be the determinant of Gt (j1 , . . . , jt ), which is the matrix obtained from Gt by taking the columns numbered by j1 , . . . , jt , where 1 ≤ j1 < . . . < jt ≤ n. For t < i ≤ n and 1 ≤ j1 < . . . < jt+1 ≤ n we define ∆(i; j1 , . . . , jt+1 ) to be the determinant of the (t + 1) × (t + 1) sub matrix of G formed by taking the columns numbered by j1 , . . . , jt+1 and the rows numbered by 1, . . . , t, i. Now consider for every (t + 1)tuple j = (j1 , . . . , jt+1 ) such that 1 ≤ j1 < . . . < jt+1 ≤ n, the linear equation in the variables Xt+1 , . . . , Xn given by: ! t+1 X X s ˆ (−1) ∆(j1 , . . . , js , . . . , jt+1 ) gijs Xi = 0 , s=1
i>t
where (j1 , . . . , ˆjs , . . . , jt+1 ) is the ttuple obtained from j by deleting the sth element. Rewrite this equation by interchanging the order of summation as follows: X ∆(i; j)Xi = 0 . i>t
If for a given j the coefficients ∆(i; j) are zero for all i > t, then all the rows of the matrix G(j), which is the sub matrix of G consisting of the columns numbered by j1 , . . . , jt+1 , are dependent on the first t rows of G(j). Thus rank(G(j)) ≤ t, so G has t + 1 columns which are dependent. But G is a parity check matrix for (B ⊗ Fqm )⊥ , therefore d((B ⊗ Fqm )⊥ ) ≤ t + 1, which contradicts the assumption in the induction hypothesis. We have therefore proved that for a given (t + 1)tuple, at least one of the coefficients ∆(i, j) is nonzero. Therefore the above equation defines a hyperplane H(j) in a vector space over Fqm of dimension n n − t. We assumed q m > t+1 , so n m n−t (q ) > (q m )n−t−1 . t+1 n Therefore (Fqm )n−t has more elements than the union of all t+1 hyperplanes of the form H(j). Thus there exists an element (xt+1 , . . . , xn ) ∈ (Fqm )n−t which does not lie in this union. Now consider the code Bt+1 , defined by the generator 0 matrix Gt+1 with entries (glj  1 ≤ l ≤ t + 1, 1 ≤ j ≤ n), where if 1 ≤ l ≤ t glj 0 glj = P if l = t + 1 i>t gij xi
4.4. EXTENDED WEIGHT ENUMERATOR
97
Then Bt+1 is a subcode of B⊗Fqm , and for every (t+1) tuple j, the P determinant of the corresponding (t+1)×(t+1) sub matrix of Gt+1 is equal to i>t ∆(i; j)xi , which is not zero, since x is not an element of H(j). Thus Bt+1 is an [n, t+1, n−t] code. Corollary 4.3.25 Suppose q m > max{ ni  1 ≤ i ≤ d − 1}. Let C be a qary code of minimum distance d, then C is contained in a q m ary MDS code of the same minimum distance as C. Proof. The Corollary follows from Proposition 4.3.24 by taking B = C ⊥ and t = d−1. Indeed, we have B0 ⊆ B1 ⊆ · · · ⊆ Bd−1 ⊆ C ⊗F⊥ q m for some Fq m linear codes Br , r = 0, . . . , d − 1 with parameters [n, r, n − r + 1]. Applying Exercise ⊥ ⊥ 2.3.5 (1) we obtain C ⊗ Fqm ⊆ Bd−1 , so also C ⊆ Bd−1 holds. Now Bd−1 is an ⊥ m Fq linear MDS code, thus Bd−1 also is and has parameters [n, n − d + 1, d] by Corollary 3.2.14.
4.3.3
Exercises
4.3.1 Give a proof of Remarks 4.3.10 and 4.3.14. 4.3.2 Let C be the binary [7,3,4] Simplex code. Give a parity check matrix of an [7, 4, 4] MDS code over D over F4 that contains C as a subfield subcode. 4.3.3 ....
4.4
Extended weight enumerator
***Intro***
4.4.1
Arrangements of hyperplanes
***affine/projective arrangements*** The weight spectrum can be computed by counting points in certain configurations of a set of hyperplanes. Definition 4.4.1 Let F be a field. A hyperplane in Fk is the set of solutions in Fk of a given a linear equation a1 X1 + · · · + ak Xk = b, where a1 , . . . , ak and b are elements of F such that not all the ai are zero. The hyperplane is called homogeneous if the equation is homogeneous, that is b = 0. Remark 4.4.2 The equations a1 X1 +· · ·+ak Xk = b and a01 X1 +· · ·+a0k Xk = b0 define the same hyperplane if and and only if (a01 , . . . , a0k , b0 ) = λ(a1 , . . . , ak , b) for some nonzero λ ∈ F.
98
CHAPTER 4. WEIGHT ENUMERATOR
Definition 4.4.3 An ntuple (H1 , . . . , Hn ) of hyperplanes in Fk is called an arrangement in Fk . The arrangement is called simple if all the n hyperplanes are mutually distinct. The arrangement is called central if all the hyperplanes are linear subspaces. A central arrangement is called essential if the intersection of all its hyperplanes is equal to {0}. Remark 4.4.4 A central arrangement of hyperplanes in Fr+1 gives rise to an arrangement of hyperplanes in Pr (F), since the defining equations are homogenous. The arrangement is essential if the intersection of all its hyperplanes is empty in Pr (F). The dual notion of an arrangement in projective space is a projective system. Definition 4.4.5 Let G = (gij ) be a generator matrix of a nondegenerate code C of dimension k. So G has no zero columns. Let Hj be the linear hyperplane in Fkq with equation g1j X1 + · · · + gkj Xk = 0 The arrangement (H1 , . . . , Hn ) associated with G will be denoted by AG . Remark 4.4.6 Let G be a generator matrix of a code C. Then the rank of G is equal to the number of rows of G. Hence the arrangement AG is essential. A code C is projective if and only if d(C ⊥ ) ≥ 3 if and only if AG is simple. Similarly as in Definition 4.3.17 on equivalent projective systems one defines the equivalence of the dual notion, that is of essential arrangements of hyperplanes in Pr (F). Then there is a onetoone correspondence between generalized equivalence classes of nondegenerate [n, k, d] codes over Fq and equivalence classes of essential arrangements of n hyperplanes in Pk−1 (Fq ) as in Proposition 4.3.18. Example 4.4.7 Consider the matrix G given 1 0 0 0 1 G= 0 1 0 1 0 0 0 1 1 1
by 1 1 0
1 1 . 1
Let C be the code over Fq with generator matrix G. For q = 2, this is the simplex code S2 (2). The columns of G represent also the coefficients of the lines of AG . The projective picture of AG is given in Figure 4.1.
Proposition 4.4.8 Let C be a nondegenerate code with generator matrix G. Let c be a codeword c = xG for some x ∈ Fk . Then wt(c) = n − number of hyperplanes in AG through x. Proof. Now c = xG. So cj = g1j x1 + · · · + gkj xk . Hence cj = 0 if and only if x lies on the hyperplane Hj . The results follows, since the weight of c is equal to n minus the number of positions j such that cj = 0. Remark 4.4.9 The number Aw of codewords of weight w equals the number of points that are on exactly n − w of the hyperplanes in AG , by Proposition 4.4.8. In particular An is equal to the number of points that is in the complement of
4.4. EXTENDED WEIGHT ENUMERATOR
99
H5 H1 H1 H5 H7
H7
H4
H4
H2
H2
H3
H3
H6
H6
Figure 4.1: Arrangement of G for q odd and q even the union of these hyperplanes in Fkq . This number can be computed by the principle of inclusion/exclusion: An = q k − H1 ∪ · · · ∪ Hn  = qk +
n X w=1
X
(−1)w
Hi1 ∩ · · · ∩ Hiw .
i1 <···
The following notations are introduced to find a formalism as above for the computation of the weight enumerator. Definition 4.4.10 For a subset J of {1, 2, . . . , n} define C(J) = {c ∈ C  cj = 0 for all j ∈ J}, l(J) = dim C(J) and X BJ = q l(J) − 1 and Bt = BJ . J=t
Remark 4.4.11 The encoding map x 7→ xG = c from vectors x ∈ Fkq to codewords gives the following isomorphism of vector spaces \ Hj ∼ = C(J) j∈J
by Proposition 4.4.8. Furthermore BJ is equal to the number of nonzero codewords c that are zero at al jTin J, and this is equal to the number of nonzero elements of the intersection j∈J Hj . The following two lemmas about the determination of l(J) will become useful later. Lemma 4.4.12 Let C be a linear code with generator matrix G. Let J ⊆ {1, . . . , n} and J = t. Let GJ be the k × t submatrix of G consisting of the columns of G indexed by J, and let r(J) be the rank of GJ . Then the dimension l(J) is equal to k − r(J).
100
CHAPTER 4. WEIGHT ENUMERATOR
Proof. The code CJ is defined in 3.1.2 by restricting the codewords of C to J. Then GJ is a generator matrix of CJ by Remark 3.1.3. Consider the the projection map πJ : C → Fkq given by πJ (c) = cJ . Then πJ is a linear map. The image of C under πJ is CJ and the kernel of πJ is C(J) by definition. It follows that dim CJ + dim C(J) = dim C. So l(J) = k − r(J). Lemma 4.4.13 Let k be the dimension of C. Let d and d⊥ be the minimum distance the code C and its dual code, respectively. Then k − t for all t < d⊥ , l(J) = 0 for all t > n − d. Furthermore k − t ≤ l(J) ≤
k − d⊥ + 1 n−d−t+1
for all for all
t ≥ d⊥ , t ≤ n − d.
Proof. (1) Let t > n − d and let J be a subset of {1, . . . , n} of size t and c a codeword such that c ∈ C(J). Then J is contained in the complement of the support of c. Hence t ≤ n − wt(c). Hence wt(c) ≤ n − t < d. So c = 0. Therefore C(J) = 0 and l(J) = 0. (2) Let J be tsubset of {1, . . . , n}. Then C(J) is defined by t homogenous linear equations on the vector space C of dimension t. So l(J) ≥ k − t. (3) The matrix G is a parity check matrix for the dual code, by (2) of Corollary 2.3.29. Now suppose that t < d⊥ . Then any t columns of G are independent, by Proposition 2.3.11. So l(J) = k − t for all tsubsets J of {1, . . . , n} by Lemma 4.4.12. (4) Assume that t ≤ n − d. Let J be a tsubset. Let t0 = n − d + 1. Choose a t0 subset J 0 such that J ⊆ J 0 . Then C(J 0 ) = { c ∈ C(J)  cj = 0 for all j ∈ J 0 \ J }. Now l(J 0 ) = 0 by (1). Hence C(J 0 ) = 0 and C(J 0 ) is obtained from C(J) by imposing J 0 \ J = n − d − t + 1 linear homogeneous equations. Hence l(J) = dim C(J) ≤ n − d − t + 1. (5) Assume that d⊥ ≤ t. Let J be a tsubset. Let t0 = d⊥ − 1. Choose a t0 subset J 0 such that J 0 ⊆ J. Then l(J 0 ) = k − d⊥ + 1 by (3) and l(J) ≤ l(J 0 ), since J 0 ⊆ J. Hence l(J) ≤ k − d⊥ + 1. Remark 4.4.14 Notice that d⊥ ≤ n − (n − k) + 1 and n − d ≤ k − 1 by the Singleton bound. So for t = k both cases of Lemma 4.4.13 apply and both give l(J) = 0. Proposition 4.4.15 Let k be the dimension of C. Let d and d⊥ be the minimum distance the code C and its dual code, respectively. Then n k−t − 1) for all t < d⊥ , t (q Bt = 0 for all t > n − d. Furthermore ⊥ n n (q k−t − 1) ≤ Bt ≤ (q min{n−d−t+1,k−d +1} − 1) t t for all d⊥ ≤ t ≤ n − d.
4.4. EXTENDED WEIGHT ENUMERATOR
101
Proof. This is a direct consequence of Lemma 4.4.13 and the definition of Bt . Proposition 4.4.16 The following formula holds n−t X n−w Bt = Aw . t w=d
Proof. This is shown by computing the number of elements of the set of pairs Bt = {(J, c)  J ⊆ {1, 2, . . . , n}, J = t, c ∈ C(J), c 6= 0} in two different ways, as in Lemma 4.1.19. For fixed J, the number of these pairs is equal to BJ , by definition. If we fix the weight w of a nonzero codeword c in C, then the number of zero entries of c is n − w and if c ∈ C(J), then J is contained in the complement of the support of c, and there are n−w possible choices for such a J. In this way t we get the right hand side of the formula. Theorem 4.4.17 The homogeneous weight enumerator of C can be expressed in terms of the Bt as follows. WC (X, Y ) = X n +
n X
Bt (X − Y )t Y n−t .
t=0
Proof. Now Xn +
n X
Bt (X − Y )t Y n−t = X n +
n−d X
Bt (X − Y )t Y n−t ,
t=0
t=0
since Bt = 0 for all t > n−d by Proposition 4.4.15. Substituting the formula for Bt of Proposition 4.4.16, interchanging the order of summation in the double sum and applying the binomial expansion of ((X − Y ) + Y )n−w gives that the above formula is equal to Xn +
n−d n−t XX
t=0 w=d n
X +
n X
n−w X
Aw
t=0
w=d
Xn +
n X
n−w Aw (X − Y )t Y n−t = t
! n−w t n−w−t (X − Y ) Y ) Yw = t
Aw X n−w Y w = WC (X, Y )
w=d
Proposition 4.4.18 Let A0 , . . . , An be the weight spectrum of a code of minimum distance d. Then A0 = 1, Aw = 0 if 0 < w < d and Aw =
n−d X t=n−w
(−1)n+w+t
t Bt if d ≤ w ≤ n. n−w
102
CHAPTER 4. WEIGHT ENUMERATOR
Proof. This identity is proved by inverting the argument of the proof of the formula of Theorem 4.4.17 and using the binomial expansion of (X − Y )t . This is left as an exercise. An alternative proof is given by the principle of inclusion/exclusion. A third proof can be obtained by using Proposition 4.4.16. A fourth proof is obtained by showing that the transformation of the Bt ’s in the Aw ’s and vice versa are given by the linear maps given in Propositions 4.4.16 and 4.4.18 that are each others inverse. See Exercise 4.4.5. Example 4.4.19 Consider the [7, 4, 3] Hamming code as in Examples 2.2.14 and ??. Then its dual is the [7, 3, 4] Simplex code. Hence d = 3 and d⊥ = 4. 4−t 7 So Bt = t (2 − 1) for all t < 4 and Bt = 0 for all t > 4 by Proposition 4.4.15. Of the 35 subsets J of size 4 there are exactly 7 of them such that l(J) = 1 and l(J) = 0 for the 28 remaining subsets, by Exercise 2.3.4. Therefore B4 = 7(21 − 1) = 7. To find the the Aw we apply Proposition 4.4.18. B0 B1 B2 B3 B4
= = = = =
15 49 63 35 7
A3 A4 A5 A6 A7
= = = = =
B4 B3 − 4B4 B2 − 3B3 + 6B4 B1 − 2B2 + 3B3 − 4B4 B0 − B1 + B2 − B3 + B4
= = = = =
7 7 0 0 1
This is in agreement with Example 4.1.6
4.4.2
Weight distribution of MDS codes
*** Definition 4.4.20 Let C be a code of length n, minimum distance d and dual minimum distance d⊥ . The genus of C is defined by g(C) = max{n + 1 − k, k + 1 − d⊥ }. ***Transfer to end of 3.2.1*** ***diagram of (un)known values of Bt (T ).**** Remark 4.4.21 The Bt are known as functions of the parameters [n, k]q of the code for all t < d⊥ and for all t > n − d. So the Bt is unknown for the n − d − d⊥ + 1 values of t such that d⊥ ≤ t ≤ n − d. In particular the weight enumerator of an MDS code is completely determined by the parameters [n, k]q of the code. Proposition 4.4.22 The weight distribution of an MDS code of length n and dimension k is given by Aw =
w−d n X w (−1)j q w−d+1−j − 1 . w j=0 j
for w ≥ d = n − k + 1. Proof. Let C be an [n, k, n−k+1] MDS code. Then its dual is also an MDS code with parameters [n, n − k, k + 1] by Proposition 3.2.7. Then Bt = nt q k−t − 1
4.4. EXTENDED WEIGHT ENUMERATOR
103
for all t < d⊥ = k + 1 and Bt = 0 for all t > n − d = k − 1 by Proposition 4.4.15. Hence n−d X t n Aw = (−1)n+w+t q k−t − 1 n−w t t=n−w by Proposition 4.4.18. Make the substitution j = t−n+w. Then the summation is from j = 0 to j = w − d. Furthermore t n n w = . n−w t w j This gives the formula for Aw .
Remark 4.4.23 Let C be an [n, k, n − k + 1] MDS code. Then the number of nonzero codewords of minimal weight is n Ad = (q − 1) d according to Proposition 4.4.22. This is in agreement with Remark 3.2.15. Remark 4.4.24 The trivial codes with parameters [n, n, 1] and [n, 0, n+1], and the repetition code and its dual with parameters [n, 1, n] and [n, n − 1, 2] are MDS codes of arbitrary length. But the length is bounded if 2 ≤ k according to the following proposition. Proposition 4.4.25 Let C be an MDS code over Fq of length n and dimension k. If k ≥ 2, then n ≤ q + k − 1. Proof. Let C be an [n, k, n − k + 1] code such that 2 ≤ k. Then d + 1 = n − k + 2 ≤ n and n n 2 Ad+1 = (q − 1) − (d + 1)(q − 1) = (q − 1)(q − d) d+1 d+1 by Proposition 4.4.22. This implies that d ≤ q, since Ad+1 ≥ 0. Now n = d + k − 1 ≤ q + k − 1. Remark 4.4.26 Proposition 4.4.25 also holds for nonlinear codes. That is: if there exits an (n, q k , n − k + 1) code such that k ≥ 2, then d = n − k + 1 ≤ q. This is proved by means of orthogonal arrays by Bush as we will see in Section 5.5.1. Corollary 4.4.27 (Bush bound) Let C be an MDS code over Fq of length n and dimension k. If k ≥ q, then n ≤ k + 1. Proof. If n > k + 1, then C ⊥ is an MDS code of dimension n − k ≥ 2. Hence n ≤ q + (n − k) − 1 by Proposition 4.4.25. Therefore k < q. Remark 4.4.28 The length of the repetition code is arbitrary long. The length n of a qary MDS code of dimension k is at most q+k−1 if 2 ≤ k, by Proposition 4.4.25. In particular the maximal length of an MDS code is a function of k and q, if k ≥ 2.
104
CHAPTER 4. WEIGHT ENUMERATOR
Definition 4.4.29 Let k ≥ 2. Let m(k, q) be the maximal length of an MDS code over Fq of dimension k. Remark 4.4.30 So m(k, q) ≤ k + q − 1 if 2 ≤ k, and m(k, q) ≤ k + 1 if k ≥ q by the Bush bound. We have seen that m(k, q) is at least q + 1 for all k and q in Proposition 3.2.10. Let C be an [n, 2, n − 1] code. Then C is systematic at the first two positions, so we may assume that its generator matrix G is of the form 1 0 x3 x4 . . . x n G= . 0 1 y3 y4 . . . yn The weight of all codewords is a least n − 1. Hence xj 6= 0 and yj 6= 0 for all 3 ≤ j ≤ n. The code is generalized equivalent with a code with xj = 1, after dividing the jth coordinate by xj for j ≥ 3. Let gi be the ith row of G. If 3 ≤ j < l and yj = yl , then g2 − yj g1 is a codeword of weight at most n − 2, which is a contradiction. So n − 2 ≤ q − 1. Therefore m(2, q) = q + 1. Dually we get m(q − 1, q) = q + 1. If case q is even, then m(3, q) is least q + 2 by the following Example 3.2.12 and dually m(q − 1, q) ≥ q + 2. Later it will be shown in Proposition 13.5.1 that these values are in fact optimal. Remark 4.4.31 The MDS conjecture states that for a nontrivial [n, k, n−k+1] MDS code over Fq we have that n ≤ q + 2 if q is even and k = 3 or k = q − 1; and n ≤ q + 1 in all other cases. So it is conjectured that q + 1 if 2 ≤ k ≤ q, m(k, q) = k+1 if q < k, except for q is even and k = 3 or k = q − 1, then m(3, q) = m(q − 1, q) = q + 2.
4.4.3
Extended weight enumerator
Definition 4.4.32 Let Fqm be the extension field of Fq of degree m. Let C be a Fq linear code of length n. The extension by scalars of C to Fqm is the Fqm linear subspace in Fnqm generated by C and will be denoted by C ⊗ Fqm . Remark 4.4.33 Let G be a generator matrix of the code C of length n over Fq . Then G is also a generator matrix of C ⊗ Fqm over Fqm linear code with. The dimension l(J) is equal to k − r(J) by Lemma 4.4.12, where r(J) is the rank of the k × t submatrix GJ of G consisting of the t columns indexed by J. This rank is equal to the number of pivots of GJ , so this rank does not change by an extension of Fq to Fqm . So dimFqm C ⊗ Fqm (J) = dimFq C(J). Hence the numbers BJ (q m ) and Bt (q m ) of the code C ⊗ Fqm are equal to X BJ (q m ) = q m·l(J) − 1 and Bt (q m ) = BJ (q m ). J=t
This motivates to consider q m as a variable in the following definitions.
4.4. EXTENDED WEIGHT ENUMERATOR
105
Definition 4.4.34 Let C be an Fq linear code of length n. BJ (T ) = T l(J) − 1 and Bt (T ) =
X
BJ (T ).
J=t
The extended weight enumerator is defined by n
WC (X, Y, T ) = X +
n−d X
Bt (T )(X − Y )t Y n−t .
t=0
Proposition 4.4.35 Let d and d⊥ be the minimum distance of code and the dual code, respectively. Then n k−t − 1) for all t < d⊥ , t (T Bt (T ) = 0 for all t > n − d. Proof. This is a direct consequence of Lemma 4.4.13 and the definition of Bt . Theorem 4.4.36 The extended weight enumerator of a linear code of length n and minimum distance d can be expressed as a homogeneous polynomial in X and Y of degree n with coefficients Aw (T ) that are integral polynomials in T . WC (X, Y, T ) =
n X
Aw (T )X n−w Y w ,
w=0
where A0 (T ) = 1, and Aw (T ) = 0 if 0 < w < d, and Aw (T ) =
n−d X
(−1)n+w+t
t=n−w
t Bt (T ) if d ≤ w ≤ n. n−w
Proof. The proof is similar to the proof of Proposition 4.4.18 and is left as an exercise. Remark 4.4.37 The definition of Aw (T ) is consistent with the fact that Aw (q m ) is the number of codewords of weight w in C ⊗ Fqm and WC (X, Y, q m ) =
n X
Aw (q m )X n−w Y w = WC⊗Fqm (X, Y )
w=0
by Proposition 4.4.18 and Theorem 4.4.36. Proposition 4.4.38 The following formula holds Bt (T ) =
n−t X n−w w=d
Proof. This is left as an exercise.
t
Aw (T ).
106
CHAPTER 4. WEIGHT ENUMERATOR
Remark 4.4.39 Using Theorem 4.4.36 it is immediate to find the weight distribution of a code over any extension Fqm if one knows the l(J) over the ground field Fq for all subsets J of {1, . . . , n}. Computing the C(J) and l(J) for a fixed J is just linear algebra. The large complexity for the computation of the weight enumerator and the minimum distance in this way stems from the exponential growth of the number of all possible subsets of {1, . . . , n}. Example 4.4.40 Consider the [7, 4, 3] Hamming code as in Example 4.4.19 but now over all extensions of the binary field. Then Bt (T ) = 7t (T 4−t − 1) for all t < 4 and Bt = 0 for all t > 4 by Proposition 4.4.35 and B4 (T ) = 7(T − 1) = 7. To find the the Aw (T ) we apply Theorem 4.4.36. A3 (T ) A4 (T ) A5 (T ) A6 (T )
= = = =
B4 (T ) B3 (T ) − 4B4 (T ) B2 (T ) − 3B3 (T ) + 6B4 (T ) B1 (T ) − 2B2 (T ) + 3B3 (T ) − 4B4 (T )
Hence A7 (T )
= = = =
7(T1) 7(T1) 21(T1)(T2) 7(T1)(T2)(T3)
= B0 (T ) − B1 (T ) + B2 (T ) − B3 + B4 (T ) = T 4 − 7T 3 + 21T 2 − 28T + 13
***factorize, example 4.1.8*** The following description of the extended weight enumerator of a code will be useful. Proposition 4.4.41 The extended weight enumerator of a code of length n can be written as n X X T l(J) (X − Y )t Y n−t . WC (X, Y, T ) = t=0 J=t
Proof. By rewriting ((X − Y ) + Y )n , we get n X X
T l(J) (X − Y )t Y n−t
t=0 J=t
=
n X
(X − Y )t Y n−t
t=0
=
X
((T l(J) − 1) + 1)
J=t
X n (X − Y )t Y n−t + (T l(J) − 1) t t=0
n X
J=t
n n X X n = (X − Y )t Y n−t + Bt (X − Y )t Y n−t t t=0 t=0
= Xn +
n X
Bt (X − Y )t Y n−t
t=0
= WC (X, Y, T ). ***Examples, repetition code, Hamming, simplex, Golay, MDS code*** ***MacWilliams identity***
4.4. EXTENDED WEIGHT ENUMERATOR
4.4.4
107
Puncturing and shortening
There are several ways to get new codes from existing ones. In this section, we will focus on puncturing and shortening of codes and show how they are used in an alternative algorithm for finding the extended weight enumerator. The algorithm is based on the TutteGrothendieck decomposition of matrices introduced by Brylawski [31]. Greene [59] used this decomposition for the determination of the weight enumerator. Let C be a linear [n, k] code and let J ⊆ {1, . . . , n}. Then the code C punctured by J is obtained by deleting all the coordinates indexed by J from the codewords of C. The length of this punctured code is n − J and the dimension is at most k. Let C be a linear [n, k] code and let J ⊆ {1, . . . , n}. If we puncture the code C(J) by J, we get the code C shortened by J. The length of this shortened code is n − J and the dimension is l(J). The operations of puncturing and shortening a code are each others dual: puncturing a code C by J and then taking the dual, gives the same code as shortening C ⊥ by J. We have seen that we can determine the extended weight enumerator of a [n, k] code C with the use of a k × n generator matrix of C. This concept can be generalized for arbitrarily matrices, not necessarily of full rank. Definition 4.4.42 Let F be a field. Let G be a k × n matrix over F, possibly of rank smaller than k and with zero columns. Then for each J ⊆ {1, . . . , n} we define l(J) = l(J, G) = k − r(GJ ). as in Lemma 7.4.37. Define the extended weight enumerator WG (X, Y, T ) as in Definition 4.4.34. We can now make the following remarks about WG (X, Y, T ). Proposition 4.4.43 Let G be a k × n matrix over F and WG (X, Y, T ) the associated extended weight enumerator. Then the following statements hold: (i) WG (X, Y, T ) is invariant under rowequivalence of matrices. (ii) Let G0 be a l × n matrix with the same rowspace as G, then we have WG (X, Y, T ) = T k−l WG0 (X, Y, T ). In particular, if G is a generator matrix of a [n, k] code C, we have WG (X, Y, T ) = WC (X, Y, T ). (iii) WG (X, Y, T ) is invariant under permutation of the columns of G. (iv) WG (X, Y, T ) is invariant under multiplying a column of G with an element of F∗ . (v) If G is the direct sum of G1 and G2 , i.e. of the form G1 0 , 0 G2 then WG (X, Y, T ) = WG1 (X, Y, T ) · WG2 (X, Y, T ).
108
CHAPTER 4. WEIGHT ENUMERATOR
Proof. (i) If we multiply G from the left with an invertible k × k matrix, the r(J) do not change, and therefore (i) holds. For (ii), we may assume without loss of generality that k ≥ l. Because G and G0 have the same rowspace, the ranks r(GJ ) and r(G0J ) are the same. So l(J, G) = k − l + l(J, G0 ). Using Proposition 4.4.41 we have for G WG (X, Y, T )
=
n X X
T l(J,G) (X − Y )t Y n−t
t=0 J=t
=
n X X
0
T k−l+l(J,G ) (X − Y )t Y n−t
t=0 J=t
= T
k−l
= T
k−l
n X X
0
T l(J,G ) (X − Y )t Y n−t
t=0 J=t
WG0 (X, Y, T ).
The last part of (ii) and (iii)–(v) follow directly from the definitions.
With the use of the extended weight enumerator for general matrices, we can derive a recursive algorithm to determine the extended weight enumerator of a code. Let G be a k × n matrix with entries in F. Suppose that the jth column is not the zero vector. Then there exists a matrix rowequivalent to G such that the jth column is of the form (1, 0, . . . , 0)T . Such a matrix is called reduced at the jth column. In general, this reduction is not unique. Let G be a matrix that is reduced at the jth column a. The matrix G \ a is the k × (n − 1) matrix G with the column a removed, and G/a is the (k − 1) × (n − 1) matrix G with the column a and the first row removed. We can view G \ a as G punctured by a, and G/a as G shortened by a. For the extended weight enumerators of these matrices, we have the following connection (we omit the (X, Y, T ) part for clarity): Proposition 4.4.44 Let G be a k × n matrix that is reduced at the jth column a. For the extended weight enumerator of a reduced matrix G holds WG = (X − Y )WG/a + Y WG\a . Proof. We distinguish between two cases here. First, assume that G \ a and G/a have the same rank. Then we can choose a G with all zeros in the first row, except for the 1 in the column a. So G is the direct sum of 1 and G/a. By Proposition 4.4.43 parts (v) and (ii) we have WG = (X + (T − 1)Y )WG/a
and
WG\a = T WG/a .
Combining the two gives WG
=
(X + (T − 1)Y )WG/a
=
(X − Y )WG/a + Y T WG/a
=
(X − Y )WG/a + Y WG\a .
4.4. EXTENDED WEIGHT ENUMERATOR
109
For the second case, assume that G \ a and G/a do not have the same rank. So r(G \ a) = r(G/a) + 1. This implies G and G \ a do have the same rank. We have that n X X WG (X, Y, T ) = T l(J,G) (X − Y )t Y n−t . t=0 J=t
by Proposition 4.4.41. This double sum splits into the sum of two parts by distinguishing between the cases j ∈ J and j 6∈ J. Let j ∈ J, t = J, J 0 = J \ {j} and t0 = J 0  = t − 1. Then l(J 0 , G/a) = k − 1 − r((G/a)J 0 ) = k − r(GJ ) = l(J, G). So the first part is equal to n X X
T l(J,G) (X − Y )t Y n−t =
n−1 X
X
T l(J
0
,G/a)
0
0
(X − Y )t +1 Y n−1−t
t0 =0 J 0 =t0
t=0 J=t j∈J
which is equal to (X − Y )WG/a . Let j 6∈ J. Then (G \ a)J = GJ . So l(J, G \ a) = l(J, G). Hence the second part is equal to n X X
T l(J,G) (X − Y )t Y n−t = Y
n−1 X
X
0
T l(J,G\a) (X − Y )t Y n−1−t
0
t0 =0 J=t0 j6∈J
t=0 J=t j6∈J
which is equal to Y WG\a .
Theorem 4.4.45 Let G be a k × n matrix over F with n > k of the form G = (Ik P ), where P is a k × (n − k) matrix over F. Let A ⊆ [k] and write PA for the matrix formed by the rows of P indexed by A. Let WA (X, Y, T ) = WPA (X, Y, T ). Then the following holds: WC (X, Y, T ) =
k X X
Y l (X − Y )k−l WA (X, Y, T ).
l=0 A=l
Proof. We use the formula of the last proposition recursively. We denote the construction of G \ a by G1 and the construction of G/a by G2 . Repeating this procedure, we get the matrices G11 , G12 , G21 and G22 . So we get for the weight enumerator WG = Y 2 WG11 + Y (X − Y )WG12 + Y (X − Y )WG21 + (X − Y )2 WG22 . Repeating this procedure k times, we get 2k matrices with n − k columns and 0, . . . , k rows, which form exactly the PA . In the diagram are the sizes of the matrices of the first two steps: note that only the k × n matrix on top has to be of full rank. The number of matrices of size (k − i) × (n − j) is given by the
110
CHAPTER 4. WEIGHT ENUMERATOR
binomial coefficient
j i
. k×n
k × (n − 1)
k × (n − 2)
(k − 1) × (n − 1)
(k − 1) × (n − 2)
(k − 2) × (n − 2)
On the last line we have W0 (X, Y, T ) = X n−k . This proves the formula.
Example 4.4.46 Let C be the even weight code of length n = 6 over F2 . Then a generator matrix of C is the 5×6 matrix G = (I5 P ) with P = (1, 1, 1, 1, 1, 1)T . So the matrices PA are l × 1 matrices with all ones. We have W0 (X, Y, T ) = X and Wl (X, Y, T ) = T l−1 (X + (T − 1)Y ) by part (ii) of Proposition 4.4.43. Therefore the weight enumerator of C is equal to WC (X, Y, T )
= WG (X, Y, T ) = X(X − Y )5 +
5 X 5 l=1
l
Y l (X − Y )5−l T l−1 (X + (T − 1)Y )
6
= X + 15(T − 1)X 4 Y 2 + 20(T 2 − 3T + 2)X 3 Y 3 +15(T 3 − 4T 2 + 6T − 3)X 2 Y 4 +6(T 4 − 5T 3 + 10T 2 − 10T + 4)XY 5 +(T 5 − 6T 4 + 15T 3 − 20T 2 + 15T − 5)Y 6 . For T = 2 we get WC (X, Y, 2) = X 6 +15X 4 Y 2 +15X 2 Y 4 +Y 6 , which we indeed recognize as the weight enumerator of the even weight code that we found in Example 4.1.5.
4.4.5
Exercises
4.4.1 Compute the extended weight enumerator of the binary simplex code S3 (2). 4.4.2 Compute the extended weight enumerators of the nfold repetition code and its dual. 4.4.3 Compute the extended weight enumerator of the binary Golay code. 4.4.4 Compute the extended weight enumerator of the ternary Golay code. 4.4.5 Consider the square matrices A and B of size n + 1 with entries aij and bij , respectively given by i i+j i aij = (−1) , and bij = for 0 ≤ i, j ≤ n. j j Show that A and B are inverses of each other.
4.5. GENERALIZED WEIGHT ENUMERATOR
111
4.4.6 Give a proof of Theorem 4.4.36. 4.4.7 Give a proof of Proposition 4.4.38. 4.4.8 Compare the complexity of the methods ”exhaustive search” and ”arrangements of hyperplanes” to compute the weight enumerator as a function of q and the parameters [n, k, d] and d⊥ .
4.5
Generalized weight enumerator
***Intro***
4.5.1
Generalized Hamming weights
We recall that for a linear code C, the minimum Hamming weight is the minimal one among all Hamming weights wt(c) for nonzero codewords c 6= 0. In this subsection, we generalize this parameter to get a sequence of values, the socalled generalized Hamming weights, which are useful in the study of the complexity of the trellis decoding and other properties of the code C. ***C nondegenerate?*** Let D be a subcode of C. Generalizing Definition 2.2.2, we define the support of D, denoted by supp(D), as set of positions where at least one codeword in D is not zero, i.e., supp(D) = {i  there exists x ∈ D, such that xi 6= 0}. The weight of D, wt(D), is defined as the size of supp(D). Suppose C is an [n, k] code. For any r ≤ k, the rth generalized Hamming weight (GHW) of C is defined as dr (C) = min{wt(D)  D is a k−dimensional subcode of C}. The set of GHWs {d1 (C), . . . , dk (C)} is called the weight hierarchy of C. Note that since any 1−dimensional subcode has a nonzero codeword as its basis, the first generalized Hamming weight d1 (C) is exactly equal to the minimum weight of C. We now state several properties of generalized Hamming weights. Proposition 4.5.1 (Monotonicity) For an [n, k] code C, the generalized Hamming weights satisfy 1 ≤ d1 (C) < d2 (C) < . . . < dk (C) ≤ n. Proof. For any 1 ≤ r ≤ k − 1, it is trivial to verify 1 ≤ dr (C) ≤ dr+1 (C) ≤ n. Let D be a subcode of dimension r + 1, such that wt(D) = dr+1 (C). We choose any index i ∈ supp(D). Consider E = {x  x ∈ D, and xi = 0}.
112
CHAPTER 4. WEIGHT ENUMERATOR
By Definition 3.1.13 and Proposition 3.1.15, E is a shortened code of D, and r ≤ dim(E) ≤ r + 1. However, by the choice of i, there exists a codeword c ∈ D with ci 6= 0. Thus, c can not be a codeword of E. This implies that E is a proper subcode of D, that is dim(E) = r. Now, by the definition of the GHWs, we have dr (C) ≤ wt(E) ≤ wt(D) − 1 = dr+1 (C) − 1. This proves that dr (C) < dr+1 (C).
Proposition 4.5.2 (Generalized Singleton Bound) For an [n, k] code C, we have dr (C) ≤ n − k + r. This bound on dr (C) is a straightforward consequence of the Proposition 4.5.1. When r = 1, we get the Singleton bound (see Theorem 3.2.1). Let H be a parity check matrix of the [n, k] code C, which is a (n−k)×n matrix of rank n − k. From Proposition 2.3.11, we know that the minimum distance of C is the smallest integer d such that d columns of H are linearly dependent. We now present a generalization of this property. Let Hi , 1 ≤ i ≤ n, be the column vectors of H. For any subset I of {1, 2, . . . , n}, let hHi  i ∈ Ii be the subset of Fnq generated by the vectors Hi , i ∈ I, which, for simplicity, is denoted by VI . Lemma 4.5.3 The rth generalized Hamming weight of C is dr (C) = min{I  dim(hHi  i ∈ Ii) ≤ I − r}. P Proof. We denote VI⊥ = {x  xi = 0 for i 6∈ I, and i∈I xi Hi = 0}. Then it is easy to see that dim(VI ) + dim(VI⊥ ) = I. Also, from the definition, for any I, VI⊥ is a subcode of C. Let D be a subcode of C with dim(D) = r and supp(D) = dr (C). Let I = supp(D). Then D ⊆ VI⊥ . This implies that dim(VI ) = I−dim(VI⊥ ) ≤ I− dim(D) = I − r. Therefore, dr (C) = supp(D) = I ≥ min{I  dim(VI ) ≤ I − r}. We now prove the inverse inequality. Denote d = min{I  dim(VI ) ≤ I − r}. Let I be a subset of {1, 2, . . . , n} such that dim(VI ) ≤ I − r and I = d. Then dim(VI⊥ ) ≥ r. Therefore, dr (C) ≤ supp(VI⊥ ) ≤ I = d. Proposition 4.5.4 (Duality) Let C be an [n, k] code. Then the weight hierarchy of its dual code C ⊥ is completely determined by the weight hierarchy of C, precisely {dr (C ⊥ )  1 ≤ r ≤ n − k} = {1, 2, . . . , n}\{n + 1 − ds (C)  1 ≤ s ≤ k}. Proof. Look at the two sets {dr (C ⊥ )  1 ≤ r ≤ n−k} and {n+1−ds (C)  1 ≤ s ≤ k}. Both are subsets of {1, 2, . . . , n}. And by the Monotonicity, the first one has size n − k, the second one has size k. Thus, it is sufficient to prove that these two sets are distinct. We now prove an equivalent fact that for any 1 ≤ r ≤ k, the value n + 1 − dr (C) is not a generalized Hamming weight of C ⊥ . Let t = n − k + r − dr (C).
4.5. GENERALIZED WEIGHT ENUMERATOR
113
It is sufficient to prove that dt (C ⊥ ) < n + 1 − dr (C) and for any δ ≥ 1, dt+δ (C ⊥ ) 6= n + 1 − dr (C). Let D be a subcode of C with dim(D) = r and supp(D) = dr (C). There exists a parity check matrix G for C ⊥ (which is a generator matrix for C), where the first r rows are words in D and the last k − r rows are not. The column vectors {Gi  i 6∈ supp(D)} have their first r coordinates zero. Thus, dim(hGi  i 6∈ supp(D)i)= column rank of the matrix (Gi  i 6∈ supp(D)) ≤ row rank of the matrix (Ri  r + 1 ≤ i ≤ k) ≤ k − r, where Ri is the ith row vector of G. Let I = {1, 2, . . . , n}\supp(D). Then, I = n − dr (C). And dim(hGi  i ∈ Ii) ≤ k − r = I − t. Thus, by Lemma 4.5.3, we have dt (C ⊥ ) ≤ I = n − dr (C) < n − dr (C) + 1. Next, we show dt+δ (C ⊥ ) 6= n + 1 − dr (C). Otherwise, dt+δ (C ⊥ ) = n + 1 − dr (C) holds for some δ. Then by the definition of generalized Hamming weight, there exists a generator matrix H for C ⊥ (which is a parity check matrix for C) and dr (C) − 1 positions 1 ≤ i1 , . . . , idr (C)−1 ≤ n, such that the coordinates of the first t + δ rows of H are all zero at these dr (C) − 1 positions. Without loss of generality, we assume these positions are exactly the last dr (C) − 1 positions n − dr (C) + 2, . . . , n. And let I = {n − dr (C) + 2, . . . , n}. Clearly, the last I column vectors span a space of dimension ≤ n − k − t − δ = dr (C) − r − δ. By Lemma 4.5.3, ds (C) ≤ dr (C) − 1, where s = I − (dr (C) − r − δ) = r + δ − 1 ≥ r. This contradicts the Monotonicity.
4.5.2
Generalized weight enumerators
The weight distribution is generalized in the following way. Instead of looking at words of C, we consider all the subcodes of C of a certain dimension r. Definition 4.5.5 Let C be a linear code of length n. The number of subcodes (r) with a given weight w and dimension r is denoted by Aw , that is A(r) w = {D ⊆ C dim D = r, wt(D) = w}. Together they form the rth generalized weight distribution of the code. The (r) rth generalized weight enumerator WC (X, Y ) of C is the polynomial with the weight distribution as coefficients, that is (r)
WC (X, Y ) =
n X
n−w w A(r) Y . w X
w=0 (0)
(r)
Remark 4.5.6 From this definition it follows that A0 = 1 and A0 = 0 for all 0 < r ≤ k. Furthermore, every 1dimensional subspace of C contains q − 1 (1) nonzero codewords, so (q − 1)Aw = Aw for 0 < w ≤ n. This means we can find back the original weight enumerator by using (0)
(1)
WC (X, Y ) = WC (X, Y ) + (q − 1)WC (X, Y ). Definition 4.5.7 We introduce the following notations: [m, r]q =
r−1 Y
(q m − q i )
i=0
114
CHAPTER 4. WEIGHT ENUMERATOR hriq = [r, r]q k [k, r]q = . r q hriq
Remark 4.5.8 In Proposition 2.5.2 it is shown that the first number is equal to the number of m × r matrices of rank r over Fq . and the third number is the Gaussian binomial, and it represents the number of rdimensional subspaces of Fkq . Hence the second number is the number of bases of Frq . Definition 4.5.9 For J ⊆ {1, . . . , n} and r ≥ 0 an integer we define: BJ
(r)
=
(r) Bt
=
{D ⊆ C(J)D subspace of dimension r} X (r) BJ J=t (r)
h
i
(0) . For r = 0 this gives Bt = nt . So (r) we see that in general l(J) = 0 does not imply BJ = 0, because 00 q = 1. But Remark 4.5.10 Note that BJ =
l(J) r
q
(r)
(r)
if r 6= 0, we do have that l(J) = 0 implies BJ = 0 and Bt
= 0.
Proposition 4.5.11 Let dr be the rth generalized Hamming weight of C, and d⊥ the minimum distance of the dual code C ⊥ . Then we have i ( h n k−t for all t < d⊥ (r) t r Bt = q 0 for all t > n − dr Proof. The first case is is a direct corollary of Lemma 4.4.13, since there are nt subsets J ⊆ {1, . . . , n} with J = t. The proof of the second case goes analogous to the proof of the same lemma: let J = t, t > n − dr and suppose there is a subspace D ⊆ C(J) of dimension r. Then J is contained in the complement of supp(D), so t ≤ n − wt(D). It follows that wt(D) ≤ n − t < dr , (r) which is impossible, so such a D does not exist. So BJ = 0 for all J with (r) J = t and t > n − dr , and therefore Bt = 0 for t > n − dr . We can check that the formula is welldefined: if t < d⊥ then l(J) = k − t. If also t > n − dr , we have t > n − dhr ≥ ki − r by the generalized Singleton bound. This implies r > k − t = l(J), so k−t = 0. r q
The relation between
(r) Bt
and
(r) Aw
becomes clear in the next proposition.
Proposition 4.5.12 The following formula holds: (r)
Bt Proof. (r)
Bt
=
n X n−w A(r) w . t w=0
We will count the elements of the set
= {(D, J)J ⊆ {1, . . . , n}, J = t, D ⊆ C(J) subspace of dimension r}
4.5. GENERALIZED WEIGHT ENUMERATOR
115 (r)
in two different ways. For each J with J = t there are BJ pairs (D, J) in P (r) (r) (r) Bt , so the total number of elements in this set is J=t BJ = Bt . On the other hand, let D be an rdimensional subcode of C with wt(D) = w. There are (r) Aw possibilities for such a D. If we want to find a J such that D ⊆ C(J), we have to pick t coordinates from the n − w allzero coordinates of D. Summation over all w proves the given formula. (r)
Note that because Aw = 0 for all w < dr , we can start summation at w = dr . We can end summation at w = n − t because for t > n − w we have n−w = 0. t So the formula can be rewritten as n−t X n−w (r) Bt = A(r) w . t w=dr
In practice, we will often prefer the summation given in the proposition. Theorem 4.5.13 The generalized weight enumerator is given by the following formula: n X (r) (r) WC (X, Y ) = Bt (X − Y )t Y n−t . t=0
Proof. The proof is similar to the one given for Theorem 4.4.17 and is left as an exercise. (r)
(r)
It is possible to determine the Aw directly from the Bt , by using the next proposition. Proposition 4.5.14 The following formula holds: A(r) w
=
n X t=n−w
(−1)
n+w+t
t (r) Bt . n−w
Proof. The proof is similar to the one given for Proposition 4.4.18 and is left as an exercise.
4.5.3
Generalized weight enumerators of MDScodes
We can use the theory in Sections 4.5.2 and 4.4.3 to calculate the weight distribution, generalized weight distribution, and extended weight distribution of a linear [n, k] code C. This is done by determining the values l(J) for each J ⊆ {1, . . . , n}. In general, we have to look at the 2n different subcodes of C to find the l(J), but for the special case of MDS codes we can find the weight distributions much faster. Proposition 4.5.15 Let C be a linear [n, k] MDS code, and let J ⊆ {1, . . . , n}. Then we have 0 for t > k l(J) = k − t for t ≤ k so for a given t the value of l(J) is independent of the choice of J.
116
CHAPTER 4. WEIGHT ENUMERATOR
Proof. We know that the dual of an MDS code is also MDS, so d⊥ = k + 1. Now use d = n − k + 1 in Lemma 7.4.39. Now that we know all the l(J) for an MDS code, it is easy to find the weight distribution. Theorem 4.5.16 Let C be an MDS code with parameters [n, k]. Then the generalized weight distribution is given by A(r) w =
w−d n X w w−d+1−j (−1)j . w j=0 j r q
The coefficients of the extended weight enumerator are given by w−d n X j w Aw (T ) = (−1) (T w−d+1−j − 1). w j=0 j Proof. We will give the construction for the generalized weight enumerator here: the case of the extended weight enumerator goes similar and is left as an (r) exercise. We know from Proposition h4.5.15 i that for an MDS code, Bt depends (r) (r) only on the size of J, so Bt = nt k−t . Using this in the formula for Aw r q
and substituting j = t − n + w, we have A(r) w
n−d Xr
t (r) Bt n − w t=n−w n−d Xr t n k−t t−n+w = (−1) n−w t r q t=n−w w−d Xr n w k+w−n−j (−1)j = w j r q j=0 =
=
(−1)n+w+t
w−d n Xr w w−d+1−j (−1)j . j r w j=0 q
In the second step, we are using the binomial equivalence n t n n − (n − w) n w = = . t n−w n−w t − (n − w) w n−t So, for all MDScodes with given parameters [n, k] the extended and generalized weight distributions are the same. But not all such codes are equivalent. We can conclude from this, that the generalized and extended weight enumerators are not enough to distinguish between codes with the same parameters. We illustrate the nonequivalence of two MDS codes by an example.
4.5. GENERALIZED WEIGHT ENUMERATOR
117
Example 4.5.17 Let C be a linear [n, 3] MDS code over Fq . It is possible to write the generator matrix G of C in the following form: 1 1 ... 1 x1 x2 . . . x n . y1 y2 . . . yn Because C is MDS we have d = n − 2. We now view the n columns of G as points in the projective plane P2 (Fq ), say P1 , . . . , Pn . The MDS property that every k columns of G are independent is now equivalent with saying that no three points are on a line. To see that these n points do not always determine an equivalent code, consider the following construction. Through the n points there are n2 = N lines, the set N . These lines determine (the generator maˆ The minimum distance of the code Cˆ is equal to the trix of) a [N, 3] code C. total number of lines minus the maximum number of lines from N through an arbitrarily point P ∈ P2 (Fq ) by Proposition 4.4.8. If P ∈ / {P1 , . . . , Pn } then the maximum number of lines from N through P is at most 12 n, since no three points of N lie on a line. If P = Pi for some i ∈ {1, . . . , n} then P lies on exactly n − 1 lines of N , namely the lines Pi Pj for j 6= i. Therefore the minimum distance of Cˆ is d = N − n + 1. We now have constructed a [N, 3, N − n + 1] code Cˆ from the original code C. Notice that two codes Cˆ1 and Cˆ2 are generalized equivalent if C1 and C2 are generalized equivalent. The generalized and extended weight enumerators of an MDS code of length n and dimension k are completely determined by the pair ˆ (n, k), but this is not generally true for the weight enumerator of C. Take for example n = 6 and q = 9, so Cˆ is a [15, 3, 10] code. Look at the codes C1 and C2 generated by the following matrices respectively, where α ∈ F9 is a primitive element: 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 α5 α6 0 1 0 α7 α4 α6 3 3 0 0 1 α α α 0 0 1 α5 α 1 Being both MDS codes, the weight distribution is (1, 0, 0, 120, 240, 368). If we now apply the above construction, we get Cˆ1 and Cˆ2 generated by 1 0 0 1 1 α4 α6 α3 α7 α 1 α2 1 α7 1 0 1 0 α7 1 0 0 α4 1 1 0 α6 α 1 α3 0 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 0 α7 α2 α3 α 0 α7 α7 α4 α7 α 0 0 0 1 0 1 0 α3 0 α6 α6 0 α7 α α6 α3 α 5 0 0 1 α α5 α6 α3 α7 α4 α3 α5 α2 α4 α α5 The weight distribution of Cˆ1 and Cˆ2 are, respectively, (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 16, 312, 288, 64) and (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 32, 264, 336, 48). So the latter two codes are not generalized equivalent, and therefore not all [6, 3, 4] MDS codes over F9 are generalized equivalent.
118
CHAPTER 4. WEIGHT ENUMERATOR
Another example was given in [110, 29] showing that two [6, 3, 4] MDS codes could have distinct covering radii.
4.5.4
Connections
There is a connection between the extended weight enumerator and the generalized weight enumerators. We first proof the next proposition. Proposition 4.5.18 Let C be a linear [n, k] code over Fq , and let C m be the linear subspace consisting of the m × n matrices over Fq whose rows are in C. Then there is an isomorphism of Fq vector spaces between C ⊗ Fqm and C m . Proof. Let α be a primitive mth root of unity in Fqm . Then we can write an element of Fqm in an unique way on the basis (1, α, α2 , . . . , αm−1 ) with coefficients in Fq . If we do this for all the coordinates of a word in C ⊗ Fqm , we get an m × n matrix over Fq . The rows of this matrix are words of C, because C and C ⊗ Fqm have the same generator matrix. This map is clearly injective. There are (q m )k = q km words in C ⊗ Fqm , and the number of elements of C m is (q k )m = q km , so our map is a bijection. It is given by ! m−1 m−1 m−1 X X X i i i ci1 α , ci2 α , . . . , cin α 7→ i=0
i=0
i=0
c01 c11 .. .
c02 c12 .. .
c03 c13 .. .
... ... .. .
c0n c1n .. .
c(m−1)1
c(m−1)2
c(m−1)3
...
c(m−1)n
.
We see that the map is Fq linear, so it gives an isomorphism C ⊗ Fqm → C m . Note that this isomorphism depends on the choice of a primitive element α. We also need the next subresult. Lemma 4.5.19 Let c ∈ C ⊗ Fqm and M ∈ C m the corresponding m × n matrix under a given isomorphism. Let D ⊆ C be the subcode generated by the rows of M . Then wt(c) = wt(D). Proof. If the jth coordinate cj of c is zero, then the jth column of M consists of only zero’s, because the representation of cj on the basis (1, α, α2 , . . . , αm−1 ) is unique. On the other hand, if the jth column of M consists of all zeros, then cj is also zero. Therefore wt(c) = wt(D). Proposition 4.5.20 Let C be a linear code over Fq . Then the weight numerator of an extension code and the generalized weight enumerators are connected via m X Aw (q m ) = [m, r]q A(r) w . r=0
Proof. We count the number of words in C ⊗ Fqm of weight w in two ways, using the bijection of Proposition 4.5.18. The first way is just by substituting T = q m in Aw (T ): this gives the left side of the equation. For the second way,
4.5. GENERALIZED WEIGHT ENUMERATOR
119
note that every M ∈ C m generates a subcode of C whose weight is equal to the weight of the corresponding word in C ⊗ Fqm . Fix this weight w and a (r) dimension r: there are Aw subcodes of C of dimension r and weight w. Every such subcode is generated by an r × n matrix whose rows are words of C. Left multiplication by an m × r matrix of rank r gives an element of C m which generates the same subcode of C, and all such elements of C m are obtained this way. The number of m × r matrices of rank r is [m, r]q , so summation over all dimensions r gives k X Aw (q m ) = [m, r]q A(r) w . r=0 (r)
We can let the summation run to m, because Aw = 0 for r > k and [m, r]q = 0 for r > m. This proves the given formula. In general, we have the following theorem. Theorem 4.5.21 Let C be a linear code over Fq . Then the extended weight numerator is determined by the generalized weight enumerators: k r−1 X Y (T − q j ) W (r) (X, Y ). WC (X, Y, T ) = C r=0
j=0
(r)
Proof. If we know Aw for all r, we can determine Aw (q m ) for every m. If we have k + 1 values of m for which Aw (q m ) is known, we can use Lagrange interpolation to find Aw (T ), for this is a polynomial in T of degree at most k. In fact, we have k r−1 X Y (T − q j ) A(r) Aw (T ) = w . r=0
j=0
This formula has the right degree and is correct for T = q m for all integer values m ≥ 0, so we know it must be the correct polynomial. Therefore the theorem follows. The converse of the theorem is also true: we can write the generalized weight enumerator in terms of the extended weight enumerator. For this end the following lemma is needed. Lemma 4.5.22 r−1 Y
(Z − q j ) =
j=0
r X r j=0
j
(−1)r−j q (
r−j 2
)Z j .
q
Proof. This identity can be proven by induction and is left as an exercise. Theorem 4.5.23 Let C be a linear code over Fq . Then the rth generalized weight enumerator is determined by the extended weight enumerator: r r−j 1 X r (r) (−1)r−j q ( 2 ) WC (X, Y, q j ). WC (X, Y ) = hriq j=0 j q
120
CHAPTER 4. WEIGHT ENUMERATOR
Proof. We consider the generalized weight enumerator in terms of Theorem 4.5.13. Using Remark ?? and rewriting gives the following: (r) WC (X, Y
)
=
n X
(r)
Bt (X − Y )t Y n−t
t=0
=
=
=
=
=
=
n X X l(J) (X − Y )t Y n−t r q t=0 J=t n X r−1 X Y q l(J) − q j (X − Y )t Y n−t r − qj q t=0 J=t j=0 n X r−1 X Y 1 (q l(J) − q j ) (X − Y )t Y n−t Qr−1 r v) (q − q v=0 t=0 J=t j=0 n X X r X r−j r 1 (−1)r−j q ( 2 ) q j·l(J) (X − Y )t Y n−t hriq t=0 j q J=t j=0 n X r X 1 X r r−j (r−j ) 2 (q j )l(J) (X − Y )t Y n−t (−1) q hriq j=0 j q t=0 J=t r r−j 1 X r (−1)r−j q ( 2 ) WC (X, Y, q j ) hriq j=0 j q
In the fourth step Lemma 4.5.22 is used.
4.5.5
Exercises
4.5.1 Give a proof of Proposition 4.5.16. 4.5.2 Compute the generalized weight enumerator of the binary Golay code. 4.5.3 Compute the generalized weight enumerator of the ternary Golay code. 4.5.4 Give a proof of Lemma 4.5.22.
4.6
Notes
Puncturing and shortening at arbitrary sets of positions and the duality theorem is from Simonis [?]. Golay code, Turyn [?] construction, Pless handbook [?] . MacWillimas ***–puncturing gives the binary [23,12,7] Golay code, which is cyclic. –automorphism group of (extended) Golay code. – (ext4ended) ternary Golay code.
4.6. NOTES
121
– designs and Golay codes. – lattices and Golay codes.*** ***repeated decoding of product code (HoeholdtJustesen). ***Singleton defect s(C) = n + 1 − k − d s(C) ≥ 0 and equality holds if and only if C is MDS. s(C) = 0 if and only if s(C ⊥ ) = 0. Example where s(C) = 1 and s(C ⊥ ) > 1. Almost MDS and near MDS. Genus g = max{s(C), s(C ⊥ ) } in 4.1. If k ≥ 2, then d ≤ q(s + 1). If k ≥ 3 and d = q(s + 1), then s + 1 ≤ q. FaldumWillems, de Boer, DodunekovLangev, relation with Griesmer bound*** ***Incidence structures and geometric codes
122
CHAPTER 4. WEIGHT ENUMERATOR
Chapter 5
Codes and related structures Relinde Jurrius and Ruud Pellikaan ***In this chapter seemingly unrelated topics are discussed.***
5.1
Graphs and codes
5.1.1
Colorings of a graph
Graph theory is regarded to start with the paper of Euler [57] with his solution of the problem of the K¨ onigbergs bridges. For an introduction to the theory of graphs we refer to [14, 136]. Definition 5.1.1 A graph Γ is a pair (V, E) where V is a nonempty set and E is a set disjoint from V . The elements of V are called vertices, and members of E are called edges. Edges are incident to one or two vertices, which are called the ends of the edge. If an edge is incident with exactly one vertex, then it is called a loop. If u and v are vertices that are incident with an edge, then they are called neighbors or adjacent. Two edges are called parallel if they are incident with the same vertices. The graph is called simple if it has no loops and no parallel edges.
•
• •
•
• •
Figure 5.1: A planar graph
123
124
CHAPTER 5. CODES AND RELATED STRUCTURES
Definition 5.1.2 A graph is called planar if the there is an injective map f : V → R2 from the set of vertices V to the real plane such that for every edge e with ends u and v there is a simple curve in the plane connecting the ends of the edge such that mutually distinct simple curves do not intersect except at the endpoints. More formally: for every edge e with ends u and v there is an injective continuous map ge : [0, 1] → R2 from the unit interval to the plane such that {f (u), f (v)} = {ge (0), ge (1)}, and ge (0, 1) ∩ ge0 (0, 1) = ∅ for all edges e, e0 with e 6= e0 . Example 5.1.3 Consider the next riddle: Three newbuild houses have to be connected to the three nearest terminals for gas, water and electricity. For security reasons, the connections are not allowed to cross. How can this be done? The answer is “not”, because the corresponding graph (see Figure 5.3) is not planar. This riddle is very suitable to occupy kids who like puzzles, but make sure to have an easy explainable proof of the improbability. We leave it to the reader to find one. Definition 5.1.4 Let Γ1 = (V1 , E1 ) and Γ2 = (V2 , E2 ) be graphs. A map ϕ : V1 → V2 is called a morphism of graphs if ϕ(v) and ϕ(w) are connected in Γ2 for all v, w ∈ V1 that are connected in Γ1 . The map is called an isomorphism of graphs if it is a morphism of graphs and there exists a map ψ : V2 → V1 such that it is a morphism of graphs and it is the inverse of ϕ. The graphs are called isomorphic if there is an isomorphism of graphs between them. Definition 5.1.5 An edge of a graph is called an isthmus if the number of components of the graph increases by deleting the edge. It the graph is connected, then deleting an isthmus gives a graph that is no longer connected. Therefore an isthmus is also called a bridge. An edge is an isthmus if and only if it is in no cycle. Therefore an edge that is an isthmus is also called an acyclic edge. Remark 5.1.6 By deleting loops and parallel edges from a graph Γ one gets a simple graph. There is a choice in the process of deleting parallel edges, but the resulting graphs are all isomorphic. We call this simple graph the simplification ¯ of the graph and it is denoted by Γ. Definition 5.1.7 Let Γ = (V, E) be a graph. Let K be a finite set and k = K. The elements of K are called colors. A kcoloring of Γ is a map γ : V → K such that γ(u) 6= γ(v) for all distinct adjacent vertices u and v in V . So vertex u has color γ(u) and all other adjacent vertices have a color distinct from γ(u). Let PΓ (k) be the number of kcolorings of Γ. Then PΓ is called the chromatic polynomial of Γ. Remark 5.1.8 If the graph Γ has no edges, then PΓ (k) = k v where V  = v and K = k, since it is equal to the number of all maps from V to K. In particular there is no map from V to an empty set in case V is nonempty. So the number of 0colorings is zero for every graph. The number of colorings of graphs was studied by Birkhoff [16], Whitney[130, 129] and Tutte[121, 124, 125, 126, 127]. Much research on the chromatic polynomial was motivated by the fourcolor problem of planar graphs.
5.1. GRAPHS AND CODES
125
Example 5.1.9 Let Kn be the complete graph on n vertices in which every pair of two distinct vertices is connected by exactly one edge. Then there is no k coloring if k < n. Now let k ≥ n. Take an enumeration of the vertices. Then there are k possible choices of a color of the first vertex and k − 1 choices for the second vertex, since the first and second vertex are connected. Now suppose •
•
•
•
•
Figure 5.2: Complete graph K5 by induction that we have a coloring of the first i vertices, then there are k − i possibilities to color the next vertex, since the (i + 1)th vertex is connected to the first i vertices. Hence PKn (k) = k(k − 1) · · · (k − n + 1) So PKn (k) is a polynomial in k of degree n. Proposition 5.1.10 Let Γ = (V, E) be a graph. Then PΓ (k) is a polynomial in k. Proof. See[16]. Let γ : V → K be a kcoloring of Γ with exactly i colors. Let σ be a permutation of K. Then the composition of maps σ ◦ γ is also kcoloring of Γ with exactly i colors. Two such colorings are called equivalent. Then k(k − 1) · · · (k − i + 1) is the number of colorings in the equivalence class of a given kcoloring of Γ with exactly i colors. Let mi be the number of equivalence classes of colorings with exactly i colors of the set K. Let v = V . Then PΓ (k) is equal to m1 k +m2 k(k −1)+. . .+mi k(k −1) · · · (k −i+1)+. . .+mv k(k −1) · · · (k −v +1). Therefore PΓ (k) is a polynomial in k.
Definition 5.1.11 A graph Γ = (V, E) is called bipartite if V is the disjoint union of two nonempty sets M and N such that the ends of an edge are in M and in N . Hence no two points in M are adjacent and no two points in N are adjacent. Let m and n be integers such that 1 ≤ m ≤ n. The complete bipartite graph Km,n is the graph on a set of vertices V that is the disjoint union of two sets M and N with M  = m and N  = n, and such that every vertex in M is connected with every vertex in N by a unique edge. Another tool to show that PΓ (k) is a polynomial this by deletioncontraction of graphs, a process which is similar to the puncturing and shortening of codes from Section ??.
126
CHAPTER 5. CODES AND RELATED STRUCTURES •
•
•
•
•
•
Figure 5.3: Complete bipartite graph K3,3 Definition 5.1.12 Let Γ = (V, E) be a graph. Let e be an edge that is incident to the vertices u and v. Then the deletion Γ \ e is the graph with vertices V and edges E \ {e}. The contraction Γ/e is the graph obtained by identifying u and v and deleting e. Formally this is defined as follows. Let u ˜ = v˜ = {u, v}, and w ˜ = {w} if w 6= u and w 6= v. Let V˜ = {w ˜ : w ∈ V }. Then Γ/e is the graph (V˜ , E \ {e}), where an edge f 6= e is incident with w ˜ in Γ/e if f is incident with w in Γ. Remark 5.1.13 Notice that the number of kcolorings of Γ does not change by deleting loops and a parallel edge. Hence the chromatic polynomial Γ and ¯ are the same. its simplification Γ The following proposition is due to Foster. See the concluding note in [129]. Proposition 5.1.14 Let Γ = (V, E) be a simple graph. Let e be an edge of Γ. Then the following deletioncontraction formula holds: PΓ (k) = PΓ\e (k) − PΓ/e (k) for all positive integers k. Proof. Let u and v be the vertices of e. Then u 6= v, since the graph is simple. Let γ be a kcoloring of Γ \ e. Then γ is also a coloring of Γ if and only if γ(u) 6= γ(v). If γ(u) = γ(v), then consider the induced map γ˜ on V˜ defined by γ˜ (˜ u) = γ(u) and γ˜ (w) ˜ = γ(w) if w 6= u and w 6= v. The map γ˜ gives a kcoloring of Γ/e. Conversely, every kcoloring of Γ/e gives a kcoloring γ of Γ \ e such that γ(v) = γ(w). Therefore PΓ\e (k) = PΓ (k) + PΓ/e (k). This follows also from a more general deletioncontraction formula for matroids that will be treated in Section 5.2.6 and Proposition ??.
5.1.2
Codes on graphs
Definition 5.1.15 Let Γ = (V, E) be a graph. Suppose that V 0 ⊆ V and E 0 ⊆ E and all the endpoints of e0 in E 0 are in V 0 . Then Γ0 = (V 0 , E 0 ) is a graph and it is called a subgraph of Γ. Definition 5.1.16 Two vertices u to v are connected by a path from u to v if there is a ttuple of mutually distinct vertices (v1 , . . . , vt ) with u = v1 and v = vt , and a (t − 1)tuple of mutually distinct edges (e1 , . . . , et−1 ) such that ei is incident with vi and vi+1 for all 1 ≤ i < t. If moreover et is an edge that is incident with u and v and distinct from ei for all i < t, then (e1 , . . . , et−1 , et ) is called a cycle. The length of the smallest cycle is called the girth of the graph and is denoted by γ(Γ).
5.1. GRAPHS AND CODES
127
Definition 5.1.17 The graph is called connected if every two vertices are connected by a path. A maximal connected subgraph of Γ is called a connected component of Γ. The vertex set V of Γ is a disjoint union of subsets Vi and set of edges E is a disjoint union of subsets Ei such that Γi = (Vi , Ei ) is a connected component of Γ. The number of connected components of Γ is denoted by c(Γ). Definition 5.1.18 Let Γ = (V, E) be a finite graph. Suppose that V consists of m elements enumerated by v1 , . . . , vm . Suppose that E consists of n elements enumerated by e1 , . . . , en . The incidence matrix I(Γ) is a m × n matrix with entries aij defined by 1 if ej is incident with vi and vk for some i < k, −1 if ej is incident with vi and vk for some i > k, aij = 0 otherwise. Suppose moreover that Γ is simple. Then AΓ is the arrangement (H1 , . . . , Hn ) of hyperplanes where Hj = Xi − Xk if ej is incident with vi and vk with i < k. An arrangement A is called graphic if A is isomorphic with AΓ for some graph Γ. ***characteristic polynomial det(A − λI), Matrix tree theorem
Definition 5.1.19 The graph code of Γ over Fq is the Fq linear code that is generated by the rows of the incidence matrix I(Γ). The cycle code CΓ of Γ is the dual of the graph code of Γ. Remark 5.1.20 Let Γ be a finite graph without loops. Then the arrangement AΓ is isomorphic with ACΓ . Proposition 5.1.21 Let Γ be a finite graph. Then CΓ is a code with parameters [n, k, d], where n = E, k = E − V  + c(Γ) and d = γ(Γ). Proof. See [14, Prop. 4.3]
***Sparse graph codes, Gallager or Lowdensity parity check codes and Tanner graph codes play an important role in the research of coding theory at this moment. See [77, 99].
5.1.3
Exercises
5.1.1 Determine the chromatic polynomial of the bipartite graph K3,2 . 5.1.2 Determine the parameters of the cycle code of the complete graph Km . Show that the code CK4 over F2 is equivalent to the punctured binary [7, 3, 4] simplex code. 5.1.3 Determine the parameters of the cycle code of the bipartite graph Km,n . Let C(m) be the dual of the nfold repetition code. Show that CKm,n is equivalent to the product code C(m) ⊗ C(n).
128
5.2
CHAPTER 5. CODES AND RELATED STRUCTURES
Matroids and codes
Matroids were introduced by Whitney [130, 131] in axiomatizing and generalizing the concepts of independence in linear algebra and circuit in graph theory. In the theory of arrangements one uses the notion of a geometric lattice. In graph and coding theory one refers more to matroids.
5.2.1
Matroids
Definition 5.2.1 A matroid M is a pair (E, I) consisting of a finite set E and a collection I of subsets of E such that the following three conditions hold. (I.1) ∅ ∈ I. (I.2) If J ⊆ I and I ∈ I, then J ∈ I. (I.3) If I, J ∈ I and I < J, then there exists a j ∈ (J \I) such that I∪{j} ∈ I. A subset I of E is called independent if I ∈ I , otherwise it is called dependent. Condition (I.2) is called the independence augmentation axiom. Remark 5.2.2 If J is a subset of E, then J has a maximal independent subset, that is there exists an I ∈ I such that I ⊆ J and I is maximal with respect to this property and the inclusion. If I1 and I2 are maximal independent subsets of J, then I1  = I2  by condition (I.3). The rank or dimension of a subset J of E is the number of elements of a maximal independent subset of J. An independent set of rank r(M ) is called a basis of M . The collection of all bases of M is denoted by B. Example 5.2.3 Let n and k be nonnegative integers such that k ≤ n. Let Un,k be a set consisting of n elements and In,k = {I ⊆ Un,k I ≤ k}. Then (Un,k , In,k ) is a matroid and called the uniform matroid of rank k on n elements. A subset B of Un,k is a basis if and only if B = k. The matroid Un,n has no dependent sets and is called free. Definition 5.2.4 Let (E, I) be a matroid. An element x in E is called a loop if {x} is a dependent set. Let x and y in E be two distinct elements that are not loops. Then x and y are called parallel if r({x, y}) = 1. The matroid is called simple if it has no loops and no parallel elements. Now Un,r is the only simple matroid of rank r if r ≤ 2. Remark 5.2.5 Let G be a k×n matrix with entries in a field F. Let E be the set [n] indexing the columns of G and IG be the collection of all subsets I of E such that the submatrix GI consisting of the columns of G at the positions of I are independent. Then MG = (E, IG ) is a matroid. Suppose that F is a finite field and G1 and G2 are generator matrices of a code C, then (E, IG1 ) = (E, IG2 ). So the matroid MC = (E, IC ) of a code C is well defined by (E, IG ) for some generator matrix G of C. If C is degenerate, then there is a position i such that ci = 0 for every codeword c ∈ C and all such positions correspond onetoone with loops of MC . Let C be nondegenerate. Then MC has no loops, and the positions i and j with i 6= j are parallel in MC if and only if the ith column of G is a scalar multiple of the jth column. The code C is projective if and only if the arrangement AG is simple if and only if the matroid MC is simple. A [n, k] code C is MDS if and only if the matroid MC is the uniform matroid Un,k .
5.2. MATROIDS AND CODES
129
Definition 5.2.6 Let M = (E, I) be a matroid. Let B be the collection of all bases of M . Define B ⊥ = (E \ B) for B ∈ B, and B ⊥ = {B ⊥ : B ∈ B}. Define I ⊥ = {I ⊆ E : I ⊆ B for some B ∈ B⊥ }. Then (E, I ⊥ ) is called the dual matroid of M and is denoted by M ⊥ . Remark 5.2.7 The dual matroid is indeed a matroid. Let C be a code over a finite field. Then (MC )⊥ is isomorphic with MC ⊥ as matroids. Let e be a loop of the matroid M . Then e is not a member of any basis of M . Hence e is in every basis of M ⊥ . An element of M is called an isthmus if it is an element of every basis of M . Hence e is an isthmus of M if and only if e is a loop of M ⊥ . Proposition 5.2.8 Let (E, I) be a matroid with rank function r. Then the dual matroid has rank function r⊥ given by r⊥ (J) = J − r(E) + r(E \ J). Proof. The proof is based on the observation that r(J) = maxB∈B B ∩ J and B \ J = B ∩ (E \ J). r⊥ (J)
=
max B ∩ J
B∈B⊥
=
max (E \ B) ∩ J
=
max J \ B
B∈B
B∈B
= J − min J ∩ B B∈B
= J − (B − max B \ J) B∈B
= J − r(E) + max B ∩ (E \ J) B∈B
= J − r(E) + r(E \ J).
5.2.2
Realizable matroids
Definition 5.2.9 Let M1 = (E1 , I1 ) and M2 = (E2 , I2 ) be matroids. A map ϕ : E1 → E2 is called a morphism of matroids if ϕ(I) ∈ I2 for all I ∈ I1 . The map is called an isomorphism of matroids if it is a morphism of matroids and there exists a map ψ : E2 → E1 such that it is a morphism of matroids and it is the inverse of ϕ. The matroids are called isomorphic if there is an isomorphism of matroids between them. Remark 5.2.10 A matroid M is called realizable or representable over the field F if there exists a matrix G with entries in F such that M is isomorphic with MG . ***six points in a plane is realizable over every field?, *** The Fano plane is realizable over F if and only if F has characteristic two. ***Pappos, Desargues configuration.
130
CHAPTER 5. CODES AND RELATED STRUCTURES
For more on representable matroids we refer to Tutte [123] and Whittle [132, 133]. Let gn be the number of simple matroids on n points. The values of gn are determined for n ≤ 8 by [18] and are given in the following table: n gn
1 1
2 1
3 2
4 4
5 9
6 7 26 101
8 950 n
Extended tables can be found in [51]. Clearly gn ≤ 22 . Asymptotically the number gn is given in [73] and is as follows: gn ≤ n − log2 n + O(log2 log2 n), gn ≥ n −
3 2
log2 n + O(log2 log2 n).
A crude upper bound on the number of k × n matrices with k ≤ n and entries in 2 Fq is given by (n + 1)q n . Hence the vast majority of all matroids on n elements is not representable over a given finite field for n → ∞.
5.2.3
Graphs and matroids
Definition 5.2.11 Let M = (E, I) be a matroid. A subset C of E is called a circuit if it is dependent and all its proper subsets are independent. A circuit of the dual matroid of M is called a cocircuit of M . Proposition 5.2.12 Let C be the collection of circuits of a matroid. Then (C.0) ∅ 6∈ C. (C.1) If C1 , C2 ∈ C and C1 ⊆ C2 , then C1 = C2 . (C.2) If C1 , C2 ∈ C and C1 6= C2 and x ∈ C1 ∩ C2 , then there exists a C3 ∈ C such that C3 ⊆ (C1 ∪ C2 ) \ {x}. Proof. See [?, Lemma 1.1.3].
Condition (C.2) is called the circuit elimination axiom. The converse of Proposition 5.2.12 holds. Proposition 5.2.13 Let C be a collection of subsets of a finite set E that satisfies the conditions (C.1), (C.2) and (C.3). Let I be the collection of all subsets of E that contain no member of C. Then (E, I) is a matroid with C as its collection of circuits. Proof. See [?, Theorem 1.1.4].
Proposition 5.2.14 Let Γ = (V, E) be a finite graph. Let C the collection of all subsets {e1 , . . . , et } such that (e1 , . . . , et ) is a cycle in Γ. Then C is the collection of circuits of a matroid MΓ on E. This matroid is called the cycle matroid of Γ. Proof. See [?, Proposition 1.1.7].
5.2. MATROIDS AND CODES
131
Remark 5.2.15 Loops in Γ correspond onetoone to loops in MΓ . Two edges that are no loops, are parallel in Γ if and only if they are parallel in MΓ . So Γ is simple if and only if MΓ is simple. Let e in E. Then e is an isthmus in the graph Γ if and only is e is an isthmus in the matroid MΓ . Remark 5.2.16 A matroid M is called graphic if M is isomorphic with MΓ for some graph Γ, and it is called cographic if M ⊥ is graphic. If Γ is a planar graph, then the matroid MΓ is graphic by definition but it is also cographic. Let Γ be a finite graph with incidence matrix I(Γ). This is a generator matrix for CΓ over a field F. Suppose that F is the binary field. Look at all the columns indexed by the edges of a cycle of Γ. Since every vertex in a cycle is incident with exactly two edges, the sum of these columns is zero and therefore they are dependent. Removing a column gives an independent set of vectors. Hence the cycles in the matroid MCΓ coincide with the cycles in Γ. Therefore MΓ is isomorphic with MCΓ . One can generalize this argument for any field. Hence graphic matroids are representable over any field. The matroids of the binary Hamming [7, 4, 3] code is not graphic and not cographic. Clearly the matroids MK5 and MK3,3 are graphic by definition, but they are not cographic. Tutte [122] found a classification for graphic matroids.
5.2.4
Tutte and Whitney polynomial of a matroid
See [7, 8, 25, 26, 28, 34, 59, 68] for references of this section. Definition 5.2.17 Let M = (E, I) be a matroid. Then the Whitney rank generating function RM (X, Y ) is defined by X RM (X, Y ) = X r(E)−r(J) Y J−r(J) J⊆E
and the TutteWhitney or Tutte polynomial by X tM (X, Y ) = (X − 1)r(E)−r(J) (Y − 1)J−r(J) . J⊆E
In other words, tM (X, Y ) = RM (X − 1, Y − 1). Remark 5.2.18 Whitney [129] defined the coefficients mij of the polynomial RM (X, Y ) such that r(M ) M 
RM (X, Y ) =
XX
mij X i Y j ,
i=0 j=0
but he did not define the polynomial RM (X, Y ) as such. It is clear that these coefficients are nonnegative, since they count the number of elements of certain sets. The coefficients of the Tutte polynomial are also nonnegative, but this is not a trivial fact, it follows from the counting of certain internal and external bases of a matroid. See [56].
132
5.2.5
CHAPTER 5. CODES AND RELATED STRUCTURES
Weight enumerator and Tutte polynomial
As we have seen, we can interpret a linear [n, k] code C over Fq as a matroid via the columns of a generator matrix G. Proposition 5.2.19 Let C be a [n, k] code over Fq . Then the Tutte polynomial tC associated with the matroid MC of the code C is tC (X, Y ) =
n X X
(X − 1)l(J) (Y − 1)l(J)−(k−t) .
t=0 J=t
Proof. This follows from l(J) = k − r(J) by Lemma 4.4.12 and r(M ) = k. This formula and Proposition 4.4.41 suggest the next connection between the weight enumerator and the Tutte polynomial. Greene [59] was the first to notice this connection. Theorem 5.2.20 Let C be a [n, k] code over Fq with generator matrix G. Then the following holds for the Tutte polynomial and the extended weight enumerator: X + (T − 1)Y X k n−k , . WC (X, Y, T ) = (X − Y ) Y tC X −Y Y Proof. By using Proposition 5.2.19 about the Tutte polynomial, rewriting, and Proposition 4.4.41 we get X + (T − 1)Y X (X − Y )k Y n−k tC , X −Y Y l(J)−(k−t) n l(J) XX TY X −Y = (X − Y )k Y n−k X −Y Y t=0 J=t
=
(X − Y )k Y n−k
n X
X
T l(J) Y k−t (X − Y )−(k−t)
t=0 J=t
=
n X
X
T l(J) (X − Y )t Y n−t
t=0 J=t
= WC (X, Y, T ). We use the extended weight enumerator here, because extending a code does not change the generator matrix and therefore not the matroid G. The converse of this theorem is also true: the Tutte polynomial is completely defined by the extended weight enumerator. Theorem 5.2.21 Let C be a [n, k] code over Fq . Then the following holds for the extended weight enumerator and the Tutte polynomial: tC (X, Y ) = Y n (Y − 1)−k WC (1, Y −1 , (X − 1)(Y − 1)) .
5.2. MATROIDS AND CODES
133
Proof. The proof of this theorem goes analogous to the proof of the previous theorem. Y n (Y − 1)−k WC (1, Y −1 , (X − 1)(Y − 1)) n X X l(J) = Y n (Y − 1)−k ((X − 1)(Y − 1)) (1 − Y −1 )t Y −(n−t) t=0 J=t
=
n X X
(X − 1)l(J) (Y − 1)l(J) Y −t (Y − 1)t Y −(n−k) Y n (Y − 1)−k
t=0 J=t
=
n X X
(X − 1)l(J) (Y − 1)l(J)−(k−t)
t=0 J=t
= tC (X, Y ). We see that the Tutte polynomial depends on two variables, while the extended weight enumerator depends on three variables. This is no problem, because the weight enumerator is given in its homogeneous form here: we can view the extended weight enumerator as a polynomial in two variables via WC (Z, T ) = WC (1, Z, T ). Greene [59] already showed that the Tutte polynomial determines the weight enumerator, but not the other way round. By using the extended weight enumerator, we get a twoway equivalence and the proof reduces to rewriting. We can also give expressions for the generalized weight enumerator in terms of the Tutte polynomial, and the other way round. The first formula was found by Britz [28] and independently by Jurrius [68]. Theorem 5.2.22 For the generalized weight enumerator of a [n, k] code Cand (r) the associated Tutte polynomial we have that WC (X, Y ) is equal to r r X + (q j − 1)Y X 1 X r r−j (j ) k n−k (−1) q (X − Y ) Y tC , . hriq j=0 j q X −Y Y And, conversely, tC (X, Y ) = Y n (Y − 1)−k
k X
r−1 Y
(r)
((X − 1)(Y − 1) − q j ) WC (1, Y −1 ) .
r=0
j=0
Proof. For the first formula, use Theorems 4.5.23 and 5.2.20. Use Theorems 4.5.21 and 5.2.21 for the second formula.
5.2.6
Deletion and contraction of matroids
Definition 5.2.23 Let M = (E, I) be a matroid of rank k. Let e be an element of E. Then the deletion M \e is the matroid on the set E \{e} with independent sets of the form I \ {e} where I is independent in M . The contraction M/e is the matroid on the set E \ {e} with independent sets of the form I \ {e} where I is independent in M and e ∈ I.
134
CHAPTER 5. CODES AND RELATED STRUCTURES
Remark 5.2.24 Let M be a graphic matroid. So M = MΓ for some finite graph Γ. Let e be an edge of Γ, then M \ e = MΓ\e and M/e = MΓ/e . Remark 5.2.25 Let C be a code with reduced generator matrix G at position e. So a = (1, 0, . . . , 0)T is the column of G at position e. Then M \ e = MG\a and M/e = MG/a . A puncturingshortening formula for the extended weight enumerator is given in Proposition 4.4.44. By virtue of the fact that the extended weight enumerator and the Tutte polynomial of a code determine each other by the Theorems 5.2.20 and 5.2.21, one expects that an analogous generalization for the Tutte polynomial of matroids holds. Proposition 5.2.26 Let M = (E, I) be a matroid. Let e ∈ E that is not a loop and not an isthmus. Then the following deletioncontraction formula holds: tM (X, Y ) = tM \e (X, Y ) + tM/e (X, Y ).
Proof. See [119, 120, 125, 31].
5.2.7
McWilliams type property for duality
For both codes and matroids we defined the dual structure. These objects obviously completely define there dual. But how about the various polynomials associated to a code and a matroid? We know from Example 4.5.17 that the weight enumerator is a less strong invariant for a code then the code itself: this means there are nonequivalent codes with the same weight enumerator. So it is a priori not clear that the weight enumerator of a code completely defines the weight enumerator of its dual code. We already saw that there is in fact such a relation, namely the MacWilliams identity in Theorem 4.1.22. We will give a proof of this relation by considering the more general question for the extended weight enumerator. We will prove the MacWilliams identities using the Tutte polynomial. We do this because of the following simple and very useful relation between the Tutte polynomial of a matroid and its dual. Theorem 5.2.27 Let tM (X, Y ) be the Tutte polynomial of a matroid M , and let M ⊥ be the dual matroid. Then tM (X, Y ) = tM ⊥ (Y, X). Proof. Let M be a matroid on the set E. Then M ⊥ is a matroid on the same set. In Proposition 5.2.8 we proved r⊥ (J) = J − r(E) + r(E \ J). In particular, we have r⊥ (E) + r(E) = E. Substituting this relation into the definition of the Tutte polynomial for the dual code, gives X ⊥ ⊥ ⊥ tM ⊥ (X, Y ) = (X − 1)r (E)−r (J) (Y − 1)J−r (J) J⊆E
=
X
(X − 1)r
⊥
(E)−J−r(E\J)+r(E)
(Y − 1)r(E)−r(E\J)
J⊆E
=
X
(X − 1)E\J−r(E\J) (Y − 1)r(E)−r(E\J)
J⊆E
= tM (Y, X)
5.2. MATROIDS AND CODES
135
In the last step, we use that the summation over all J ⊆ E is the same as a summation over all E \ J ⊆ E. This proves the theorem.
If we consider a code as a matroid, then the dual matroid is the dual code. Therefore we can use the above theorem to prove the MacWilliams relations. Greene[59] was the first to use this idea, see also Brylawsky and Oxley[33].
Theorem 5.2.28 (MacWilliams) Let C be a code and let C ⊥ be its dual. Then the extended weight enumerator of C completely determines the extended weight enumerator of C ⊥ and vice versa, via the following formula: WC ⊥ (X, Y, T ) = T −k WC (X + (T − 1)Y, X − Y, T ).
Proof. Let G be the matroid associated to the code. Using the previous theorem and the relation between the weight enumerator and the Tutte polynomial, we find
= = =
T −k WC (X + (T − 1)Y, X − Y, T ) X X + (T − 1)Y , T −k (T Y )k (X − Y )n−k tC Y X −Y X + (T − 1)Y X Y k (X − Y )n−k tC ⊥ , X −Y Y WC ⊥ (X, Y, T ).
Notice in the last step that dim C ⊥ = n − k, and n − (n − k) = k.
We can use the relations in Theorems 4.5.21 and 4.5.23 to prove the MacWilliams identities for the generalized weight enumerator.
Theorem 5.2.29 Let C be a code and let C ⊥ be its dual. Then the generalized weight enumerators of C completely determine the generalized weight enumerators of C ⊥ and vice versa, via the following formula:
(r) WC ⊥ (X, Y
j r X (r−j X 2 )−j(r−j)−l(j−l)−jk (l) r−j q )= (−1) WC (X +(q j −1)Y, X −Y ). hr − ji hj − li q q j=0 l=0
Proof. We write the generalized weight enumerator in terms of the extended weight enumerator, use the MacWilliams identities for the extended weight enu
136
CHAPTER 5. CODES AND RELATED STRUCTURES
merator, and convert back to the generalized weight enumerator. r r−j 1 X r (r) WC ⊥ (X, Y ) = (−1)r−j q ( 2 ) WC ⊥ (X, Y, q i ) hriq j=0 j q =
r (r−j X 2 )−j(r−j) r−j q q −jk Wc (X + (q j − 1)Y, X − Y, q j ) (−1) hji hr − ji q q j=0
=
r (r−j X 2 )−j(r−j)−jk r−j q (−1) hjiq hr − jiq j=0
×
j X l=0
=
hjiq l(j−l) q hj −
(l)
liq
WC (X + (q j − 1)Y, X − Y )
j r X (r−j X 2 )−j(r−j)−l(j−l)−jk r−j q (−1) hr − jiq hj − liq j=0 l=0
(l)
×WC (X + (q j − 1)Y, X − Y ). This theorem was proved by Kløve[72], although the proof uses only half of the relations between the generalized weight enumerator and the extended weight enumerator. Using both makes the proof much shorter.
5.2.8
Exercises
5.2.1 Give a proof of the statements in Remark 5.2.2. 5.2.2 Give a proof of the statements in Remark 5.2.7. 5.2.3 Show that all matroids on at most 3 elements are graphic. Give an example of a matroid that is not graphic.
5.3
Geometric lattices and codes
***Intro***
5.3.1
Posets, the M¨ obius function and lattices
Definition 5.3.1 Let L be a set and ≤ a relation on L such that: (PO.1) x ≤ x, for all x in L (reflexive), (PO.2) if x ≤ y and y ≤ x, then x = y, for all x, y in L (antisymmetric), (PO.3) if x ≤ y and y ≤ z, then x ≤ z, for all x, y and z in L (transitive). Then the pair (L, ≤) or just L is called a poset with partial order ≤ on the set L. Define x < y if x ≤ y and x 6= y. The elements x and y in L are comparable if x ≤ y or y ≤ x. A poset L is called a linear order if every two elements are comparable. Define Lx = {y ∈ Lx ≤ y} and Lx = {y ∈ Ly ≤ x} and the the interval between x and y by [x, y] = {z ∈ Lx ≤ z ≤ y}. Notice that [x, y] = Lx ∩ Ly .
5.3. GEOMETRIC LATTICES AND CODES
137
Definition 5.3.2 Let (L, ≤) be a poset. A chain of length r from x to y in L is a sequence of elements x0 , x1 , . . . , xr in L such that x = x0 < x1 < · · · < xr = y. Let r be a number. Let x, y in L. Then cr (x, y) denotes the number of chains of length r from x to y. Now cr (x, y) is finite if L is finite. The poset is called locally finite if cr (x, y) is finite for all x, y ∈ L and every number r. Proposition 5.3.3 Let L be a locally finite poset. Let x ≤ y in L. Then (C.0) c0 (x, y) = 0 if x and y are not comparable. (C.1) c0 (x, x) = 1, cr (x, x) = 0 for all r > 0 and c0 (x, y) = 0 if x < y. P P (C.2) cr+1 (x, y) = x≤z
∞ X
(−1)r cr (x, y).
r=0
Proposition 5.3.5 Let L be a locally finite poset. Then for all x, y in L: (M.0) µ(x, y) = 0 if x and y are not comparable. (M.1) µ(x, x) = 1. P
P µ(x, z) = x≤z≤y µ(z, y) = 0. P P (M.3) If x < y, then µ(x, y) = − x≤z
x≤z≤y
Proof. (M.0) and (M.1) follow from (C.0) and (C.1), respectively of Proposition 5.3.3. (M.2) is clearly equivalent with (M.3). (M.3) If x < y, then c0 (x, y) = 0. So µ(x, y) =
∞ ∞ X X (−1)r cr (x, y) = (−1)r+1 cr+1 (x, y) = r=1
−
r=0
∞ ∞ X X X X X (−1)r cr (x, z) = − (−1)r cr (x, z) = − µ(x, z). r=0
x≤z
x≤z
x≤z
The first and last equality uses the definition of µ. The second equality starts counting at r = 0 instead of r = 1, the third uses (C.2) of Proposition 5.3.3 and in the fourth the order of summation is interchanged.
138
CHAPTER 5. CODES AND RELATED STRUCTURES
Remark 5.3.6 (M.1) and (M.3) of Proposition 5.3.5 can be used as an alternative way to compute µ(x, y) by induction. Definition 5.3.7 Let L be a poset. If L has an element 0L such that 0L is the unique minimal element of L, then 0L is called the minimum of L. Similarly 1L is called the maximum of L if 1L is the unique maximal element of L. If x, y in L and x ≤ y, then the interval [x, y] has x as minimum and y as maximum. Suppose that L has 0L and 1L as minimum and maximum also denoted by 0 and 1, respectively. Then 0 ≤ x ≤ 1 for all x ∈ L. Define µ(x) = µ(0, x) and µ(L) = µ(0, 1) if L is finite. Definition 5.3.8 Let L be a locally finite poset with a minimum element. Let A be an abelian group and f : L → A a map from L to A. The sum function fˆ of f is defined by X fˆ(x) = f (y). y≤x
P Define similarly the sum functions fˇ of f by fˇ(x) = x≤y f (y) if L is a locally finite poset with a maximum element. Remark 5.3.9 A poset L is locally finite if and only if [x, y] is finite for all x ≤ y in L. So [0, x] is finite if L is a locally finite poset with minimum element 0. Hence the sum function fˆ(x) is welldefined, since it is a finite sum of f (y) in A with y in [0, x]. In the same way fˇ(x) is welldefined, since [x, 1] is finite. Theorem 5.3.10 (M¨ obius inversion formula) Let L be a locally finite poset with a minimum element. Then X f (x) = µ(y, x)fˆ(y). y≤x
Similarly f (x) = element.
P
x≤y
µ(x, y)fˇ(y) if L is a locally finite poset with a maximum
Proof. Let x be an element of L. Then X XX X X µ(y, x)fˆ(y) = µ(y, x)f (z) = f (z) µ(y, x) = y≤x
y≤x z≤y
f (x)µ(x, x) +
X z
z≤x
f (z)
X
z≤y≤x
µ(y, x) = f (x)
z≤y≤x
The first equality uses the definition of fˆ(y). In the second equality the order of summation is interchanged. In the third equality the first summation is split in the parts z = x and z < x, respectively. Finally µ(x, x) = 1 and the second summation is zero for all z < x, by Proposition 5.3.5. The proof of the second equality is similar. Example 5.3.11 Let f (x) = 1 if x = 0 and f (x) = 0 otherwise. Then the P sum function fˆ(x) = y≤x f (y) is constant 1 for all x. The M¨obius inversion formula gives that X 1 if x = 0, µ(x) = 0 if x > 0, y≤x
which is a special case of Proposition 5.3.5.
5.3. GEOMETRIC LATTICES AND CODES
139
Remark 5.3.12 Let (L, ≤) be a poset. Let ≤R be the reverse relation on L defined by x ≤R y if and only if y ≤ x. Then (L, ≤R ) is a poset. Suppose that (L, ≤) is locally finite with M¨ obius function µ. Then the number of chains of length r from x to y in (L, ≤R ) is the same as the number of chains of length r from y to x in (L, ≤). Hence (L, ≤R ) is locally finite with M¨obius function µR such that µR (x, y) = µ(y, x). If (L, ≤) has minimum 0L or maximum 1L , then (L, ≤R ) has minimum 1L or maximum 0L , respectively. Definition 5.3.13 Let L be a poset. Let x, y ∈ L. Then y is called a cover of x if x < y, and there is no z such that x < z < y. The Hasse diagram of L is a directed graph that has the elements of L as vertices, and there is a directed edge from y to x if and only if y is a cover of x. ***picture*** Example 5.3.14 Let L = Z be the set of integers with the usual linear order. Let x, y ∈ L and x ≤ y. Then c0 (x, x) = 1, c0 (x, y) = 0 if x < y, and cr (x, y) = y−x−1 for all r ≥ 1. So L infinite and locally finite. Furthermore r−1 µ(x, x) = 1, µ(x, x + 1) = −1 and µ(x, y) = 0 if y > x + 1. Definition 5.3.15 Let L be a poset. Let x, y in L. Then x and y have a least upper bound if there is a z ∈ L such that x ≤ z and y ≤ z, and if x ≤ w and y ≤ w, then z ≤ w for all w ∈ L. If x and y have a least upper bound, then such an element is unique and it is called the join of x and y and denoted by x ∨ y. Similarly the greatest lower bound of x and y is defined. If it exists, then it is unique and it is called the meet of x and y and denoted by x ∧ y. A poset L is called a lattice if x ∨ y and x ∧ y exist for all x, y in L. Remark 5.3.16 Let (L, ≤) be a finite poset with maximum 1 such that x ∧ y exists for all x, y ∈ L. The collection {zx ≤ z, y ≤ z} is finite and not empty, since it contains 1. The meet of all the elements in this collection is well defined and is given by ^ x ∨ y = { z  x ≤ z, y ≤ z}. Hence L is a lattice. Similarly L is a lattice if L is a finite W poset with minimum 0 such that x ∨ y exists for all x, y ∈ L, since x ∧ y = { z  z ≤ x, z ≤ y}. Example 5.3.17 Let L be the collection of all finite subsets of a given set X . Let ≤ be defined by the inclusion, that means I ≤ J if and only if I ⊆ J. Then 0L = ∅, and L has a maximum if and only if X is finite in which case 1L = X . Let I, J ∈ L and I ≤ J. Then I ≤ J < ∞. Let m = J − I. Then cr (I, J) =
X m1
m2 m1
m3 m ··· . m2 mr−1
Hence L is locally finite. L is finite if and only if X is finite. Furthermore I ∨ J = I ∪ J and I ∧ J = I ∩ J. So L is a lattice. Using Remark 5.3.6 we see that µ(I, J) = (−1)J−I if I ≤ J. This is much easier than computing µ(I, J) by means of Definition 5.3.4.
140
CHAPTER 5. CODES AND RELATED STRUCTURES
Example 5.3.18 Now suppose that X = {1, . . . , n}. Let L be the poset of subsets of X . Let A1 , . . . , An be a collection of subsets of a finite set A. Define for a subset J of X ! \ [ AJ = Aj and f (J) = AJ \ AI . j∈J
I
Then AJ is the disjoint union of the subsets AI \ ( Hence the sum function
S
K
AK ) for all I ≤ J.
! fˆ(J) =
X
f (I) =
I≤J
X
[
AI \
 = AJ .
AK
K
I≤J
M¨ obius inversion gives that ! [
AJ \
=
AI
I
X
(−1)J−I AI 
I≤J
which is called the principle of inclusion/exclusion. Example 5.3.19 A variant of the principle of inclusion/exclusion is given as follows. Let H1 , . . . , Hn be a collection of subsets of a finite set H. Let L be the poset of all intersections of the Hj with the reverse inclusion as partial order. Then H is the minimum of L and H1 ∩ · · · ∩ Hn is the maximum of L. Let x ∈ L. Define ! [ f (x) = x \ y . x
Then
! fˇ(x) =
X
f (y) =
x≤y
X
y \
z
 = x.
y
x≤y
Hence
[
! x \
[ x
y =
X
µ(x, y)y.
x≤y
Example 5.3.20 Let L = N be the set of positive integers with the divisibility relation as partial order. Then 0L = 1 is the minimum of L, it is locally finite and it has no maximum. Now m ∨ n = lcm(m, n) and m ∧ n = gcd(m, n). Hence L is a lattice. By Remark 5.3.6 we see that if n = 1, 1 (−1)r if n is the product of r mutually distinct primes, µ(n) = 0 if n is divisible by the square of a prime. Hence µ(n) is the classical M¨obius function. Furthermore µ(d, n) = µ( nd ) if dn. Let ϕ(n) = {i ∈ N gcd(i, n) = 1} be Euler’s ϕ function. Define Vd = {i ∈ {1, . . . , n} gcd(i, n) =
n d}
5.3. GEOMETRIC LATTICES AND CODES
141
for dn. Then { i ∈ {1, . . . , d}  gcd(i, d) = 1 } ·
n d
= Vd .
so Vd  = ϕ(d). Now {1, . . . , n} is the disjoint union of the subsets Vd with dn. Hence the sum function of ϕ(n) is given by ϕ(n) ˆ =
X
ϕ(d) = n.
dn
Therefore ϕ(n) =
X dn
n µ(d) , d
by M¨ obius inversion. Definition 5.3.21 Let (L1 , ≤1 ) and (L2 , ≤2 ) be posets. A map ϕ : L1 → L2 is called monotone if ϕ(x) ≤2 ϕ(y) for all x ≤1 y in L1 . The map ϕ is called strictly monotone if ϕ(x) <2 ϕ(y) for all x <1 y in L1 . The map is called an isomorphism of posets if it is strictly monotone and there exists a strictly monotone map ψ : L2 → L1 that is the inverse of ϕ. The posets are called isomorphic if there is an isomorphism of posets between them. Remark 5.3.22 If ϕ : L1 → L2 is an isomorphism between locally finite posets with a minimum, then µ2 (ϕ(x), ϕ(y)) = µ1 (x, y) for all x, y in L1 . If (L1 , ≤1 ) and (L2 , ≤2 ) are isomorphic posets and L1 is a lattice, then L2 is also a lattice. Example 5.3.23 Let n be a positive integer that is the product of r mutually distinct primes p1 , . . . , pr . Let L1 be the set of all positive integers that divide n with divisibility as partial order ≤1 as in Example 5.3.20. Let L2 be the collection of all subsets of {1, . . . , r} with the inclusion as partial order ≤2 as in Example 5.3.17. Define the Q maps ϕ : L1 → L2 and ψ : L2 → L1 by ϕ(d) = {ipi divides n} and ψ(x) = i∈x pi . Then ϕ and ψ are strictly monotone and they are inverses of each other. Hence L1 and L2 are isomorphic lattices.
5.3.2
Geometric lattices
Remark 5.3.24 Let (L, ≤) be a lattice without infinite chains. Then L has a minimum and a maximum. Definition 5.3.25 Let L be a lattice with minimum 0. An atom is an element a ∈ L that is a cover of 0. A lattice is called atomic if for every x > 0 in L there exist atoms a1 , . . . , ar such that x = a1 ∨ · · · ∨ ar , and the minimum possible r is called the rank of x and is denoted by rL (x) or r(x) for short. A lattice is called semimodular if for all mutually distinct x, y in L, x ∨ y covers x and y if there exists a z such that x and y cover z. A lattice is called modular if x ∨ (y ∧ z) = (x ∨ y) ∧ z for all x, y and z in L such that x ≤ z. A lattice L is called a geometric lattice if it is atomic and semimodular and has no infinite chains. If L is a geometric lattice L, then it has a minimum and a maximum and r(1) is called the rank of L and is denoted by r(L).
142
CHAPTER 5. CODES AND RELATED STRUCTURES
Example 5.3.26 Let L be the collection of all finite subsets of a given set X as in Example 5.3.17. The atoms are the singleton sets, that is subsets consisting of exactly one element of X . Every x ∈ L is the finite union of its singleton subsets. So L is atomic and r(x) = x. Now y covers x if and only if there is an element Q not in x such that y = x ∪ {Q}. If x 6= y and x and y both cover z, then there is an element P not in z such that x = z ∪ {P }, and there is an element Q not in z such that y = z ∪ {Q}. Now P 6= Q, since x 6= y. Hence x ∨ y = z ∪ {P, Q} covers x and y. Hence L is semimodular. In fact L is modular. L is locally finite. L is a geometric lattice if and only if X is finite. Example 5.3.27 Let L be the set of positive integers with the divisibility relation as in Example 5.3.20. The atoms of L are the primes. But L is not atomic, since a square is not the join of finitely many elements. L is semimodular. The interval [1, n] in L is a geometric lattice if and only if n is square free. If n is square free and m ≤ n, then r(m) = r if and only if m is the product of r mutually distinct primes. Proposition 5.3.28 Let L be a geometric lattice. Then for all x, y ∈ L: (GL.1) If x < y, then r(x) < r(y)
(strictly monotone)
(GL.2) r(x ∨ y) + r(x ∧ y) ≤ r(x) + r(y)
(semimodular inequality)
(GL.3) All maximal chains from 0 to x have the same length r(x). Proof. See [113, Prop. 3.3.2] and [114, Prop. 3.7].
Remark 5.3.29 Let L be an atomic lattice. Then L is semimodular if and only if the semimodular inequality (GL.2) holds for all x, y ∈ L. And L is modular if and only if the modular equality: r(x ∨ y) + r(x ∧ y) = r(x) + r(y) for all x, y ∈ L. Remark 5.3.30 Let L be a geometric lattice. Let x, y ∈ L and x ≤ y. The chain x = y0 < y1 < · · · < ys = y from x to y is called an extension of the chain x = x0 < x1 < · · · < xr = y if {x0 , x1 , . . . , xr } is a subset of {y0 , y1 , . . . , ys }. A chain from x to y is called maximal if there is no extension to a longer chain from x to y. Every chain from x to y can be extended to a maximal chain with the same end points, and all such maximal chains have the same length r(y) − r(x). This is called the JordanH¨ older property. Remark 5.3.31 Let L be a geometric lattice. Let Lj = {x ∈ Lr(x) = j}. Then Lj is called the level of L. Then the Hasse diagram of L is a graph that has the elements of L as vertices. If x, y ∈ L, x < y and r(y) = r(x) + 1, then x and y are connected by an edge. So only elements between two consecutive levels Lj and Lj+1 are connected by an edge. The Hasse diagram of L considered as a poset as in Definition 5.3.13 is the directed graph with an arrow from y to x if x, y ∈ L, x < y and r(y) = r(x) + 1. ***picture***
5.3. GEOMETRIC LATTICES AND CODES
143
Remark 5.3.32 Let L be a geometric lattice. Then Lx is a geometric lattice with x as minimum element and of rank rL (1) − rL (x), and µLx (y) = µ(x, y) and rLx (y) = rL (y) − rL (x) for all x ∈ L and y ∈ Lx . Similar remarks hold for Lx and [x, y].
Example 5.3.33 Let L be the collection of all linear subspaces of a given finite dimensional vector space V over a field F with the inclusion as partial order. Then 0L = {0} is the minimum and 1L = V is the maximum of L. The partial order L is locally finite if and only if L is finite if and only if the field F is finite. Let x and y be linear subspaces of V . Then x ∩ y the intersection of x and y is the largest linear subspace that is contained in x and y. So x ∧ y = x ∩ y. The sum x + y of of x and y is by definition the set of elements a + b with a in x and b in y. Then x + y is the smallest linear subspace containing both x and y. Hence x ∨ y = x + y. So L is a lattice. The atoms are the one dimensional linear subspaces. Let x be a subspace of dimension r over F. So x is generated by a basis g1 , . . . , gr . Let ai be the one dimensional subspace generated by gi . Then x = a1 ∨ · · · ∨ ar . Hence L is atomic and r(x) = dim(x). Moreover L is modular, since dim(x ∩ y) + dim(x + y) = dim(x) + dim(y) for all x, y ∈ L. Furthermore L has no infinite chains, since V is finite dimensional. Therefore L is a modular geometric lattice. Example 5.3.34 Let F be a field. Let V = (v1 , . . . , vn ) be an ntuple of nonzero vectors in Fk . Let L = L(V) be the collection of all linear subspaces of Fk that are generated by subsets of V with inclusion as partial order. So L is finite and a fortiori locally finite. By definition {0} is the linear subspace space generated by the empty set. Then 0L = {0} and 1L is the subspace generated by all v1 , . . . , vn . Furthermore L is a lattice with x ∨ y = x + y and _ x ∧ y = { z  z ≤ x, z ≤ y} by Remark 5.3.16. Let aj be the linear subspace generated by vj . Then a1 , . . . , an W are the atoms of L. Let x be the subspace generated by {vj j ∈ J}. Then x = j∈J aj . If xWhas dimension r, then there exists a subset I of J such that I = r and x = i∈I ai . Hence L is atomic and r(x) = dim(x). Now x ∧ y ⊆ x ∩ y, so r(x ∨ y) + r(x ∧ y) ≤ dim(x + y) + dim(x ∩ y) = r(x) + r(y). Hence the semimodular inequality holds and L is a geometric lattice. In most cases L is not modular. Example 5.3.35 Let F be a field. Let A = (H1 , . . . , Hn ) be an arrangement over F of hyperplanes in the vector space V = Fk . Let L = L(A) be the collection of all nonempty intersections of elements of A. By definition Fk is the empty intersection. Define the partial order ≤ by x ≤ y if and only if y ⊆ x.
144
CHAPTER 5. CODES AND RELATED STRUCTURES
Then V is the minimum element and {0} is the maximum element. Furthermore \ x ∨ y = x ∩ y if x ∩ y 6= ∅, and x ∧ y = { z  x ∪ y ⊆ z }. Suppose that A is a central arrangement. Then x∩y is nonempty for all x, y in L. So x∨y and x∧y exist for all x, y in L, and L is a lattice. Let vj = (v1j , . . . , vkj ) Pk be a nonzero vector such that i=1 vij Xi = 0 is a homogeneous equation of Hj . Let V = (v1 , . . . , vn ). Consider the map ϕ : L(V) → L(A) defined by \ ϕ(x) = Hj if x is the subspace generated by {vj j ∈ J}. j∈J
Now x ⊂ y if and only if ϕ(y) ⊂ ϕ(x) for all x, y ∈ L(V). So ϕ is a strictly monotone map. Furthermore ϕ is a bijection and its inverse map is also strictly monotone. Hence L(V) and L(A) are isomorphic lattices. Therefore L(A) is also a geometric lattice.
5.3.3
Geometric lattices and matroids
The notion of a geometric lattice is ”cryptomorphic” that is almost equivalent to a matroid. See [34, 38, 44, ?, 114]. Proposition 5.3.36 Let L be a finite geometric lattice. Let M (L) be the set of all atoms of L. Let I(L) be the collection of all subsets I of M (L) such that r(a1 ∨ · · · ∨ ar ) = r if I = {a1 , . . . , ar } is a collection of r atoms of L. Then (M (L), I(L)) is a matroid. Proof. The proof is left as an exercise.
Proposition 5.3.37 (Rota’s Crosscut Theorem) Let L be a finite geometric lattice. Let M (L) be the matroid associated with L. Then X χL (T ) = (−1)I T r(L)−r(I) . I⊆M (L)
Proof. See [101] and [24, Theorem 3.1].
Definition 5.3.38 Let (M, I) be a matroid. An element x in M is called a loop if {x} is a dependent set. Let x and y in M be two distinct elements that are not loops. Then x and y are called parallel if r({x, y}) = 1. The matroid is called simple if it has no loops and no parallel elements. Remark 5.3.39 Let G be a k × n matrix with entries in a field F. Let MG be the set {1, . . . , n} indexing the columns of G and IG be the collection of all subsets I of MG such that the submatrix GI consisting of the columns of G at the positions of I are independent. Then (MG , IG ) is a matroid. Suppose that F is a finite field and G1 and G2 are generator matrices of a code C, then (MG1 , IG1 ) = (MG2 , IG2 ). So the matroid (MC , IC ) of a code C is well defined by (MG , IG ) for some generator matrix G of C. If C is degenerate, then there is a position i such that ci = 0 for every codeword c ∈ C and all such positions correspond onetoone with loops of MC . Let C be nondegenerate. Then MC
5.3. GEOMETRIC LATTICES AND CODES
145
has no loops, and the positions i and j with i 6= j are parallel in MC if and only if the ith column of G is a scalar multiple of the jth column. The code C is projective if and only if the arrangement AG is simple if and only if the matroid MC is simple. An [n, k] code C is MDS if and only if the matroid MC is the uniform matroid Un,k . Remark 5.3.40 Let C be a projective code with generator matrix G. Then AG is an essential simple arrangement with geometric lattice L(AG ). Furthermore the matroids M (L(AG )) and MC are isomorphic. Definition 5.3.41 Let (M, I) be a matroid. A kflat of M is a maximal subset of M of rank k. Let L(M ) be the collection of all flats of M , it is called the lattice of flats of M . Let J be a subset of M . Then the closure J¯ is by definition the intersection of all flats that contain J. Remark 5.3.42 M is a kflat with k = r(M ). If F1 and F2 are flats, then F1 ∩ F2 is also a flat. Consider L(M ) with the inclusion as partial order. Then M is the maximum of L(M ). And F1 ∩ F2 = F1 ∧ F2 for all F1 and F2 in L(M ). Hence L(M ) is indeed a lattice by Remark 5.3.16. Let J be a subset of M , then J¯ is a flat, since it is a nonempty, finite intersection of flats. So ¯∅ is the minimum of L(M ). Remark 5.3.43 An element x in M is a loop if and only if x ¯ = ¯∅. If x, y ∈ M ¯ = {¯ are no loops, then x and y are parallel if and only if x ¯ = y¯. Let M xx ∈ ¯ ∈ I, ¯ ¯ Then (M ¯ , I) ¯ is a simple matroid. M, x ¯ 6= ¯ ∅}. Let I¯ = {II ∅ 6∈ I}. Definition 5.3.44 Let G be a generator matrix of a code C. The reduced ¯ is the matrix obtained from G by deleting all zero columns from G matrix G and all columns that are a scalar multiple of a previous column. The reduced ¯ code C¯ of C is the code with generator matrix G. Remark 5.3.45 Let G be a generator matrix of a code C. The definition of the ¯ does not depend on the choice of the generator reduced code C¯ by means of G ¯ G and MG¯ are isomorphic. matrix G of C. The matroids M Let J be a subset of {1, . . . , n}. Then the closure J¯ is equal to the complement ¯ in {1, . . . , n} of the support of C(J) and C(J) = C(J). Proposition 5.3.46 Let (M, I) be a matroid. Then L(M ) with the inclusion ¯ ). as partial order is a geometric lattice and L(M ) is isomorphic with L(M Proof. See [114, Theorem 3.8].
5.3.4
Exercises
5.3.1 Give a proof of Remark 5.3.9. 5.3.2 Give a proof of Remark 5.3.16. 5.3.3 Give a proof of the formulas for cr (x, y) and µ(x, y) in Example 5.3.17. 5.3.4 Give a proof of the formula for µ(x) in Example 5.3.20.
146
CHAPTER 5. CODES AND RELATED STRUCTURES
5.3.5 Give a proof of the statements in Example 5.3.27. 5.3.6 Give an example of an atomic finite lattice with minimum 0 and maximum 1 that is not semimodular. 5.3.7 Give a proof of the statements in Remark 5.3.29. 5.3.8 Let L be a finite geometric lattice. Show that (M (L), I(L)) is a matroid as stated in Proposition 5.3.36. Show moreover that this matroid is simple. 5.3.9 Give a proof of the statements in Remark 5.3.39. 5.3.10 Give a proof of the statements in Remark 5.3.42. 5.3.11 Give a proof of Proposition 5.3.46. 5.3.12 Let L be a geometric lattice. Let a be an atom of L and x ∈ L. Show that r(x ∨ a) ≤ r(x) + 1 and r(x ∨ a) = r(x) if and only if a ≤ x. 5.3.13 Let L be a geometric lattice. Show that r(y) − r(x) is the length of every maximal chain from x to y for all x ≤ y in L. 5.3.14 Give a proof of Remark 5.3.32. 5.3.15 Give an example of a central arrangement A such that the lattice L(A) is not modular.
5.4
Characteristic polynomial
***
5.4.1
Characteristic and M¨ obius polynomial
Definition 5.4.1 Let L be a finite geometric lattice. The characteristic polynomial χL (T ) and the Poincar´e polynomial πL (T ) of L are defined by: X X χL (T ) = µL (x)T r(L)−r(x) , and πL (T ) = µL (x)(−T )r(x) . x∈L
x∈L
The two variable M¨ obius polynomial µL (S, T ) in S and T is defined by X X µL (S, T ) = µ(x, y)S r(x) T r(L)−r(y) . x∈L x≤y∈L
The two variable characteristic polynomial or coboundary polynomial is defined by X X χL (S, T ) = µ(x, y)S a(x) T r(L)−r(y) , x∈L x≤y∈L
where a(x) is the number of atoms a in L such that a ≤ x. Remark 5.4.2 Now µ(L) = χL (0), and χL (1) = 0 if and only if L consists of one element. Furthermore χL (T ) = T r(L) πL (−T −1 ), and µL (0, T ) = χL (0, T ) = χL (T ).
5.4. CHARACTERISTIC POLYNOMIAL
147
Remark 5.4.3 Let r be the rank of L. Then the following relation holds for the M¨ obius polynomial in terms of characteristic polynomials µL (S, T ) =
r X
S i µi (T ) with µi (T ) =
i=0
X
χLx (T ),
x∈Li
where Li = {x ∈ Lr(x) = i} and n = L1 the number of atoms in L. Then similarly χL (S, T ) =
n X
X
S i χi (T ) with χi (T ) =
i=0
χLx (T ).
x∈L,a(x)=i
Remark 5.4.4 Let L be a geometric lattice. Then r(L)
X
µi (T ) = µL (1, T ) =
i=0
since Pn
X X
µ(x, y)T r(L)−r(y) = T r(L) ,
y∈L 0≤x≤y
P
0≤x≤y µ(x, y) = 0 for all 0 < y in L by Proposition 5.3.5. Similarly Pn k χ (T ) = χL (1, T ) = T r(L) . Also for the extended i=0 i i=0 Ai (T ) = T weights of a code of dimension k by Proposition 4.4.38 for t = 0.
Example 5.4.5 Let L be the lattice of all subsets of a given finite set of r elements as in Example 5.3.17. Then r(x) = a(x) and µ(x, y) = (−1)a(y)−a(x) if x ≤ y. Hence r X r r (−1)j T r−j = (T − 1)r and µi (T ) = (T − 1)r−i . χL (T ) = j i j=0 Therefore µL (S, T ) = (S + T − 1)r . Example 5.4.6 Let L be the lattice of all linear subspaces of a given vector space of dimension r over the finite field Fq as in Example 5.3.33. Then r(x) is the dimension of x over Fq . The number of subspaces of dimension i is counted in Proposition 4.3.7. It is left as an exercise to show that µ(x, y) = (−1)i q (j−i)(j−i−1)/2 if r(x) = i, r(y) = j and x ≤ y, and r X i r χL (T ) = (−1)i q (2) T r−i = (T − 1)(T − q) · · · (T − q r−1 ) i q
and
i=0
µi (T ) =
r i
(T − 1)(T − q) · · · (T − q r−i−1 ).
q
See [71]. Remark 5.4.7 Every polynomial in one variable with coefficients in a field F ¯ of F. In Examples 5.4.5 factorizes in linear factors over the algebraic closure F and 5.4.6 we see that χL (T ) factorizes in linear factors over Z. This is always the case for so called super solvable geometric lattices and lattices from free central arrangements. See [92].
148
CHAPTER 5. CODES AND RELATED STRUCTURES
Definition 5.4.8 Let L be a finite geometric lattice. The Whitney numbers wi and Wi of the first and second kind, respectively are defined by X wi = µ(x) and Wi = Li . x∈Li
The doubly indexed Whitney numbers wij and Wij of the first and second kind, respectively are defined by X X µ(x, y) and wij = x∈Li y∈Lj
Wij = {(x, y)x ∈ Li , y ∈ Lj , x ≤ y}. See [60], [34, §6.6.D], [?, Chapter 14] and [113, §3.11]. Remark 5.4.9 We have that r(L)
χL (T ) =
X
r(L) r(L)
wi T
r(L)−i
and µL (S, T ) =
i=0
XX
wij S i T r(L)−j
i=0 j=0
Hence the (doubly indexed) Whitney numbers of he first kind are determined by µL (S, T ). The leading coefficient of X X µi (T ) = µ(x, y)T r(Lx )−rLx (y) x∈Li x≤y
P
is equal to x∈Li µ(x, x) = Li  = Wi . Hence the Whitney numbers of the second kind Wi are determined by µL (S, T ). We will see in Example 5.4.32 that the Whitney numbers are not determined by χL (S, T ). Finally, let r = r(L). Then µr−1 (T ) = Wr−1 (T − 1)
5.4.2
Characteristic polynomial of an arrangement
A central arrangement A gives rise to a geometric lattice L(A) and characteristic polynomial χL(A) that will be denoted by χA . Similarly πA denotes the Poincar´e polynomial of A. If A is an arrangement over the real numbers, then πA (1) counts the number of connected components of the complement of the arrangement. See [139]. Something similar can be said about arrangements over finite fields. Proposition 5.4.10 Let q be a prime power, and let A = (H1 , . . . , Hn ) be a simple and central arrangement in Fkq . Then χA (q m ) = Fkqm \ (H1 ∪ · · · ∪ Hn ). Proof. See [7, Theorem 2.2], [17, Proposition 3.2], [44, Sect. 16] [92, Theorem 2.69]. Let A = Fkqm and Aj = Hj (Fm q ). Let L be the poset of all intersections of the Aj . The principle of inclusion/exclusion as formulated in Example 5.3.19 gives that X X Fkqm \ (H1 ∪ · · · ∪ Hn ) = µ(x)x = µ(x)q m dim(x) . x∈L
x∈L
5.4. CHARACTERISTIC POLYNOMIAL
149
The expression on the right hand side is equal to χA (q m ), since L is isomorphic with the reverse of the geometric lattice L(A) of the arrangement A = (H1 , . . . , Hn ), so dim(x) = µL(A) − µL(A) (x) and µL (x) = µL(A) (x) by Remark 5.3.12. Definition 5.4.11 Let A = (H1 , . . . , Hn ) be an arrangement in Fk over the field F. Let H = Hi . Then the deletion A \ H is the arrangement in Fk obtained from (H1 , . . . , Hn ) by deleting all the Hj such that Hj = H. Let x = ∩i∈I Hi be an intersection of hyperplanes of A. Let l be the dimension of x. The restriction Ax is the arrangement in Fl of all hyperplanes x ∩ Hj in x such that x ∩ Hj 6= ∅ and x ∩ Hj 6= x, for a chosen isomorphism of x with Fl . Proposition 5.4.12 Deletionrestriction formula Let A = (H1 , . . . , Hn ) be a simple and central arrangement in Fk over the field F. Let H = Hi . Then χA (T ) = χA\H (T ) − χAH (T ). Proof. A proof for an arbitrary field can be found in [92, Theorem 2.56]. Here the special case of a central arrangement over the finite field Fq will be treated. Without loss of generality we may assume that H = H1 . Denote Hj (Fqm ) by Hj and Fkqm by V . Then the following set is written as the disjoint union of two others. V \ (H2 ∪ · · · ∪ Hn ) = (V \ (H1 ∪ H2 ∪ · · · ∪ Hn )) ∪ (H1 \ (H2 ∪ · · · ∪ Hn )) . The number of elements of the left hand side is equal to χA\H (q m ), and the number of elements of the two sets on the right hand side are equal to χA (q m ) and χAH (q m ), respectively by Proposition 5.4.10. Hence χA\H (q m ) = χA (q m ) + χAH (q m ) for all positive integers m, since the union is disjoint. Therefore the identity of the polynomial holds. Definition 5.4.13 Let A = (H1 , . . . , Hn ) be a central simple arrangement over the field F in Fk . Let J ⊆ {1, . . . , n}. Define HJ = ∩j∈J Hj . Consider the decreasing sequence Nk ⊂ Nk−1 ⊂ · · · ⊂ N1 ⊂ N0 , of algebraic subsets of the affine space Ak , defined by [ Ni = HJ . J⊆{1,...,n},r(HJ )=i
Define Mi = (Ni \ Ni+1 ). Remark 5.4.14 N0 = Ak , N1 = ∪nj=1 Hj , Nk = {0} and Nk+1 = ∅. Furthermore Ni is a union of linear subspaces of Ak all of dimension k − i. Notice that HJ is isomorphic with C(J) in case A is the arrangement of the generator matrix G of the code C as remarked in the proof of Proposition 4.4.8.
150
CHAPTER 5. CODES AND RELATED STRUCTURES
Proposition 5.4.15 Let A = (H1 , . . . , Hn ) be a central simple arrangement over the field F in Fk . Let z(x) = {j ∈ {1, . . . , n}x ∈ Hj } and r(x) = r(Hz(x) ) the rank of x for x ∈ Ak . Then Ni = { x ∈ Ak  r(x) ≥ i } and Mi = { x ∈ Ak  r(x) = i }. Proof. Let x ∈ Ak and c = xG. Let x ∈ Ni . Then there exists a J ⊆ {1, . . . , n} such that r(HJ ) = i and x ∈ HJ . So cj = 0 for all j ∈ J. So J ⊆ z(x). Hence Hz(x) ⊆ HJ . Therefore r(x) = r(Hz(x) ) ≥ r(HJ ) = i. The converse implication is proved similarly. The statement about Mi is a direct consequence of the one about Ni . Proposition 5.4.16 Let A be a central simple arrangement over Fq . Let L = L(A) be the geometric lattice of A. Then µi (q m ) = Mi (Fqm ). P Proof. See also [7, Theorem 6.3]. Remember that µi (T ) = r(x)=i χLx (T ) as defined in Remark 5.4.3. Let L = L(A) and x ∈ L. Then L(Ax ) = Lx . Let ∪Ax be the union of the hyperplanes of Ax . Then (x \ (∪Ax ))(Fqm ) = χLx (q m ) by Proposition 5.4.10. Now Mi is the disjoint union of complements of the arrangements of Ax for all x ∈ L such that r(x) = i by Proposition 5.4.15. Hence X X Mi (Fqm ) = (x \ (∪Ax ))(Fqm ) = χLx (q m ). x∈L,r(x)=i
x∈L,r(x)=i
5.4.3
Characteristic polynomial of a code
Proposition 5.4.17 Let C be a nondegenerate Fq linear code. Then An (T ) = χC (T ). Proof. The elements in Fkqm \(H1 ∪· · ·∪Hn ) correspond onetoone to codewords of weight n in C⊗Fqm by Proposition 4.4.8. So An (q m ) = χC (q m ) for all positive integers m by Proposition 5.4.10. Hence An (T ) = χC (T ). Definition 5.4.18 Let G be a generator matrix of an [n, k] code C over Fq . Define Yi = { x ∈ Ak  wt(xG) ≤ n − i } and Xi = { x ∈ Ak  wt(xG) = n − i }. Remark 5.4.19 The Yi form a decreasing sequence Yn ⊆ Yn−1 ⊆ · · · ⊆ Y1 ⊆ Y0 , of algebraic subsets of Ak and Xi = (Yi \ Yi+1 ). Proposition 5.4.20 Let C be a projective code of length n. Then χi (q m ) = Xi (Fqm ) = An−i (q m ).
5.4. CHARACTERISTIC POLYNOMIAL
151
Proof. Every x ∈ Fkqm corresponds onetoone to codeword in C ⊗ Fqm via the map x 7→ xG. So Xi (Fqm ) = An−i (q m ). And An−i (q m ) = χi (q m ) for all i, by Remark ??. Corollary 5.4.21 Let C be a projective code of length n. Then χi (T ) = An−i (T ) for all i. Remark 5.4.22 Another way to define Xi is the collection of all points P ∈ Ak such that P is on exactly i distinct hyperplanes of the arrangement AG . Denote the arrangement of hyperplanes in Pk−1 also by AG , and let P¯ be the point in Pk−1 corresponding to P ∈ Ak . Define X¯i = {P¯ ∈ Pk−1 P¯ is on exactly i hyperplanes of AG }. For all i < n the polynomial χi (T ) is divisible by T − 1. Define χ ¯i (T ) = χi (T )/(T − 1). Then χ ¯i (q m ) = X¯i (Fqm ) for all i < n by Proposition 5.4.20. Theorem 5.4.23 Let G be a generator matrix of a nondegenerate code C. Let AG be the associated central arrangement. Let d⊥ = d(C ⊥ ). Then Ni ⊆ Yi for all i, equality holds for all i < d⊥ and Mi = Xi for all i < d⊥ − 1. If furthermore C is projective, then µi (T ) = χi (T ) = An−i (T ) for all i < d⊥ − 1. Proof. Let x ∈ Ni . Then x ∈ HJ for some J ⊆ {1, . . . , n} such that r(HJ ) = i. So J ≥ i and wt(xG) ≤ n − i by Proposition 4.4.8. Hence x ∈ Yi . Therefore Ni ⊆ Yi . Let i < d⊥ and x ∈ Yi . Then wt(xG) ≤ n − i. Let J = supp(xG). Then J ≥ i. Take a subset I of J such that I = i. Then x ∈ HI and r(I) = I = i by Lemma 7.4.39, since i < d⊥ . Hence x ∈ Ni . Therefore Yi ⊆ Ni . So Yi = Ni for all i < d⊥ , and Mi = Xi for all i < d⊥ − 1. The code is nondegenerate. So d(C ⊥ ) ≥ 2. Suppose furthermore that C is projective. Then µi (T ) = χi (T ) = An−i (T ) for all i < d⊥ − 1, by Remark ?? and Propositions 5.4.20 and 5.4.16. The extended and generalized weight enumerators are determined by the pair (n, k) for an [n, k] MDS code by Remark ??. If C is an [n, k] code, then d(C ⊥ ) is at most k + 1. Furthermore d(C ⊥ ) = k + 1 if and only if C is MDS if and only if C ⊥ is MDS. An [n, k, d] code is called almost MDS if d = n − k. So d(C ⊥ ) = k if and only if C ⊥ is almost MDS. If C is almost MDS, then C ⊥ is not necessarily almost MDS. The code C is called near MDS if both C and C ⊥ are almost MDS. See [?]. Proposition 5.4.24 Let C be an [n, k, d] code such that C ⊥ is MDS or almost MDS and k ≥ 3. Then both χC (S, T ) as WC (X, Y, T ) determine µC (S, T ). In particular µi (T ) = χi (T ) = An−i (T ) for all i < k − 1, µk−1 (T ) =
n−1 X i=k−1
and µk (T ) = 1.
χi (T ) =
n−1 X i=k−1
An−i (T ),
152
CHAPTER 5. CODES AND RELATED STRUCTURES
Proof. Let C be a code such that d(C ⊥ ) ≥ k ≥ 3. Then C is projective and An−i = χi for all i < k − 1 by Remark ??. If i < k − 1, then the expression for µi (T ) is given by Theorem 5.4.23. FurtherPk morePµk (T ) = χn (T ) = A0 (T ) = 1. Finally let L = L(C). Then i=0 µi (T ) = P n n T k , i=0 χi (T ) = T k and i=0 Ai (T ) = T k by Remark 5.4.4. Hence the formula for µk−1 (T ) holds. Therefore µC (S, T ) is determined both by WC (X, Y, T ) and χC (S, T ). Projective codes of dimension 3 are examples of codes C such that C ⊥ is almost MDS. In the following we will give explicit formulas for µC (S, T ) for such codes. Let C be a projective code of length n and dimension 3 over Fq with generator matrix G. The arrangement AG = (H1 , . . . , Hn ) of planes in F3q is simple and essential, and the corresponding arrangement of lines in P2 (Fq ) is also denoted by AG . We defined X¯i (Fqm ) = {P¯ ∈ P2 (Fqm )  P¯ is on exactly i lines of AG } and χ ¯i (q m ) = X¯i (Fqm ) in Remark 5.4.22 for all i < n. Remark 5.4.25 Notice that for projective codes of dimension three X¯i (Fqm ) = X¯i (Fq ) for all positive integers m and 2 ≤ i < n. Abbreviate in this case χ ¯i (q m ) = χ ¯i for 2 ≤ i < n. Proposition 5.4.26 Let C be a projective code of length n and dimension 3 over Fq . Then Pn−1 ¯i − n + 1 , µ0 (T ) = (T − 1) T 2 − (n − 1)T + i=2 (i − 1)χ Pn−1 µ1 (T ) = (T − 1) nT + n − i=2 iχ ¯i , µ2 (T ) = (T − 1) Pn−1 χ i=2 ¯i . Proof. A more general statement and proof is possible for [n, k] codes C such that d(C ⊥ ) ≥ k, using Proposition 5.4.24, the fact that Bt (T ) = T k−t − 1 for all t < d(C ⊥ ) by Lemma 7.4.39 and the expression of Bt (T ) in terms of Aw (T ) by Proposition ??. We will give a second geometric proof for the special case of projective codes of dimension 3. It is enough to show this proposition with T = q m for all m by Lagrange interpolation. Notice that µi (q m ) is the number of elements of Mi (Fqm ) by Proposition 5.4.16. Let P¯ be the corresponding point in P2 (Fqm ) for P ∈ F3qm ¯ i = {P¯  P ∈ Mi }. Then and P 6= 0. Abbreviate Mi (Fqm ) by Mi . Define M ¯ i  for all i < 3. Mi  = (q m − 1)M ¯ 2 , then P¯ ∈ Hj ∩ Hk for some j 6= k. Hence P¯ ∈ X¯i (Fq ) for some (1) If P¯ ∈ M ¯ 2 is the disjoint union of the X¯i (Fq ), i ≥ 2, since the code is projective. So M Pn−1 ¯ 2 ≤ i < n. Therefore M2  = i=2 χ ¯i . ¯ 1 if and only if P¯ is on exactly one line Hj . There are n lines, and (2) P¯ ∈ M every line has q m + 1 points that are defined over Fqm . If i ≥ 2, then every ¯ 1  = n(q m + 1) − Pn−1 iχ P¯ ∈ X¯i (Fq ) is on i lines Hj . Hence M i=2 ¯i . ¯ 1, M ¯ 2 and M ¯ 0 . The numbers M ¯ 2  and M ¯ 1 (3) P2 is the disjoint union of M 2 2m m are computed in (1) and (2), and P (Fqm ) = q + q + 1. From this we derive ¯ 0. the number of elements of M
5.4. CHARACTERISTIC POLYNOMIAL
153
Example 5.4.27 Consider the matrices G and P given by 1 0 0 0 1 1 1 G = 0 1 0 1 0 1 1 and 0 0 1 1 1 0 1 1 0 0 0 1 1 −1 1 1 0 −1 1 −1 1 . P = 0 1 0 1 0 0 1 −1 −1 0 1 1 −1 Let C be the code over Fq with generator matrix G. The columns of G represent also the coefficients of the lines of AG . The jth column of P represents the homogenous coordinates of the points Pj in the projective plane that occur as intersections of two lines of AG . In case q is even, the points P7 , P8 and P9 coincide. ***two pictures: q odd and q even** If q is even, then χ ¯2 = 0 and χ ¯3 = 7. If q is odd, then χ ¯2 = 3 and χ ¯3 = 6.
q even
q odd
i χ ¯i A¯i µ ¯3−i χ ¯i A¯i µ ¯3−i
1 0 7 0 9
2 3 0 7 0 0 7T − 14 T 2 − 6T + 8 3 6 0 0 7T − 17 T 2 − 6T + 9
4 0 7
5 0 0
6 0 7T − 14
7 0 T 2 − 6T + 8
0 6
0 3
0 7T − 17
0 T 2 − 6T + 9
Notice that there is a codeword of weight 7 in case q is even and q > 4 or q is odd and q > 3, since A¯7 (T ) = (T − 2)(T − 4) or A¯7 (T ) = (T − 3)2 , respectively. Example 5.4.28 Let G be a 3 × n generator matrix of an MDS code. The lines of the arrangement AG are in general position. That means that every two distinct lines meet in one point, and every three mutually distinct lines have an empty intersection. So χ ¯2 = n2 and χ ¯i = 0 for all i > 2. Hence n ¯ ¯ An−2 (T ) = µ ¯2 (T ) = 2 and An−1 (T ) = µ ¯1 (T ) = nT + 2n − n2 and A¯n (T ) = n−1 2 µ ¯0 (T ) = T − (n − 1)T + 2 , by Proposition 5.4.16 and Theorem ?? which is in agreement with Proposition 4.4.22. Example 5.4.29 Let a and b positive integers such that 2 < a < b. Let n = a + b. Let G be a 3 × n generator matrix of a nondegenerate code. Suppose that there are two points P and Q in the projective plane over Fq such that the a + b lines of the projective arrangement of AG consists of a distinct lines incident with P , and b distinct lines incident with Q and there is no line incident with P and Q. Then A¯n−2 = χ ¯2 = ab, A¯a = χ ¯a = 1 and A¯b = χ ¯b = 1. Hence µ ¯2 (T ) = ab + 2. Furthermore A¯n−1 (T ) = µ ¯1 (T ) = (a + b)T − 2ab, A¯n (T ) = µ ¯0 (T ) = T 2 − (a + b − 1)T + ab − 1 and A¯i (T ) = 0 for all i 6= a, b, n − 2, n − 1, n.
154
CHAPTER 5. CODES AND RELATED STRUCTURES
Example 5.4.30 Let a, b and c be positive integers such that 2 < a < b < c. Let n = a + b + c. Let G be a 3 × n generator matrix of a nondegenerate code C(a, b, c). Suppose that there are three points P , Q and R in the projective plane over Fq such that the lines of the projective arrangement of AG consist of a distinct lines incident with P and not with Q and R, b distinct lines incident with Q and not with P and R, and c distinct lines incident with R and not with P and Q. If q is large enough, then such a configurations exists. The a lines through P intersect the b lines through Q in ab points. Similarly statements hold for the lines through P and R intersecting in ac points, and the lines through Q and R intersecting in bc points. All these intersection points are on exactly two lines of the arrangement and there are no other. Hence χ ¯2 = ab + bc + ca. Now P is the unique point on exactly a lines of the arrangement. So χ ¯a = 1. Similarly χ ¯b = χ ¯c = 1. Finally χ ¯i = 0 for all 2 ≤ i < n and i ∈ / {2, a, b, c}. Now µi (T ) is divisible by T − 1 for all 0 ≤ i < k. Define µ ¯i (T ) = µi (T )/(T − 1). Define similarly A¯w (T ) = Aw (T )/(T − 1) for all 0 < w ≤ n. Propositions 5.4.24 and 5.4.26 imply that A¯n−a = A¯n−b = A¯n−c = 1 and A¯n−2 = ab + bc + ca and µ ¯2 (T ) = ab + bc + ca + 3. Furthermore A¯n−1 (T ) = µ ¯1 (T ) = nT − 2(ab + bc + ca), A¯n (T ) = µ ¯0 (T ) = T 2 − (n − 1)T + ab + bc + ca − 2 and A¯i (T ) = 0 for all i 6∈ {0, n − a, n − b, n − c, n − 2, n − 1, n}. Therefore WC(a,b,c) (X, Y, T ) = WC(a0 ,b0 ,c0 ) (X, Y, T ) if and only if (a, b, c) = (a0 , b0 , c0 ), and µC(a,b,c) (S, T ) = µC(a0 ,b0 ,c0 ) (S, T ) if and only if a+b+c = a0 +b0 +c0 and ab + bc + ca = a0 b0 + b0 c0 + c0 a0 . In particular let C1 = C(3, 9, 14) and C2 = C(5, 6, 15). Then C1 and C2 are two projective codes with the same M¨ obius polynomial µC (S, T ) but distinct extended weight enumerators and coboundary polynomials χC (S, T ). Example 5.4.31 Consider the codes C3 and C4 over erator matrices G3 and G4 given by 1 1 0 0 1 0 0 1 G3 = 0 1 1 1 0 1 0 and G4 = 0 −1 0 1 1 0 0 1 0
Fq with q > 2 with gen1 1 1
0 1 1
0 1 a
1 0 0 0 1 0 , 0 0 1
where a ∈ Fq \ {0, 1}. It was shown in [34, Exercise 6.96] that the duals of these codes have the same Tutte polynomial. So the codes C3 and C4 have the same Tutte polynomial tC (X, Y ) = 2X + 2Y + 3X 2 + 5XY + 4Y 2 + X 3 + X 2 Y + 2XY 2 + 3Y 3 + Y 4 . Hence C3 and C4 have the extended weight enumerator given by X 7 + (2T − 2)X 4 Y 3 + (3T − 3)X 3 Y 4 + (T 2 − T )X 2 Y 5 + +(5T 2 − 15T + 10)XY 6 + (T 3 − 6T 2 + 11T − 6)Y 7 . The codes C3 and C4 are not projective and their reductions C¯3 and C¯4 , respectively have generator matrices given by 1 1 0 1 0 0 1 1 0 0 0 0 ¯ 3 = 0 1 1 0 1 0 and G ¯4 = 0 1 1 1 1 0 , G −1 0 1 0 0 1 0 1 1 a 0 1
5.4. CHARACTERISTIC POLYNOMIAL
155
From the arrangement A(C¯3 ) and A(C¯4 ) we deduce the χ ¯i (T ) that are given in the following table. code \ i 0 C3 T 2 − 5T + 6 C4 T 2 − 5T + 6
1 6T − 12 6T − 13
2 3 6
3 4 1
4 0 1
5 0 0
Therefore tC3 (X, Y ) = tC4 (X, Y ) but χC3 (S, T ) 6= χC4 (S, T ) and tC¯3 (X, Y ) 6= tC¯4 (X, Y ). Example 5.4.32 Let C5 = C3⊥ and C6 = C4⊥ . Then C5 and C6 have the same Tutte polynomial tC ⊥ (X, Y ) = tC (Y, X) as given by by Example 5.4.31: 2X + 2Y + 4X 2 + 5XY + 3Y 2 + 3X 3 + 2X 2 Y + XY 2 + Y 3 + 3X 4 . Hence C5 and C6 have the same extended weight enumerator given by X 7 +(T −1)X 5 Y 2 +(6T −6)X 4 Y 3 +(2T 2 −T −1)X 3 Y 4 +(15T 2 −43T +28)X 2 Y 5 + +(7T 3 − 36T 2 + 60T − 31)XY 6 + (T 4 − 7T 3 + 19T 2 − 23T + 10)Y 7 . The geometric lattice L(C5 ) has atoms a, b, c, d, e, f, g corresponding to the first, second, etc. column of G3 . The second level of L(C5 ) consists of the following 17 elements: abe, ac, ad, af, ag, bc, bd, bf, bg, cd, ce, cf, cg, de, df, dg, ef g. The third level consists of the following 12 elements: abce, abde, abef g, acdg, acf, adf, bcdf, bcg, bdg, cde, cef g, def g. Similarly, the geometric lattice L(C6 ) has atoms a, b, c, d, e, f, g corresponding to the first, second, etc. column of G4 . The second level of L(C6 ) consists of the following 17 elements: abe, ac, ad, af, ag, bc, bd, bf, bg, cd, ce, cf, cg, de, df g, ef, eg. The third level consists of the following 13 elements: abce, abde, abef, abeg, acd, acf, acg, adf g, bcdf g, cde, cef, ceg, def g. Proposition 5.4.24 implies that µ0 (T ) and µ1 (T ) are the same for both codes and equal to µ0 (T ) = χ0 (T ) = A7 (T ) = (T − 1)(T − 2)(T 2 − 4T + 5) µ1 (T ) = χ1 (T ) = A6 (T ) = (T − 1)(7T 2 − 29T + 31). The polynomials µ3 (T ) and µ2 (T ) are given in the following table using Remarks 5.4.9 and 5.4.4. C5 C6 µ2 (T ) 17T 2 − 49T + 32 17T 2 − 50T + 33 µ3 (T ) 12T − 12 13T − 13 This example shows that the M¨ obius polynomial µC (S, T ) is not determined by coboundary polynomials χC (S, T ).
156
5.4.4
CHAPTER 5. CODES AND RELATED STRUCTURES
Minimal codewords and subcodes
Definition 5.4.33 A minimal codeword of a code C is a codeword whose support does not properly contain the support of another codeword. Remark 5.4.34 The zero word is a minimal codeword. Notice that the nonzero scalar multiple of a minimal codeword is again a minimal codeword. Nonzero minimal codewords play a role in minimum distance decoding. Minimal codewords play a role in minimum distance decoding algorithms [6, 8, 9] and secret sharing schemes and access structures [80, 117]. We can generalize this notion to subcodes instead of words. Definition 5.4.35 A minimal subcode of dimension r of a code C is an rdimensional subcode whose support is not properly contained in the support of another rdimensional subcode. Remark 5.4.36 A minimal codeword generates a minimal subcode of dimension one, and all the elements of a minimal subcode of dimension one are minimal codewords. A codeword of minimal weight is a nonzero minimal codeword, but te converse is not always the case. In the Example 5.4.32 it is shown that the codes C5 and C6 have the same Tutte polynomial whereas the number of minimal codewords of the code C5 is 12 and of C6 is 13. Hence the number of minimal codewords and subcodes is not determined by the Tutte polynomial. However the number of minimal codewords and the number of minimal subcodes of a given dimension are given by the M¨ obius polynomial. Theorem 5.4.37 Let C be a code of dimension k. Let 0 ≤ r ≤ k. Then the number of minimal subcodes of dimension r is equal to Wk−r , the (r − k)th Whitney number of the second kind and it is determined by the M¨ obius polynomial. Proof. Let D be a subcode of C of dimension r. Let J be the complement in [n] of the support of D. If d ∈ D and dj 6= 0, then j ∈ supp(D) and j 6∈ J. Hence D ⊆ C(J). Now suppose moreover that D is a minimal subcode of C. Without loss of generality we may assume that D is systematic at the first r positions. So D has a generator matrix P of the form (Ir A). Let dj be the jth row of this r matrix. Let c ∈ C(J). If c − j=1 cj dj is not the zero word, then the subcode 0 D of C generated by c, d2 , . . . , dr has dimension r and its support is contained in supp(D)\{1} and 1 ∈ supp(D). This contradicts the minimality of D. Hence Pr c − j=1 cj dj = 0 and c ∈ D. Therefore D = C(J). To find a minimal subcode of dimension r, we fix l(J) = r and minimalize the support of C(J) with respect to inclusion. Because J is contained in the complement in [n] of the support of C(J), this is equivalent to maximize J with respect to inclusion. In matroid terms this means we are maximizing J for r(J) = k − l(J) = k − r. This means J = J¯ is a flat of rank k − r by Remark 5.3.45. The flats of a matroid are the elements in the geometric lattice L = L(M ). The number of (k − r)dimensional elements in L(M ) is equal to Lk−r , which is equal to the Whitney number of the second kind Wk−r and thus equal to the leading coefficient of µk−r (T ) by Remark 5.4.9. Hence the M¨obius polynomial determines all the numbers of minimal subcodes of dimension r for 0 ≤ r ≤ k.
5.4. CHARACTERISTIC POLYNOMIAL
157
Remark 5.4.38 Note that the flats of dimension k − r in a matroid are exactly the hyperplanes in the (r − 1)th truncated matroid T r−1 (M ). This gives another proof of the result of Britz [28, Theorem 3] that the minimal supports of dimension r are the cocircuits of the (r − 1)th truncated matroid. For r = 1, this gives the wellknown equivalence between nonzero minimal codewords and cocircuits See [?, Theorem 9.2.4] and [123, 1.21].
5.4.5
Two variable zeta function
Generally the counting of rational points over field extensions Fqm is computed by the zeta function. Definition 5.4.39 Let X be an affine variety in Ak defined over Fq , that is the zeroset of a collection of polynomials in Fq [X1 , . . . , Xk ]. Then X (Fqm ) is the set of all points X with coordinates in Fqm , also called the the set of Fqm rational points of X . The zeta function ZX (T ) of X is the formal power series in T defined by ! ∞ X X (Fqm ) r T . ZX (T ) = exp r m=1 Theorem 5.4.40 Let A be a central simple arrangement in Fkq . Let χA (T ) = Pk j k j=0 cj T be the characteristic polynomial of A. Let M = A \ (H1 ∪ · · · ∪ Hn ) be the complement of the arrangement. Then the zeta function of M is given by: k Y ZM (T ) = (1 − q j T )−cj . j=0
Proof. See [17, Theorem 3.6]. Two variable zeta function of Duursma
5.4.6
Overview
We have established relations between the generalized weight enumerators for 0 ≤ r ≤ k, the extended weight enumerator and the Tutte polynomial. We summarize this in the following diagram: WC (X, Y ) m [
4.5.21
Y, T ) 5 WC (X, O c
4.5.23 5.2.20
y (r) {WC (X, Y )}kr=0 o j
5.2.22 5.2.22
5.2.21
/ t (X, Y ) C

(r) {WC (X, Y, T )}kr=0
158
CHAPTER 5. CODES AND RELATED STRUCTURES
We see that the Tutte polynomial, the extended weight enumerator and the collection of generalized weight enumerators all contain the same amount of information about a code, because they completely define each other. The original weight enumerator WC (X, Y ) contains less information and therefore does not (r) determine WC (X, Y, T ) or {WC (X, Y )}kr=0 . See Simonis [109]. One may wonder if the method of generalizing and extending the weight enumerator can be continued, creating the generalized extended weight enumerator, in order to get a stronger invariant. The answer is no: the generalized extended weight enumerator can be defined, but does not contain more information then the three underlying polynomials. It was shown by Gray [29] that the matroid of a code is a stronger invariant than its Tutte polynomial.
5.4.7
Exercises
5.4.1 Give a proof of the formulas in Example 5.4.6. 5.4.2 Give a proof of Remark 5.4.25. 5.4.3 Compute the two variable M¨obius and coboundary polynomial of the simplex code S3 (q).
5.5
Combinatorics and codes
***Intro***
5.5.1
Orthogonal arrays and codes
Definition 5.5.1 Let q be a positive integer, not necessarily a power of a prime. A Latin square of order q is a q × q array with entries from a set Q of q elements, such that every column and every row is a permutation of the symbols Q. Example 5.5.2 An example of a Latin square of order 4 with Q = {a, b, c, d} is given by a d c b d a b c c b a d b c d a Remark 5.5.3 An alternative way to represent a Latin square is by a map L : R × C → Q, where R, C and Q, are the sets of rows, columns and values, respectively, with all three of size q. Then L represents a Latin square if and only if L(x, j) = k has a unique solution x ∈ R, for all j ∈ C and k ∈ Q, and L(i, y) = k has a unique solution y ∈ C, for all i ∈ R and k ∈ Q. Any permutation of the rows, that is of the set R, gives another Latin square, and similarly permutations of the columns C and the entries Q give again Latin squares. Example 5.5.4 Let (G, ·) be a group where · is the multiplication on G. Let R, C and Q all three be equal to G. Let L(x, y) = x · y. Then L defines a Latin square of order G.
5.5. COMBINATORICS AND CODES
159
Remark 5.5.5 A pair of GreekLatin squares. Euler’s problem of 36 officers and the nonexistence of two mutual orthogonal Latin squares of the order 6. Definition 5.5.6 Two Latin squares L1 and L2 are called mutually orthogonal if Q2 is equal to the set of all pairs (L1 (x, y), L2 (x, y)) with x, y ∈ Q. A collection {Li : i ∈ J} of Latin squares Li of order q with entries from a set Q, is called a set of mutually orthogonal Latin squares (MOLS) if Li and Lj are mutual orthogonal for all i, j ∈ J with i 6= j. Example 5.5.7 Consider Q = Fq where + is the addition. Let La (x, y) = x + ay. Then La defines a Latin square of order q for all a ∈ F∗q . Furthermore {La : a ∈ F∗q } form a collection of q − 1 MOLS of order q. Example 5.5.8 In GAP one can constructs lists of MOLS. For example for q = 7 we can construct 6 MOLS: > M:=MOLS(7,6);; > M[1]; [ [ 0, 1, 2, 3, 4, 5, 6 ], [ 1, 2, 3, 4, 5, 6, 0 ], \\ [ 2, 3, 4, 5, 6, 0, 1 ],[ 3, 4, 5, 6, 0, 1, 2 ], \\ [ 4, 5, 6, 0, 1, 2, 3 ], [ 5, 6, 0, 1, 2, 3, 4 ], \\ [ 6, 0, 1, 2, 3, 4, 5 ] ] Definition 5.5.9 Let n ≥ 2. An orthogonal array OA(q, n) of order q and depth n is a q 2 × n array whose entries are from a set Q of q elements, such that for every two columns all q 2 pairs of symbols from Q appear in exactly one row. Remark 5.5.10 Let J = {1, 2, . . . , j}. Let {Li : i ∈ J} be a collection of j MOLS of order q. Let n = j + 2. We can construct a q 2 × n orthogonal array as follows. Identify R and C with Q by means of bijections. So we may assume that they are equal. In the first two columns all q 2 pairs of Q2 are tabulated. If (x, y) is in the row of the first two columns, then Li (x, y) is in the column i + 2 of the same row. Conversely an OA(q, n) gives rise to n − 2 MOLS of order q if n ≥ 3. In particular an OA(q, 3) is a Latin square and an OA(q, 4) corresponds to two mutual orthogonal Latin squares. Example 5.5.11 Let q be a power of a prime. Then a collection of q − 1 MOLS of order q is constructed in Example 5.5.7. Therefore there exists an OA(q, q + 1). Remark 5.5.12 Let {Li : i ∈ J} be a collection of n − 2 MOLS of order q with an array A the corresponding OA(q, n). A permutation σ of the rows R gives a collection {L0i : i ∈ J} of Latin squares which are again mutually orthogonal with a corresponding array A1 . Then A1 is obtained from A by permuting the symbols in the first column under σ and leaving the remaining columns unchanged. Similarly, a permutation of the columns C gives an array A2 that is obtained from A by permuting the symbols in the second column. A permutation of the entries from Q of Li gives an array Ai+2 that is obtained from A by permuting the symbols in the (i + 2)th column.
160
CHAPTER 5. CODES AND RELATED STRUCTURES
Remark 5.5.13 Let A be an OA(q, n) with entries in Q. Then two rows of A are distinct and coincide in at most one position. Let C be the subset of Qn consisting of the rows of A. Then C is a nonlinear code of length n with q 2 codewords and minimum distance n − 1. So C attains the Singleton bound of Exercise 3.2.1. Conversely any nonlinear (n, q 2 , n − 1) code yields an OA(q, n). The following proposition is a generalization of Proposition 4.4.25 in case k = 2, that is n ≤ q + 1 if there exists an [n, 2, n − 1] code over Fq . Proposition 5.5.14 Suppose there exists an orthogonal array OA(q, n). Then n ≤ q + 1. Proof. Let A be the array of an OA(q, n). Choose an element in Q and denote it by 0. If the symbols in the the ith column of A are permuted, where the other columns remain unchanged, the new array is again OA(q, n) by Remark 5.5.12. Therefore we may assume without loss of generality that the first row of A consists of zeros. The distance between two rows is at least n − 1 by Remark 5.5.13. Hence apart form the first row, no other row contains two zeros. Next, it can be easily observed that each element from Q occurs in every column of A exactly q times. We leave this as an exercise for the reader. Count the number of rows that contain one zero. This number is n(q − 1). Indeed, zero should appear n times in each column, but zero in the first column has already been counted. In addition, since the ith row with i > 1, cannot have more than one zero, we see that all these zeros lie in different rows. So 1 + n(q − 1) is the number of rows that contain a zero, and this is at most q 2 , the total number of rows. Therefore n ≤ q + 1. Remark 5.5.15 The bound of Proposition 5.5.14 is tight if q is a power of a prime by Example 5.5.11. Consider the following generalization of an orthogonal array. Definition 5.5.16 An orthogonal array OA(q, n, λ) is a λq 2 × n array whose entries are from a set Q of q elements, such that for every two columns any of q 2 pairs of symbols from Q occurs in exactly λ rows. In particular OA(q, n) = OA(q, n, 1). The next result we present here without a proof. It provides a lower bound on the value of λ in terms of q and n. Theorem 5.5.17 If there exists an orthogonal array OA(q, n, λ), then λ≥ Proof.
Reference: ***...**
n(q − 1) + 1 . q2
Definition 5.5.18 An orthogonal array OAλ (t, n, q) is an M × n array, where M = λq t , whose entries are from a set Q of q ≥ 2 elements, such that for every M × t subarray all q t possible ttuples occur exactly λ times as a row. The parameters λ, t, n, q and M are called the index, strength, constraints, levels and size, respectively. The orthogonal array is called linear if Q = Fq and the rows of the array form an Fq linear subspace of Fnq .
5.5. COMBINATORICS AND CODES
161
Remark 5.5.19 An OA(q, n, λ) is an orthogonal array of strength 2, that is OA(q, n, λ) = OAλ (2, n, q). ***Notice that the order of n and q is interchanged according to the literature!!! should we adopt this convention too???*** Theorem 5.5.20 The following objects correspond to each other: 1) An Fq linear [n, k, d] code, 2) A linear orthogonal array OAqs (d − 1, n, q), where s = n − k + 1 − d is the Singleton defect of C. Proof. Let C be an Fq linear [n, k, d] code with Singleton defect s = s(C) = n − k + 1 − d. Consider the q n−k × n matrix A having as rows the codewords of C ⊥ . Then A is a linear OAqs (d − 1, n, q). *** ....*** Remark 5.5.21 An OA1 (n − k, n, q) is a nonlinear generalization of an Fq linear MDS code of length n and dimension k. Consider the following a generalization of Corollary 4.4.27 on MDS codes. Theorem 5.5.22 (Bush bound) Let A be an OA1 (k, n, q). If q ≤ k, then n ≤ k + 1. Proof.
***... ***
5.5.2
Designs and codes
5.5.3
Exercises
5.5.1 Proof that Example 5.5.7 gives a set of q − 1 mutually orthogonal Latin squares of order q. 5.5.2 Let q be positive integer. Show that q − 1 is the maximal number of MOLS of order q. 5.5.3 Show that there exist t MOLS of order qr if there exist t MOLS of orders q and r, respectively. 5.5.4 Let n ≥ 3. Give a proof of the correspondence between an OA(q, n) and n − 2 MOLS of order q of Remark 5.5.10. 5.5.5 Let A be the array of an OA(q, n, λ) with entries from Q. Show that every symbol of Q occurs in every column of A exactly λq times. 5.5.6 Let A be the array of an OAλ (t, q, n) with entries from Q. Let A0 be obtained from A by permuting the symbols in a given column and leaving the remaining columns unchanged. Show that A0 is the array of an OAλ (t, q, n). 5.5.7 [CAS] Write two procedures: • first takes as an input a q x q table and checks if the table is a Latin square. Check your procedure with IsLatinSquare in GAP; • second given a list of q x q tables checks if they are MOLS. Use AreMOLS from GAP to test your procedure.
162
5.6
CHAPTER 5. CODES AND RELATED STRUCTURES
Notes
Section 4.1.6: MDS Conjecture is confirmed for all q such that 2 ≤ q ≤ 11, BlokhuisBruenThas, HirschfeldStorme. Section 4.2: Theory of arrangements of hyperplanes [92]. The use of the isomorphism in Proposition 4.5.18 for the proof of Theorem 4.5.21 was suggested in [109] by Simonis. Proposition 4.5.20 first appears in [63, Theorem 3.2], although the term “generalized weight enumerator” was yet to be invented. The identity of Lemma 4.5.22 one can find in [5, 27, 71, 128, 113]. Section 4.3: Applications of GHW’s ***dimension/length profile, Forney*** ***Wiretap channel of type II*** ***trellis complexity*** ***rth rank MDS, Kloeve,Simonis,Wei*** ***Question: two var. wt enumerator determines the generalized wit enumerator?*** ***C AMDS C ⊥ AMDS iff d2 = d1 + 2.*** ***If d > qs(C), then ...*** *** wt enumerator of AMDS code*** Section: 4.4: Theory of lattices [38, ?]. The polynomial µL (S, T ) is defined by Zaslavsky in [139, Section 1]. In [140, Section 2]. and [?, Section 6] it is called the Whitney polynomial. The polynomial χL (S, T ) is called the coboundary polynomial by Crapo in [42, p. 605] and [43]. See also [30, 32]. Blocking sets and codes meeting the Griesmer bound minihypers, blocking sets and codes meeting the Griesmer bound Belov, HamadaHelleseth, Storme Section 4.4.2: Corollary 4.3.25 was proved ur [?], with the first Oberst and D¨ n−k−1 weaker assumption q m > n−1 − , where C is an [n, k, d] code. Propod−1 d−1 sition 4.3.24 was shown by Pellikaan [?] with a stronger conclusion.
5.6. NOTES
163
(Complete) narcs, ovals, Segre: an oval is (q + 1)arc if q is odd, ***B. Segre, conic, odd curve in char 2, nucleus*** Conjectures of Segre, HirschfeldThas, HirschfeldKochmarosTorres pp. 599. Section: 4.5: Section 4.6: Literature on (mutual orthogonal) Latin squares, orthogonal arrays, codes and designs: J.H. van Lint and R.M. Wilson: A course in combinatorics Pages: 158, 250, 261, 382 and 495. P. Cameron and J.H. van Lint Designs, graphs, codes and their links. Pages: 14, 93, 170, 209. Links between coding theory and statistical objects: R.C. Bose, “On some connections between the design of experiments and information theory,” Bull. Inst. Internat. Statist., vol. 38, pp. 257–271, 1961. Connection between OA and Errorcorrecting codes with a given defect: R.C. Bose and K.A. Bush, “Orthogonal arrays of strength two and three,” Ann. Math. Stat., vol 23, pp. 508–524, 1952. The construction of OA of max length and the Bush bound. K.A. Bush, “Orthogonal arrays of index unity,” Ann. Math. Stat., vol 23, pp. 426–434, 1952. J.W.P. Hirschfeld and L. Storme, “The packing problem in statistics, coding theory and finite projective spaces,” Journ. Stat. Planning and Inference, vol. 72, pp. 355–380, 1998. The notion of a OAλ (t, q, n) as a generalization of MOLS is from: C.R. Rao, “Factorial experiments derivable from combinatorial arrangements of arrays,” Journ. Royal Stat. Soc. Suppl. vol. 9, pp. 128–139, 1947. ***BoseBush,Bierbrauer, Stinson*** ***tresilient functions, ***The design of statistical experiments. ***Lattices and codes.
164
CHAPTER 5. CODES AND RELATED STRUCTURES
Chapter 6
Complexity and decoding Stanislav Bulygin, Ruud Pellikaan and XinWen Wu 6.1
Complexity
In this section we briefly explain the theory of complexity and introduce some hard problems which are related to the theme of this book and will be useful in the following chapters.
6.1.1
BigOh notation
The following definitions and notations are essential in the evaluation of the complexity of an algorithm. Definition 6.1.1 Let f (n) and g(n) be functions mapping nonnegative integers to real numbers. We define (1) f (n) = O(g(n)) for n → ∞, if there exists a real constant c > 0 and an integer constant n0 > 0 such that 0 ≤ f (n) ≤ cg(n) for all n ≥ n0 . (2) f (n) = Ω(g(n)) for n → ∞, if there exists a real constant c > 0 and an integer constant n0 > 0 such that 0 ≤ cg(n) ≤ f (n) for all n ≥ n0 . (3) f (n) = Θ(g(n)) for n → ∞, if there exist real constants c1 > 0 and c2 > 0, and an integer constant n0 > 0 such that c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 . (4) f (n) ≈ g(n) for n → ∞, if limn→∞ f (n)/g(n) = 1. (5) f (n) = o(g(n)) for n → ∞, if for every real constant > 0 there exists an integer constant n0 > 0 such that 0 ≤ f (n) < g(n) for all n ≥ n0 . Remark 6.1.2 The notations f (n) = O(g(n)) and f (n) = o(g(n)) of Landau are often referred to as the “bigOh” and “littleoh” notations. Furthermore f (n) = O(g(n)) is expressed as “f (n) is of the order g(n)”. Intuitively, this means that f (n) grows no faster asymptotically than g(n) up to a constant. And 165
166
CHAPTER 6. COMPLEXITY AND DECODING
f (n) ≈ g(n) is expressed as “f (n) is approximately equal to g(n)”. Similarly, in the literature f (n) = Ω(g(n)) and f (n) = Θ(g(n)), are referred to as the “bigOmega”, “bigTheta”, notations, respectively. Example 6.1.3 It is easy to see that for every positive constant a, we have a = O(1) and a/n = O(1/n). Let f (n) = ak nk + ak−1 nk−1 + · · · + a0 , where k is an integer constant and ak , ak−1 , . . . , a0 are real constants with ak > 0. For this polynomial in n, we have f (n) = O(nk ), f (n) = Θ(nk ), f (n) ≈ ak nk and f (n) = o(nk+1 ) for n → ∞. We have 2 log n + 3 log log n = O(log n), 2 log n + 3 log log n = Θ(log n) and 2 log n + 3 log log n ≈ 2 log n for n → ∞, since 2 log n ≤ 2 log n + 3 log log n ≤ 5 log n when n ≥ 2 and limn→∞ log log n/ log n = 0.
6.1.2
Boolean functions
An algorithm is a welldefined computational procedure such that every execution takes a variable input and halts with an output. The complexity of an algorithm or a computational problem includes time complexity and storage space complexity. Definition 6.1.4 A (binary) elementary (arithmetic) operation is an addition, a comparison or a multiplication of two elements x, y ∈ {0, 1} = F2 . Let A be an algorithm that has as input a binary word. Then the time or work complexity CT (A, n) is the number of elementary operations in the algorithm A to get the output as a function of the length n of the input, that is the number of bits of the input. The space or memory complexity CT (A, n) is the maximum number of bits needed for memory during the execution of the algorithm with an input of n bits. The complexity C(A, n) is the maximum of CT (A, n) and CS (A, n). Example 6.1.5 Let C be a binary [n, k] code given the generator matrix G. Then the encoding procedure (a1 , . . . , ak ) 7→ (a1 , . . . , ak )G is an algorithm. For every execution of the encoding algorithm, the input is a vector of length k which represents a message block; the output is a codeword of length n. To compute one entry of a codeword one has to perform k multiplications and k − 1 additions. The work complexity of this encoding is therefore n(2k − 1). The memory complexity is nk + k + n: the number of bits needed to store the input vector, the matrix G and the output codeword. Thus the complexity is dominated by work complexity and thus is n(2k − 1). Example 6.1.6 In coding theory the code length is usually taken as a measure of an input size. In case of binary codes this coincides with the above complexity measures. For qary codes an element of Fq has a minimal binary representation by dlog(q)e bits. A decoding algorithm with a received word of length n as input can be represented by a binary word of length N = ndlog(q)e. In case the finite field is fixed there is no danger of confusion, but in case the efficiency of algorithms for distinct finite fields are compared, everything should be expressed in
6.1. COMPLEXITY
167
terms of the number of binary elementary operations as a function of the length of the input as a binary string. Let us see how this works out for solving a system of linear equations over a finite field. Whereas the addition and multiplication is counted for 1 unit in the binary case, this is no longer the case in the qary case. An addition in Fq is equal to dlog(q)e binary elementary operations and multiplication needs O(m2 log2 (p) + m log3 (p)) = O(log3 (q)) elementary operations, where q = pm and p is the characteristic of the finite field, see ??. The GaussJordan algorithm to solve a system of n linear equations in n unknowns over a finite field Fq needs O(n3 ) additions and multiplications in Fq . That means the binary complexity is O(n3 log3 (q)) = O(N 3 ), where N = ndlog(q)e is the length of the binary input. The known decoding algorithms that have polynomial complexity and that will be treated in the sequel reduce all to linear algebra computations, so they have complexity O(n3 ) elementary operations in Fq or O(N 3 ) bit operations. So we will take the code length n as a measure of the input size, and state the complexity as a function of n. These polynomial decoding algorithms apply to restricted classes of linear codes. To study the theory of complexity, two different computational models which both are widely used in the literature are the Turing machine (TM) model and Boolean circuit model. Between these two models the Boolean circuit model has an especially simple definition and is viewed more amenable to combinatorial analysis. A Boolean circuit represents a Boolean function in a natural way. And Boolean functions have a lot of applications in the theory of coding. In this book we choose Boolean circuits as the computational model. *** One of two paragraphs on Boolean Circuits vs. Turing Machines (c.f. R.B. Boppana & M. Sipser, ”The Complexity of Finite Function”) *** The basic elements of a Boolean circuit are Boolean gates, namely, AND, OR, NOT, and XOR, which are defined by the following truth tables. The truth table of AND (denoted by ∧): ∧ F T
F F F
T F T
∨ F T
F F T
T T T
The truth of OR (denoted by ∨):
The truth table of NOT (denoted by ¬): ¬
The truth table of XOR:
F T
T F
168
CHAPTER 6. COMPLEXITY AND DECODING
XOR F T
F F T
T T F
It is easy to check that XOR gate can be represented by AND, OR and NOT as the following x XOR y = (x ∧ (¬y)) ∨ ((¬x) ∧ y). The NAND operation is an AND operation followed by a NOT operation. The NOR operation is an OR operation followed by a NOT operation. In the following definition of Boolean circuits, we restrict to operations AND, OR and NOT. Substituting F = 0 and T = 1, the Boolean gates above are actually operations on bits (called logical operations on bits). We have ∧ operation: 0 0 1 1
∧ ∧ ∧ ∧
0 1 0 1
= = = =
0 0 0 1
0 0 1 1
∨ ∨ ∨ ∨
0 1 0 1
= = = =
0 1 1 1
∨ operation:
NOT operation: ¬ 0 ¬ 1
= =
1 0
Consider the binary elementary arithmetic operations + and ·. It is easy to verify that x · y = x ∧ y,
and x + y = x XOR y = (x ∧ (¬y)) ∨ ((¬x) ∧ y).
Definition 6.1.7 Given positive integers n and m, a Boolean function is a function b : {0, 1}n → {0, 1}m . It is also called an ninput, moutput Boolean function and the set of all such functions is denoted by B(n, m). Denote B(n, 1) by B(n). m
m
Remark 6.1.8 The number of elements of B(n, m) is (2n )2 = 2n2 . Identify {0, 1} with the binary field F2 . Let b1 and b2 be elements of B(n, m). Then the sum b1 + b2 is defined by (b1 + b2 )(x) = b1 (x) + b2 (x) for x ∈ Fn2 . In this way the set of Boolean functions B(n, m) is a vector space over F2 of dimension n2m . Let b1 and b2 be elements of B(n). Then the product b1 b2 is defined by (b1 b2 )(x) = b1 (x)b2 (x) for x ∈ Fn2 . In this way B(n) is an F2 algebra with the property b2 = b for all b in B(n).
6.1. COMPLEXITY
169
Every polynomial f (X) in F2 [X1 , . . . , Xn ] yields a Boolean function f˜ : Fn2 → F2 by evaluation: f˜(x) = f (x) for x ∈ Fn2 . Consider the map ev : F2 [X1 , . . . , Xn ] −→ B(n), defined by ev(f ) = f˜. Then ev is an algebra homomorphism. Now X˜i2 = X˜i for all i. Hence the ideal hX12 + X1 , . . . , Xn2 + Xn i is contained in the kernel of ev. The factor ring and F2 [X1 , . . . , Xn ]/hX12 + X1 , . . . , Xn2 + Xn i and B(n) are both F2 algebras of the same dimension n. Hence ev induces an isomorphism ev : F2 [X1 , . . . , Xn ]/hX12 + X1 , . . . , Xn2 + Xn i −→ B(n). Example 6.1.9 Let symk (x) be the Boolean function defined by the following polynomial in k 2 variables xij , 1 ≤ i, j ≤ k, symk (x) =
k X k Y
xij .
i=1 j=1
Hence this description needs k(k −1) additions and k −1 multiplications. Therefore k 2 − 1 elementary operations are needed in total. If we would have written symk in normal form by expanding the products, the description is of the form k X Y
symk (x) =
xiσ(i) ,
σ∈K K i=1
where K K is the set of all functions σ : {1, . . . , k} → {1, . . . , k} . This expression has k k terms of products of k factors. So this needs (k − 1)k k multiplications and k k − 1 additions. Therefore k k+1 − 1 elementary operations are needed in total. Hence this last description has exponential complexity. Example 6.1.10 Computing the binary determinant. Let detk (x) be the Boolean function of k 2 variables xij , 1 ≤ i, j ≤ k, that computes the determinant over F2 of the k × k matrix x = (xij ). Hence detk (x) =
k X Y
xiσ(i) ,
σ∈Sk i=1
where Sk is the symmetric group of k elements. This expression has k! terms of products of k factors. Therefore k(k!) − 1 elementary operations are needed in total. Let x ˆij be the the square matrix of size k − 1 obtained by deleting the ith row and the jth column from x. Using the cofactor expansion detk (x) =
k X
xij detk (ˆ xij ),
j=1
we see that the complexity of this computation is of the order O(k!). This complexity is still exponential. But detk has complexity O(k 3 ) by Gaussian elimination. This translates in a description of detk as a Boolean function with O(k 3 ) elementary operations. ***explicit description and worked out in and example in det3 .***
170
CHAPTER 6. COMPLEXITY AND DECODING
Example 6.1.11 A Boolean function computing whether an integer is prime or not. Let primem (x) be the Boolean function that is defined by 1 if x1 + x2 2 + · · · + xm 2m−1 is a prime, primem (x1 , . . . , xm ) = 0 otherwise. So prime2 (x1 , x2 ) = x2 and prime3 (x1 , x2 , x3 ) = x2 + x1 x3 + x2 x3 . Only very recently it was proved that the decision problem whether an integer is prime or not, has polynomial complexity, see ??. Example 6.1.12 ***A Boolean function computing exponentiation expa by Coppersmith and Shparlinski, “A polynomial approximation of DL and DH mapping,” Journ. Crypt. vol. 13, pp. 339–360, 2000. *** Remark 6.1.13 From these examples we see that the complexity of a Boolean function depends on the way we write it as a combination of elementary operations. We can formally define the complexity of a Boolean function f in terms of the size of a circuit that represents the Boolean function. Definition 6.1.14 A Boolean circuit is a directed graph containing no cycles (that is, if there is a route from any node to another node then there is no way back), which has the following structure: (i) Any node (also called vertex ) v has indegree (that is, the number of edges entering v) equal to 0, 1 or 2, and the outdegree (that is, the number of edges leaving v) equal to 0 or 1. (ii) Each node is labeled by one of AND, OR, NOT, 0, 1, or a variable xi . (iii) If a node has indegree 0, then it is called an input and is labeled by 0, 1, or a variable xi . (iv) If a node has indegree 1 and outdegree 1, then it is labeled by NOT. (v) If a node has indegree 2 and outdegree 1, then it is labeled by AND or OR. In a Boolean circuit, any node with indegree greater than 0 is called a gate. Any node with outdegree 0 is called an output. Remark 6.1.15 By the definition, we observe that: (1) A Boolean circuit can have more than one input and more than one output. (2) Suppose a Boolean circuit has n variables x1 , x2 , . . . , xn , and has m outputs, then it represents a Boolean function f : {0, 1}n → {0, 1}m in a natural way. (3) Any Boolean function f : {0, 1}n → {0, 1}m can be represented by a Boolean circuit.
6.1. COMPLEXITY
171
Definition 6.1.16 The size of a Boolean circuit is the number of gates that it contains. The depth of a Boolean circuit is the length of the longest path from an input to an output. For a Boolean function f , the time complexity of f , denoted by CT (f ), is the smallest value of the sizes of the Boolean circuits representing f . The space complexity (also called depth complexity), denoted by CS (f ) is the smallest value of the depths of the Boolean circuits representing f . Theorem 6.1.17 (Shannon) Existence of a family of Boolean function of exponential complexity. Proof. Let us first give a upper bound on the number of circuits with n variables and size s; and then compare it with the number of Boolean functions of n variables. In a circuit of size s, each gate is assigned an AND or OR operator that on two previous nodes *** this phrase is strangely stated. ***. Each previous node can either be a previous gate with at most s choices, a literal (that is, a variable or its negation) with 2n choices, or a constant with 2 choices. Therefore, each gate has at most 2(s+2n+2)2 choices, which implies that the number of circuits with n variables and size s is at most 2s (s + 2n + 2)2s . Now, setting s = 2n /(10n), n n the upper bound 2s (s + 2n + 2)2s is approximately 22 /5 22 . On the other n hand, the number of Boolean functions of n variables and one output is 22 . This implies that almost every Boolean function requires circuits of size larger than 2n /(10n).
6.1.3
Hard problems
We now look at the classification of algorithms through the complexity. Definition 6.1.18 Let Ln (α, a) = O(exp(anα (ln n)1−α )), where a and α are constants with 0 ≤ a and 0 ≤ α ≤ 1. In particular Ln (1, a) = O(exp(an)), and Ln (0, a) = O(exp(a ln n)) = O(na ). Let A denote an algorithm with input size n. Then A is an L(α)algorithm if the complexity of this algorithm has an estimate of the form Ln (α, a) for some a. An L(0)algorithm is called a polynomial algorithm and an L(1)algorithm is called an exponential algorithm. An L(α)algorithm is called an subexponential algorithm if α < 1. A problem that has either YES or NO as an answer is called a decision problem. All the computational problems that will be encountered here can be phrased as decision problems in such a way that an efficient algorithm for the decision problem yields an efficient algorithm for the computational problem, and vice versa. In the following complexity classes, we restrict our attention to decision problems. Definition 6.1.19 The complexity class P is the set of all decision problems that are solvable in polynomial complexity.
172
CHAPTER 6. COMPLEXITY AND DECODING
Definition 6.1.20 The complexity class NP is the set of all decision problems for which a YES answer can be verified in polynomial time given some extra information, called a certificate. The complexity class coNP is the set of all decision problems for which a NO answer can be verified in polynomial time given an appropriate certificate. Example 6.1.21 Consider the decision problem that has as input a generator matrix of a code C and a positive integer w, with question “d(C) ≤ w?” In case the answer is yes, there exists a codeword c of minimum weight d(C). Then c is a certificate and the verification wt(c) ≤ w has complexity n. Definition 6.1.22 Let D1 and D2 be two computational problems. Then D1 is said to polytime reducible to D2 , denoted as D1 ≤P D2 , provided that there exists an algorithm A1 that solves D1 which uses an algorithm A2 that solves D2 , and A1 runs in polynomial time if A2 does. Informally, if D1 ≤P D2 , we say D1 is no harder than D2 . If D1 ≤P D2 and D2 ≤P D1 , then D1 and D2 are said to be computationally equivalent. Definition 6.1.23 A decision problem D is said to be NPcomplete if • D ∈ NP, and • E ≤P D for every E ∈ NP. The class of all NPcomplete problems is denoted by NPC. Definition 6.1.24 A computational problem (not necessarily a decision problem) is NPhard if there exists some NPcomplete problem that polytime reduces to it. Observe that every NPcomplete problem is NPhard. So the set of all NPhard problems contains NPC as a subset. Some other relationships among the complexity classes above are illustrated as follows.
******A Figure******
It is natural to ask the following questions (1) Is P = NP ? (2) Is NP = coNP ? (3) Is P = NP ∩ coNP ? Most experts are of the opinion that the answer to each of these questions is NO. However no mathematical proofs are available, and to answer these questions is an interesting and hard problem in theoretical computer science.
6.1.4
Exercises
6.1.1 Give an explicit expression of det3 (x) as a Boolean function. 6.1.2 Give an explicit expression of prime4 (x) as a Boolean function. 6.1.3 Give an explicit expression of expa (x) as a Boolean function, where ....
6.2. DECODING
6.2
173
Decoding
*** intro***
6.2.1
Decoding complexity
The known decoding algorithms that work for all linear codes have exponential complexity. Now we consider some of them. Remark 6.2.1 The brute force method compares the distance of a received word with all possible codewords, and chooses a codeword of minimum distance. The time complexity of the brute force method is at most nq k . Definition 6.2.2 Let r be a received word with respect to a code C of dimension k. Choose an (n − k) × n parity check matrix H of the code C. Then s = rH T ∈ Fn−k is called the syndrome of r. q Remark 6.2.3 Let C be a code of dimension k. Let r be a received word. Then r + C is called the coset of r. Now the cosets of the received words r1 and r2 are the same if and only if r1 H T = r2 H T . Therefore there is a one to one correspondence between cosets of C and values of syndromes. Furthermore every element of Fn−k is the syndrome of some received word r, since H has q rank n − k. Hence the number of cosets is q n−k . Remark 6.2.4 In Definition 2.4.10 of coset leader decoding no mention is given of how this method is implemented. Cosetleader decoding can be done in two ways. Let H be a parity check matrix and G a generator matrix of C. 1) Preprocess a lookup table and store it in memory with a list of pairs (s, e), where e is a coset leader of the coset with syndrome s ∈ Fn−k . q T Suppose a received word r is the input, compute s = rH ; look at the unique pair (s, e) in the table with s as its first entry; give r − e as output. 2) For a received word r, compute s = rH T ; compute a solution e of minimal weight of the equation eH T = s; give r − e as output. Now consider the complexity of the two methods for coset leader decoding: 1) The space complexity is clearly q n−k the number of elements in the table. The time complexity is O(k 2 (n − k)) for finding the solution c. The preprocessing of the table has time complexity q n−k by going through all possible error patterns e of nondecreasing weight and compute s = eH T . Put (s, e) in the list if s is not already a first entry of a pair in the list. 2) Go through all possible error patterns e of nondecreasing weight and compute s = eH T and compare it with rH T , where r is the received word. The first instance where eH T = rH T gives a closest codeword c = r − e. The complexity is at most Bρ n2 for finding a coset leader, where ρ is the covering radius, by Remark 2.4.9. ***Now Bρ  ≈ ... *** Example 6.2.5 ***[7,4,3] Hamming codes and other perfect codes, some small non perfect codes.*** In order to compare their complexities we introduce the following definitions. ***work factor, memory factor***
174
CHAPTER 6. COMPLEXITY AND DECODING
Definition 6.2.6 Let the complexity of an algorithm be exponential O(q en ) for n → ∞. Then e is called the complexity exponent of the algorithm. Example 6.2.7 The complexity exponent of the brute force method is R and of coset leader decoding is 1 − R, where R is the information rate. ***Barg, van Tilburg. picture***
6.2.2
Decoding erasures
***hard/soft decision decoding, (de)modulation, signalling*** After receiving a word there is a stage at the beginning of the decoding process where a decision has to be made about which symbol has been received. In some applications it is desirable to postpone a decision and to put a question mark ”?” as a new symbol at that position, as if the symbol was erased. This is called an erasure. So a word over the alphabet Fq with erasures can be viewed as a word in the alphabet Fq ∪ {?}, that is an element of (Fq ∪ {?})n . If only erasures occur and the number of erasures is at most d−1, then we are sure that there is a unique codeword that agrees with the received word at all positions that are not an erasure. Proposition 6.2.8 Let d be the minimum distance of a code. Then for every received word with t errors and s erasures such that 2t + s < d there is a unique nearest codeword. Conversely, if d ≤ 2t + s then there is a received word with at most t errors and s erasures with respect to more than one codeword. Proof. This is left as an exercise to the reader.
Suppose that we have received a word with s erasures and no errors. Then the brute force method would fill in all the possible q s words at the erasure positions and check whether the obtained word is a codeword. This method has complexity O(n2 q s ), which is exponential in the number of erasures. In this section it is shown that correcting erasures only by solving a system of linear equations. This can be achieved by using the generator matrix or the parity check matrix. The most efficient choice depends on the rate and the minimum distance of the code. Proposition 6.2.9 Let C be a code in Fnq with parity check matrix H and minimum distance d. Suppose that the codeword c is transmitted and the word r is received with no errors and at most d − 1 erasures. Let J be the set of erasure positions of r. Let y ∈ Fnq be defined by yj = rj if j 6∈ J and yj = 0 otherwise. Let s = yH T be the syndrome of y. Let e = y − c. Then wt(e) < d and e is the unique solution of the following system of linear equations in x: xH T = s and xj = 0 for all j 6∈ J. Proof. By the definitions we have that s = yH T = cH T + eH T = 0 + eH T = eH T .
6.2. DECODING
175
The support of e is contained in J. Hence ej = 0 for all j 6∈ J. Therefore e is a solution of the system of linear equations. If x is another solution, then (x − e)H T = 0. Therefore x − e is an element of C, and moreover it is supported at J. So its weight is at most d(C) − 1. Hence it must be zero. Therefore x = e. The above method of correcting the erasures only by means of a parity check matrix is called syndrome decoding up to the minimum distance. Definition 6.2.10 Let the complexity of an algorithm be f (n) with f (n) ≈ cne for n → ∞. Then the algorithm is called polynomial of degree e with complexity coefficient c. Corollary 6.2.11 The complexity of correcting erasure only by means of syndrome decoding up to the minimum distance is polynomial of degree 3 and complexity coefficient 31 (1 − R)2 δ for a code of length n → ∞, where R is the information rate and δ the relative minimum distance. Proof. This is consequence of Proposition 6.2.9 which amounts to solving a system of n − k linear equations in at most d − 1 unknowns, in order to get the error vector e. Then c = y − e is the codeword sent. We may assume that the encoding is done systematically at k positions, so the message m is immediately read off from these k positions. The complexity is asymptotically of the order: 1 1 2 2 3 3 (n − k) d = 3 (1 − R) δn for n → ∞. See Appendix ??. Example 6.2.12 Let C be the binary [7, 4, 3] Hamming code with parity check matrix given in Example 2.2.9. Suppose that r = (1, 0, ?, ?, 0, 1, 0) is a received word with two erasures. Replace the erasures by zeros by y = (1, 0, 0, 0, 0, 1, 0). The syndrome of y is equal to yH T = (0, 0, 1). Now we want to solve the system of linear equations xH T = (1, 1, 0) and xi = 0 for all i 6= 3, 4. Hence x3 = 1 and x4 = 1 and c = (1, 0, 1, 1, 0, 1, 0) is the transmitted codeword. Example 6.2.13 Consider the MDS code C1 over F11 of length 11 and dimension 4 with generator matrix G1 as given in Proposition 3.2.10 with xi = i ∈ F11 for i = 1, . . . , 11. Let C be the dual code of C1 . Then C is a [11, 7, 5] code by Corollary 3.2.7, and H = G1 is a parity check matrix for C by Proposition 2.3.19. Suppose that we receive the following word with 4 erasures and no errors. r = (1, 0, ?, 2, ?, 0, 0, 3, ?, ?, 0). What is the sent codeword ? Replacing the erasures by 0 gives the word y = (1, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0). So yH T = (6, 0, 5, 4). Consider the linear system of equations given by the 4 × 4 submatrix of H consisting of the columns corresponding to the erasure positions 3, 5, 9 and 10 together with the column HyT . 1 1 1 1 6 3 5 9 10 0 9 3 4 1 5 . 5 4 3 10 4
176
CHAPTER 6. COMPLEXITY AND DECODING
After Gaussian elimination we see that (0, 8, 9, 0)T is the unique solution of this system of linear equations. Hence c = (1, 0, 0, 2, 3, 0, 0, 3, 2, 0, 0) is the codeword sent. Remark 6.2.14 Erasures only correction by means of syndrome decoding is efficient in case the information rate R is close to 1 and the relative minimum distance δ is small, but cumbersome if R is small and δ is close to 1. Take for instance the [n, 1, n] binary repetition code. Any received word with n − 1 erasures is readily corrected by looking at the remaining unerased position, if it is 0, then the all zero word was sent, and if it is 1, then the all one word was sent. With syndrome decoding one should solve a system of n − 1 linear equations in n − 1 unknowns. The following method to correct erasures only uses a generator matrix of a code. Proposition 6.2.15 Let G be a generator matrix of an [n, k, d] code C over Fq . Let m ∈ Fkq be the transmitted message. Let s be an integer such that s < d. Let r be the received word with no errors and at most s erasures. Let I = {j1 , . . . , jn−s } be the subset of size n − s that is in the complement of the be defined by yi = rji for i = 1, . . . , n − s. erasure positions. Let y ∈ Fn−s q Let G0 be the k × (n − s) submatrix of G consisting of the n − s columns of G corresponding to the set I. Then xG0 = y has a unique solution m, and mG is the codeword sent. Proof. The Singleton bound 3.2.1 states that k ≤ n − d + 1. So k ≤ n − s. Now mG = c is the codeword sent and yi = rji = cji for i = 1, . . . , n − s. Hence mG0 = y and m is a solution. Now suppose that x ∈ Fkq satisfies xG0 = y, then (m − x)G is a codeword that has a zero at n − s positions, so its weight is at most s < d. So (m − x)G is the zero codeword and xG0 = mG0 . Hence m − x = 0, since G has rank k. The above method is called correcting erasures only up to the minimum distance by means of the generator matrix. Corollary 6.2.16 The complexity of correcting erasures only up to the minimum distance by means of the generator matrix is polynomial of degree 3 and complexity coefficient R2 (1 − δ − 32 R) for a code of length n → ∞, where R is the information rate and δ the relative minimum distance. Proof. This is consequence of Proposition 6.2.15. The complexity is that of solving a system of k linear equations in at most n − d + 1 unknowns, which is asymptotically of the order: (n − d − 23 k)k 2 = R2 (1 − δ − 32 R)n3 for n → ∞. See Appendix ??. ***picture, comparison of G and H method** Example 6.2.17 Let C be the [7, 2, 6] extended ReedSolomon code over F7 with generator matrix 1 1 1 1 1 1 1 G= 0 1 2 3 4 5 6
6.2. DECODING
177
Suppose that (?, 3, ?, ?, ?, 4, ?) is a received word with no errors and 5 erasures. By means of the generator matrix we have to solve the following linear system of equations: x1 + x2 = 3 x1 + 5x2 = 4 which has (x1 , x2 ) = (1, 2) as solution. Hence (1, 2)G = (1, 3, 5, 0, 2, 4, 6) was the transmitted codeword. With syndrome decoding a system of 5 linear equations in 5 unknowns must be solved. Remark 6.2.18 For MDS codes we have asymptotically R ≈ 1 − δ and correcting erasures only by syndrome decoding and by a generator matrix has complexity coefficients 13 (1 − R)3 and 13 R3 , respectively. Therefore syndrome decoding decoding is preferred for R > 0.5 and by a generator matrix if R < 0.5.
6.2.3
Information and covering set decoding
The idea of this section is to decode by finding errorfree positions in a received word, thus localizing errors. Let r be a received word written as r = c + e, where c is a codeword from an [n, k, d] code C and e is an error vector with the support supp(e). Note that if I is some information set (Definition 2.2.20) such that supp(e) ∩ I = ∅, then we are actually able to decode. Indeed, as supp(e) ∩ I = ∅, we have that r(I) = c(I) (Definition 3.1.2). Now if we denote by G the generator matrix of C, then the submatrix G(I) can be transformed to (−1)
the identity matrix Idk . Let G0 = M G, where M = G(I) , so that G0(I) = Idk , see Proposition 2.2.22. Thus a unique solution m ∈ Fkq of mG = c, can be found as m = r(I) M , because mG = r(I) M G = r(I) G0 and the latter restricted to the positions of I yields r(I) = c(I) . Now the algorithm, called information set decoding exploiting this idea is presented in Algorithm 6.1. Algorithm 6.1 Information set decoding Input:  Generator matrix G of an [n, k] code C  Received word r  I(C) a collection of all the information sets of a given code C Output: A codeword c ∈ C, such that d(r, c) = d(r, C) Begin c := 0; for I ∈ I(C) do (−1) G0 := G(I) G c0 := r(I) G0 if d(c0 , r) < d(c, r) then c = c0 end if end for return c End Theorem 6.2.19 The information set decoding algorithm performs minimum distance decoding.
178
CHAPTER 6. COMPLEXITY AND DECODING
Proof. Let r = c + e, where wt(e) = d(r, C). Let rH = eH = s. Then e is a coset leader with the support E = supp(e) in the coset with the syndrome s. It is enough to prove that there exists some information set disjoint with E, or, equivalently, some check set (Definition 2.3.9) that contains E. Consider an (n − k) × E submatrix H(E) of the parity check matrix H. As e is a coset leader, we have that for no other vector v in the same coset is supp(v) ( E. Thus the subsystem of the parity check system defined by positions from E has a unique solution e(E) . Otherwise it would be possible to find a solution with support a proper subset of E. The above implies that rank(H(E) ) = E ≤ n−k. Thus E can be expanded to a check set. For a practical application it is convenient to choose the sets I randomly. Namely, we choose some ksubsets randomly in hope that after some reasonable amount of trials we encounter the one that is an information set and errorfree. Algorithm 6.2 Probabilistic information set decoding Input:  Generator matrix G of an [n, k] code C  Received word r  Number of trials Ntrials (n, k) Output: A codeword c ∈ C Begin c := 0; Ntr := 0; repeat Ntr := Ntr + 1; Choose uniformly at random a subset I of {1, . . . , n} of cardinality k. if G(I) is invertible then (−1)
G0 := G(I) G c0 = r(I) G0 if d(c0 , r) < d(c, r) then c := c0 end if end if until Ntr < Ntrials (n, k) return c End We would like to estimate the complexity of the probabilistic information set decoding for the generic codes. Parameters of generic codes are computed in Theorem 3.3.6. We now use this result and notations to formulate the following result on complexity. Theorem 6.2.20 Let C be a generic [n, k, d] qary code, with the dimension k = Rn, 0 < R < 1 and the minimum distance d = d0 , so that the covering radius is d0 (1 + o(1)). If Ntrials (n, k) is at least n−k n / , σ·n· d0 d0
6.2. DECODING
179
*** sigma is 1/pr(nxn matrix is invertible) add to Theorem 3.3.7*** then for large enough n the probabilistic information set decoding algorithm for the generic code C performs minimum distance decoding with negligibly small decoding error. Moreover the algorithm is exponential with complexity exponent δ0 , (6.1) CCq (R) = (logq 2) H2 (δ0 ) − (1 − R)H2 1−R where H2 is the binary entropy function. Proof. In order to succeed in the algorithm, we need that the set I chosen at a certain iteration is errorfree and that the corresponding submatrix of G is invertible. The probability P (n, k, d0 ) of this event is n − d0 n n−k n / σq (n) = / σq (n). k k d0 d0 Therefore the probability that I fails to satisfy these properties is n−k n 1− / σq (n). d0 d0 Considering the assumption on Ntrials (n, k) we have that probability of notfinding an errorfree information set after Ntrials (n, k) trials is (1 − P (n, k, d0 ))n/P (n,k,d0 ) = O(e−n ), which is negligible. Next, due to the fact that determining whether G(I) is invertible and performing operations in the ifpart have polynomial time complexity, we have that Ntrials (n, k) dominates time complexity. Our task now is to give an asymptotic estimate of the latter. First, d0 = δ0 n, where δ0 = Hq−1 (1 − R), see Theorem 3.3.6. Then, using Stirling’s approximation log2 n! = n log2 n − n + o(n), we have n−1 log2 dn0 = n−1 n log2 n − d0 log2 d0 − (n − d0 ) log2 (n − d0 ) + o(n) = = log2 n − δ0 log2 (δ0 n) − (1 − δ0 ) log2 ((1 − δ0 )n) + o(1) = H2 (δ0 ) + o(1) Thus logq
n = (nH2 (δ0 ) + o(n)) log2 q. d0
Analogously n−k δ0 logq = (n(1 − R)H2 + o(n)) log2 q, d0 1−R where n − k = (1 − R)n. Now logq Ntrials (n, k) = logq n + logq σ + logq
n n−k + logq . d0 d0
Considering that the first two summands are dominated by the last two, the claim on the complexity exponent follows.
180
CHAPTER 6. COMPLEXITY AND DECODING
0.5 0.4 0.3
ES
SD
0.2 0.1
CS
0.2
0.4
0.6
0.8
1
Figure 6.1: Exhaustive search, syndrome decoding, and information set algorithm If we depict complexity coefficient of the exhaustive search, syndrome decoding, and the probabilistic information set decoding, we see that the information set decoding is strongly superior to the former two, see Figure 6.1. We may think of the above algorithms in a dual way using check sets instead of information sets and parity check matrices instead of generator matrices. The set of all check sets is closely related with the socalled covering systems, which we will consider a bit later in this section and which give the name for the algorithm. Algorithm 6.3 Covering set decoding Input:  Parity check matrix H of an [n, k] code C  Received word r  J (C) a collection of all the check sets of a given code C Output: A codeword c ∈ C, such that d(r, c) = d(r, C) Begin c := 0; s := rH T ; for J ∈ J (C) do −1 T ) ; e0 := s · (H(J) Compute e such that e(J) = e0 and ej = 0 for j not in J; c0 := r − e; if d(c0 , r) < d(c, r) then c = c0 end if end for return c End Theorem 6.2.21 The covering set decoding algorithm performs minimum dis
6.2. DECODING
181
tance decoding. Proof. Let r = c + e as in the proof of Theorem 6.2.19. From that proof we know that there exists a check set J such that supp(e) ⊂ J. Now we have HrT = HeT = H(J) e(J) . Since for the check set J, the matrix H(J) is invertible, we may find e(J) and thus e. Similarly to Algorithm 6.1 one may define the probabilistic version of the covering set decoding. As we have already mentioned the covering set decoding algorithm is closely related to the notion of a covering system. The overview of this notion follows next. Definition 6.2.22 Let n, l and t be integers such that 0 < t ≤ l ≤ n. An (n, l, t) covering system is a collection J of subsets J of {1, . . . , n}, such that every J ∈ J has l elements and every subset of {1, . . . , n} of size t is contained in at least one J ∈ J . The elements J of a covering system J are also called blocks. If a subset T of size t is contained in a J ∈ J , then we say that T is covered or trapped by J. Remark 6.2.23 From the proofs of Theorem 6.2.21 for almost all codes it is enough to find a a collection J of subsets J of {1, . . . , n}, such that all J ∈ J have n − k elements and every subset of {1, . . . , n} of size ρ = d0 + o(1) is contained in at least one J ∈ J , thus obtaining (n, n − k, d0 ) covering system. Example 6.2.24 The collection of all subsets of {1, . . . , n} of size l is an (n, l, t) covering system for all 0 < t ≤ l. This collection consists of nl blocks. Example 6.2.25 Consider F2q , the affine plane over Fq . Let n = q 2 be the number of its points. Then every line consists of q points, and every collection of two points is covered by exactly one line. Hence there exists a (q 2 , q, 2) covering system. Every line that is not parallel to the xaxis is given by a unique equation y = mx + c. There are q 2 of such lines. And there are q lines parallel to the yaxis. So the total number of lines is q 2 + q. Example 6.2.26 Consider the projective plane over Fq as treated in Section 4.3.1. Let n = q 2 + q + 1 be the number of its points. Then every line consists of q + 1 points, and every collection of two points is covered by exactly one line. There are q + 1 lines. Hence there exists a (q 2 + q + 1, q + 1, 2) covering system consisting of q + 1 blocks. Remark 6.2.27 The number of blocks of an (n, l, t) covering system is consid erably smaller than the number of all possible tsets. It is still at least nt / tl . But also this number grows exponentially in n if λ = limn→∞ l/n > 0 and τ = limn→∞ t/n > 0. Definition 6.2.28 The covering coefficient b(n, l, t) is the smallest integer b such that there is an (n, l, t) covering system consisting of b blocks. Although the exact value of the covering coefficient b(n, l, t) is an open problem we do know its asymptotic logarithmic behavior.
182
CHAPTER 6. COMPLEXITY AND DECODING
Proposition 6.2.29 Let λ and τ be constants such that 0 < τ < λ < 1. Then 1 log b(n, bλnc, bτ nc) = H2 (τ ) − λH2 (τ /λ), n→∞ n lim
Proof. *** I suggest to skip the proof *** In order to establish this asymptotical result we prove lower and upper bounds, which are asymptotically identical. First the lower bound on b(n, l, t). Note that every ltuple traps tl ttuples. Therefore, one needs at least n l / t t n ltuples. Now we use the relation from [?]: 2nH2 (θ)−o+ (n) ≤ θn ≤ 2nH2 (θ) for 0 < θ < 1, where o+ (n) is a nonnegative function, such that o+ (n) = o(n). Applying this lower bound for l = bλnc and t = bτ nc we have bτnnc ≥ 2nH2 (τ )−o+ (n) and bλnc ≤ 2nλH2 (τ /λ) . Therefore, bτ nc b(n, bλnc, bτ nc) ≥
bλnc n bτ nc / bτ nc n H2 (τ )−λH2 (τ /λ) +o+ (n)
≥ 2
For a similar lower bound see Exercise ??. Now the upper bound. Consider a set S with f (n, l, t) independently and uniformly randomly chosen ltuples, such that n f (n, l, t) =
l n−t n−l
· cn,
where c > ln 2. The probability that a ttuple is not trapped by any tuple from S is n−t n f (n,l,t) 1− / . n−l l Indeed, the number of all ltuples is nl , the probability of trapping a given ttuple T1 by an ltuple T2 is the same as probabilityof trapping the complement n of T2 by the complement of T1 and is equal to n−t / . The expected number n−l l of nontrapped ttuples is then n n−t n f (n,l,t) 1− / . t n−l l Using the relation limx→∞ (1 − 1/x)x = e−1 and the expression for f (n, l, t) we have that the expected number above tends to T = 2nH2 (t/n)+o(n)−cn log e From the condition on c we have that T < 1. This implies that among all the sets with f (n, l, t) independently and uniformly randomly chosen ltuples, there exists one that traps all the ttuples. Thus b(n, l, t) ≤ f (n, l, t). By the wellknown combinatorial identities n n−t n l n l = = , t n−l n−l t l t
6.2. DECODING
183
we have that for t = bτ nc and l = bλnc holds b(n, l, t) ≤ 2n
H2 (τ )−λH2 (τ /λ) +o(n)
,
which asymptotically coincides with the lower bound proven above.
Let us now turn to the case of bounded distance decoding. So here we are aiming at correcting some t errors, where t < ρ. The complexity result for almost all codes is obtained by substituting t/n instead of δ0 in (6.1). In particular, for decoding up to half the minimum distance for almost all codes we have the following result. Corollary 6.2.30 If Ntrials (n, k) is at least n n−k n· / , d0 /2 d0 /2 then covering set decoding algorithm for almost all codes performs decoding up to half the minimum distance with negligibly small decoding error. Moreover the algorithm is exponential with complexity coefficient δ0 . (6.2) CSBq (R) = (logq 2) H2 (δ0 /2) − (1 − R)H2 2(1 − R) We are interested now in bounded decoding up to t ≤ d − 1. For almost all (long) codes the case t = d − 1 coincides with the minimum distance decoding, see... . From Proposition 6.2.9 it is enough to find a collection J of subsets J of {1, . . . , n}, such that all J ∈ J have d − 1 elements and every subset of {1, . . . , n} of size t is contained in at least one J ∈ J . Thus we need an (n, d − 1, t) covering system. Let us call this the erasure set decoding. Example 6.2.31 Consider a code of length 13, dimension 9 and minimum distance 5. The number of all 2sets of {1, . . . , 13} is equal to 13 2 = 78. In order to correct two errors one has to compute the linear combinations of two columns of a parity check matrix H, for all the 78 choices of two columns, and see whether it is equal to rH T for the received word r. An improvement can be obtained by a covering set. Consider the projective plane over F3 as in Example 6.2.26. Hence we have a (13, 4, 2) covering system. Using this covering system there are 13 subsets of 4 elements for which one has to find HrT as a linear combination of the corresponding columns of the parity check matrix. So we have to consider 13 times a system of 4 linear equations in 4 variables instead of 78 times a system of 4 linear equations in 2 variables. From Proposition 6.2.29 and Remark ?? we have the complexity result for erasure set decoding. Proposition 6.2.32 Erasure set decoding performs bounded distance decoding for every t = αδ0 n, 0 < α ≤ 1. The algorithm is exponential with complexity coefficient ESq (R) = (logq 2) H2 (αδ0 ) − δ0 H2 (α) . (6.3) Proof.
The proof is left to the reader as an exercise.
It can be shown, see Exercise 6.2.7, that erasure set decoding is interior to covering set for all α. ***Permutation decoding, HuffmanPless 10.2, ex Golay q=3, exer q=2***
184
6.2.4
CHAPTER 6. COMPLEXITY AND DECODING
Nearest neighbor decoding
***decoding using minimal codewords.
6.2.5
Exercises
6.2.1 Count an erasure as half an error. Use this idea to define an extension of the Hamming distance on (Fq ∪ {?})n and show that it is a metric. 6.2.2 Give a proof of Proposition 6.2.8. 6.2.3 Consider the code C over F11 with parameters [11, 7, 5] of Example 6.2.13. Suppose that we receive the word (7, 6, 5, 4, 3, 2, 1, ?, ?, ?, ?) with 4 erasures and no errors. Which codeword is sent? 6.2.4 Consider the code C1 over F11 with parameters [11, 4, 8] of Example 6.2.13. Suppose that we receive the word (4, 3, 2, 1, ?, ?, ?, ?, ?, ?, ?) with 7 erasures and no errors. Find the codeword sent. 6.2.5 Consider the covering systems of lines in the affine space Fm q of dimension m over Fq , and the projective space of dimension m over Fq , respectively. Show the existence of a (q n , q, 2) and a ((q m+1 − 1)/(q − 1), q + 1, 2) covering system as in Examples 6.2.25 and 6.2.26 in the case m = 2. Compute the number of lines in both cases. 6.2.6 Prove the following lower bound on b(n, l, t): b(n, l, t) ≥
n − t + 1 nn − 1 ... ... . l l−1 l−t+1
Hint: By double counting argument prove first that l · b(n, l, t) ≥ n · b(n − 1, l − 1, t − 1) and then use b(n, l, 1) = dn/le. 6.2.7 By using the properties of binary entropy function prove that for all 0 < R < 1 and 0 < α < 1 holds (1 − R)H2
αH −1 (1 − R) q
1−R
> Hq−1 (1 − R) · H2 (α).
Conclude that covering set decoding is superior to erasure set.
6.3 6.3.1
Difficult problems in coding theory General decoding and computing minimum distance
We have formulated the decoding problem in Section 6.2. As we have seen that the minimum (Hamming) distance of a linear code is an important parameter which can be used to estimate the decoding performance. However, a larger minimum distance does not guarantee the existence of an efficient decoding algorithm. It is natural to ask the following computational questions: For general linear codes, whether there exists a decoding algorithm with polynomialtime complexity? Whether or not there exists a polynomialtime algorithm which finds the minimum distance for any linear code? It has been proved that these
6.3. DIFFICULT PROBLEMS IN CODING THEORY
185
computational problems are both intractable. Let C be an [n, k] binary linear code. Suppose r is the received word. According to the maximumlikelihood decoding principle, we wish to find a codeword such that the Hamming distance between r and the codeword is minimal. As we have seen in previous sections that using the brute force search, correct decoding requires 2k comparisons in the worst case, and thus has exponentialtime complexity. Consider the syndrome of the received word. Let H be a paritycheck matrix of C, which is an m × n matrix, where m = n − k. The syndrome of r is s = rH T . The following two computational problems are equivalent, letting c = r − e: (1) (Maximumlikelihood decoding problem) Finding a codeword c such that d(r, c) is minimal. (2) Finding a minimumweight solution e to the equation xH T = s. Clearly, an algorithm which solves the following computational problem (3) also solves the above Problem (2). (3) For any nonnegative integer w, find a vector x of Hamming weight ≤ w such that xH T = s. Conversely, an algorithm which solves Problem (2) must solve Problem (3). In fact, suppose e is a minimumweight solution e to the equation xH T = s. Then, for w < wt(e), the algorithm will return “no solution”; for w ≥ wt(e), the algorithm returns e. Thus, the maximumlikelihood decoding problem is equivalent the above problem (3). The decision problem of the maximumlikelihood decoding problem is as
Decision Problem of Decoding Linear Codes INSTANCE: An m × n binary matrix H, a binary vector s of length m, and a nonnegative integer w. QUESTION: Is there a binary vector x ∈ Fn2 of Hamming weight ≤ w such that xH T = s? Proposition 6.3.1 The decision problem of decoding linear codes is an NPcomplete problem. We will prove this proposition by reducing the threedimensional matching problem to the decision problem of decoding linear codes. The threedimensional matching problem is a wellknown NPcomplete problem. For the completeness, we recall this problem as follows.
ThreeDimensional Matching Problem INSTANCE: A set T ⊆ S1 × S2 × S3 , where S1 , S2 , and S3 are disjoint finite sets having same number of elements, a = S1  = S2  = S3 .
186
CHAPTER 6. COMPLEXITY AND DECODING
QUESTION: Does T contain a matching, that is, a subset U ⊆ T such that U  = a and no two elements of U agree in any coordinate?
We now construct a matrix M which is called the incidence matrix of T as follows. Fix an ordering of the triples of T . Let ti = (ti1 , ti2 , ti3 ) denote the ith triple of T for i = 1, . . . , T . The matrix M has T  rows and 3a columns. Each row mi of M is a binary vector of length 3a and Hamming weight 3, which is constituted of three blocks bi1 , bi2 and bi3 , of the same length a, i.e., mi = (bi1 , bi2 , bi3 ). For u = 1, 2, 3, if tiu is the v element of Su , then the vth coordinate of biu is 1, all the other coordinates of this block is 0. Clearly, the existence of a matching of the ThreeDimensional Matching Problem is equivalent to the existence of a rows of M such that their mod 2 sum is T  (1, 1, . . . , 1), that is, there exist a binary vector x ∈ F2 of weight a such that 3a xM = (1, 1, . . . , 1) ∈ F2 . Now we are ready to prove Proposition 6.3.1. Proof of Proposition 6.3.1. Suppose we have a polynomialtime algorithm solving the Decision Problem of Decoding Linear Codes. Given an input T ⊆ S1 × S2 × S3 for the ThreeDimensional Matching Problem, set H = MT , where M is the incidence matrix of T , s = (1, 1, . . . , 1) and w = a. Then running the algorithm for the Decision Problem of Decoding Linear Codes, we will discover whether or not there exist the desired matching. Thus, a polynomialtime algorithm for the Decision Problem of Decoding Linear Codes implies a polynomialtime algorithm for the ThreeDimensional Matching Problem. This proves the Decision Problem of Decoding Linear Codes is NPcomplete. Next, let us the consider the problem of computing the minimum distance of an [n, k] binary linear code C with a paritycheck matrix H. For any linear code, the minimum distance is equal to the minimum weight, we use these two terms interchangeably. Consider the following decision problem.
Decision Problem of Computing Minimum Distance INSTANCE: An m × n binary matrix H and a nonnegative integer w. QUESTION: Is there a nonzero binary vector x of Hamming weight w such that xH T = 0?
If we have an algorithm which solves the above problem, then we can run the algorithm with w = 1, 2, . . ., and the first integer d with affirmative answer is the minimum weight of C. On the other hand, if we have an algorithm which finds the minimum weight d of C, then we can solve the above problem by comparing w with d. Therefore, we call this problem the Decision Problem of Computing Minimum Distance, and the NPcompleteness of this problem implies the NPhardness of the problem of computing the minimum distance.
6.3. DIFFICULT PROBLEMS IN CODING THEORY
187
***Computing the minimum distance: – brute force, complexity (q k − 1)/(q − 1), O(q k ) – minimal number of parity checks: O( nk k 3 )*** ***Brouwer’s algorithm and variations, ZimmermanCanteauChabeaud, Sala*** *** Vardy’s result: computing the min. dist. is NP hard***
6.3.2
Is decoding up to half the minimum distance hard?
Finding the minimum distance and decoding up to half the minimum distance are closely related problems. Algorithm 6.3.2 Suppose that A is an algorithm that computes the minimum distance of an Fq linear code C that is given by a parity check matrix H. We define an algorithm D with input y ∈ Fnq . Let s = HyT be the syndrome of ˜ = [Hs] be the parity check matrix of the code y with respect to H. Let H C˜ of length n + 1. Let C˜i be the code that is obtained by puncturing C˜ at the ith position. Use algorithm A to compute d(C) and d(C˜i ) for i ≤ n. Let t = min{d(C˜i )i ≤ n}. Let I = {it = d(C˜i ), i ≤ n}. Assume I = t and t < d(C). Assume furthermore that erasure decoding at the positions I finds a unique codeword c in C such that ci = yi for all i not in I. Output c in case the above assumptions are met, and output ∗ otherwise. Proposition 6.3.3 Let A be an algorithm that computes the minimum distance. Let D be the algorithm that is defined in 6.3.2. Let y ∈ Fnq be an input. Then D is a decoder that gives as output c in case d(C, y) < d(C) and y has c as unique nearest codeword. In particular D is a decoder of C that corrects up to half the minimum distance. Proof. Let y be a word with t = d(C, y) < d(C) and suppose that c is a unique nearest codeword. Then y = c + e with c ∈ C and t = wt(e). Note that ˜ since s = HyT = HeT . So d(C) ˜ ≤ t+1. Let z ˜ If z˜n+1 = 0, ˜ be in C. (e, −1) ∈ C, ˜ = (z, 0) with z ∈ C. Hence wt(˜ then z z) ≥ d(C) ≥ t + 1. If z˜n+1 6= 0, then ˜ zT = 0. Hence ˜ = (z, −1). So H˜ without loss of generality we may assume that z T 0 Hz = s. So c = y − z ∈ C. If wt(˜ z) ≤ t + 1, then wt(z) ≤ t. So d(y, c0 ) ≤ t. 0 Hence c = c, since c is the unique nearest codeword by assumption. Therefore ˜ = t + 1, since t + 1 ≤ d(C). z = e and wt(z) = t. Hence d(C) Let C˜i be the code that is obtained by puncturing C˜ at the ith position. Use the algorithm A to compute d(C˜i ) for all i ≤ n. An argument similar to above shows that d(C˜i ) = t if i is in the support of e, and d(C˜i ) = t + 1 if i is not in the support of e. So t = min{d(C˜i )i ≤ n} and I = {it = d(C˜i ), i ≤ n} is the support of e and has size t. So the error positions are known. Computing the error values is matter of linear algebra as shown in Proposition 6.2.11. In this way e and c are found. Proposition 6.3.4 Let M D be the problem of computing the minimum distance of a code given by a parity check matrix. Let DHM D be the problem of decoding up to half the minimum distance. Then DHM D ≤P M D.
188
CHAPTER 6. COMPLEXITY AND DECODING
Proof. Let A be an algorithm that computes the minimum distance of an Fq linear code C that is given by a parity check matrix H. Let D be the algorithm given in 6.3.2. Then A is used (n + 1)times in D. Suppose that the complexity of A is polynomial of degree e. We may assume that e ≥ 2. Computing the error values can be done with complexity O(n3 ) by Proposition 6.2.11. Then the complexity of D is polynomial of degree e + 1. ***Sendrier and Finasz*** ***Decoding with preprocessing, BruckNaor***
6.3.3
Other hard problems
***worse case versus average case, the simplex method for linear programming is an example of an algorithm that runs almost always fast, that is polynomially in its input, but for which is known to be exponentially in the worst case. Ellipsoid method, Khachian’s method*** ***approximate solutions of NPhard problems***
6.4
Notes
In 1978, Berlekamp, McEliece and van Tilborg proved that the maximumlikelihood decoding problem is NPhard for general binary codes. Vardy showed in 1997 that the problem of computing the minimum distance of a binary linear code is NPhard.
Chapter 7
Cyclic codes Ruud Pellikaan Cyclic codes have been in the center of interest in the theory of errorcorrecting codes since their introduction. Cyclic codes of relatively small length have good parameters. In the list of 62 binary cyclic codes of length 63 there are 51 codes that have the largest known minimum distance for a given dimension among all linear codes of length 63. Binary cyclic codes are better than the GilbertVarshamov bound for lengths up to 1023. Although some negative results are known indicating that cyclic codes are asymptotically bad, this still is an open problem. Rich combinatorics is involved in the determination of the parameters of cyclic codes in terms of patterns of the defining set. ***...***
7.1 7.1.1
Cyclic codes Definition of cyclic codes
Definition 7.1.1 The cyclic shift σ(c) of a word c = (c0 , c1 , . . . , cn−1 ) ∈ Fnq is defined by σ(c) := (cn−1 , c0 , c1 , . . . , cn−2 ). An Fq linear code C of length n is called cyclic if σ(c) ∈ C for all c ∈ C. The subspaces {0} and Fnq are clearly cyclic and are called the trivial cyclic codes. Remark 7.1.2 In the context of cyclic codes it is convenient to consider the index i of a word modulo n and the convention is that the numbering of elements (c0 , c1 , . . . , cn−1 ) starts with 0 instead of 1. The cyclic shift defines a linear map σ : Fnq → Fnq . The ifold composition σ i = σ ◦ · · · ◦ σ is the ifold forward shift. Now σ n is the identity map and σ n−1 is the backward shift. A cyclic code is invariant under σ i for all i. 189
190
CHAPTER 7. CYCLIC CODES
Proposition 7.1.3 Let G be a generator matrix of a linear code C. Then C is cyclic if and only if the cyclic shift of every row of G is in C. Proof. If C is cyclic, then the cyclic shift of every row of G is in C, since all the rows of G are codewords. Conversely, suppose that the cyclic shift of every Pk row of G is in C. Let g1 , . . . , gk be the rows of G. Let c ∈ C. Then c = i=1 xi gi for some x1 , . . . , xk ∈ Fq . Now σ is a linear transformation of Fnq . So σ(c) =
k X
xi σ(gi ) ∈ C,
i=1
since C is linear and σ(gi ) ∈ C for all i by assumption. Hence C is cyclic.
Example 7.1.4 Consider the [6,3] code fined by 1 1 1 G= 1 3 2 1 2 4
over F7 with generator matrix G de1 6 1
1 4 2
1 5 . 4
Then σ(g1 ) = g1 , σ(g2 ) = 5g2 and σ(g3 ) = 4g3 . Hence the code is cyclic. Example 7.1.5 Consider the [7, 4, 3] Hamming code C, with generator matrix G as given in Example 2.2.14. Then (0, 0, 0, 1, 0, 1, 1), the cyclic shift of the third row is not a codeword. Hence this code is not cyclic. After a permutation of the columns and rows of G we get the generator matrix G0 of the code C 0 , where 1 0 0 0 1 1 0 0 1 0 0 0 1 1 G0 = 0 0 1 0 1 1 1 . 0 0 0 1 1 0 1 Let gi0 be the ith row of G0 . Then σ(g10 ) = g20 , σ(g20 ) = g10 + g30 , σ(g30 ) = g10 + g40 and σ(g40 ) = g10 . Hence C 0 is cyclic by Proposition 7.1.3. Therefore C is not cyclic, but equivalent to a cyclic code C 0 . Proposition 7.1.6 The dual of a cyclic code is again cyclic. Proof. Let C be a cyclic code. Then σ(c) ∈ C for all c ∈ C. So σ n−1 (c) = (c1 , . . . , cn−1 , c0 ) ∈ C for all c ∈ C. Let x ∈ C ⊥ . Then σ(x) · c = xn−1 c0 + x0 c1 + · · · + xn−2 cn−1 = x · σ n−1 (c) = 0 for all c ∈ C. Hence C ⊥ is cyclic.
7.1. CYCLIC CODES
7.1.2
191
Cyclic codes as ideals
The set of all polynomials in the variable X with coefficients in Fq is denoted by Fq [X]. Two polynomials can be added and multiplied and in this way Fq [X] is a ring. One has division with rest this means that very polynomial f (X) has after division with another nonzero polynomial g(X) a quotient q(X) with rest r(X) that is zero or of degree strictly smaller than deg g(X). In other words f (X) = q(X)g(X) + r(X) and r(X) = 0 or deg r(X) < deg g(X). In this way Fq [X] with its degree is a Euclidean domain. Using division with rest repeatedly we find the greatest common divisor gcd(f (X), g(X)) of two polynomials f (X) and g(X) by the algorithm of Euclid. ***complexity of Euclidean Algorithm*** Every nonempty subset of a ring that is closed under addition and multiplication by an arbitrary element of the the ring is called an ideal. Let g1 , . . . , gm be given elements of a ring. The set of all a1 g1 + · · · + am gm with a1 , . . . , am in the ring, forms an ideal and is denoted by hg1 , . . . , gm i and is called the ideal generated by g1 , . . . , gm . As a consequence of division with rest every ideal in Fq [X] is either {0} or generated by a one unique monic polynomial. Furthermore hf (X), g(X)i = hgcd(f (X), g(X))i. We refer for these notions and properties to Appendix ??. Definition 7.1.7 Let R be a ring and I an ideal in R. Then R/I is the factor ring of R modulo I. If R = Fq [X] and I = hX n − 1i is the ideal generated by X n − 1, then Cq,n is the factor ring Cq,n = Fq [X]/hX n − 1i. Remark 7.1.8 The factor ring Cq,n has an easy description. Every polynomial f (X) has after division by X n − 1 a rest r(X) of degree at most n − 1, that is there exist polynomials q(X) and r(X) such that f (X) = q(X)(X n − 1) + r(X) and deg r(X) < n or r(X) = 0. The coset of the polynomial f (X) modulo X n − 1 is denoted by f (x). Hence f (X) and r(X) have the same coset and represent the same element in Cq,n . Now xi denotes the coset of X i modulo hX n −1i. Hence the cosets 1, x, . . . , xn−1 form a basis of Cq,n over Fq . The multiplication of the basis elements xi and xj in Cq,n with 0 ≤ i, j < n is given by i+j x if i + j < n, xi xj = xi+j−n if i + j ≥ n, Definition 7.1.9 Consider the map ϕ between Fnq and Cq,n ϕ(c) = c0 + c1 x + · · · + cn−1 xn−1 . Then ϕ(c) is also denoted by c(x).
192
CHAPTER 7. CYCLIC CODES
Proposition 7.1.10 The map ϕ is an isomorphism of vector spaces. Ideals in the ring Cq,n correspond onetoone to cyclic codes in Fnq . Proof. The map ϕ is clearly linear and it maps the ith standard basis vector of Fnq to the coset xi−1 in Cq,n for i = 1, . . . , n. Hence ϕ is an isomorphism of vector spaces. Let ψ be the inverse map of ϕ. Let I be an ideal in Cq,n . Let C := ψ(I). Then C is a linear code, since ψ is a linear map. Let c ∈ C. Then c(x) = ϕ(c) ∈ I and I is an ideal. So xc(x) ∈ I. But xc(x) = c0 x+c1 x2 +· · ·+cn−2 xn−1 +cn−1 xn = cn−1 +c0 x+c1 x2 +· · ·+cn−2 xn−1 , since xn = 1. So ψ(xc(x)) = (cn−1 , c0 , c1 . . . , cn−2 ) ∈ C. Hence C is cyclic. Conversely, let C be a cyclic code in Fnq , and let I := ϕ(C). Then I is closed under addition of its elements, since C is a linear code and ϕ is a linear map. If a ∈ Fnq and c ∈ C, then a(x)c(x) = ϕ(a0 c + a1 σ(c) + · · · + an−1 σ n−1 (c)) ∈ I. Hence I is an ideal in Cq,n .
In the following we will not distinguish between words and the corresponding polynomials under ϕ; we will talk about words c(x) when in fact we mean the vector c and vice versa. Example 7.1.11 Consider the rows of the generator matrix G0 of the [7, 4, 3] Hamming code of Example 7.1.5. They correspond to g10 (x) = 1 + x4 + x5 , g20 (x) = x + x5 + x6 , g30 (x) = x2 + x4 + x5 + x6 and g40 (x) = x3 + x4 + x6 , respectively. Furthermore x·x6 = 1, so x is invertible in the ring F2 [X]/hX 7 −1i. Now h1 + x4 + x5 i = hx + x5 + x6 i = hx6 + x10 + x11 i = hx3 + x4 + x6 i. Hence the ideals generated by gi0 (x) are the same for i = 1, 2, 4 and there is no unique generating element. The third row generates the ideal hx2 +x4 +x5 +x6 i = hx2 (1+x2 +x3 +x4 )i = h1+x2 +x3 +x4 i = h(1+x)(1+x+x3 )i, which gives a cyclic code that is a proper subcode of dimension 3. Therefore all except the third element generate the same ideal.
7.1.3
Generator polynomial
Remark 7.1.12 The ring Fq [X] with its degree function is an Euclidean ring. Hence Fq [X] is a principal ideal domain, that means that all ideals are generated by one element. If an ideal of Fq [X] is not zero, then a generating element is unique up to a nonzero scalar multiple of Fq . So there is a unique monic polynomial generating the ideal. Now Cq,n is a factor ring of Fq [X], therefore it is also a principal ideal domain. A cyclic code C considered as an ideal in Cq,n is generated by one element, but this element is not unique, as we have seen in Example 7.1.11. The inverse image of C under the map Fq [X] → Cq,n is denoted
7.1. CYCLIC CODES
193
by I. Then I is a nonzero ideal in Fq [X] containing X n − 1. Therefore I has a unique monic polynomial g(X) as generator. So g(X) is the monic polynomial in I of minimal degree. Hence g(X) is the monic polynomial of minimal degree such that g(x) ∈ C. Definition 7.1.13 Let C be a cyclic code. Let g(X) be the monic polynomial of minimal degree such that g(x) ∈ C. Then g(X) is called the generator polynomial of C. Example 7.1.14 The generator polynomial of the trivial code Fnq is 1, and of the zero code of length n is X n − 1. The repetition code and its dual have as generator polynomials X n−1 + · · · + X + 1 and X − 1, respectively. Proposition 7.1.15 Let g(X) be a polynomial in Fq [X]. Then g(X) is a generator polynomial of a cyclic code over Fq of length n if and only if g(X) is monic and divides X n − 1. Proof. Suppose g(X) is the generator polynomial of a cyclic code. Then g(X) is monic and a generator of an ideal in Fq [X] that contains X n − 1. Hence g(X) divides X n − 1. Conversely, suppose that g(X) is monic and divides X n − 1. So b(X)g(X) = X n − 1 for some b(X). Now hg(x)i is an ideal in Cq,n and defines a cyclic code C. Let c(X) be a monic polynomial such that c(x) ∈ C. Then c(x) = a(x)g(x). Hence there exists an h(X) such that c(X) = a(X)g(X) + h(X)(X n − 1) = (a(X) + b(X)h(X))g(X) Hence deg g(X) ≤ deg c(X). Therefore g(X) is the monic polynomial of minimal degree such that g(x) ∈ C. Hence g(X) is the generator polynomial of C. Example 7.1.16 The polynomial X 3 + X + 1 divides X 8 − 1 in F3 [X], since (X 3 + X + 1)(X 5 − X 3 − X 2 + X − 1) = X 8 − 1. Hence 1 + X + X 3 is a generator polynomial of a ternary cyclic code of length 8. Remark 7.1.17 Let g(X) be the generator polynomial of C. Then g(X) is a monic polynomial and g(x) generates C. Let c(X) be another polynomial such that c(x) generates C. Let d(X) be the greatest common divisor of c(X) and X n − 1. Then d(X) is the monic polynomial such that hd(X)i = hc(X), X n − 1i = I. But also g(X) is the unique monic polynomial such that hg(X)i = I. Hence g(X) = gcd(c(X), X n − 1). Example 7.1.18 Consider the binary cyclic code of length 7 and generated by 1 + x2 . Then 1 + X 2 = (1 + X)2 and 1 + X 7 is divisible by 1 + X in F2 [X]. So 1 + X is the the greatest common divisor of 1 + X 7 and 1 + X 2 . Hence 1 + X is the generator polynomial of C.
194
CHAPTER 7. CYCLIC CODES
Example 7.1.19 Let C be the Hamming code of Examples 7.1.5 and 7.1.11. Then 1 + x4 + x5 generates C. In order to get the greatest common divisor of 1 + X 7 and 1 + X 4 + X 5 we apply the Euclidean algorithm: 1 + X 7 = (1 + X + X 2 )(1 + X 4 + X 5 ) + (X + X 2 + X 4 ), 1 + X 4 + X 5 = (1 + X)(X + X 2 + X 4 ) + (1 + X + X 3 ), X + X 2 + X 4 = X(1 + X + X 3 ). Hence 1 + X + X 3 is the greatest common divisor, and therefore 1 + X + X 3 is the generator polynomial of C. Remark 7.1.20 Let g(X) be a generator polynomial of a cyclic code of length n, then g(X) divides X n − 1 by Proposition 7.1.15. So g(X)h(X) = X n − 1 for some h(X). Hence g(0)h(0) = −1. Therefore the constant term of the generator polynomial of a cyclic code is not zero. Proposition 7.1.21 Let g(X) = g0 +g1 X +· · ·+gl X l be a polynomial of degree l. Let n be an integer such that l ≤ n. Let k = n − l. Let G be the k × n matrix defined by g0 g1 · · · gl 0 ··· 0 . 0 g0 g1 · · · gl . . . .. . G= . .. . . . . . . . . . · · · . . . 0 0 ··· 0 g0 g1 · · · gl 1. If g(X) is the generator polynomial of a cyclic code C, then the dimension of C is equal to k and a generator matrix of C is G. 2. If gl = 1 and G is the generator matrix of a code C such that (gl , 0, · · · , 0, g0 , g1 , · · · , gl−1 ) ∈ C, then C is cyclic with generator polynomial g(X). Proof. 1) Suppose g(X) is the generator polynomial of a cyclic code C. Then the element g(x) generates C and the elements g(x), xg(x), . . . , xk−1 g(x) correspond to the rows of the above matrix. The generator polynomial is monic, so gl = 1 and the k × k submatrix of G consisting of the last k columns is a lower diagonal matrix with ones on the diagonal, so the rows of G are independent. Every codeword c(x) ∈ C is equal to a(x)g(x) for some a(X). Division with remainder of X n − 1 by a(X)g(X) gives that there exist e(X) and f (X) such that a(X)g(X) = e(X)(X n − 1) + f (X) and deg f (X) < n or f (X) = 0. But X n − 1 is divisible by g(X) by Proposition 7.1.15. So f (X) is divisible by g(X). Hence f (X) = b(X)g(X) and deg b(X) < n − l = k or b(X) = 0 for some polynomial b(X). Therefore c(x) = a(x)g(x) = b(x)g(x) and deg b(X) < k or b(X) = 0. So every codeword is a linear combination of g(x), xg(x), . . . , xk−1 g(x).
7.1. CYCLIC CODES
195
Hence k is the dimension of C and G is a generator matrix of C. 2) Suppose G is the generator matrix of a code C such that gl = 1 and (gl , 0, · · · , 0, g0 , g1 , · · · , gl−1 ) ∈ C. Then the cyclic shift of the ith row of G is the (i + 1)th row of G for all i < k, and the cyclic shift of the kth row of G is (gl , 0, · · · , 0, g0 , g1 , · · · , gl−1 ) which is also an element of C by assumption. Hence C is cyclic by Proposition 7.1.3. Now gl = 1 and the upper right corner of G consists of zeros, so G has rank k and the dimension of C is k. Now g(X) is monic, has degree l = n − k and g(x) ∈ C. The generator polynomial of C has the same degree l by (1). Hence g(X) is the generator polynomial of C. Example 7.1.22 The ternary cyclic code of length 8 with generator polynomial 1 + X + X 3 of Example 7.1.16 has dimension 5. Remark 7.1.23 A cyclic [n, k] code is systematic at the first k positions, since it has a generator matrix as given in Proposition 7.1.21 which is upper diagonal with nonzero entries on the diagonal at the first k positions, since g0 6= 0 by Remark 7.1.20. So the row reduced echelon form of a generator matrix of the code has the k × k identity matrix at the first k columns. The last row of this rref matrix is up to the constant g0 equal to (0, · · · , 0, g0 , g1 , · · · , gl ) giving the coefficients of the generator polynomial. This methods of obtaining the generator polynomial out of a given generator matrix G is more efficient than taking the greatest common divisor of g1 (X), . . . , gk (X), X n − 1, where g1 , . . . , gk are the rows of G. Example 7.1.24 Consider generator matrix G of the [6,3] cyclic code over F7 of Example 7.1.4. The row reduced echelon form of G is equal to 1 0 0 6 1 3 0 1 0 3 3 6 . 0 0 1 6 4 6 The last row represents x2 + 6x3 + 4x4 + 6x5 = x2 (1 + 6x + 4x2 + 6x3 ) Hence 1 + 6x + 4x2 + 6x3 is a codeword. The corresponding monic polynomial 6 + X + 3X 2 + X 3 has degree 3. Hence this is the generator polynomial.
7.1.4
Encoding cyclic codes
Consider a cyclic code of length n with generator polynomial g(X) and the corresponding generator matrix G as in Proposition 7.1.21. Let the message m = (m0 , . . . , mk−1 ) ∈ Fkq be mapped to the codeword c = mG. In terms of polynomials that means that c(x) = m(x)g(x), where m(x) = m0 + · · · + mk−1 xk−1 . In this way we get an encoding of message words into codewords. The k × k submatrix of G consisting of the last k columns of G is a lower
196
CHAPTER 7. CYCLIC CODES
triangular matrix with ones on its diagonal, so it is invertible. That means that we can perform row operations on this matrix until we get another matrix G2 such that its last k columns form the k × k identity matrix. The matrix G2 is another generator matrix of the same code. The encoding m 7→ c2 = mG2 by means of G2 is systematic in the last k positions, that means that there exist r0 , . . . , rn−k−1 ∈ Fq such that c2 = (r0 , . . . , rn−k−1 , m0 , . . . , mk−1 ). In other words the encoding has the nice property, that one can read off the sent message directly from the encoded word by looking at the last k positions, in case no errors appeared during the transmission at these positions. Now how does one translate this systematic encoding in terms of polynomials? Let m(X) be a polynomial of degree at most k − 1. Let −r(X) be the rest after dividing m(X)X n−k by g(X). Now deg(g(X)) = n−k. So there is a polynomial q(X) such that m(X)X n−k = q(X)g(X) − r(X) and deg(r(X)) < n − k or r(X) = 0. Hence r(x) + m(x)xn−k = q(x)g(x) is a codeword of the form r0 + r1 x + · · · + rn−k−1 xn−k−1 + m0 xn−k + · · · + mk−1 xn−1 . Example 7.1.25 Consider the cyclic [7,4,3] Hamming code of Example 7.1.19 with generator polynomial g(X) = 1 + X + X 3 . Let m be a message with polynomial m(X) = 1 + X 2 + X 3 . Then division of m(X)X 3 by g(X) gives as quotient q(X) = 1 + X + X 2 + X 3 with rest r(X) = 1. The corresponding codeword by systematic encoding is c2 (x) = r(x) + m(x)x3 = 1 + x3 + x5 + x6 . Example 7.1.26 Consider the ternary cyclic code of length 8 with generator polynomial 1 + X + X 3 of Example 7.1.16. Let m be a message with polynomial m(X) = 1 + X 2 + X 3 . Then division of m(X)X 3 by g(X) gives as quotient q(X) = −1 − X + X 2 + X 3 with rest −r(X) = 1 − X. The corresponding codeword by systematic encoding is c2 (x) = r(x) + m(x)x3 = −1 + x + x3 + x5 + x6 .
7.1.5
Reversible codes
Definition 7.1.27 Define the reversed word ρ(x) of x ∈ Fnq by ρ(x0 , x1 , . . . , xn−2 , xn−1 ) = (xn−1 , xn−2 , . . . , x1 , x0 ). Let C be a code in Fnq , then its reversed code ρ(C) is defined by ρ(C) = { ρ(c)  c ∈ C }. A code is called reversible if C = ρ(C).
7.1. CYCLIC CODES
197
Remark 7.1.28 The dimensions of C and ρ(C) are the same, since ρ is an automorphism of Fnq . If a code is reversible, then ρ ∈ Aut(C). Definition 7.1.29 Let g(X) be a polynomial of degree l given by g0 + g1 X + · · · + gl−1 X l−1 + gl X l . Then X l g(X −1 ) = gl + gl−1 X + · · · + g1 X l−1 + g0 X l . is called the reciprocal of g(X). If moreover g(0) 6= 0, then X l g(X −1 )/g(0) is called the monic reciprocal of g(X). The polynomial g(X) is called reversible if g(0) 6= 0 and it is equal to its monic reciprocal. Remark 7.1.30 If g = (g0 , g1 , . . . , gl−1 , gl ) are the coefficients of the polynomial g(X), then the reversed word ρ(g) give the coefficients of the reciprocal of g(X). Remark 7.1.31 If α is a zero of g(X) and α 6= 0, then the reciprocal α−1 is a zero of the reciprocal of g(X). Proposition 7.1.32 Let g(X) be the generator polynomial of a cyclic code C. Then ρ(C) is cyclic with the monic reciprocal of g(X) as generator polynomial, and C is reversible if and only if g(X) is reversible. Proof. A cyclic code is invariant under the forward shift σ and the backward shift σ n−1 . Now σ(ρ(c)) = ρ(σ n−1 (c)) for all c ∈ C. Hence ρ(C) is cyclic. Now g(0) 6= 0 by Remark 7.1.20. Hence the monic reciprocal of g(X) is well defined and its corresponding word is an element of ρ(C) by Remark 7.1.30. The degree of g(X) and its monic reciprocal are the same, and the dimensions of C and ρ(C) are the same. Hence this monic reciprocal is the generator polynomial of ρ(C). Therefore C is reversible if and only if g(X) is reversible, by the definition of a reversible polynomial. Remark 7.1.33 If C is a reversible cyclic code, then the group generated by σ and ρ is the dihedral group of order 2n and is contained in Aut(C).
7.1.6
Parity check polynomial
Definition 7.1.34 Let g(X) be the generator polynomial of a cyclic code C of length n. Then g(X) divides X n − 1 by Proposition 7.1.15 and h(X) =
Xn − 1 g(X)
is called the parity check polynomial of C. Proposition 7.1.35 Let h(X) be the parity check polynomial of a cyclic code C. Then c(x) ∈ C if and only if c(x)h(x) = 0.
198
CHAPTER 7. CYCLIC CODES
Proof. Let c(x) ∈ C. Then c(x) = a(x)g(x), for some a(x). We have that g(X)h(X) = X n − 1. Hence g(x)h(x) = 0. So c(x)h(x) = a(x)g(x)h(x) = 0. Conversely, suppose that c(x)h(x) = 0. There exist polynomials a(X) and b(X) such that c(X) = a(X)g(X) + b(X) and b(X) = 0 or deg b(X) < deg g(X). Hence c(x)h(x) = a(x)g(x)h(x) + b(x)h(x) = b(x)h(x). Notice that b(x)h(x) 6= 0 if b(X) is a nonzero polynomial, since deg b(X)h(X) is at most n − 1. Hence b(X) = 0 and c(x) = a(x)g(x) ∈ C. Remark 7.1.36 If H is a parity check matrix for a code C, then H is a generator matrix for the dual of C. One might expect that if h(X) is the parity check polynomial for a cyclic code C, then h(X) is the generator polynomial of the dual of C. This is not the case but something of this nature is true as the following shows. Proposition 7.1.37 Let h(X) be the parity check polynomial of a cyclic code C. Then the monic reciprocal of h(X) is the generator polynomial of C ⊥ . Proof. Let C be a cyclic code of length n and dimension k with generator polynomial g(X) and parity check polynomial h(X). If k = 0, then g(X) = X n − 1 and h(X) = 1 and similarly if k = n, then g(X) = 1 and h(X) = X n − 1. Hence the proposition is true in these cases. Now suppose that 0 < k < n. Then h(X) = h0 + h1 X + · · · + hk X k . Hence X k h(X −1 ) = hk + hk−1 X + · · · + h0 X k . The ith position of xk h(x−1 ) is hk−i . Let g(X) be the generator polynomial of C. Let l = n − k. Then g(X) = g0 + g1 X + · · · + gl X l and gl = 1. The elements xt g(x) generate C. The ith position of xt g(x) is equal to gi+t . Hence the inner product of the words xt g(x) and xk h(x−1 ) is k X
gi+t hk−i ,
i=0
which is the coefficient of the term X k+t in X t g(X)h(X). But X t g(X)h(X) is equal to X n+t − X t and 0 < k < n, hence this coefficient is zero. So P n k −1 ) is an element of the dual of C. i=1 gi+t hk−i = 0 for all t. So x h(x n Now g(X)h(X) = X − 1. So g(0)h(0) = −1. Hence the monic reciprocal of h(X) is well defined, is monic, represents an element of C ⊥ , has degree k and the dimension of C ⊥ is n − k. Hence X k h(X −1 )/h(0) is the generator polynomial of C ⊥ by Proposition 7.1.21. Example 7.1.38 Consider the [6,3] cyclic code over F7 of Example 7.1.24 which has generator polynomial X 3 + 4X 2 + X + 1. Hence h(X) =
X6 − 1 = X 3 + 4X 2 + X + 1 X 3 + 4X 2 + X + 1
7.1. CYCLIC CODES
199
is the parity check polynomial of the code. The generator polynomial of the dual code is g ⊥ (X) = X 6 h(X −1 ) = 1 + 4X + X 2 + X 3 by Proposition 7.1.37, since h(0) = 1. Example 7.1.39 Consider in F2 [X] the polynomial g(X) = 1 + X 4 + X 6 + X 7 + X 8 . Then g(X) divides X 15 − 1 with quotient h(X) =
X 15 − 1 = 1 + X 4 + X 6 + X 7. g(X)
Hence g(X) is the generator polynomial of a binary cyclic code of length 15 with parity check polynomial h(X). The generator polynomial of the dual code is g ⊥ (X) = X 7 h(X −1 ) = 1 + X + X 3 + X 7 by Proposition 7.1.37, since h(0) = 1. Example 7.1.40 The generator polynomial 1 + X + X 3 of the ternary code of length 8 of Example 7.1.16 has parity check polynomial h(X) =
X8 − 1 = X 5 − X 3 − X 2 + X − 1. g(X)
The generator polynomial of the dual code is g ⊥ (X) = X 8 h(X −1 )/h(0) = X 5 − X 4 + X 3 + X 2 − 1 by Proposition 7.1.37. Example 7.1.41 Let us now take a look at how cyclic codes are constructed via generator and check polynomials in GAP. > x:=Indeterminate(GF(2));; > f:=x^171;; > F:=Factors(PolynomialRing(GF(2)),f); [ x_1+Z(2)^0, x_1^8+x_1^5+x_1^4+x_1^3+Z(2)^0, x_1^8+x_1^7+x_1^6+\\ x_1^4+x_1^2+x_1+Z(2)^0 ] > g:=F[2];; > C:=GeneratorPolCode(g,17,"code from Example 6.1.41",GF(2));; > MinimumDistance(C);; > C; a cyclic [17,9,5]3..4 code from Example 6.1.41 over GF(2) > h:=F[3];; > C2:=CheckPolCode(h,17,GF(2));; > MinimumDistance(C2);; > C2; a cyclic [17,8,6]3..7 code defined by check polynomial over GF(2) So here x is a variable with which the polynomials are built. Note that one can also define it via x:=X(GF(2)), since X is a synonym of Indeterminate. For this same reason we could not use X as a variable.
200
7.1.7
CHAPTER 7. CYCLIC CODES
Exercises
7.1.1 Let C be the Fq linear code with generator matrix
1 0 G= 0 0
1 1 0 0
1 1 1 0
1 1 1 1
0 1 1 1
0 0 . 0 1
0 0 1 1
Show that C is not cyclic for every finite field Fq . 7.1.2 Let C be a cyclic code over Fq of length 7 such that (1, 1, 1, 0, 0, 0, 0) is an element of C. Show that C is a trivial code if q is not a power of 3. 7.1.3 Find the generator polynomial of the binary cyclic code of length 7 generated by 1 + x + x5 . 7.1.4 Show that 2 + X 2 + X 3 is the generator polynomial of a ternary cyclic code of length 13. 7.1.5 Let α be an element in F8 such code with generator matrix G, where 1 1 1 G = 1 α α2 1 α2 α4
that α3 = α + 1. Let C be the F8 linear 1 α3 α6
1 α4 α
1 α5 α3
1 α6 . α5
1) Show that the code C is cyclic. 2) Determine the coefficients of the generator polynomial of this code. 7.1.6 Consider the binary polynomial g(X) = 1 + X 2 + X 5 . 1) Show that g(X) is the generator polynomial of a binary cyclic code C of length 31 and dimension 26. 2) Give the encoding with respect to the code C of the message m with m(X) = 1+X 10 +X 25 as message polynomial, that is systematic at the last 26 positions. 3) Find the parity check polynomial of C. 4) Give the coefficients of the generator polynomial of C ⊥ . 7.1.7 Give a description of the systematic encoding of an [n, k] cyclic code in the first k positions in terms of division by the generator polynomial with rest. 7.1.8 Estimate the number of additions and the number of multiplications in Fq needed to encode an [n, k] cyclic code using multiplication with the generator polynomial and compare these with the numbers for systematic encoding in the last k positions by dividing m(X)X n−k by g(X) with rest. 7.1.9 [CAS] Implement the encoding procedure from Section 7.1.4. 7.1.10 [CAS] Having a generator polynomial g, code length n, and field size q construct a cyclic code dual to the one generated by g. Use the function ReciprocalPolynomial (both in GAP and Magma).
7.2. DEFINING ZEROS
7.2
201
Defining zeros
*** ***
7.2.1
Structure of finite fields
The finite fields we encountered up to now were always of the form Fp = Z/hpi with p a prime. For the notion of defining zeros of a cyclic codes this does not suffice and extensions of prime fields are needed. In this section we state basic facts on the structure of finite fields. For proofs we refer to the existing literature.
Definition 7.2.1 The smallest subfield of a field F is unique and is called the prime field of F. The only prime fields are the rational numbers Q and the finite field Fp with p a prime and the characteristic of the field is zero and p, respectively. Remark 7.2.2 Let F be a field of characteristic p a prime. Then (x + y)p = xp + y p for all x, y ∈ F by Newton’s binomial formula, since i = 1, . . . , p − 1.
p i
is divisible by p for all
Proposition 7.2.3 If F is a finite field, then the number of elements of F is a power of a prime number. Proof. The characteristic of a finite field is prime, and such a field is a vector space over the prime field of finite dimension. So the number of elements of a finite field is a power of a prime number. Remark 7.2.4 The factor ring over the field of polynomials in one variable with coefficients in a field F modulo an irreducible polynomial gives a way to construct a field extension of F. In particular, if f (X) ∈ Fp [X] is irreducible, and hf (X)i is the ideal generated by all the multiples of f (X), then the factor ring Fp [X]/hf (X)i is a field with pe elements, where e = deg f (X). The coset of X modulo hf (X)i is denoted by x, and the monomials 1, x, . . . , xe−1 form a basis over Fp . Hence every element in this field is uniquely represented by a polynomial g(X) ∈ Fp [X] of degree at most e − 1 and its coset is denoted by g(x). This is called the principal representation. The sum of two representatives is again a representative. For the product one has to divide by f (X) and take the rest as a representative. Example 7.2.5 The irreducible polynomials of degree one in F2 [X] are X and 1 + X. And 1 + X + X 2 is the only irreducible polynomial of degree two in F2 [X]. There are exactly two irreducible polynomials of degree three in F2 [X]. These are 1 + X + X 3 and 1 + X 2 + X 3 . Consider the field F = F2 [X]/h1 + X + X 3 i with 8 elements. Then 1, x, x2 is a basis of F over F2 . Now (1 + X)(1 + X + X 2 ) = 1 + X 3 ≡ X
mod 1 + X + X 3 .
202
CHAPTER 7. CYCLIC CODES
Hence (1 + x)(1 + x + x2 ) = x in F. In the following table the powers xi are written by their principal representatives. x3 x4 x5 x6 x7
= 1+x = x + x2 = 1 + x + x2 = 1 + x2 = 1
Therefore the nonzero elements form a cyclic group of order 7 with x as generator. Pn i Definition 7.2.6 Let F be a field. Let f (X) = i=0 ai X in F[X]. Then 0 f (X) is the formal derivative of f (X) and is defined by f 0 (X) =
n X
iai X i−1 .
i=1
Remark 7.2.7 The product or Leibniz rule holds for the derivative (f (X)g(X))0 = f 0 (X)g(X) + f (X)g 0 (X). The following criterion gives a way to decide whether the zeros of a polynomial are simple. Lemma 7.2.8 Let F be a field. Let f (X) ∈ F[X]. Then every zero of f (X) has multiplicity one if and only if gcd(f (X), f 0 (X)) = 1. Proof. Suppose gcd(f (X), f 0 (X)) = 1. Let α be a zero of f (X) of multiplicity m. Then there exists a polynomial a(X) such that f (X) = (X − α)m a(X). Differentiating this equality gives f 0 (X) = m(X − α)m−1 a(X) + (X − α)m a0 (X). If m > 1, then X − α divides f (X) and f 0 (X). This contradicts the assumption that gcd(f (X), f 0 (X)) = 1. Hence every zero of f (X) has multiplicity one. Conversely, if gcd(f (X), f 0 (X)) 6= 1, then f (X) and f 0 (X) have a common zero a, possibly in an extension of F. Conclude that (X − a)2 divides f (X), using the product rule again. Remark 7.2.9 Let p be a prime and q = pe . The formal derivative of X q − X is −1 in Fp . Hence all zeros of X q − X in an extension of Fp are simple by Lemma 7.2.8. For every field F and polynomial f (X) in one variable X there exists a field G that contains F as a subfield such that f (X) splits in linear factors in G[X]. The smallest field with these properties is unique up to an isomorphism of fields and is called the splitting field of f (X) over F. A field F is called algebraically closed if every polynomial in one variable has a zero in F. So every polynomial in one variable over an algebraically closed field splits in linear factors over this field. Every field F has an extension G
7.2. DEFINING ZEROS
203
that is algebraically closed such that G does not have a proper subfield that is algebraically closed. Such an extension is unique up to isomorphism and is ¯ The field C of complex called the algebraic closure of F and is denoted by F. numbers is the algebraic closure of the field R of real numbers. Remark 7.2.10 If F is a field with q elements, then F∗ = F \ {0} is a multiplicative group of order q − 1. So xq−1 = 1 for all x ∈ F∗ . Hence xq = x for all x ∈ F. Therefore the zeros of X q − X are precisely the elements of F. Theorem 7.2.11 Let p be a prime and q = pe . There exists a field of q elements and any field with q elements is isomorphic to the splitting field of X q − X over Fp and is denoted by Fq or GF (q), the Galois field of q elements. Proof. The splitting field of X q − X over Fp contains the zeros of X q − X. Let Z be the set of zeros of X q −X in the splitting field. Then Z = q, since X q −X splits in linear factors and all zeros are simple by Remark 7.2.9. Now 0 and 1 are elements of Z and Z is closed under addition, subtraction, multiplication and division by nonzero elements. Hence Z is a field. Furthermore Z contains Fp since q = pe . Hence Z is equal to the splitting field of X q −X over Fp . Hence the splitting field has q elements. If F is a field with q elements, then all elements of F are zeros of X q − X by Remark 7.2.10. Hence F is contained in an isomorphic copy of the splitting field of X q − X over Fp . Therefore they are equal, since both have q elements. The set of invertible elements of the finite field Fq is an abelian group of order q − 1. But a stronger statement is true. Proposition 7.2.12 The multiplicative group F∗q is cyclic. Proof. The order of an element of F∗q divides q − 1, since F∗q is a group of order q − 1. Let d be the maximal order of an element of F∗q . Then d divides q − 1. Let x be an element of order d. If y ∈ F∗q , then the order n of y divides d. Otherwise there is a prime l dividing n and l not dividing d. So z = y n/l has order l. Hence xz has order dl, contradicting the maximality of d. Therefore the order of an element of F∗q divides d. So the elements of F∗q are zeros of X d − 1. Hence q − 1 ≤ d and d divides q − 1. We conclude that d = q − 1, x is an element of order q − 1 and F∗q is cyclic generated by x. Definition 7.2.13 A generator of F∗q is called a primitive element. An irreducible polynomial f (X) ∈ Fp [X] is called primitive if x is a primitive element in Fp [X]/hf (X)i, where x is the coset of X modulo f (X). Definition 7.2.14 Choose a primitive element α of Fq . Define α∗ = 0. Then for every element β ∈ Fq there is a unique i ∈ {∗, 0, 1, . . . , q − 2} such that β = αi , and this i is called the logarithm of β with respect to α, and αi the exponential representation of β. For every i ∈ {∗, 0, 1, . . . , q − 2} there is a unique j ∈ {∗, 0, 1, . . . , q − 2} such that 1 + αi = αj and this j is called the Zech logarithm of i and is denoted by Zech(i) = j.
204
CHAPTER 7. CYCLIC CODES
Remark 7.2.15 Let p be a prime and q = pe . In a principal representation of Fq , very element is given by a polynomial of degree at most e−1 with coefficients in Fp and addition in Fq is easy and done coefficient wise in Fp . But for the multiplication we need to multiply two polynomials and compute a division with rest. Define the addition i + j for i, j ∈ {∗, 0, 1, . . . , q − 2}, where i + j is taken modulo q − 1 if i and j are both not equal to ∗, and i + j = ∗ if i = ∗ or j = ∗. Then multiplication in Fq is easy in the exponential representation with respect to a primitive element, since αi αj = αi+j for i, j ∈ {∗, 0, 1, . . . , q − 2}. In the exponential representation the addition can be expressed in terms of the Zech logarithm. αi + αj = αi+Zech(j−i) . Example 7.2.16 Consider the finite field F8 as given in Example 7.2.5 by the irreducible polynomial 1 + X + X 3 . In the following table the elements are represented as powers in x, as polynomials a0 +a1 x+a2 x2 and the Zech logarithm is given. i xi (a0 , a1 , a2 ) Zech(i) ∗ x∗ (0, 0, 0) 0 0 x0 (1, 0, 0) ∗ 1 x1 (0, 1, 0) 3 2 x2 (0, 0, 1) 6 3 x3 (1, 1, 0) 1 4 x4 (0, 1, 1) 5 5 x5 (1, 1, 1) 4 6 x6 (1, 0, 1) 2 In the principal representation we immediately see that x3 + x5 = x2 , since x3 = 1 + x and x5 = 1 + x + x2 . The exponential representation by means of the Zech logarithm gives x3 + x5 = x3+Zech(2) = x2 . ***Applications: quasi random generators, discrete logarithm.*** Definition 7.2.17 Let Irrq (n) be the number of irreducible monic polynomials over Fq of degree n. Proposition 7.2.18 Let q be a power of a prime number. Then X qn = d · Irrq (d). dn
Proof.
***...***
Proposition 7.2.19 Irrq (n) =
1 X n d µ q . n d dn
7.2. DEFINING ZEROS
205
Proof. Consider the poset N of Example 5.3.20 with the divisibility P as partial order. Define f (d) = d · Irrq (d). Then the sum function f˜(n) = dn f (d) is equal to q n , by Proposition 7.2.18. The M¨obius inversion formula 5.3.10 implies P that n · Irrq (n) = dn µ nd q d which gives the desired result. Remark 7.2.20 Proposition 7.2.19 implies Irrq (d) ≥
1 1 n q − q n−1 − · · · − q = n n
qn −
qn − q q−1
> 0,
since µ(1) = 1 and µ(d) ≥ −1 for all d. By counting the number of irreducible polynomials over Fq we see that that there exists an irreducible polynomial in Fq [X] of every degree d. Let q = pd and p a prime. Now Zp is a field with p elements. There exists an irreducible polynomial f (T ) in Zp [T ] of degree d, and Zp [T ]/hf (T )i is a field with pd = q elements. This is another way to show the existence of a finite field with q elements, where q is a prime power.
7.2.2
Minimal polynomials
*** Remark 7.2.21 From now on we assume that n and q are relatively prime. This assumption is not necessary but it would complicate matters otherwise. Hence q has an inverse modulo n. So q e ≡ 1(mod n) for some positive integer e. Hence n divides q e − 1. Let Fqe be the extension of Fq of degree e. So n divides the order of F∗qe , the cyclic group of units. Hence there exists an element α ∈ F∗qe of order n. From now on we choose such an element α of order n. Example 7.2.22 The order of the cyclic group F∗3e is 2, 8, 26, 80 and 242 for e = 1, 2, 3, 4 and 5, respectively. Hence F35 is the smallest field extension of F3 that has an element of order 11. Remark 7.2.23 The multiplicity of every zero of X n −1 is one by Lemma 7.2.8, since gcd(X n − 1, nX n−1 ) = 1 in Fq by the assumption that gcd(n, q) = 1. Let α be an element in some extension of Fq of order n. Then 1, α, α2 , . . . , αn−1 are n mutually distinct zeros of X n − 1. Hence Xn − 1 =
n−1 Y
(X − αi ).
i=0
Definition 7.2.24 Let α be a primitive nth root of unity in the extension field Fqe . For this choice of an element of order n we define mi (X) as the minimal polynomial of αi , that is the monic polynomial in Fq [X] of smallest degree such that mi (αi ) = 0. Example 7.2.25 In particular m0 (X) = X − 1. Proposition 7.2.26 The minimal polynomial mi (X) is irreducible in Fq [X].
206
CHAPTER 7. CYCLIC CODES
Proof. Let mi (X) = f (X)g(X) with f (X), g(X) ∈ Fq [X]. Then f (αi )g(αi ) = mi (αi ) = 0. So f (αi ) = 0 or g(αi ) = 0. Hence deg(f (X)) ≥ deg(mi (X)) or deg(g(X)) ≥ deg(mi (X)) by the minimality of the degree of mi (X). Hence mi (X) is irreducible. Example 7.2.27 Choose α = 3 as the primitive element in F7 of order 6. Then X 6 − 1 is the product of linear factors in F7 [X]. Furthermore m1 (X) = X − 3, m2 (X) = X − 2, m3 (X) = X − 6 and so on. But 5 is also an element of order 6 in F∗7 . The choice α = 5 would give m1 (X) = X − 5, m2 (X) = X − 4 and so on. Example 7.2.28 There are exactly two irreducible polynomials of degree 3 in F2 [X]. They are factors of 1 + X 7 : 1 + X 7 = (1 + X)(1 + X + X 3 )(1 + X 2 + X 3 ). Let α ∈ F8 be a zero of 1 + X + X 3 . Then α is a primitive element of F8 and α2 and α4 are the remaining zeros of 1 + X + X 3 . The reciprocal of 1 + X + X 3 is X 3 (1 + X −1 + X −3 ) = 1 + X 2 + X 3 and has α−1 = α6 , α−2 = α5 and α−4 = α3 as zeros. So m1 (X) = 1 + X + X 3 and m3 (X) = 1 + X 2 + X 3 . Proposition 7.2.29 The monic reciprocal of mi (X) is equal to m−i (X). Proof. The element αi is a zero of mi (X). So α−i is a zero of the monic reciprocal of mi (X) by Remark 7.1.30. Hence the degree of the monic reciprocal of mi (X) is at least deg(m−i (X)). So deg(mi (X)) ≥ deg(m−i (X)). Similarly deg(mi (X)) ≤ deg(m−i (X)). So deg(mi (X)) = deg(m−i (X)) is the degree of the monic reciprocal of mi (X). Hence the monic reciprocal of mi (X) is a monic polynomial of minimal degree having α−i as a zero, therefore it is equal to m−i (X).
7.2.3
Cyclotomic polynomials and cosets
*** Definition 7.2.30 Let n be a nonnegative integer. Then Euler’s function ϕ is given by ϕ(n) = {i : gcd(i, n) = 1, 0 ≤ i < n}. Lemma 7.2.31 The following properties of Euler’s function hold: 1) ϕ(mn) = ϕ(m)ϕ(n) if gcd(m, n) = 1. 2) ϕ(1) = 1. 3) ϕ(p) = p − 1 if p is a prime number. 4) ϕ(pe ) = pe−1 (p − 1) if p is a prime number.
7.2. DEFINING ZEROS
207
Proof. The set {i : gcd(i, n) = 1, 0 ≤ i < n} is a set of representatives of Z∗n that is the set of all invertible elements of Zn . Hence ϕ(n) = Z∗n . The Chinese remainder theorem gives that Zm ⊕ Zn ∼ = Zmn if gcd(m, n) = 1. Hence Z∗m ⊕ Z∗n ∼ = Z∗mn . Therefore ϕ(mn) = ϕ(m)ϕ(n) if gcd(m, n) = 1. The remaining items are left as an exercise. Proposition 7.2.32 Let p1 , . . . , pk be the primes dividing n. Then 1 1 ··· 1 − . ϕ(n) = n 1 − p1 pk
Proof. This is a direct consequence of Lemma 7.2.31.
Definition 7.2.33 Let F be a field. Let n be a positive integer that is relatively ¯ ∗ . The prime to the characteristic of F. Let α be an element of order n in F cyclotomic polynomial of order n is defined by Y Φn (X) = (X − αi ) gcd(i,n)=1,0≤i
Remark 7.2.34 The degree of Φn (X) is equal to ϕ(n). Remark 7.2.35 If x is a primitive element, then y is a primitive element if and only if y = xi for some i such that 1 ≤ i < q − 1 and gcd(i, q − 1) = 1. Hence the number of primitive elements in F∗q is equal to ϕ(q − 1), where ϕ is Euler’s function. Theorem 7.2.36 Let F be a field. Let n be a positive integer that is relatively prime to the characteristic of F. The polynomial Φn (X) is in F[X], has as zeros ¯ ∗ of order n and has degree ϕ(n), where ϕ is Euler’s function. all elements in F Furthermore Y Xn − 1 = Φd (X). dn
Proof. The degree of Φn (X) is equal to the number i such that 0 ≤ i < n and gcd(i, n) = 1 which is by definition equal to ϕ(n). The power αi has order n if ¯ ∗ , then β and only if gcd(i, n) = 1. Conversely of order n in F Q if β is an element n n i i is a zero of X − 1 and X − 1 = 0≤i
dn gcd(i,d)=1,0≤i
dn
The fact that Φn (X) has coefficients in F is shown by induction on n. Now Φ1 (X) =Q X − 1 is in F[X]. Suppose that Φm (X) is in F[X] for all m < n. Then f (X) = n6=dn Φd (X) is in F[X], and X n − 1 = f (X)Φn (X). So X n − 1 is ¯ divisible by f (X) in F[X], and X n − 1 and f (X) are in F[X]. Hence Φn (X) is in F[X].
208
CHAPTER 7. CYCLIC CODES
Remark 7.2.37 The factorization of X n − 1 in cyclotomic polynomials gives a way to compute the Φn (X) recursively. Remark 7.2.38 The cyclotomic polynomial Φn (X) depends on the field F in Definition 7.2.33. But Φn (X) is universal in the sense that in characteristic zero it has integer coefficients and they do not depend on the field F. By reducing the coefficients of this polynomial modulo a prime p one gets the cyclotomic polynomial over any field of characteristic p. In characteristic zero Φn (X) is irreducible in Q[X] for all n. But Φn (X) is sometimes reducible in Fp [X]. Example 7.2.39 The polynomials Φ1 (X) = X − 1 and Φ2 (X) = X + 1 are irreducible in any characteristic, and X 2 − 1 = Φ1 (X)Φ2 (X). Now X 3 − 1 = Φ1 (X)Φ3 (X). Hence Φ3 (X) = X 2 + X + 1, and this polynomial is irreducible in Fp [X] if and only if F∗p has no element of order 3 if and only if p ≡ 2 mod 3. X 4 − 1 = Φ1 (X)Φ2 (X)Φ4 (X). So Φ4 (X) = X 2 + 1, and this polynomial is irreducible in Fp [X] if and only if p ≡ 3 mod 4. Proposition 7.2.40 Let f (X) be a polynomial with coefficients in Fq . If β is a zero of f (X), then β q is also a zero of f (X). Proof. Let f (X) = f0 + f1 X + · · · + fm X m ∈ Fq [X]. Then fiq = fi for all i. If β is a zero of f (X), then f (β) = 0. So 0 = f (β)q = (f0 + f1 β + · · · + fm β m )q = q qm f0q + f1q β q + · · · + fm β = f0 + f1 β q + · · · + fm β qm = f (β q ).
Hence β q is a zero of f (X).
Remark 7.2.41 In particular we have that mi (X) = mqi (X). Let g(X) be a generator polynomial of a cyclic code over Fq . If αi is a zero of g(X), then αqi is also a zero of g(X). Definition 7.2.42 The cyclotomic coset Cq (I) of the subset I of Zn with respect to q is the subset of Zn defined by Cq (I) = { iq j  i ∈ I, j ∈ N0 } If I = {i}, then Cq (I) is denoted by Cq (i). Proposition 7.2.43 The cyclotomic cosets Cq (i) give a partitioning of Zn for a given q such that gcd(q, n) = 1. Proof. Every i ∈ Zn is in the cylcotomic coset Cq (i). Suppose that Cq (i) and Cq (j) have an element in common. Then iq k = jq l for some k, l ∈ N0 . We may assume that k ≤ l, then i = jq l−k and l − k ∈ N0 . So iq m = jq l−k+m for all m ∈ N0 . Hence Cq (i) is contained in Cq (j). Now n and q are relatively prime, so q is invertible in Zn and q e ≡ 1( mod n) for some positive integer e. So jq m = iq (e−1)(l−k)+m for all m ∈ N0 . Hence Cq (j) is contained in Cq (i). Therefore we have shown that two cyclotomic cosets are equal or disjoint.
7.2. DEFINING ZEROS
209
Proposition 7.2.44 mi (X) =
Y
(X − αj )
j∈Cq (i)
Proof. QIf j ∈ Cq (i), then mi (αj ) = 0 by Proposition 7.2.40. Hence the product j∈Cq (i) (X − αj ) divides mi (X). Now raising to the qth power gives a permutation of the zeros αj with j ∈ Cq (i). The coefficients of the product of the linear factors X − αj are symmetric functions in the αj and therefore invariant under raising to the qth power. Hence these coefficients are elements of Fq and this product is an element of Fq [X] that has αi as a zero. Therefore equality holds by the minimality of mi (X). Proposition 7.2.45 Let n be a positive integer such that gcd(n, q) = 1. Then the number of choices of an element of order n in an extension of Fq is equal to ϕ(n). The possible choices of the minimal polynomial m1 (X) corresponds to monic irreducible factors of Φn (X) and the number of these choices is ϕ(n)/d, where d = Cq (1). Proof. The number of choices of an element of order n in an extension of Fq is ϕ(n) by Theorem 7.2.36. Let i ∈ Zn and gcd(i, n) = 1. Consider the map Cq (1) → Cq (i) defined by j 7→ ij. Then this map is well defined and has an inverse, since i is invertible in Zn . So Cq (1) = Cq (i) and the set of elements in Zn such that gcd(i, n) = 1 is partitioned in cyclotomic cosets of the samen size d by Proposition 7.2.43, and every choice of such a coset corresponds to a choice of m1 (X) and is an irreducible monic factor of Φn (X). Hence the number of possible minimal polynomials m1 (X) is ϕ(n)/d. Example 7.2.46 Let n = 11 and q = 3. Then ϕ(11) = 10. Consider the sequence starting with 1 and obtained by multiplying repeatedly with 3 modulo 11: 1, 3, 9, 27 ≡ 5, 15 ≡ 4, 12 ≡ 1. So C3 (1) = {1, 3, 4, 5, 9} consists of 5 elements. Hence Φ11 (X) has two irreducible factors in F3 [X] given by: Φ11 (X) =
X 11 − 1 = (−1 + X 2 − X 2 + X 4 + X 5 )(−1 − X + X 2 − X 3 + X 5 ). X −1
Example 7.2.47 Let n = 23 and q = 2. Then ϕ(23) = 22. Consider the sequence starting with 1 and obtained by multiplying repeatedly with 2 modulo 23: 1, 2, 4, 8, 16, 32 ≡ 9, 18, 36 ≡ 13, 26 ≡ 3, 6, 12, 24 ≡ 1. So C2 (1) = {1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 26} consists of 11 elements. Hence Φ23 (X) = (X 23 − 1)/(X − 1) is the product two irreducible factors in F2 [X] given by: (1 + X 2 + X 4 + X 5 + X 6 + X 10 + X 11 )(1 + X + X 5 + X 6 + X 7 + X 9 + X 11 ). Proposition 7.2.48 Let i and j be integers such that 0 < i, j < n. Suppose ij ≡ 1 mod n. Then mi (X) = gcd(m1 (X j ), X n − 1).
210
CHAPTER 7. CYCLIC CODES
Proof. Let β be a zero of gcd(m1 (X j ), X n −1). Then β is a zero of m1 (X j ) and X n − 1. So β = αl for some l and m1 (αjl ) = 0. Hence jl ∈ Cq (1) by Proposition 7.2.44. So jl ≡ q m mod n for some m. Hence l ≡ ijl ≡ iq m mod n. Therefore l ∈ Cq (i) and β is a zero of mi (X). Similarly, if β is a zero of mi (X), then β is a zero of gcd(m1 (X j ), X n − 1). Both polynomials are monic and have the same zeros and all zeros are simple by Remark 7.2.23. Therefore the polynomials are equal. Proposition 7.2.49 Let gcd(i, n) = d and j = n/d. Let α be an element of order n in F∗qe and β = αd . Let mi (X) be the minimal polynomial of αi and nj (X) the minimal polynomial of β j . Then β is an element of order n/d in F∗qe and mi (X) = nj (X). Proof. The map jq m 7→ jdq m gives a well defined oneto one correspondence between elements of D, the cyclotomic coset of j modulo n/d and the elements of C, the cyclotomic coset of i modulo n. Hence Y Y mi (X) = (X − αk ) = (X − β l ) = nj (X) k∈C
l∈D
by Proposition 7.2.44.
Example 7.2.50 Let α be an element of order 8 in an extension of F3 . Let m1 (X) be the minimal polynomial of α in F3 [X]. Then m1 (X) divides X 8 − 1. But X 8 − 1 = (X 4 − 1)(X 4 + 1) and the zeros of X 4 − 1 have order at most 4. The factorization of X 4 − 1 is given by X 4 − 1 = (X − 1)(X + 1)(X 2 + 1) with m0 (X) = X − 1 and m4 (X) = X + 1, since α4 = −1. The cyclotomic coset of 2 is C3 (2) = {2, 6} and α2 and α6 are the elements of order 4 in F∗9 . So m2 (X) = m6 (X) = Φ4 (X) = X 2 + 1. This confirms Proposition 7.2.49 with i = d = 2 and j = 1. Now C3 (1) = {1, 3} and C3 (5) = {5, 7}. So m1 (X) = m3 (X) and m5 (X) = m7 (X). Notice that −1 ≡ 7 mod 8 and m7 (X) is the monic reciprocal of m1 (X) by Proposition 7.2.29. The degree of m1 (X) is 2. Suppose m1 (X) = a0 + a1 X + X 2 . −1 2 Then m7 (X) = a−1 0 +a0 a1 X +X . The polynomials m1 (X) and m7 (X) divide X 4 + 1. Hence −1 2 Φ8 (X) = X 4 + 1 = m1 (X)m7 (X) = (a0 + a1 X + X 2 )(a−1 0 + a0 a1 X + X ).
Expanding the right hand side and comparing coefficients gives that a0 = −1 and a1 = 1 or a1 = −1. Hence there are two possible choices for m1 (X). Choose m1 (X) = X 2 + X − 1. So X 2 − X − 1 is the alternative choice for m1 (X). Furthermore α2 = −α + 1 and m5 (X) = m7 (X) = (X − α5 )(X − α7 ) by Proposition 7.2.44. An application of Proposition 7.2.48 with i = j = 5, gives a third way to compute m5 (X) since 5·5 ≡ 1 mod 8, and m1 (X 5 ) = X 10 +X 5 −1 and gcd(X 10 + X 5 − 1, X 8 − 1) = X 2 − X − 1.
7.2. DEFINING ZEROS
7.2.4
211
Zeros of the generator polynomial
We have seen in Proposition 7.1.15 that the generator polynomial divides X n −1, so its zeros are nth roots of unity if n is not divisible by the characteristic of Fq . Instead of describing a cyclic code by its generator polynomial g(X), one can describe the code alternatively by the set of zeros of g(X) in an extension of Fq . Definition 7.2.51 Fix an element α of order n in an extension Fqe of Fq . A subset I of Zn is called a defining set of a cyclic code C if C = {c(x) ∈ Cq,n : c(αi ) = 0 for all i ∈ I}. The root set, the set of zeros or the complete defining set Z(C) of C is defined as Z(C) = {i ∈ Zn : c(αi ) = 0 for all c(x) ∈ C}. Proposition 7.2.52 The relation between the generator polynomial g(X) of a cyclic code C and the set of zeros Z(C) is given by g(X) =
Y
(X − αi ).
i∈Z(C)
The dimension of C is equal to n − Z(C). Proof. The generator polynomial g(X) divides X n − 1 by Proposition 7.1.15. The polynomial X n − 1 has no multiple zeros, by Remark 7.2.23 since n and q are relatively prime. So every zero of g(X) is of the form αi for some i ∈ Zn and one. Let Z(g) = {i ∈ Zn  g(αi ) = 0}. Then g(X) = Q has multiplicity i i i∈Z(g) (X − α ). Let c(x) ∈ C. Then c(x) = a(x)g(x), so c(α ) = 0 for all i ∈ Z(g). So Z(g) ⊆ Z(C). Conversely, g(x) ∈ C, so g(αi ) = 0 for all i ∈ Z(C). Hence Z(C) ⊆ Z(g). Therefore Z(g) = Z(C) and g(X) is a product of the linear factors as claimed. Furthermore the degree of g(X) is equal to Z(C). Hence the dimension of C is equal to n − Z(C) by Proposition 7.1.21. Proposition 7.2.53 The complete defining set of a cyclic code is the disjoint union of cyclotomic cosets. Proof. Let g(X) be the generator polynomial of a cyclic code C. Then g(αi ) = 0 if and only if i ∈ Z(C) by Proposition 7.2.52. If αi is a zero of g(X), then αiq is a zero of g(X) by Remark 7.2.41. So Cq (i) is contained in Z(C) if i ∈ Z(C). Therefore Z(C) is the union of cyclotomic cosets. This union is a disjoint union by Proposition 7.2.43. Example 7.2.54 Consider the binary cyclic code C of length 7 with defining set {1}. Then Z(C) = {1, 2, 4} and m1 (X) = 1 + X + X 3 is the generator polynomial of C. Hence C is the cyclic Hamming code of Example 7.1.19. The cyclic code with defining set {3} has generator polynomial m3 (X) = 1+X 2 +X 3 and complete defining set {3, 5, 6}.
212
CHAPTER 7. CYCLIC CODES
Remark 7.2.55 If a cyclic code is given by its zero set, then this definition depends on the choice of an element of order n. Consider Example 7.2.54. If we would have taken α3 as element of order 7, then the generator polynomial of the binary cyclic code with defining set {1} would have been 1 + X 2 + X 3 instead of 1 + X + X 3 . Example 7.2.56 Consider the [6,3] cyclic code over F7 of Example 7.1.24 with the generator polynomial g(X) = 6 + X + 3X 2 + X 3 . Then X 3 + 3X 2 + X + 6 = (X − 2)(X − 3)(X − 6). So 2, 3 and 6 are the zeros of the generator polynomial. Choose α = 3 as the primitive element in F7 of order 6 as in Example 7.2.27. Then α, α2 and α3 are the zeros of g(X). Example 7.2.57 Let α be an element of F9 such that α2 = −α + 1 as in Example 7.2.50. Then 1, α and α3 are the zeros of the ternary cyclic code of length 8 with generator polynomial 1 + X + X 3 of Example 7.1.16, since X 3 + X + 1 = (X 2 + X − 1)(X − 1) = m1 (X)m0 (X). Proposition 7.2.58 Let C be a cyclic code of length n. Then Z(C ⊥ ) = Zn \ { −i  i ∈ Z(C) }. Proof. The power αi is a zero of g(X) if and only if i ∈ Z(C) by Proposition 7.2.52. And h(αi ) = 0 if and only if g(αi ) 6= 0, since h(X) = (X n − 1)/g(X) and all zeros of X n − 1 are simple by Remark 7.2.23. Furthermore g ⊥ (X) is the monic reciprocal of h(X) by Proposition 7.1.37. Finally g ⊥ (α−i ) = 0 if and only if h(αi ) = 0 by Remark 7.1.30. Example 7.2.59 Consider the [6,3] cyclic code C over F7 of Example 7.2.56. Then α, α2 and α3 are the zeros of g(X). Hence Z(C) = {1, 2, 3} and Z(C ⊥ ) = Z6 \ {−1, −2, −3} = {0, 1, 2}, by Proposition 7.2.58. Therefore g ⊥ (X) = (X − 1)(X − α)(X − α2 ) = (X − 1)(X − 3)(X − 2) = X 3 + X 2 + 4X + 1. This is in agreement with Example 7.1.38. Example 7.2.60 Let C be the ternary cyclic code of length 8 with the generator polynomial g(X) = 1 + X + X 3 of Example 7.1.16. Then g(X) = m0 (X)m1 (X) and Z(C) = {0, 1, 3} by Example 7.2.57. Hence Z(C ⊥ ) = Z8 \ {0, −1, −3} = {1, 2, 3, 4, 6}. Proposition 7.2.61 The number of cyclic codes of length n over Fq is equal to 2N , where N is the number of cyclotomic cosets modulo n with respect to q. Proof. A cyclic code C of length n over Fq is uniquely determined by its set of zeros Z(C) by Proposition 7.2.52. The set of zeros is a disjoint union of cyclotomic cosets modulo n with respect to q by Proposition 7.2.53. Hence a cyclic code is uniquely determined by a choice of a subset of all N cyclotomic cosets. There are 2N of such subsets.
7.2. DEFINING ZEROS
213
Example 7.2.62 There are 3 cyclotomic cosets modulo 7 with respect to 2. Hence there are 8 binary cyclic codes of length 7 with generator polynomials 1, m0 , m1 , m3 , m0 m1 , m0 m3 , m1 m3 , m0 m1 m3 .
7.2.5
Exercises
7.2.1 Show that f (X) = 1 + X 2 + X 5 is irreducible in F2 [X]. Give a principal representation of the product of β = 1 + x + x4 and γ = 1 + x3 + x4 in the factor ring F2 [X]/hf (X)i by division by f (X) with rest. Give a table of the principal representation and the Zech logarithm of the powers xi for i in {∗, 0, 1, . . . , 30}. Compute the addition of β and γ by means of the exponential representation. 7.2.2 What is the smallest field extension of Fq that has an element of order 37 in case q = 2, 3 and 5? Show that the degree of the extension is always a divisor of 36 for any prime power q not divisible by 37. 7.2.3 Determine Φ6 (X) in characteristic zero. Let p be an odd prime. Show that Φ6 (X) is irreducible in Fp [X] if and only if p ≡ 2 mod 3. 7.2.4 Let α be an element of order 8 in an extension of F5 . Give all possible choices of the minimal polynomial m1 (X). Compute the coefficients of mi (X) for all i between 0 and 7. 7.2.5 Let α be an element of order n in F∗qe . Let m1 (X) be the minimal polynomial of α in Fq . Estimate the total number of arithmetical operations in Fq to compute the minimal polynomial mi (X) by means of Proposition 7.2.44 if gcd(i, n) = 1 as a function of n and e. Compare this complexity with the computation by means of Proposition 7.2.48. 7.2.6 Let C be a cyclic code of length 7 over Fq . Show that {1, 2, 4} is a complete defining set if q is even. 7.2.7 Compute the zeros of the code of Example 7.1.5. 7.2.8 Show that α = 5 is an element of order 6 in F∗7 . Give the coefficients of the generator polynomial of the cyclic [6,3] code over F7 with α, α2 and α3 as zeros. 7.2.9 Consider the binary cyclic code C of length 31 and generator polynomial 1 + X 2 + X 5 of Exercise 7.1.6. Let α be a zero of this polynomial. Then α has order 31 by Exercise 7.2.1. 1) Determine the coefficients of m1 (X), m3 (X) and m5 (X) with respect to α. 2) Determine Z(C) and Z(C ⊥ ). 7.2.10 Let C be a cyclic code over F5 with m1 (X)m2 (X) as generator polynomial. Determine Z(C) and Z(C ⊥ ). 7.2.11 What is the number of ternary cyclic codes of length 8? 7.2.12 [CAS] Using a function MoebiusMu from GAP and Magma write a program that computes the number of irriducible polynomials of given degree as per Proposition 7.2.19. Check your result with the use of the function IrreduciblePolynomialsNr in GAP.
214
CHAPTER 7. CYCLIC CODES
7.2.13 [CAS] Take the field GF(2^10) and its primitive element a. Compute the Zech logarithm of a^100 with respect to a using commands ZechLog both in GAP and Magma. 7.2.14 [CAS] Using the function CyclotomicCosets in GAP/GUAVA write a function that takes as an input the code length n, the field size q and a list of integers L and computes dimension of a qary cyclic code which definig set is {a^ii in L} for some primitive nth root of unity a (predefined in GAP is fine).
7.3
Bounds on the minimum distance
*** *** The BCH bound is a lower bound for the minimum distance of a cyclic code. Although this bound is tight in many cases, it is not always the true minimum distance. In this section several improved lower bounds are given but not one of them gives the true minimum distance in all cases. In fact computing the true minimum distance of a cyclic code is a hard problem.
7.3.1
BCH bound
Definition 7.3.1 Let C be an Fq linear code. Let C˜ be an Fqm linear code in ˜ and C˜ is called a Fnqm . If C ⊆ C˜ ∩ Fnq , then C is called a subfield subcode of C, super code of C. If equality holds, then C is called the restriction (by scalars) ˜ of C. Remark 7.3.2 Let I be a defining set for the cyclic code C. Then c(αi ) = c0 + c1 αi + · · · + cj αij + · · · + cn−1 αi(n−1) = 0 ˜ be the l × n matrix with entries for all i ∈ I. Let l = I. Let H ( αij  i ∈ I, j = 0, 1, . . . , n − 1 ). ˜ as parity check matrix. Then C is a subLet C˜ be the Fqm linear code with H ˜ field subcode of C, and it is in fact its restriction by scalars. Any lower bound on the minimum distance of C˜ holds a fortiori for C. This remark will be used in the following proposition on the BCH (BoseChaudhuriHocquenghem) bound on the minimum distance for cyclic codes. Proposition 7.3.3 Let C be a cyclic code that has at least δ − 1 consecutive elements in Z(C). Then the minimum distance of C is at least δ. Proof. The complete defining set C contains {b ≤ i ≤ b + δ − 2} for a certain b. We have seen in Remark 7.3.2 that ( αij  b ≤ i ≤ b+δ −2, 0 ≤ j < n ) is a parity check matrix of a code C˜ over Fqm that has C as a subfield subcode. The jth column has entries αbj αij , 0 ≤ i ≤ δ − 2. The code C˜ is generalized equivalent with the code with parity check matrix H˜ 0 = ( αij ) with 0 ≤ i ≤ δ−2, 0 ≤ j < n, by the linear isometry that divides the jth coordinate by αbj for 0 ≤ j < n. Let xj = αj−1 for 1 ≤ j ≤ n. Then H˜ 0 = ( xij  0 ≤ i ≤ δ − 2, 1 ≤ j ≤ n ) is a
7.3. BOUNDS ON THE MINIMUM DISTANCE
215
generator matrix of an MDS code over Fqm with parameters [n, δ − 1, n − δ + 2] by Proposition 3.2.10. So H˜ 0 is a parity check matrix of a code with parameters [n, n − δ + 1, δ] by Proposition 3.2.7. Hence the minimum distance of C˜ and C is at least δ. Definition 7.3.4 A cyclic code with defining set {b, b+1, . . . , b+δ −2} is called a BCH code with designed minimum distance δ. The BCH code is called narrow sense if b = 1, and it is called primitive if n = q m − 1. Example 7.3.5 Consider the binary cyclic Hamming code C of length 7 of Example 7.2.28. The complete defining set of C is {1, 2, 4} and contains two consecutive elements. So 3 is a lower bound for the minimum distance. This is equal to the minimum distance. Let D be the binary cyclic code of length 7 with defining set {0, 3}. Then the complete defining set of D is {0, 3, 5, 6}. So 5, 6, 7 are three consecutive elements in Z(D), since 7 ≡ 0 mod 7. So 4 is a lower bound for the minimum distance of D. In fact equality holds, since D is the dual of C that is the [7, 3, 4] binary simplex code. Example 7.3.6 Consider the [6,3] cyclic code C over F7 of Example 7.2.56. The zeros of the generator polynomial are α, α2 and α3 . So Z(C) = {1, 2, 3} and d(C) ≥ 4. Now g(x) = 6 + x + 3x2 + x3 is a codeword of weight 4. Hence the minimum distance is 4. Remark 7.3.7 If α and β are both elements of order n, then there exist r, s in Zn such that β = αr and α = β s and rs ≡ 1mod n by Theorem 7.2.36. If C is the cyclic code with defining set I with respect to α, and D is the cyclic code with defining set I with respect to β, then C and D are equivalent under the permutation σ of Zn such that σ(0) = 0 and σ(i) = ir for i = 1, . . . , n − 1. Hence a cyclic code that is given by its zero set is defined up to equivalence by the choice of an element of order n. Example 7.3.8 Consider the binary cyclic code C1 of length 17 and m1 (X) as generator polynomial. Then Z1 = {−, 1, 2, −, 4, −, −, −, 8, 9, −, −, −, 13, −, 15, 16} is the complete defining set of C1 , where the spacing − indicates a gap. The BCH bound gives 3 as a lower bound for the minimum distance of C1 . The code C3 with generator polynomial m3 (X) has complete defining set Z3 = {−, −, −, 3, −, 5, 6, 7, −, −, 10, 11, 12, −, 14, −, −}. Hence 4 is a lower bound for d(C3 ). The cyclic codes of length 17 with generator polynomial mi (X) are equivalent if i 6= 0, by Remark 7.3.7, since the order of αi is 17. Hence d(C1 ) = d(C3 ) ≥ 4. In Example 7.4.2 it will be shown that the minimum distance is in fact 5. The following definition does not depend on the choice of an element of order n.
216
CHAPTER 7. CYCLIC CODES
Definition 7.3.9 A subset of Zn of the form {b + ia0 ≤ i ≤ δ − 2} for some integers a, b and δ with gcd(a, n) = 1 and δ ≤ n + 1 is called consecutive of period a. Let I be a subset of Zn . The number δBCH (I) is the largest integer δ ≤ n+1 such that there is a consecutive subset of I consisting of δ −1 elements. Let C be a cyclic code of length n. Then δBCH (Z(C)) is denoted by δBCH (C). Theorem 7.3.10 The minimum distance of C is at least δBCH (C), Proof. Let α be an element of order n in an extension of Fq . Suppose that the complete defining set of C with respect to α contains the set {b+ia0 ≤ i ≤ δ−2} of δ −1 elements for some integers a and b with gcd(a, n) = 1. Let β = αa . Then β is an element of order n and there is an element c ∈ Zn such that ac = 1, since gcd(a, n) = 1. Hence {bc + i0 ≤ i ≤ δ − 2} is a defining set of C with respect to β containing δ − 1 consecutive elements. Hence the minimum distance of C is at least δ by Proposition 7.3.3. Remark 7.3.11 One easily sees a consecutive set of period one in Z(C) by writing the elements in increasing order and the gaps by a spacing as done in Example 7.3.8. Suppose gcd(a, n) = 1. Then there exists a b such that ab ≡ 1 mod n. A consecutive set of period a is seen by considering b · Z(C) and its consecutive sets of period 1. In this way one has to inspect ϕ(n) complete defining sets for its consecutive sets of period 1. The complexity of this computation is at most ϕ(n)Z(C) in the worst case. But quite often it is much less in case b · Z(C) = Z(C). Example 7.3.12 This is a continuation of Example 7.3.8. The complete defining set Z3 of the code C3 has {5, 6, 7} as largest consecutive subset of period 1 in Z17 . Now 3 · 6 ≡ 1 mod 17 and 6 · {5, 6, 7} = {13, 2, 8} is a consecutive subset of period 6 in Z17 of three elements contained in the complete defining set Z1 of the code C1 . Now b · Z1 is equal to Z1 or Z3 for all 0 < b < 17. Hence δBCH (C1 ) = δBCH (C3 ) = 4. Example 7.3.13 Consider the binary BCH code Cb of length 15 and with defining set {b, b + 1, b + 2, b + 3} for some b. So its designed distance is 5. Take α in F∗16 with α4 = 1 + α as primitive element. Then m0 (X) = 1 + X, m1 (X) = 1+X +X 4 , m3 (X) = 1+X +X 2 +X 3 +X 4 and m5 (X) = 1+X +X 2 . If b = 1, then the complete defining set is {1, 2, 3, 4, 6, 8, 9, 12} so δBCH (C1 ) = 5. The generator polynomial is g1 (X) = m1 (X)m3 (X) = 1 + X 4 + X 6 + X 7 + X 8 as is shown in Example 7.1.39 and has weight 5. So the minimum distance of C1 is 5. If b = 0, then δBCH (C0 ) = 6. The generator polynomial is g0 (X) = m0 (X)m1 (X)m3 (X) = 1 + X + X 4 + X 5 + X 6 + X 9 and has weight 6. So the minimum distance C0 is 6. If b = 2 or b = 3, then δBCH (C2 ) = 7. The generator polynomial is g2 (X) is equal to 1 + X + X 2 + X 4 + X 5 + X 8 + X 10 and has weight 7. So the minimum distance C2 is 7. If b = 4 or b = 5, then δBCH (C4 ) = 15. The generator polynomial is g4 (X) is equal to 1 + X + X 2 + · · · + X 12 + X 13 + X 14 and has weight 15. So the minimum distance C4 is 15.
7.3. BOUNDS ON THE MINIMUM DISTANCE
217
Example 7.3.14 Consider the primitive narrow sense BCH code of length 15 over F16 with designed distance 5. So the defining set is {1, 2, 3, 4}. Then this is also the complete defining set. Take α with α4 = 1 + α as primitive element. Then the generator polynomial is given by (X − α)(X − α2 )(X − α3 )(X − α4 ) = α10 + α3 X + α6 X 2 + α13 X 3 + X 4 . In all these cases of the previous two examples the minimum distance is equal to the BCH bound and equal to the weight of the generator polynomial. This in not always the case as one see in Exercise 7.3.8 Example 7.3.15 Although BCH codes are a special case of codes defined through roots as in Section 7.2.4, GAP and Magma have special functions for constructing these. In GAP/GUAVA we proceed as follows. > C:=BCHCode(17,3,GF(2)); a cyclic [17,9,3..5]3..4 BCH code, delta=3, b=1 over GF(2) > DesignedDistance(C); 3 > MinimumDistance(C); 5 Syntax is BCHCode(n,delta,F), where n is the length, delta is δ in Definition 7.3.4, and F is the ground field. So here we constructed the narrowsense BCH code. One can give the parameter b explicitly, by the command BCHCode(n,b,delta,F). The designed distance for a BCH code is printed in its description, but can also be called separately as above. Note that code C coincides with the code CR from Example 12.5.17. In Magma we proceed as follows. > C:=BCHCode(GF(2),17,3); // [17, 9, 5] "BCH code (d = 3, b = 1)" // Linear Code over GF(2) > a:=RootOfUnity(17,GF(2)); > C:=CyclicCode(17,[a^3],GF(2)); > BCHBound(C); 4 We can also provide b giving it as the last parameter in the BCHCode command. Note that there is a possibility in Magma to compute the BCH bound as above.
7.3.2
Quadratic residue codes
***
7.3.3
Hamming, simplex and Golay codes as cyclic codes
Example 7.3.16 Consider an Fq linear cyclic code of length n = (q r −1)/(q−1) with defining set {1}. Let α be an element of order n in F∗qr . The minimum distance of the code is at least 2, by the BCH bound. If gcd(r, q − 1) = i > 1, then i divides n, since n=
qr − 1 = q r−1 + · · · + q + 1 ≡ r q−1
mod (q − 1).
Let j = n/i. Let c0 = −αj . Then c0 ∈ F∗q , since j(q − 1) = n(q − 1)/i is a multiple of n. So c(x) = c0 + xj is a codeword of weight 2 and the minimum
218
CHAPTER 7. CYCLIC CODES
distance is 2. Now consider the case with q = 3 and r = 2 in particular. Then α ∈ F∗9 is an element of order 4 and c(x) = 1 + x2 is a codeword of the ternary cyclic code of length 4 with defining set {1}. So this code has parameters [4,2,2]. Proposition 7.3.17 Let n = (q r −1)/(q −1). If r is relatively prime with q −1, then the Fq linear cyclic code of length n with defining set {1} is a generalized [n, n − r, 3] Hamming code. Proof. Let α be an element of order n in F∗qr . The minimum distance of the code is at least 2 by the BCH bound. Suppose there is a codeword c(x) of weight 2 with nonzero coefficients ci and cj with 0 ≤ i < j < n. Then c(α) = 0. So ci αi + cj αj = 0. Hence αj−i = −ci /cj . Therefore α(j−i)(q−1) = 1, since −ci /cj ∈ F∗q . Now n(j − i)(q − 1), but since gcd(n, q − 1) = gcd(r, q − 1) = 1 by assumption, it follows that nj −i, which is a contradiction. Hence the minimum distance is at least 3. Therefore the parameters are [n, n − r, 3] and the code is equivalent with the Hamming code Hr (q) by Proposition 2.5.19. Corollary 7.3.18 The simplex code Sr (q) is equivalent with a cyclic code if r is relatively prime with q − 1. Proof. The dual of a cyclic code is cyclic by Proposition 7.1.6 and a simplex code is by definition the dual of a Hamming code. So the statement is a consequence of Proposition 7.3.17. Proposition 7.3.19 The binary cyclic code of length 23 with defining set {1} is equivalent to the binary [23,12,7] Golay code. Proof. ***
Proposition 7.3.20 The ternary cyclic code of length 11 with defining set {1} is equivalent to the ternary [11,6,5] Golay code. Proof. ***
*** Show that there are two generator polynomials of a ternary cyclic code of length 11 with defining set {1}, depending on the choice of an element of order 11. Give the coefficients of these generator polynomials. ***
7.3.4
Exercises
7.3.1 Let C be the binary cyclic code of length 9 and defining set {0, 1}. Give the BCH bound of this code. 7.3.2 Show that a nonzero binary cyclic code of length 11 has minimum distance 1, 2 or 11. 7.3.3 Choose the primitive element as in Exercise 7.2.9. Give the coefficients of the generator polynomial of a cyclic H5 (2) Hamming code and give a word of weight 3.
7.4. IMPROVEMENTS OF THE BCH BOUND
219
7.3.4 Choose the primitive element as in Exercise 7.2.9. Consider the binary cyclic code C of length 31 and generator polynomial. m0 (X)m1 (X)m3 (X)m5 (X). Show that C has dimension 15 and δBCH (C) = 8. Give a word of weight 8. 7.3.5 Determine δBCH (C) for all the binary cyclic codes C of length 17. 7.3.6 Show the existence of a binary cyclic code of length 127, dimension 64 and minimum distance at least 21. 7.3.7 Let C be the ternary cyclic code of length 13 with complete defining set {1, 3, 4, 9, 10, 12}. Show that δBCH (C) = 5 and that it is the true minimum distance. 7.3.8 Consider the binary code C of length 21 and defining set {1}. 1)Show that there are exactly two binary irreducible polynomials of degree 6 that have as zeros elements of order 21. 2)Show that the BCH bound and the minimum distance are both equal to 3. 3) Conclude that the minimum distance of a cyclic code is not always equal to the minimal weight of the generator polynomials of all equivalent cyclic codes.
7.4
Improvements of the BCH bound
***.....***
7.4.1
HartmannTzeng bound
Proposition 7.4.1 Let C be a cyclic code of length n with defining set I. Let U1 and U2 be two consecutive sets in Zn consisting of δ1 − 1 and δ2 − 1 elements, respectively. Suppose that U1 + U2 ⊆ I. Then the minimum distance of C is at least δ1 + δ2 − 2. Proof. This is a special case of the forthcoming Theorem 7.4.19 and Proposition 7.4.20. ***direct proof*** Example 7.4.2 Consider the binary cyclic code C3 of length 17 and defining set {3} of Example 7.3.8. Then Proposition 7.4.1 applies with U1 = {5, 6, 7}, U2 = {0, 5}, δ1 = 4 and δ2 = 3. Hence the minimum distance of C3 is at least 5. The factorization of 1 + X 17 in F2 [X] is given by (1 + X)(1 + X 3 + X 4 + X 5 + X 8 )(1 + X + X 2 + X 4 + X 6 + X 7 + X 8 ). Let α be a zero of the second factor. Then α is an element of F28 of order 17. Hence m1 (X) is the second factor and m3 (X) is the third factor. Now 1 + x3 + x4 + x5 + x8 is a codeword of C1 of weight 5. Furthermore C1 and C3 are equivalent. Hence d(C3 ) = 5. Definition 7.4.3 For a subset I of Zn , let δHT (I) be the largest number δ such that there exist two nonempty consecutive sets U1 and U2 in Zn consisting of δ1 − 1 and δ2 − 1 elements, respectively, with U1 + U2 ⊆ I and δ = δ1 + δ2 − 2. Let C be a cyclic code of length n. Then δHT (Z(C)) is denoted by δHT (C).
220
CHAPTER 7. CYCLIC CODES
Theorem 7.4.4 The HartmannTzeng bound. Let I be the complete defining set of a cyclic code C. Then the minimum distance of C is at least δHT (I). Proof. This is a consequence of Definition 7.4.3 and Proposition 7.4.1.
Proposition 7.4.5 Let I be a subset of Zn . Then δHT (I) ≥ δBCH (I). Proof. If we take U1 = U , U2 = {0}, δ1 = δ and δ2 = 2 in the HT bound, then we get the BCH bound. Remark 7.4.6 In computing δHT (I) one considers all a·I with gcd(a, n) = 1 as in Remark 7.3.11. So we may assume that U1 is a consecutive set of period one. Let S(U1 ) = {i ∈ Zn i + U1 ⊆ I} be the shift set of U1 . Then U1 + S(U1 ) ⊆ I. Furthermore if U1 + U2 ⊆ I, then U2 ⊆ S(U1 ). Take a consecutive subset U2 of S(U1 ). This gives all desired pairs (U1 , U2 ) of consecutive subsets in order to compute δHT (I). Example 7.4.7 Consider Example 7.4.2. Then U1 is a consecutive subset of period one of Z3 and U2 = S(U1 ) is a consecutive subset of period five. And U10 = {1, 2} is a consecutive subset of period one of Z1 and U20 = S(U10 ) = {0, 7, 14} is a consecutive subset of period seven. The choices of (U1 , U2 ) and (U10 , U20 ) are both optimal. Hence δHT (Z1 ) = 5. Example 7.4.8 Let C be the binary cyclic code of length 21 and defining set {1, 3, 7, 9}. Then I = {−, 1, 2, 3, 4, −, 6, 7, 8, 9, −, 11, 12, −, 14, 15, 16, −, 18, −, −} is the complete defining set of C. From this we conclude that δBCH (I) ≥ 5 and δHT (I) ≥ 6. By considering 5 · I one concludes that in fact equalities hold. But we show in Example 7.4.17 that the minimum distance of C is strictly larger than 6.
7.4.2
Roos bound
The Roos bound is first formulated for arbitrary linear codes and afterwards applied to cyclic codes. Definition 7.4.9 Let a, b ∈ Fnq . Define the star product a∗b by the coordinate wise multiplication: a ∗ b = (a1 b1 , . . . , an bn ). Let A and B be subsets of Fnq . Define A ∗ B = { a ∗ b  a ∈ A, b ∈ B }. Remark 7.4.10 If A and B are subsets of Fnq , then (A ∗ B)⊥ = { c ∈ Fnq  (a ∗ b) · c = 0 for all a ∈ A, b ∈ B }
7.4. IMPROVEMENTS OF THE BCH BOUND
221
is a linear subspace of Fq . But if A and B are linear subspaces of Fnq , then A ∗ B is not necessarily a linear subspace. See Example 9.1.3. Consider the star product combined with the inner product. Then (a ∗ b) · c =
n X
ai bi ci .
i=1
Hence a · (b ∗ c) = (a ∗ b) · c. Proposition 7.4.11 Let C be an Fq linear code of length n. Let (A, B) be a pair of Fqm linear codes of length n such that C ⊆ (A ∗ B)⊥ . Assume that A is not degenerate and k(A)+d(A)+d(B ⊥ ) ≥ n+3. Then d(C) ≥ k(A)+d(B ⊥ )−1. Proof. Let a = k(A) − 1 and b = d(B ⊥ ) − 1. Let c be a nonzero element of C with support I. If I ≤ b, then take i ∈ I. There exists an a ∈ A such that ai 6= 0, since A is not degenerate. So a ∗ c is not zero. Now (c ∗ a) · b = c · (a ∗ b) by Remark 7.4.10 and this is equal to zero for all b in B, since C ⊆ (A ∗ B)⊥ . Hence a ∗ c is a nonzero element of B ⊥ of weight at most b. This contradicts d(B ⊥ ) > b. So b < I. If I ≤ a + b, then we can choose index sets I− and I+ such that I− ⊆ I ⊆ I+ and I− has b elements and I+ has a + b elements. Recall from Definition 4.4.10 that A(I+ \ I− ) is defined as the space {a ∈ Aai = 0 for all i ∈ I+ \ I− }. Now k(A) > a and I+ \ I− has a elements. Hence A(I+ \ I− ) is not zero. Let a be a nonzero element of A(I+ \ I− ). The vector c ∗ a is an element of B ⊥ and has support in I− . Furthermore I−  = b < d(B ⊥ ), hence a ∗ c = 0, so ai = 0 for all i ∈ I+ . Therefore a is a nonzero element of A of weight at most n − I+  = n − (a + b), which contradicts the assumption d(A) > n − (a + b). So I > a + b. Therefore d(C) ≥ a + b + 1 = k(A) + d(B ⊥ ) − 1. In order to apply this proposition to cyclic codes some preparations are needed. Definition 7.4.12 Let U be a subset of Zn . Let α be an element of order n in F∗qm . Let CU be the code over Fqm of length n generated by the elements (1, αi , . . . , αi(n−1) ) for i ∈ U . Then U is called a generating set of CU . Let dU be the minimum distance of the code CU⊥ . Remark 7.4.13 Notice that CU and its dual are codes over Fqm . Every subset U of Zn is a complete defining set with respect to q m , since n divides q m − 1, so q m U = U . Furthermore CU has dimension U . The code CU is cyclic, since σ(1, αi , . . . , αi(n−1) ) = α−i (1, αi , . . . , αi(n−1) ). U is the complete defining set of CU⊥ . So dU ≥ δBCH (U ). Beware that dU is by definition the minimum distance of CU⊥ over Fqm and not of the cyclic code over Fq with defining set U . Remark 7.4.14 Let U and V be subsets of Zn . Let w ∈ U +V . Then w = u+v with u ∈ U and v ∈ V . So (1, αw , . . . , αw(n−1) ) = (1, αu , . . . , αu(n−1) ) ∗ (1, αv , . . . , αv(n−1) ) Hence CU ∗ CV ⊆ CU +V . ⊥
Therefore C ⊆ (CU ∗ CV )
if C is a cyclic code with U + V in its defining set.
222
CHAPTER 7. CYCLIC CODES
¯ be a consecutive set containing Remark 7.4.15 Let U be a subset of Zn . Let U U . Then U is the complete defining set of CU⊥ . Hence Zn \ {−ii ∈ U } is the ¯ } is a complete defining set of CU by Proposition 7.2.58. Then Zn \ {−ii ∈ U ¯  that is contained in the defining set of CU . Hence consecutive set of size n − U ¯  + 1 by the BCH bound. the minimum distance of CU is at least n − U Proposition 7.4.16 Let U ¯ . Let V be consecutive set U be a cyclic code of length n the minimum distance of C
be a nonempty subset of Zn that is contained in the ¯  ≤ U  + dV − 2. Let C a subset of Zn such that U such that U + V is in the set of zeros of C. Then is at least U  + dV − 1.
Proof. Let A and B be the cyclic codes with generating sets U and V , respectively. Then A has dimension U  by Remark 7.4.13 and its minimum ¯  + 1 by Remark 7.4.15. A generating matrix of A distance is at least n − U has no zero column, since otherwise A would be zero, since A is cyclic; but A is not zero, since U is not empty. So A is not degenerate. Moreover d(B ⊥ ) = dV , ¯  + 1) + dV by Definition 7.4.12. Hence k(A) + d(A) + d(B ⊥ ) ≥ U  + (n − U ¯  ≤ U  + dV − 2. Finally C ⊆ (A ∗ B)⊥ by which is at least n + 3, since U Remark 7.4.14. Therefore all assumptions of Proposition 7.4.25 are fulfilled. Hence d(C) ≥ k(A) + d(B ⊥ ) − 1 = U  + dV − 1. Example 7.4.17 Let C be the binary cyclic code of Example 7.4.8. Let U = ¯ = 4 · {0, 1, 2, 3, 4, 5} is a consecutive set 4 · {0, 1, 3, 5} and V = {2, 3, 4}. Then U and dV = 4. By inspection of the table + 2 3 4
0 2 3 4
4 6 7 8
12 14 15 16
20 1 2 3
we see that U + V is contained in the complete defining set of C. Furthermore ¯  = 6 = U  + dV − 2. Hence d(C) ≥ 7 by Proposition 7.4.16. The alternative U choice with U 0 = 4 · {0, 1, 2, 3, 5, 6}, U¯0 = 4 · {0, 1, 2, 3, 4, 5, 6} and V 0 = {3, 4} gives d(C) ≥ 8 by the Roos bound. This in fact is the true minimum distance. Definition 7.4.18 Let I be a subset of Zn . Denote by δR (I) the largest number δ such that there exist nonempty subsets U and V of Zn and a consecutive set ¯ with U ⊆ U ¯ , U + V ⊆ I and U ¯  ≤ U  + dV − 2 = δ − 1. U Let C be a cyclic code of length n. Then δR (Z(C)) is denoted by δR (C). Theorem 7.4.19 The Roos bound. The minimum distance of a cyclic code C is at least δR (C). Proof. This is a consequence of Proposition 7.4.16 and Definition 7.4.18.
Proposition 7.4.20 Let I be a subset of Zn . Then δR (I) ≥ δHT (I). Proof. Let U1 and U2 be nonempty consecutive subsets of Zn of sizes δ1 − 1 ¯ = U1 and V = U2 . Now dV = δ2 ≥ 2, since and δ2 − 1, respectively. Let U = U ¯  ≤ U  + dV − 2. Applying Proposition 7.4.16 gives V is not empty. Hence U δR (I) ≥ U  + dV − 1 ≥ δ1 + δ2 − 2. Hence δR (I) ≥ δHT (I). Example 7.4.21 Examples 7.4.8 and 7.4.17 give a subset I of Z21 such that δBCH (I) < δHT (I) < δR (I).
7.4. IMPROVEMENTS OF THE BCH BOUND
7.4.3
223
AB bound
Remark 7.4.22 In 3.1.2 we defined for every subset I of {1, . . . , n} the projection map πI : Fnq → Ftq by πI (x) = (xi1 , . . . , xit ), where I = {i1 , . . . , it } and 1 ≤ i1 < . . . < it ≤ n. We denoted the image of πI by A(I) and the kernel of πI by A(I), that is A(I) = {a ∈ A  ai = 0 for all i ∈ I}. Suppose that dim A = k and I = t. If t < d(A⊥ ), then dim A(I) = k − t by Lemma 4.4.13, and therefore dim A(I) = t. The following proposition is known for cyclic codes as the AB or the van LintWilson bound. Proposition 7.4.23 Let A, B and C be linear codes of length n over Fq such that (A ∗ B) ⊥ C and d(A⊥ ) > a > 0 and d(B ⊥ ) > b > 0. Then d(C) ≥ a + b. Proof. Let c be a nonzero codeword in C with support I, that is to say I = {i  ci 6= 0}. Let t = I. Without loss of generality we may assume that a ≤ b. We have that if t ≤ a 2t a + t if a < t ≤ b dim(AI ) + dim(BI ) ≥ a + b if b < t by Remark 7.4.22. But (A ∗ B) ⊥ C, so (c ∗ A)I ⊥ BI . Moreover dim((c ∗ A)I ) = dim(AI ), since ci 6= 0 for all i ∈ I. Therefore dim(AI ) + dim(BI ) ≤ I = t. This is only possible in case t ≥ a + b. Hence d(C) ≥ a + b.
Example 7.4.24 Consider the binary cyclic code of length 21 and defining set {0, 1, 3, 7}. Then the complete defining set of this code is given by I = {0, 1, 2, 3, 4, −, 6, 7, 8, −, −, 11, 12, −, 14, 1−, 16, −, −, −, −}. We leave it as an exercise to show that δBCH (I) = δHT (I) = δR (I) = 6. Application of the AB bound to U = {1, 2, 3, 6} and V = {0, 1, 5} gives that the minimum distance is at least 7. The minimum distance is at least 8, since it is an even weight code. Remark 7.4.25 Let C be an Fq linear code of length n. Let (A, B) be a pair of Fqm linear codes of length n. Let a = k(A) − 1 and b = d(B ⊥ ) − 1, then one can restate the conditions of Proposition as follows: If (1) (A ∗ B) ⊥ C, (2) k(A) > a, (3) d(B ⊥ ) > b, (4) d(A) + a + b > n and (5) d(A⊥ ) > 1, then d(C) ≥ a + b + 1. The original proof given by Van Lint and Wilson of the Roos bound is as follows. Let A be a generator matrix of A. Let AI be the submatrix of A consisting of the columns indexed by I. Then rank(AI ) = dim(AI ). Condition (5) implies that A has no zero column, so rank(AI ) ≥ 1 for all I with at least one element. Let I be an index set such that I ≤ a + b, then any two words of A differ in
224
CHAPTER 7. CYCLIC CODES
at least one place of I, since d(A) > n − (a + b) ≥ n − I, by Condition (4). So A and AI have the same number of codewords, so rank(AI ) ≥ k(A) ≥ a + 1. Hence for any I such that b < I ≤ a + b we have that rank(AI ) ≥ I − b + 1. Let B be a generator matrix of B. Then Condition (3) implies: I if I ≤ b rank(BI ) = ≥ b if I > b by Remark 7.4.22. Therefore, rank(AI ) + rank(BI ) > I for I ≤ a + b Now let c be a nonzero element of C with support I, then rank(AI )+rank(BI ) ≤ I, as we have seen in the proof of Proposition 7.4.23. Hence I > a + b, so d(C) > a + b. Example 7.4.26 In this example we show that the assumption that A is nondegenerate is necessary. Let A, B ⊥ and C be the binary codes with generating matrices (011), (111) and (100), respectively. Then A ∗ C ⊆ B ⊥ and k(A) = 1, d(A) = 2, n = 3 and d(B ⊥ ) = 3, so k(A) + d(A) + d(B ⊥ ) = 6 = n + 3, but d(C) = 1.
7.4.4
Shift bound
***....*** Definition 7.4.27 Let I be a subset of Zn . A subset A of Zn is called independent with respect to I if it can be obtained by the following rules: (I.1) the empty set is independent with respect to I. (I.2) if A is independent with respect to I and A is a subset of I and b ∈ Zn is not an element of I, then A ∪ {b} is independent with respect to I. (I.3) if A is independent with respect to I and c ∈ Zn , then c+A is independent with respect to I, where c + A = {c + a  a ∈ A}. Remark 7.4.28 The name ”shifting” refers to condition (I.3). A set A is independent with respect to I if and only if there exists a sequence of sets A1 , . . . , Aw and elements a1 , . . . , aw−1 and b0 , b1 , . . . , bw−1 in Zn such that A1 = {b0 } and A = Aw and furthermore Ai+1 = (ai + Ai ) ∪ {bi } and ai + Ai is a subset of I and bi is not an element of I. Then Ai = {bl−1 +
Pi−1
j=l aj
 l = 1, . . . , i },
and all Ai are independent with respect to I. Let i1 , i2 , . . . , iw and j1 , j2 , . . . , jw be new sequences which are obtained from the sequences a1 , . . . , aw−1 and b0 , b1 . . . , bw−1 by: iw = 0, iw−1 = a1 , . . . , iw−k = a1 + · · · + ak and jk = bk−1 − iw−k+1 .
7.4. IMPROVEMENTS OF THE BCH BOUND
225
These data can be given in the following table
aw−1 aw−2 .. .
Aw Aw−1 .. .
j1 i1 + j1 i2 + j1 .. .
j2 i1 + j2 i2 + j2 .. .
j3 i1 + j3 i2 + j3 .. .
a2 a1
A3 A2 A1
a1 + a2 + b0 a1 + b0 b0
a 2 + b1 b1
b2
... ... ...
jw−1 i1 + jw−1 bw−2
jw bw−1
+ i1 i2 .. . iw−2 iw−1 iw
with the elements of At as rows in the middle part. The enumeration of the At is from the bottom to the top, and the bi are on the diagonal. In the first row and the last column the jl and the ik are tabulated, respectively. The sum ik + jl is given in the middle part. By this transformation it is easy to see that a set A is independent with respect to I if and only if there exist sequences i1 , i2 , . . . , iw and j1 , j2 , . . . , jw such that A = {i1 + jl  1 ≤ l ≤ w} and ik + jl ∈ I for all l + k ≤ w and ik + jl 6∈ I for all l + k = w + 1. So the entries in the table above the diagonal are elements of I, and on the diagonal are not in I. Notice that in this formulation we did not assume that the sets {ik  1 ≤ k ≤ w}, {jl  1 ≤ l ≤ w} and A have size w, since this is a consequence of this definition. If for instance ik = ik0 for some 1 ≤ k < k 0 ≤ w, then ik + jw+1−k0 = ik0 + jw+1−k0 6∈ I, but ik + jw+1−k0 ∈ I, which is a contradiction. Definition 7.4.29 For a subset Z of Zn , let µ(Z) be the maximal size of a set which is independent with respect to Z. Define the shift bound bound for a subset I of Zn as follows: δS (I) = min{ µ(Z)  I ⊆ Z ⊆ Zn , Z 6= Zn and Z a complete defining set }. Theorem 7.4.30 The minimum distance of C(I) is at least δS (I). The proof of this theorem will be given at the end of this section. Proposition 7.4.31 The following inequality holds: δS (I) ≥ δHT (I). Proof. There exist δ, s and a such that gcd(a, n) = 1 and δHT (I) = δ + s and {i + j + ka  1 ≤ j < δ, 0 ≤ k ≤ s} ⊆ I. Suppose Z is a complete defining set which contains I and is not equal to Zn . Then there exists a δ 0 ≥ δ such that i + j ∈ Z for all 1 ≤ j < δ 0 and i + δ 0 6∈ Z.
226
CHAPTER 7. CYCLIC CODES
The set {i + j + ka  1 ≤ j < δ, k ∈ Zn } is equal to Zn , since gcd(a, n) < δ. So there exist s0 ≥ s and j 0 such that i + j + ka ∈ Z for all 1 ≤ j < δ and 0 ≤ k ≤ s0 , and 1 ≤ j 0 < δ and i + j 0 + (s0 + 1)a 6∈ Z. Let w = δ + s0 . Let ik = (k − 1)a for all 1 ≤ k ≤ s0 + 1, and ik = δ 0 − δ − s0 − 1 + k for all k such that s0 + 2 ≤ k ≤ δ + s0 . Let jl = i + l for all 1 ≤ l ≤ δ − 1, and let jl = i + j 0 + (l − δ + 1)a for all l such that δ ≤ l ≤ δ + s0 . Then one easily checks that ik + jl ∈ Z for all k + l ≤ w, and ik + jw−k+1 = i + j 0 + (s0 + 1)a 6∈ Z for all 1 ≤ k ≤ s0 +1, and ik +jw−k+1 = i+δ 0 6∈ Z for all s0 +2 ≤ k ≤ δ +s0 . So we have a set which is independent with respect to Z and has size w = δ + s0 ≥ δ + s. Hence µ(Z) ≥ δ + s for all complete defining sets Z which contain I and are not equal to Zn . Therefore δS (I) ≥ δHT (I). Example 7.4.32 The binary Golay code of length 23 can be defined as the cyclic code with defining set {1}, see Proposition 7.3.19. In this example we show that the shift bound is strictly greater than the HT bound and is still not equal to the minimum distance. Let Ii be the cyclotomic coset of i. Then Z0 = {0}, Z1 = {−, 1, 2, 3, 4, −, 6, −, 8, 9, −, −, 12, 13, −, −, 16, −, 18, −, −, −, −}, and Z5 = {−, −, −, −, −, 5, −, 7, −, −, 10, 11, −, −, 14, 15, −, 17, −, 19, 20, 21, 22} Then δBCH (Z1 ) = δHT (Z1 ) = 5. Let (a1 , . . . , a5 ) = (1, −1, −3, 7, 4, 13) and (b0 , . . . , b5 ) = (5, 5, 5, 14, 5, 5). Then the At+1 = (At + at ) ∪ {bt } are given in the rows of the middle part of the following table at 13 4 7 −3 −1
At+1 A6 A5 A4 A3 A2 A1
it+1 5 2 12 8 1 4 5
6 3 13 9 2 5
9 11 −2 6 8 18 16 18 5 12 14 5
8 5
−3 7 3 −4 −1 0
with the at in the first column and the bt in the diagonal. The corresponding sequence (i1 , . . . , i6 ) = (−3, 7, 3, −4, −1, 0) is given in the last column of the table and (j1 , . . . , j6 ) = (5, 6, 9, 11, −2, 8) in second row. So Z1 has an independent set of size 6. In fact this is the maximal size of an independent set of Z1 . Hence µ(Z1 ) = 6. The defining sets Z0 , Z1 and Z5 and their unions are complete, and these are the only ones. Let Z0,1 = Z0 ∪ Z1 , then Z0,1 has an independent set of size 7, since A6 is independent with respect to Z1 and also with respect to Z0,1 , and −2 + A6 = {0, 1, 4, 6, 16, 3} is a subset of Z0,1 and 5 6∈ Z0,1 , so A7 = {0, 1, 4, 6, 16, 3, 5} is independent with respect to Z0,1 . Furthermore Z1,5 = Z1 ∪ Z5 contains a sequence of 22 consecutive elements, so µ(Z1,5 ) ≥ 23. Therefore δS (Z1 ) = 6. But the minimum distance of the binary Golay code is 7, since otherwise there would be a word c ∈ C(Z1 ) of weight 6, so c ∈ C(Z0,1 ), but δS (Z0,1 ) ≥ 7, which is a contradiction.
7.4. IMPROVEMENTS OF THE BCH BOUND
227
Example 7.4.33 Let n = 26, F = F27 , and F0 = F3 . Let 0, 13, 14, 16, 17, 22, 23 and 25 be the elements of I. Let U = {0, 3, 9, 12} and V = {13, 14}. Then ¯ = {0, 3, 6, 9, 12}, so U ¯  = 5 ≤ 4 + 3 − 2. Moreover I contains dV = 3 and U U + V . Hence δR (I) ≥ 4 + 3 − 1 = 6, but in fact δS (I) = 5. Example 7.4.34 ***Example of δR (I) < δS (I).*** Example 7.4.35 It is necessary to take the minimum of all µ(Z) in the definition of the shift bound. The maximal size of an independent set with respect to a complete defining set I is not a lower bound for the minimum distance of the cyclic code with I as defining set, as the following example shows. Let F be a finite field of odd characteristic. Let α be a nonzero element of F of even order n. Let I = {2, 4, . . . , n − 2} and I = {0, 2, 4, . . . , n − 2}. Then I and I are complete and µ(I) = 3, since {2, 0, 1} is independent with respect to I, but µ(I) = 2. ***Picture of interrelations of the several bounds. One way to get a bound on the weight of a codeword c = (c0 , . . . , cn−1 ) is obtained by looking for a maximal nonsingular square submatrix of the matrix of syndromes (Sij ). For cyclic codes we get in this way a matrix, with entries P Sij = ck αk(i+j) , which is constant along backdiagonals. Suppose gcd(n, q) = 1. Then there is a field extension Fqm of Fq such that F∗qm has an element α of order n. Let ai = (1, αi , . . . , αi(n−1) ). Then { ai  i ∈ Zn } is a basis of Fnqm . Consider the following generalization of the definition of a syndrome 6.2.2. Definition 7.4.36 The syndrome of a word y ∈ Fn0 with respect to ai and bj is defined by Si,j (y) = y · ai ∗ aj . Let S(y) be the syndrome matrix with entries Si,j (y). Notice that ai ∗ aj = ai+j for all i, j ∈ Zn . Hence Si,j = Si+j . Lemma 7.4.37 Let y ∈ Fn0 . Let I = { i + j  i, j ∈ Zn and y · ai ∗ bj = 0 }. If A is independent with respect to I, then wt(y) ≥ A. Proof. Suppose A is independent with respect to I and has w elements, then there exist sequences i1 , . . . , iw and j1 , . . . , jw such that A consists of the pairs (i1 , j1 ), (i1 , j2 ), . . . , (i1 , jw ) and (ik , jl ) ∈ I for all k + l ≤ w and (ik , jl ) 6∈ I for all k + l = w + 1. Consider the (w × w) matrix M with entries Mk,l = Sik ,jl (y). By the assumptions we have that M is a matrix such that Mk,l = 0 for all k + l ≤ w and Mk,l 6= 0 for all k + l = w + 1, that is to say with zeros above the backdiagonal and nonzeros on the backdiagonal, so M has rank w. Moreover M is a submatrix of the matrix S(y) which can be written as a product: S(y) = HD(y)H T , where H is the matrix with the ai as row vectors, D(y) is the diagonal matrix with the entries of y on the diagonal. Now the rank of H is n, since the a0 , . . . , an−1 is a basis of Fnqm . Hence A = w = rank(M ) ≤ rank(S(y)) ≤ rank(D(y)) = wt(y).
228
CHAPTER 7. CYCLIC CODES
Remark 7.4.38 Let Ci be a code with Zi as defining set for i = 1, 2. If Z1 ⊆ Z2 , then C2 ⊆ C1 . Lemma 7.4.39 Let I be a complete defining set for the cyclic code C. If y ∈ C and y 6∈ D for all cyclic codes D with complete defining sets Z which contain I and are not equal to I, then wt(y) ≥ µ(I). Proof. Define Z = {i + j i, j ∈ Zn , y · ai ∗ bj = 0}. ***Then Z is a complete defining set. *** Clearly I ⊆ Z, since y ∈ C and I is a defining set of C. Let D be the code with defining set Z. Then y ∈ D. If I 6= Z, then y 6∈ D by the assumption, which is a contradiction. Hence I = Z, and wt(y) ≥ µ(I), by Lemma 7.4.37. Proof. Let y be a nonzero codeword of C. Let Z be equal to {i + j i, j ∈ Zn , y · ai ∗ bj = 0}. Then Z 6= Zn , since y is not zero and the ai ’s generate Fnqm . The theorem now follows from Lemma 7.4.39 and the definition of the shift bound. Remark 7.4.40 The computation of the shift bound is quite involved, and is only feasible the use of a computer. It makes sense if one classifies codes with respect to the minimum distance, since in order to get δS (I) one gets at the same time the δS (J) for all I ⊆ J.
7.4.5
Exercises
7.4.1 Consider the binary cyclic code of length 15 and defining set {3, 5}. Compute the complete defining set I of this code. Show that δBCH (I) = 3 and δHT (I) = 4 is the true minimum distance. 7.4.2 Consider the binary cyclic code of length 35 and defining set {1, 5, 7}. Compute the complete defining set I of this code. Show that δBCH (I) = δHT (I) = 6 and δR (I) ≥ 7. 7.4.3 Let m be odd and n = 2m − 1. Melas’s code is the binary cyclic code of length n and defining set {1, −1}. Show that this code is reversible, has dimension k = n − 2m and that the minimum distance is at least five. 7.4.4 Let −1 be a power of q modulo n. Then every cyclic code over Fq of length n is reversible. 7.4.5 Let n = 22m + 1 with m > 1. Zetterberg’s code is the binary cyclic code of length n and defining set {1}. Show that this code is reversible, has dimension k = n − 4m and that the minimum distance is at least five. 7.4.6 Consider the ternary cyclic code of length 11 and defining set {1}. Compute the complete defining set I of this code. Show that δBCH (I) = δHT (I) = δS (I) = 4. Let I 0 = {0} ∪ I. Show that δBCH (I 0 ) = δHT (I) = 4 and δS (I) ≥ 5. 7.4.7 Let q be a power of a prime and n a positive integer such that gcd(n, q) = 1. Write a computer program that computes the complete defining set Z modulo n with respect to q and the bounds δBCH (Z), δHT (Z), δR (Z) and δS (Z) for given a defining set I in Zn .
7.5. LOCATOR POLYNOMIALS AND DECODING CYCLIC CODES
7.5
229
Locator polynomials and decoding cyclic codes
***
7.5.1
MattsonSolomon polynomial
Definition 7.5.1 Let α ∈ F∗qm be a primitive nth root of unity. The MattsonSolomon (MS) polynomial A(Z) of a(x) = a0 + a1 x + · · · + an−1 xn−1 is defined by A(Z) =
n X
Ai Z n−i , where Ai = a(αi ) ∈ Fqm .
i=1
Here too we adopt the convention that the index i is computed modulo n. The MS polynomial A(Z) is the discrete Fourier transform of a(x). In order to compute the inverse discrete Fourier transform, that is the coefficients of a(X) in terms of the A(Z) we need the following lemma on the sum of a geometric sequence. Lemma 7.5.2 Let β ∈ Fqm be a zero of X n − 1. Then n X
βi =
i=1
n if β = 1 0 if β 6= 1.
Pn Proof. If β = 1, then Pi=1 β i = n. If β 6= 1, then using the formula for the n sum of a geometric series i=1 β i = (β n+1 − β)/(β − 1) and β n+1 = β gives the desired result. Proposition 7.5.3 1) The inverse transform is given by ai = n1 A(αi ). 2) A(Z) is the MS polynomial of a word a(x) coming from Fnq if and only if Ajq = Aqj for all j = 1, . . . , n. 3) A(Z) is the MS polynomial of a codeword a(x) of the cyclic code C if and only if Aj = 0 for all j ∈ Z(C) and Ajq = Aqj for all j = 1, . . . , n. Proof. 1) Expanding A(αi ) and using the definitions gives A(αi ) =
n X
Aj αi(n−j) =
j=1
n X
a(αj )αi(n−j) =
j=1
n n−1 X X
ak αjk αi(n−j) .
j=1 k=0
Using αn = 1, interchanging the order of summation and using Lemma 7.5.2 with β = αk−i gives n−1 n X X ak α(k−i)j = nai . k=0
j=1
230
CHAPTER 7. CYCLIC CODES
2) If A(Z) is the MS polynomial of a(x), then using Proposition 7.2.40 gives Aqj = a(αj )q = a(αqj ) = Aqj , since the coefficients of a(x) are in Fq . Conversely, suppose that Ajq = Aqj for all j = 1, . . . , n. Then using (1) gives Pn Pn aqi = ( n1 A(αi ))q = n1 j=1 Aqj αqi(n−j) = n1 j=1 Aqj αqi(n−j) . Using the fact that multiplication with q is a permutation of Zn gives that the above sum is equal to Pn 1 i(n−j) = ai . j=1 Aj α n Hence aqi = ai and ai ∈ Fq for all i. Therefore a(x) is coming from Fnq . 3) Aj = 0 if and only if a(αj ) = 0 by (1). Together with (2) and the definition of Z(C) this gives the desired result. Another proof of the BCH bound can be obtained with the MattsonSolomon polynomial. Proposition 7.5.4 Let C be a narrow sense BCH code with defining minimum distance δ. If A(Z) is the MS polynomial of a(x) a nonzero codeword of C, then the degree of A(Z) is at most n − δ and the weight of a(x) is at least δ. Proof. Let a(x) be a nonzero codeword of C. Let A(Z) be the MS polynomial of a(x), then Ai = a(αi ) = 0 for all i = 1, . . . , δ − 1. So the degree of A(Z) is at most n − δ. We have that ai = A(αi )/n by (1) of Proposition 7.5.3. The number of zero coefficients of a(x) is the number zeros of A(Z) in Fqm , which is at most n − δ. Hence the weight of a(x) is at least δ. Example 7.5.5 Let a(x) = 6 + x + 3x2 + x3 a codeword of the cyclic code of length 6 over F7 of Example 7.1.24. Choose α = 3 as primitive element. Then A(Z) = 4 + Z + 3Z 2 is the MS polynomial of a(x).
7.5.2
Newton identities
Definition 7.5.6 Let a(x) be a word of weight w. Then there are indices 0 ≤ i1 < · · · < iw < n such that a(x) = ai1 xi1 + · · · + aiw xiw with aij 6= 0 for all j. Let xj = αij and yj = aij . Then the xj are called the locators and the yj the corresponding values. Furthermore i
Ai = a(α ) =
w X
yj xij .
j=1
Consider the product σ(Z) =
w Y
(1 − xj Z).
j=1
Then σ(Z) has as zeros the reciprocals of the locators, and is sometimes called the locator polynomial. Sometimes this name is reserved for the monic polynomial that has the locators as zeros.
7.5. LOCATOR POLYNOMIALS AND DECODING CYCLIC CODES
231
Pw Proposition 7.5.7 Let σ(Z) = i=0 σi Z i be the locator polynomial of the locators x1 , . . . , xw . Then σi is the ith elementary symmetric function in these locators: X σt = (−1)t xj1 xj2 · · · xjt . 1≤j1
Proof. This is proved by induction on w and is left to the reader as an exercise. The following property of the MS polynomial is called the generalized Newton identity and gives the reason for these definitions. Proposition 7.5.8 For all i it holds that Ai+w + σ1 Ai+w−1 + · · · + σw Ai = 0. Proof. Substitute Z = 1/xj in the equation 1 + σ1 Z + · · · + σw Z w =
w Y
(1 − xj Z)
j=1
and multiply by yj xji+w . This gives yj xi+w + σ1 yj xi+w−1 + · · · + σw yj xij = 0. j j Summing on j = 1, . . . , w yields the desired result of Proposition 7.5.8.
Example 7.5.9 Let C be the cyclic code of length 5 over F16 with defining set {1, 2}. Then this defining set is complete. The polynomial X4 + X3 + X2 + X + 1 is irreducible over F2 . Let β be a zero of this polynomial in F16 . Then the order of β is 5. The generator polynomial of C is (X + β)(X + β 2 ) = X 2 + (β + β 2 )X + β 3 . So (β 3 , β + β 2 , 1, 0, 0) ∈ C and (β + β 2 + β 3 , 1 + β, 0, 1, 0) = (β + β 2 )(β 3 , β + β 2 , 1, 0, 0) + (0, β 3 , β + β 2 , 1, 0) is an element of C. These codewords together with their cyclic shifts and their nonzero scalar multiples give (5 + 5) ∗ 15 = 150 words of weight 3. In fact these are the only codewords of weight 3, since it is an [5, 3, 3] MDS code and A3 = 53 (16 − 1) by Remark 3.2.15. Propositions 7.5.3 and 7.5.8 give another way to proof this. Consider the set of equations: A4 + σ1 A3 + σ2 A2 + σ3 A1 = 0 A5 + σ1 A4 + σ2 A3 + σ3 A2 = 0 A1 + σ1 A5 + σ2 A4 + σ3 A3 = 0 A2 + σ1 A1 + σ2 A5 + σ3 A4 = 0 A3 + σ1 A2 + σ2 A1 + σ3 A5 = 0
232
CHAPTER 7. CYCLIC CODES
If A1 , A2 , A3 , A4 and A5 are the coefficients of the MS polynomial of a codeword, then A1 = A2 = 0. If A3 = 0, then Ai = 0 for all i. So we may assume that A3 6= 0. The above equations imply A4 = σ1 A3 , A5 = (σ12 + σ2 )A3 and 3 = 0 σ1 + σ3 σ12 σ2 + σ22 + σ1 σ3 = 0 2 = 0. σ1 σ3 + σ2 σ3 + 1 Substitution of σ3 = σ13 in the remaining equations yields 4 σ1 + σ12 σ2 + σ22 = 0 σ15 + σ13 σ2 + 1 = 0. Multiplying the first equation with σ1 and adding to the second one gives 1 + σ1 σ22 = 0. Thus σ1 = σ2−2 and
σ210 + σ25 + 1 = 0.
This last equation has 10 solutions in F16 , and we are free to choose A3 from F∗16 . This gives in total 150 solutions.
7.5.3
APGZ algorithm
Let C be a cyclic code of length n such that the minimum distance of C is at least δ by the BCH bound. In this section we will give a decoding algorithm for such a code which has an efficient implementation and is used in practice. This algorithm corrects errors of weight at most (δ −1)/2, whereas the true minimum distance can be larger than δ. The notion of a syndrome was already given in the context of arbitrary codes in Definition 6.2.2. Let α be a primitive nth root of unity. Let C be a cyclic code of length n with 1, . . . , δ − 1 in its complete defining set. Let hi = (1, αi , . . . , αi(n−1) ). Consider C as the subfield subcode of the code with ˜ with rows hi for i ∈ Z(C) as in Remark 7.3.2. Let parity check matrix H c = (c0 , . . . , cn−1 ) ∈ C be the transmitted word, so c(x) = c0 + · · · + cn−1 xn−1 . Let r be the received word with w errors and w ≤ (δ−1)/2. So r(x) = c(x)+e(x) and wt(e(x)) = w. The syndrome Si of r(x) with respect to the row hi is equal to Si = r(αi ) = e(αi ) for i ∈ Z(C), ˜ T . Hence since c(αi ) = 0 for all i ∈ Z(C). The syndrome of r is s = rH si = Si for all i ∈ Z(C) and these are also called the known syndromes, since the receiver knows Si for all i ∈ Z(C). The unknown syndromes are defined by Si = e(αi ) for i 6∈ Z(C). Let A(Z) be the MS polynomial of e(x). Then Si = r(αi ) = e(αi ) = Ai
for i ∈ Z(C).
The receiver knows all S1 , S2 , . . . , S2w , since {1, 2, . . . , δ − 1} ⊆ Z(C) and 2w ≤ δ − 1.
7.5. LOCATOR POLYNOMIALS AND DECODING CYCLIC CODES
233
Let σ(Z) be the errorlocator polynomial, that is the locator polynomial σ(Z) =
w Y
(1 − xj Z)
j=1
of the error positions {x1 , . . . , xw } = { αi  ei 6= 0 }. Let σi be the ith coefficient of σ(Z) and form the following set of generalized Newton identities of Proposition 7.5.8 with Si = Ai + · · · + σw S1 = 0 Sw+1 + σ1 Sw Sw+2 + σ1 Sw+1 + · · · + σw S2 = 0 (7.1) .. .. .. . . . S2w + σ1 S2w−1 + · · · + σw Sw = 0. The algorithm of ArimotoPetersonGorensteinZierler (APGZ) solves this system of linear equations in the variables σj by Gaussian elimination. The fact that this system has a unique solution is guaranteed by the following. Proposition 7.5.10 The matrix (Si+j−1 1 ≤ i, j, ≤ v) is nonsingular if and only if v = w the number of errors. Proof. ***(Si+j−1 ) = HD(e)H T as in the proof of Lemma 7.4.37***
After the system of linear equations is solved, we know the errorlocator polynomial σ(Z) = 1 + σ1 Z + σ2 Z 2 + · · · + σw Z w which has as its zeros the reciprocals of the error locations. Finding the zeros of this polynomial is done by inspecting all values of Fqm . Example 7.5.11 Let C be the binary narrow sense BCH code of length 15 and designed minimum distance 5 with generator polynomial 1+X 4 +X 6 +X 7 +X 8 as in Example 7.3.13. Let r = (0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0) be a received word with respect to the code C with 2 errors. Then r(x) = x + x3 + x4 + x7 + x13 and S1 = r(α) = α12 and S3 = r(α3 ) = α7 . Now S2 = S12 = α9 and S4 = S14 = α3 . The system of equations becomes: 7 α + α9 σ1 + α12 σ2 = 0 α 3 + α 7 σ1 + α 9 σ2 = 0 Which has the unique solution σ1 = α12 and σ2 = α13 . So the errorlocator polynomial is 1 + α12 Z + α13 Z 2 which has α−3 and α−10 as zeros. Hence e = (0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0) is the error and c = (0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0) is the codeword sent.
234
7.5.4
CHAPTER 7. CYCLIC CODES
Closed formulas
Consider the system of equations (7.1) as linear in the unknowns σ1 , . . . , σw with coefficients in S1 , . . . , S2w . Then σi =
∆i ∆0
where ∆i is the determinant of a certain w × w matrix according to Cramer’s rule. Then the ∆i are polynomials in the Si . Conclude that 1 Z . . . Zw Sw+1 Sw . . . S1 w det . . .. = ∆0 + ∆1 Z + · · · + ∆w Z .. .. ··· . S2w
S2w−1
...
Sw
is a closed formula of the generic errorlocator polynomial. Notice that the constant coefficient of the generic errorlocator polynomial is not 1 Example 7.5.12 Consider the narrowsense BCH code with defining minimum distance 5. Then {1, 2, 3, 4} is the defining set, so the syndromes S1 , S2 , S3 and S4 of a received word are known. We have to solve the system of equations S2 σ1 + S1 σ2 = −S3 S3 σ1 + S2 σ2 = −S4 Now Cramer’s rule gives that −S3 S1 −S4 S2 S1 S4 − S2 S3 σ1 = = S22 − S1 S3 S2 S1 S3 S2 and similarly σ2 =
S32 − S2 S4 S22 − S1 S3
The generic errorlocator polynomial is 1 Z Z2 det S3 S2 S1 = (S22 − S1 S3 ) + (S1 S4 − S2 S3 )Z + (S32 − S2 S4 )Z 2 . S4 S3 S2 In the binary case we have that S2 = S12 and S4 = S14 . So = S14 + S1 S3 = S1 (S13 + S3 ) S22 + S1 S3 S1 S4 + S2 S3 = S15 + S12 S3 = S12 (S13 + S3 ) = (S13 + S3 )2 S32 + S2 S4 = S32 + S16 Hence the generic errorlocator polynomial becomes after division by S13 + S3 S1 + S12 Z + (S13 + S3 )Z 2 .
7.5. LOCATOR POLYNOMIALS AND DECODING CYCLIC CODES
235
Example 7.5.13 Let C be the narrow sense BCH code over F16 of length 15 and designed minimum distance 5 as in Example 7.3.14. Let r = (α5 , α8 , α11 , α10 , α10 , α7 , α12 , α11 , 1, α, α12 , α14 , α12 , α2 , 0) be a received word with respect to the code C with 2 errors. Then S1 = α12 , S2 = α7 , S3 = 0 and S4 = α2 . The formulas S2 = S12 and S4 = S14 of Example 7.5.11 are no longer valid, since this code is defined over F16 instead of F2 . By the formulas in Example 7.5.12 the errorlocator polynomial is 1 + Z + α10 Z 2 which has α−2 and α−8 as zeros. In this case the error positions are known, but the error values need some extra computation, since the values are not binary. This could be done by considering the error positions as erasures along the lines of Section 6.2.2. The next section gives an alternative with Forney’s formula.
7.5.5
Key equation and Forney’s formula
Consider the narrow sense BCH code C with designed minimum distance δ. So the defining set is {1, . . . , δ − 1}. Let c(x) ∈ C be the transmitted codeword. Let r(x) = c(x) + e(x) be the received word with error e(x). Suppose that the number of errors w = wt(e(x)) is at most (δ − 1)/2. The support of e(x) will be denoted by I, that is ei 6= 0 if and only if i ∈ I. So the errorlocator polynomial is Y σ(Z) = (1 − αi Z) i∈I
with coefficients σ0 = 1, σ1 , . . . , σw . Definition 7.5.14 The syndromes are Sj = r(αj ) for 1 ≤ j ≤ δ − 1. The syndrome polynomial S(Z) is defined by S(Z) =
δ−1 X
Sj Z j−1 ,
j=1
Remark 7.5.15 The syndrome Sj is equal to e(αj ), since c(αj ) = 0, for all j = 1, . . . , δ − 1. Furthermore 2w ≤ δ − 1. The Newton identities Sk + σ1 Sk−1 + · · · + σw Sk−w = 0
for k = w + 1, . . . , 2w
imply that the (k − 1)st coefficient of σ(Z)S(Z) is zero for all k = w + 1, . . . , 2w, since X X σ(Z)S(Z) = σi Sj Z k−1 k
i+j=k
Hence there exist polynomials q(Z) and r(Z) such that σ(Z)S(Z) = r(Z) + q(Z)Z 2w , deg(r(Z)) < w. In the following we will identify the remainder r(Z).
236
CHAPTER 7. CYCLIC CODES
Definition 7.5.16 The errorevaluator polynomial ω(Z) is defined by ω(Z) =
X
Y
ei α i
i∈I
(1 − αj Z).
i6=j∈I
Proposition 7.5.17 Let σ 0 (Z) be the formal derivative of σ(Z). Then the error values are given by Forney’s formula: el = −
ω(α−l ) σ 0 (α−l )
for all error positions αl . Proof. Differentiating σ(Z) =
Y (1 − αi Z) i∈I
gives σ 0 (Z) =
X
−αi
i∈I
Y
(1 − αj Z).
i6=j∈I
Hence Y
σ 0 (α−l ) = −αl
(1 − αj−l )
l6=j∈I
which is not zero. Substitution of α−l in ω(Z) gives ω(α−l ) = −el σ 0 (α−l ).
Remark 7.5.18 The polynomial σ(Z) has simple zeros. Hence β is not a zero of σ 0 (Z) if β is a zero of σ(Z), by Lemma 7.2.8. So the denominator in Proposition 7.5.17 is not zero. This proposition implies that β is not zero of ω(Z) if β is a zero of σ(Z). Hence the greatest common divisor of σ(Z) and ω(Z) is one. Proposition 7.5.19 The errorlocator polynomial σ(Z) and the errorevaluator polynomial ω(Z) satisfy the Key equation: σ(Z)S(Z) ≡ ω(Z)(mod Z δ−1 ).
(7.2)
Moreover if (σ1 (Z), ω1 (Z)) is another pair of polynomials that satisfy the Key equation and such that deg ω1 (Z) < deg σ1 (Z) ≤ (δ − 1)/2, then there exists a polynomial λ(Z) such that σ1 (Z) = λ(Z)σ(Z) and ω1 (Z) = λ(Z)ω(Z). Proof. We have that Sj = r(αj ) = e(αj ) for all j = 1, 2, . . . , δ − 1. Using the definitions, interchanging summations and the sum formula for a geometric series we get δ−1 δ−1 X X X S(Z) = e(αj )Z j−1 = ei αij Z j−1 j=1
=
X i∈I
ei α i
δ−1 X j=1
(αi Z)j−1 =
j=1 i∈I
X i∈I
ei α i
1 − (αi Z)δ−1 . 1 − αi Z
7.5. LOCATOR POLYNOMIALS AND DECODING CYCLIC CODES
237
Hence σ(Z)S(Z) =
Y
(1 − αj Z)S(Z) =
j∈I
X
ei αi 1 − (αi Z)δ−1
Y
i∈I
(1 − αj Z).
i6=j∈I
Therefore σ(Z)S(Z) ≡
X i∈I
ei αi
Y
(1 − αj Z) ≡ ω(Z) (mod Z δ−1 ).
i6=j∈I
Suppose that we have another pair (σ1 (Z), ω1 (Z)) such that σ1 (Z)S(Z) ≡ ω1 (Z)(mod Z δ−1 ) and deg ω1 (Z) < deg σ1 (Z) ≤ (δ − 1)/2. Then σ(Z)ω1 (Z) ≡ σ1 (Z)ω(Z) (mod Z δ−1 ) and the degrees of σ(Z)ω1 (Z) and σ1 (Z)ω(Z) are strictly smaller than δ − 1. Hence σ(Z)ω1 (Z) = σ1 (Z)ω(Z). The greatest common divisor of σ(Z) and ω(Z) is one by Remark 7.5.18. Therefore there exists a polynomial λ(Z) such that σ1 (Z) = λ(Z)σ(Z) and ω1 (Z) = λ(Z)ω(Z). Remark 7.5.20 In Remark 7.5.15 it is shown that the Newton identities give the Key equation σ(Z)S(Z) ≡ r(Z)(mod Z δ−1 ). In Proposition 7.2 a new proof of the Key equation is given where the remainder r(Z) is identified as the errorevaluator polynomial ω(Z). Conversely the Newton identities can be derived form this second proof. Example 7.5.21 Let C be the narrow sense BCH code of length 15 over F16 of designed minimum distance 5 and let r be the received word as in Example 7.5.13. The errorlocator polynomial is σ(Z) = 1 + Z + α10 Z 2 which has α−2 and α−8 as zeros. The syndrome polynomial is S(Z) = α12 +α7 Z +α2 Z 3 . Then σ(Z)S(Z) = α12 + α2 Z + α2 Z 4 + α12 Z 5 . Proposition 7.5.19 implies ω(Z) ≡ σ(Z)S(Z) ≡ α12 + α2 Z(
mod Z 4 ).
Hence ω(Z) = α12 + α2 Z, since deg(ω(Z)) < deg(σ(Z)) = 2. Furthermore σ 0 (Z) = 1. The error values are therefore e2 = ω(α−2 ) = α11 and e8 = ω(α−8 ) = α8 by Proposition 7.5.17. Remark 7.5.22 Consider the BCH code C with {b, b + 1 . . . , b + δ − 2} as defining set. The syndromes are Sj = e(αj ) for b ≤ j ≤ b + δ − 2. Adapt the above definitions as follows. The syndrome polynomial S(Z) is defined by S(Z) =
b+δ−2 X j=b
Sj Z j−b ,
238
CHAPTER 7. CYCLIC CODES
The errorevaluator polynomial ω(Z) is defined by Y X ω(Z) = ei αib (1 − αj Z). i∈I
i6=j∈I
Show that the errorlocator polynomial σ(Z) and the errorevaluator polynomial ω(Z) satisfy the Key equation: σ(Z)S(Z) ≡ ω(Z)(mod Z δ−1 ). Show that the error values are given by Forney’s formula: ei = −
ω(α−i ) αi(b−1) σ 0 (α−i )
for all error positions i.
7.5.6
Exercises
7.5.1 Consider A(Z) = 2+6Z +2Z 2 +5Z 3 in F7 [Z]. Show that A(Z) is the MS polynomial of codeword a(x) of a cyclic code of length 6 over F7 with primitive element α = 3. Compute the zeros and coefficients of a(x). 7.5.2 Give a proof of Proposition 7.5.7. 7.5.3 In case w = 2 we have that σ1 = −(x1 + x2 ), σ2 = x1 x2 and Ai = y1 xi1 + y2 xi2 . Substitute these formulas in the Newton identities in order to check their validity. 7.5.4 Let C be the binary narrow sense BCH code of length 15 and designed minimum distance 5 as in Example 7.5.11. Let r = (1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1) be a received word with respect to the code C with 2 errors. Find the codeword sent. 7.5.5 Consider the narrowsense BCH code with defining minimum distance 7. Then the syndromes S1 , S2 , . . . , S6 of a received word are known. Compute the coefficients of the generic errorlocator polynomial. Show that in the binary case the generic errorlocator polynomial becomes (S3 + S13 ) + (S1 S3 + S14 )Z + (S5 + S12 S3 )Z 2 + (S32 + S1 S5 + S13 S3 + S16 )Z 3 , by using S2 = S12 , S4 = S14 and S6 = S32 and after division by the common factor S32 + S1 S5 + S13 S3 + S16 . 7.5.6 Let C be the narrow sense BCH code of length 15 over F16 of designed minimum distance 5 as in Examples 7.5.13 and 7.5.21. Let r = (α8 , α7 , α1 , α11 , α3 , α5 , α10 , α11 , α10 , α7 , α4 , α10 , 0, 1, α5 ) be a received word with respect to the code C with 2 errors. Find the error positions. Determine the error values by Forney’s formula. 7.5.7 Show the validity of the Key equation and Forney’s formula as claimed in Remark 7.5.22.
7.6. NOTES
7.6
239
Notes
6.4.2: iterated HT, iterated Roos bound. 6.4.3: symmetric Roos bound. In many cases of binary codes of length at most 62 the shift bound is equal to the minimum distance, see [?]. For about 95% of all ternary codes of length at most 40 the shift bound is equal to the minimum distance, see [?]. In a discussion with B.Z. Shen we came to the following generalization of independent sets and the shift bound, see also Shen and Tzeng [?] and Augot, Charpin and Sendrier [?] on generalized Newton identities. Lemma 7.4.37 is a generalization of a theorem of van Lint and Wilson [?, Theorem 11]. Generalization of shift bound for linear codes. Linear complexity and the pseudo rank bound. Shift bound for gen. Hamming weights. Conjecture of (non) existence of asymptotically good cyclic codes Assmus, Turyn 1966. ***Blahut’s theorem, Massey in Festschrift on DFT and PS polynomial*** Fundamental iterative algorithm.
240
CHAPTER 7. CYCLIC CODES
Chapter 8
Polynomial codes Ruud Pellikaan ****
8.1
RS codes and their generalizations
ReedSolomon codes will be introduced as special cyclic codes. We will show that these codes are MDS and can be obtained by evaluating certain polynomials. This gives rise to a generalization of these codes. Fractional transformations are defined and related to the automorphism group of generalized ReedSolomon code.
8.1.1
ReedSolomon codes
Consider the following definition of ReedSolomon codes over the finite field Fq . Definition 8.1.1 Let α be a primitive element of Fq . Let n = q − 1. Let b and k be nonnegative integers such that 0 ≤ b, k ≤ n. Define the generator polynomial gb,k (X) by gb,k (X) = (X − αb ) · · · (X − αb+n−k−1 ). The ReedSolomon(RS) code RSk (n, b) is by definition the qary cyclic code with generator polynomial gb,k (X). In the literature the code is also denoted by RSb (n, k). Proposition 8.1.2 The code RSk (n, b) has length n = q − 1, is cyclic, linear and MDS of dimension k. The dual of RSk (n, b) is equal to RSn−k (n, n − b + 1). Proof. The code RSk (n, b) is of length q − 1, cyclic and linear by definition. The degree of the generator polynomial is n − k, so the dimension of the code is k by Proposition 7.1.21. The complete defining set is {b, b + 1, . . . , b + n − k − 1} 241
242
CHAPTER 8. POLYNOMIAL CODES
and has n − k consecutive elements. Hence the minimum distance d is at least n − k + 1 by the BCH bound of Proposition 7.3.3. The generator polynomial gb,k (X) has degree n − k, so gb,k (x) is a codeword of weight at most n − k + 1. Hence d is at most n − k + 1. Also the Singleton bound gives that d is at most n − k + 1. Hence d = n − k + 1 and the code is MDS. Another proof that the parameters are [n, k, n − k + 1] will be given in Proposition 8.1.14. The complete defining set of RSk (n, b) is the subset U consisting of n − k consecutive elements: U = {b, b + 1, . . . , b + n − k − 1}. Hence Zn \ {−ii ∈ U } is the complete defining set of the dual of RSk (n, b) by Proposition 7.2.58. But Zn \ {−ii ∈ U } = Zn \ {n − (b + n − k − 1), . . . , n − (b + 1), n − b} = {n − b + 1, n − b + 2, . . . , n − b + k} is the complete defining set of RSn−k (n, n − b + 1).
Another description of RS codes will be given by evaluating polynomials. Definition 8.1.3 Let f (X) ∈ Fq [X]. Let ev(f (X)) be the evaluation of f (X) defined by ev(f (X)) = (f (1), f (α), . . . , f (αn−1 )). Proposition 8.1.4 We have that RSk (n, b) = { ev(X n−b+1 f (X))  f (X) ∈ Fq [X], deg(f ) < k }. Proof. The dual of RSk (n, b) is RSn−k (n, n−b+1) by Proposition 8.1.2, which has {n − b + 1, . . . , n − b + k} as complete defining set. So RSn−k (n, n − b + 1) has H = (αij n − b + 1 ≤ i ≤ n − b + k, 0 ≤ j ≤ n − 1) as parity check matrix, by Remark 7.3.2 and the proof of Proposition 7.3.3. That means that H is a generator matrix of RSb (n, k). The rows of H are ev(X i ) for n − b + 1 ≤ i ≤ n − b + k. So they generate the space { ev(X n−b+1 f (X))  deg(f ) < k }. Example 8.1.5 Consider RS3 (7, 1). It is a cyclic code over F8 with generator polynomial g1,3 (X) = (X − α)(X − α2 )(X − α3 )(X − α4 ) where α is a primitive element of F8 satisfying α3 = α + 1. Then g1,3 (X) = α3 + αX + X 2 + α3 X 3 + X 4 In the second description we have that RS3 (7, 1) = { ev(f (X))  f (X) ∈ Fq [X], deg(f ) < 3 } The matrix in Exercise 7.1.5 is obtained by evaluating the monomials 1, X and X 2 at αj for j = 0, 1, . . . , 6. It is a generating matrix of RS3 (7, 1).
8.1. RS CODES AND THEIR GENERALIZATIONS
8.1.2
243
Extended and generalized RS codes
Definition 8.1.6 The extended RS code ERSk (n, b) is the extension of the code RSk (n, b). The code ERSk (n, 1) has also a description by means of evaluations. Proposition 8.1.7 We have that ERSk (n, 1) = { (f (1), f (α), . . . , f (αn−1 ), f (0))  f (X) ∈ Fq [X], deg(f ) < k }. Proof. If C is a code of length n, then by Definition 3.1.6 the extended code C e is given by Pn−1 C e = { (c, − i=0 ci )  c ∈ C }. So we have to show that f (0) + f (1) + f (α) + · · · + f (αn−1 ) = 0 for all polynomials f (X) ∈ Fq [X] of degree at most k − 1. By linearity it is enough to show that this is the case for all monomials of degree at most k − 1. Let f (X) be the monomial X i with 0 ≤ i < n. Then n−1 X j=0
f (αj ) =
n−1 X j=0
αij =
n if i = 0, 0 if 0 < i < n,
by Lemma 7.5.2. Now n = q − 1 = −1 in Fq . So in both cases we have that this sum is equal to −f (0). Definition 8.1.8 Let F be a field. Let f (X) = f0 + f1 X + · · · + fk X k be an element of F[X] and a ∈ F. Then the evaluation of f (X) at a is given by f (a) = f0 + f1 a + · · · + fk ak . Let Lk = { f (X) ∈ Fq [X]  deg f (X) ≤ k }. The evaluation map evk,a : Lk −→ F is given by evk,a (f (X)) = f (a). Furthermore the evaluation at infinity is defined by evk,∞ (f (X)) = fk . Remark 8.1.9 The evaluation map is linear. Furthermore evk,∞ (f (X)) = 0 if and only if f (X) has degree at most k − 1 for all f (X) ∈ Lk . The map evk,a does not depend on k if a ∈ F. The notation f (∞) will be used instead of evk,∞ (f (X)), but notice that this depends on k and the implicit assumption that f (X) has degree at most k. Definition 8.1.10 Let n be an arbitrary integer such that 1 ≤ n ≤ q. Let a be an ntuple of mutually distinct elements of Fq ∪ {∞}. Let b be an ntuple of nonzero elements of Fq . Let k be an arbitrary integer such that 0 ≤ k ≤ n. The generalized RS code GRSk (a, b) is defined by GRSk (a, b) = { (f (a1 )b1 , f (a2 )b2 , . . . , f (an )bn )  f (X) ∈ Fq [X], deg(f ) < k }.
244
CHAPTER 8. POLYNOMIAL CODES
The following two examples show that the generalized RS codes are indeed generalizations of both RS codes as extended RS codes. Example 8.1.11 Let α be a primitive element of F∗q . Let n = q − 1. Define aj = αj−1 and bj = an−b+1 for j = 1, . . . , n. Then RSk (n, b) = GRSk (a, b). j Example 8.1.12 Let α be a primitive element of F∗q . Let n = q. Let a1 = 0 and b1 = 1. Define aj = αj−2 and bj = ajn−b+1 for j = 2, . . . , n. Then ERSk (n, 1) = GRSk (a, b). Example 8.1.13 The BCH code over Fq with defining set {b, b+1, . . . , b+δ−2} and length n, can be considered as a subfield subcode over Fq of a generalized RS code over Fqm where m is such that n divides q m − 1. Proposition 8.1.14 Let 0 ≤ k ≤ n ≤ q. Then GRSk (a, b) is an Fq linear MDS code with parameters [n, k, n − k + 1]. Proof. Notice that a linear code C stays linear under the linear map c 7→ (b1 c1 , . . . , bn cn ), and that the parameters remain the same if the bi are all nonzero. Hence we may assume without loss of generality that b is the all ones vector. Consider the evaluation map evk−1,a : Lk−1 → Fnq defined by evk−1,a (f (X)) = (f (a1 ), f (a2 ), . . . , f (an )). This map is linear and Lk−1 is a linear space of dimension k. Furthermore GRSk (a, b) is the image of Lk−1 under evk−1,a . Suppose that aj ∈ Fq for all j. Let f (X) ∈ Lk−1 and evk−1,a (f (X)) = 0. Then f (X) is of degree at most k − 1 and has n zeros. But k − 1 < n by assumption. So f (X) is the zero polynomial. Hence the restriction of the map evk−1,a to Lk−1 is injective, and GRSk (a, b) has the same dimension k as Lk−1 . Let c be a nonzero codeword of GRSk (a, b) of weight d. Then there exists a nonzero polynomial f (X) of degree at most k − 1 such that evk−1,a (f (X)) = c. The zeros of f (X) among the a1 , . . . , an correspond to the zero coordinates of c. So the number of zeros of f (X) among the a1 , . . . , an is equal to the number of zero coordinates of c, which is n − d. Hence n − d ≤ deg f (X) ≤ k − 1, that is d ≥ n − k + 1. Qk−1 The evaluation of the polynomial f (X) = i=1 (X − ai ) gives an explicit codeword of weight n − k + 1. Also the Singleton bound gives that d ≤ n − k + 1. Therefore the minimum distance of the generalized RS code is equal to n − k + 1 and the code is MDS. In case aj = ∞ for some j, then ai ∈ Fq for all i 6= j. Now f (aj ) = 0 implies that the degree of f (X) is a most k − 2. So the above proof applies for the remaining n − 1 elements and polynomials of degree at most k − 2. Remark 8.1.15 The monomials 1, X, . . . , X k−1 form a basis of Lk−1 . Suppose that aj ∈ Fq for all j. Then evaluating these monomials gives a generator matrix with entries ai−1 j bj of the code GRSk (a, b). If b is the all ones vectors, then the matrix Gk (a) of Proposition 3.2.10 is a generator matrix of GRSk (a, b). If aj = ∞, then evk−1,aj (bj X i−1 ) = 0 for all i < k − 1 and evk−1,aj (bj X k−1 ) = bj . Hence (0, . . . , 0, bj )T is the corresponding column vector of the generator matrix.
8.1. RS CODES AND THEIR GENERALIZATIONS
245
Remark 8.1.16 A generalized RS code is MDS by Proposition 8.1.14. So any k positions can be used to encode systematically. That means that there is a generator matrix G of the form (Ik P ), where Ik is the k × k identity matrix and P a k × (n − k) matrix. The next proposition gives an explicit description of P . Proposition 8.1.17 Let b be an ntuple of nonzero elements of Fq . Let a be an ntuple of mutually distinct elements of Fq ∪ {∞}. Define [ai , aj ] = ai − aj , [∞, aj ] = 1 and [ai , ∞] = −1 for ai , aj ∈ Fq . Then GRSk (a, b) has a generator matrix of the form (Ik P ), where pij =
bj+k
Qk
bi
Qk
t=1,t6=i [aj+k , at ] t=1,t6=i [ai , at ]
for 1 ≤ i ≤ k and 1 ≤ j ≤ n − k. Proof. Assume first that b is the all ones vector. Let gi be the ith row of this generator matrix. Then this corresponds to a polynomial gi (X) of degree at most k − 1 such that gi (ai ) = 1 and gi (at ) = 0 for all 1 ≤ t ≤ k and t 6= i. By the Lagrange Interpolation Theorem ?? there is a unique polynomial with these properties and it is given by Qk
t=1,t6=i (X
gi (X) = Qk
− at )
t=1,t6=i [ai , at ]
.
Notice that if ai = ∞, then gi (X) satisfies also the required conditions, since [ai , at ] = [∞, at ] = 1 by definition and gi (X) is a monic polynomial of degree k − 1 so gi (∞) = 1. Hence Pij = gi (aj+k ) is of the described form also in case aj+k = ∞. For arbitrary b we have to multiply the jth column of G with bj . In order to get the identity matrix back, the ith row is divided by bi . Corollary 8.1.18 Let (Ik P ) be the generator matrix of the code GRSk (a, b). Then piu pjv [ai , ak+u ][aj , ak+v ] = pju piv [aj , ak+u ][ai , ak+v ] for all 1 ≤ i, j ≤ k and 1 ≤ u, v ≤ n − k. Proof. This is left as an exercise.
In Section 3.2.1 both generalized ReedSolomon and Cauchy codes were introduced as examples of MDS codes. The following corollary shows that in fact these codes are the same. Corollary 8.1.19 Let a be an ntuple of mutually distinct elements of Fq . Let b be an ntuple of nonzero elements of Fq . Let ( Q k bi t=1,t6=i [ai , at ] if 1 ≤ i ≤ k, ci = Qk bi t=1 [ai , at ] if k + 1 ≤ i ≤ n. Then GRSk (a, b) = Ck (a, c).
246
CHAPTER 8. POLYNOMIAL CODES
Proof. The generator matrix of of GRSk (a, b) which is systematic at the first k positions is of the form (Ik P ) with P as given in Proposition 8.1.17. Then pij =
cj+k c−1 i [aj+k , ai ]
for all 1 ≤ i ≤ k and 1 ≤ j ≤ n−k. Hence (Ik P ) = (Ik A(a, c)) is the generator matrix of the generalized Cauchy code Ck (a, c). Remark 8.1.20 A generalized RS code is tight with respect to the Singleton bound k + d ≤ n + 1, that is it is an MDS code. Hence its dual is also MDS. In fact the next proposition shows that the dual of a generalized RS code is again a GRS code. Proposition 8.1.21 Let b⊥ be the vector with entries b⊥ j =
bj
1 i6=j [aj , ai ]
Q
for j = 1, . . . , n. Then GRSn−k (a, b⊥ ) is the dual code of GRSk (a, b). Proof. Let G = (Ik P ) be the generator matrix of GRSk (a, b) with P as obtained in Proposition 8.1.17. In the same way GRSn−k (a, b⊥ ) has a generator matrix H of the form (QIn−k ) with Qn cj t=k+1,t6=i+k [aj , at ] Qn Qij = ci+k t=k+1,t6=i+k [ai+k , at ] for 1 ≤ i ≤ n − k and 1 ≤ j ≤ k. After substituting the values for b⊥ j and canceling the same terms in numerator and denominator we see that Q = −P T . Hence H is a parity check matrix of GRSk (a, b) by Proposition 2.3.30. Example 8.1.22 This is a continuation of Example 8.1.11. Let b be the all ones vector. Then RSk (n, 1) = GRSk (a, b) and RSk (n, 0) = GRSk (a, a). Furthermore the dual of RSk (n, 1) is RSn−k (n, 0) by Proposition 8.1.2. So RSk (n, 1)⊥ = GRSn−k (a, a). Alternatively, Proposition 8.1.21 Q gives that the dual of GRSk (a, b) is equal to GRSn−k (a, c) with cj = 1/ i6=j (aj − ai ). We leave it as an exercise to show that cj = −aj for all j. Example 8.1.23 Consider the code RS3 (7, 1). Let α ∈ F8 be an element with α3 = 1 + α. Let a, b ∈ F78 with ai = αi−1 and b the all ones vector. Then RS3 (7, 1) = GRS3 (a, b) by Remark 8.1.11. Let (I3 P ) be a generator matrix of this code. Let g1 (X) be the quadratic polynomial such that g1 (1) = 1, g1 (α) = 0 and g1 (α2 ) = 0. Then g1 (X) =
(X + α)(X + α2 ) . (1 + α)(1 + α2 )
Hence g1 (α3 ) = α3 , g1 (α4 ) = α, g1 (α5 ) = 1 and g1 (α6 ) = α3 are the entries of the first row of P . Continuing in this way we get 3 α α 1 α3 P = α6 α6 1 α2 . α5 α4 1 α4
8.1. RS CODES AND THEIR GENERALIZATIONS
247
The dual of RS3 (7, 1) is RS4 (7, 0), by Proposition 8.1.2, which is equal to GRS4 (a, a). This is in agreement with Proposition 8.1.21, since cj = aj for all j. Remark 8.1.24 Let b be an ntuple of nonzero elements of Fq . Let a be an ntuple of mutually distinct elements in Fq ∪ {∞} and ak = ∞. Then GRSk (a, b) has a generator matrix of the form (Ik P ), where pij =
cj+k c−1 i aj+k − ai
for all 1 ≤ i ≤ k − 1 and 1 ≤ j ≤ n − k and pkj = cj+k c−1 k for 1 ≤ j ≤ n − k, with Qk−1 bi t=1,t6=i (ai − at ) ci = b k Qk−1 bi t=1 (ai − at )
if 1 ≤ i ≤ k − 1, if i = k, if k + 1 ≤ i ≤ n.
by Corollary 8.1.19.
8.1.3
GRS codes under transformations
Proposition 8.1.25 Let n ≥ 2. Let a be in Fnq consisting of mutually distinct entries. Let b be an ntuple of nonzero elements of Fq . Let 1 ≤ i, j ≤ n and i 6= j. Then there exists a b0 in Fnq with nonzero entries and an a0 in Fnq consisting of mutually distinct entries such that a0i = 0, a0j = 1 and GRSk (a, b) = GRSk (a0 , b0 ). Proof. We may assume without loss of generality that b = 1. Consider the linear polynomials l(X) = (X − ai )/(aj − ai ) and m(X) = (aj − ai )X + ai . Then l(m(X)) = X and m(l(X)) = X. Now Lk is the vector space of all polynomials in the variable X of degree at most k. Then the maps λ, µ : Lk−1 → Lk−1 defined by λ(f (X)) = f (l(X)) and µ(g(X)) = g(m(X)) are both linear and inverses of each other. Hence λ and µ are automorphisms of Lk−1 . Let a0t = l(at ) for all t. Then the a0t are mutually distinct, since the at are mutually distinct and l(X) defines a bijection of Fq . Furthermore a0i = l(ai ) = 0 and a0j = l(aj ) = 1. Now evk−1,a (f (l(X))) is equal to (f (l(a1 )), . . . , f (l(an )) = (f (a01 ), . . . , f (a0n )) = evk−1,a0 (f (X)). Finally GRSk (a0 , 1) = {evk−1,a0 (f (X))  f (X) ∈ Lk−1 } and GRSk (a, 1) is equal to {evk−1,a (g(X))  g(X) ∈ Lk−1 } = {evk−1,a (f (l(X)))  f (X) ∈ Lk−1 }. Therefore GRSk (a, b) = GRSk (a0 , b0 ).
248
CHAPTER 8. POLYNOMIAL CODES
Remark 8.1.26 ***Introduction of GRS with ai = ∞ as in Remark 8.1.15. Refer to forthcoming section of AG codes on the projective line. We leave the proof of the fact that we may assume furthermore a03 = ∞ as an exercise to the reader. For this one has to consider the fractional transformations aX + b , cX + d with ad − bc 6= 0. The set of fractional transformations with entries in a field F form a group with the composition of maps as group operation and determine the product and inverse. Consider the map from GL(2, F) to the group of fractional transformations with entries in F defined by aX + b a b . 7→ c d cX + d Then this map is a morphism of groups and that the kernel of this map consists of the diagonal matrices aI2 with a 6= 0. Remark 8.1.27 ***Definition of evaluation of a rational function..... Let ϕ(X) be a fractional transformation and a ∈ Fq ∪ {∞} and f (X) ∈ F[X]. Then evk,ϕ(a) (f (X)) = evk,a (f (ϕ(X))). This follows straightforward from the definitions in case a is in F and a not a zero of the denominator of ϕ(X)....... ***projective transformations of the projective line*** Proposition 8.1.28 Let n ≥ 3. Let a be an ntuple of mutually distinct entries in Fq ∪ {∞}. Let b be an ntuple of nonzero elements of Fq . Let i, j and l be three mutually distinct integers between 1 and n. Then there exists a b0 in Fnq with nonzero entries and an ntuple a0 consisting of mutually distinct entries in Fq ∪ {∞} such that a0i = 0, a0j = 1 and a0l = ∞ and GRSk (a, b) = GRSk (a0 , b0 ). Proof. This is shown similarly as the proof of Proposition 8.1.25 using fractional transformations instead and is left as an exercise. Now suppose that a generator matrix of the code GRSk (a, b) is given. Is it possible to retrieve a and b? The pair (a, b) is not unique by the action of the fractional transformations. The following proposition gives an answer to this question. Proposition 8.1.29 Let n ≥ 3. Let a and a0 be ntuples with mutually distinct entries in Fq ∪ {∞}. Let b and b0 be ntuples of nonzero elements of Fq . Let i, j and l be three mutually distinct integers between 1 and n. If a0i = ai , a0j = aj , a0l = al and GRSk (a, b) = GRSk (a0 , b0 ), then a0 = a and b0 = λb for some nonzero λ in Fq . Proof. The generalized RS code is MDS, so it is systematic at the first k positions and it has a generator matrix of the form (Ik P ) such that the entries of P are nonzero. Let c = (p11 , . . . , pk1 , 1, pk1 /pk2 , . . . , pk1 /pn(n−k) ).
8.1. RS CODES AND THEIR GENERALIZATIONS
249
Let G = c ∗ (Ik P ). Then G is the generator matrix of a generalized equivalent code C. Dividing the ith row of G by pi1 gives another generator matrix G0 of the same code C such that the (k + 1)th column of G0 is the all ones vector and the kth row is of the form (0, . . . , 1, 1 . . . , 1). So we may suppose without loss of generality that the generator matrix of generalized RS code is of the form (Ik P ) with pi1 = 1 for all i = 1, . . . , k and pkj = 1 for all j = 1, . . . , n − k. After a permutation of the positions we may suppose without loss of generality that l = k, i = k + 1 and j = k + 2. After a fractional transformation we may assume that a0k+1 = ak+1 = 0, a0k+2 = ak+2 = 1 and a0k = ak = ∞ by Proposition 8.1.28. Remark 8.1.24 gives that gives that there exists an ntuple c with nonzero entries in Fq such that pij =
cj+k c−1 i for all 1 ≤ i ≤ k − 1 and 1 ≤ j ≤ n − k and aj+k − ai pkj = cj+k c−1 for 1 ≤ j ≤ n − k. k
Hence pkj = ck+j c−1 k = 1. So ck+j = ck for all j = 1, . . . , n − k. Multiplying all entries of c with a nonzero constant gives the same code. Hence we may assume without loss of generality that ck+j = ck = 1 for all j = 1, . . . , n − k. Therefore cj = 1 for all j ≥ k. Let i < k. Then pi1 = ck+1 /(ci (ak+1 − ai )) = 1, ck+1 = 1 and ak+1 = 0. So pi1 = −1/(ai ci ) = 1. Hence ai ci = −1. Likewise pi2 = ck+2 /(ci (ak+2 − ai )), ck+2 = 1 and ak+2 = 1. So pi2 = Hence ci =
1 1 = , since ai ci = −1. ((1 − ai )ci ) (ci + 1)
pi2 (1 − pi2 ) and ai = for all i < k. pi2 (pi2 − 1)
Finally pij = ck+j /(ci (ak+j −ai )) and ck+j = 1. So ak+j −ai = 1/(ci pij ). Hence ak+j = ai − ai /pij , since ai = −1/ci . Combining this with the expression for ai gives (pij − 1) pi2 · aj+k = . (pi2 − 1) pij Therefore a and c are uniquely determined. So also b is uniquely determined, since Qk−1 ci / t=1,t6=i (ai − at ) if 1 ≤ i ≤ k − 1, bi = c if i = k, k Qk−1 ci / t=1 (ai − at ) if k + 1 ≤ i ≤ n. by Remark 8.1.24. ***  PAut(GRS(a, b) = ... and MAut(GRS(a, b) = ....  What is the number of GRS codes? ***
250
CHAPTER 8. POLYNOMIAL CODES
Example 8.1.30 Let G be the generator matrix of a generalized ReedSolomon code with entries in F7 given by 6 1 1 6 2 2 3 G = 3 4 1 1 5 4 3 . 1 0 3 3 6 0 1 Then rref(G) = (I3 A) with
1 A= 4 3
3 4 1
3 6 6
6 6 . 3
So we want to find a vector a consisting of mutually distinct entries in F7 ∪ {∞} and b in F77 with nonzero entries such that C = GRS3 (a, b). Now C 0 = (1, 4, 3, 1, 5, 5, 6) ∗ C has a generator matrix of the form (I3 A0 ) with 1 1 1 1 A0 = 1 5 4 2 . 1 4 3 6 We may assume without loss of generality that a4 = 0, a5 = 1 and a3 = ∞ by Proposition 8.1.25. ***...............***
8.1.4
Exercises
8.1.1 Show that in RS3 (7, 1) the generating codeword g1,3 (x) is equal to αev(1) + α5 ev(X) + α4 ev(X 2 ). 8.1.2 Compute the parity check polynomial of RS3 (7, 1) and the generator polynomial of RS3 (7, 1)⊥ by means of Proposition 7.1.37, and verify that it is equal to g0,4 (X) according to Proposition 8.1.2. 8.1.3 Give the generator matrix of RS4 (7, 1) of the form (I4 P ), where P a 4 × 3 matrix. 8.1.4 Show directly, that is without the use of Proposition 8.1.4, that the code { ev(X n−b+1 f (X))  deg(f ) < k } is cyclic. 8.1.5 Give another proof of the fact in Proposition 8.1.2 that the dual of RSk (n, b) is equal to RSn−k (n, n − b + 1) using the description with evaluations of Proposition 8.1.4 and that the inner product of codewords of the two codes is zero. 8.1.6 Let Q n = q − 1. Let a1 , . . . , an be an enumeration of the elements of F∗q . Show that i6=j (aj − ai ) = −1/aj for all j. 8.1.7 Consider α ∈ F8 with α3 = 1 + α. Let a = (a1 , . . . , a7 ) with ai = αi−1 for 1 ≤ i ≤ 7. Let b = (1, α2 , α4 , α2 , 1, 1, α4 ). Find c such that the dual of GRSk (a, b) is equal to GRS7−k (a, c) for all k. 8.1.8 Determine all values of n, k and b such that RSk (n, b) is self dual.
8.2. SUBFIELD AND TRACE CODES
251
8.1.9 Give a proof of Corollary 8.1.18. 8.1.10 Let n ≤ q. Let a be an ntuple of mutually distinct elements of Fq , and r an ntuple of nonzero elements of Fq . Let k be an integer such that 0 ≤ k ≤ n. Show that the generalized Cauchy code Ck (a, r) is equal to r ∗ Ck (a). 8.1.11 Give a proof of statements made in Remark 8.1.26. 8.1.12 Let u, v and w be three mutually distinct elements of a field F. Show that there is a unique fractional transformation ϕ such that ϕ(u) = 0, ϕ(v) = 1 and ϕ(w) = ∞. 8.1.13 Give a proof of Proposition 8.1.28. 8.1.14 Let α ∈ F8 be a primitive element such that α3 = α + 1. Let G be the generator matrix a generalized ReedSolomon code given by 6 α α6 α 1 α4 1 α4 G = 0 α3 α3 α4 α6 α6 α4 . α4 α5 α3 1 α2 0 α6 (1) Find a in F78 consisting of mutually distinct entries and b in F78 with nonzero entries such that G is a generator matrix of GRS3 (a, b). (2) Consider the 3 × 7 generator matrix G0 of the code RS3 (7, 1) with entry α(i−1)(j−1) in the ith row and the jth column. Give an invertible 3 × 3 matrix S and a permutation matrix P such that G0 = SGP . (3) What is the number of pairs (S, P ) of such matrices?
8.2
Subfield and trace codes
***
8.2.1
Restriction and extension by scalars
In this section we derive bounds on the parameters of subfield subcodes. We repeat Definitions 4.4.32 and 7.3.1. Definition 8.2.1 Let D be an Fq linear code in Fnq . Let C be an Fqm linear code of length n. If D = C ∩ Fnq , then D is called the subfield subcode or the restriction (by scalars) of C, and is denoted by CFq . If D ⊆ C, then C is called a super code of D. If C is generated as an Fqm linear space by D, then C is called the extension (by scalars) of D and is denoted by D ⊗ Fqm . Proposition 8.2.2 Let G be a generator matrix with entries in Fq . Let D and C be the Fq linear and the Fqm linear code, respectively with G as generator matrix. Then (D ⊗ Fqm ) = C and (CFq ) = D.
252
CHAPTER 8. POLYNOMIAL CODES
Proof. Let G be a generator matrix of the Fq linear code D. Then G is also a generator matrix of D ⊗ Fqm by Remark 4.4.33. Hence (D ⊗ Fqm ) = C. Now D is contained in C and in Fnq . Hence D ⊆ (CFq ). Conversely, suppose that c ∈ (CFq ). Then c ∈ Fnq and c = xG for some x ∈ Fkqm . After a permutation of the coordinates we may assume without loss of generality that G = (Ik A) for some k×(n−k) matrix A with entries in Fq . Therefore (x, xA) = xG = c ∈ Fnq . Hence x ∈ Fkq and c ∈ D. Remark 8.2.3 Similar statements hold as in Proposition 8.2.2 with a parity check matrix H instead of a generator matrix G. Remark 8.2.4 Let D be a cyclic code of length n over Fq with defining set I. Suppose that gcd(n, q) = 1 and n divide q m − 1. Let α in F∗qm have order n. Let ˜ be the Fqm linear cyclic code with parity check matrix H ˜ = (αij i ∈ I, j = D ˜ by Remark 7.3.2. So (D ⊗Fqm ) ⊆ D ˜ 0, . . . , n−1). Then D is the restriction of D ˜ ˜ and ((D ⊗ Fqm )Fq ) = (DFq ) = D. If α is not an element of Fq , then H is not defined over Fq and the analogous statement of Proposition 8.2.2 as mentioned ˜ in Remark 8.2.3 does hold and (D ⊗ Fqm ) is a proper subcode of D. ˜ is row equivalent over Fqm with a matrix H with entries in We will see that H ˜ if I is the complete defining set of D. Fq and (D ⊗ Fqm ) = D
8.2.2
Parity check matrix of a restricted code
Lemma 8.2.5 Let h1 , . . . , hn ∈ Fqm . Let α1 , . . . , αm be a basis of Fqm over Fq . Then there exist unique elements hij ∈ Fq such that hj =
m X
hij αi .
i=1
Furthermore for all x ∈ Fnq n X
h j xj = 0
j=1
if and only if n X
hij xj = 0 for all i = 1, . . . , m.
j=1
Proof. The existence and uniqueness of the hij is a consequence of the assumption that α1 , . . . , αm is a basis of Fqm over Fq . Let x ∈ Fnq . Then ! n n m m n X X X X X hj xj = hij αi xj = hij xj αi . j=1
j=1
i=1
i=1
j=1
The αi form a basis over Fq and the xj are elements of Fq . This implies the statement on the equivalence of the linear equations. Proposition 8.2.6 Let E = (h1 , . . . , hn ) be a 1 × n parity check matrix of the Fqm linear code C. Let l be the dimension of the Fq linear subspace in Fqm generated by h1 , . . . , hn . Then the dimension of CFq is equal to n − l.
8.2. SUBFIELD AND TRACE CODES
253
Proof. Let H be the m × n matrix with entries hij as given in Lemma 8.2.5. Then (h1j , . . . , hmj ) are the coordinates of hj with respect to the basis α1 , . . . , αm of Fqm over Fq . So the rank of H is equal to l. The code CFq is the null space of the matrix H, by Lemma 8.2.5 and has dimension n − rank(H) which is n − l. Example 8.2.7 Let α ∈ F9 be a primitive element such that α2 + α − 1 = 0. Choose αi = αi with 1, α as basis. Consider the parity check matrix E = 1 α α2 α3 α4 α5 α6 α7 of the F9 linear code C. Then according to Lemma 8.2.5 the parity check matrix H of CF3 is given by 1 0 1 2 2 0 2 1 H= 0 1 2 2 0 2 1 1 For instance α3 = −1 − α, so α3 has coordinates (−1, −1) with respect to the chosen basis and the transpose of this vector is the 4th column of H. The entries of the row E generate F9 over F3 . The rank of H is 2, so the dimension of CF3 is 6. This is in agreement with Proposition 8.2.6. Lemma 8.2.5 has the following consequence. Proposition 8.2.8 Let D be an Fq linear code of length n and dimension k. Let m = n − k. If k < n, then D is the restriction of a code C over Fqm of codimension one. Proof. Let H be an (n−k)×n parity check matrix of D over Fq . Let m = n−k. Let k < n. Then m > 0. Let α1 , . . . , αm be a basis of Fqm over Fq . Define for j = 1, . . . , n m X hj = hij αi . i=1
Let E = (h1 , . . . , hn ) be an 1 × n parity check matrix of the Fqm linear code C. Now E is not the zero vector, since k < n. So C has codimension one, and D is the restriction of C by Lemma 8.2.5. Proposition 8.2.9 Let C be an Fqm linear code with parameters [n, k, d]qm . Then the dimension of CFq over Fq is at least n − m(n − k) and its minimum distance is at least d. Proof. The minimum distance of CFq is at least the minimum distance of C, since CFq is a subset of C. Let E be a parity check matrix of C. Then E consists of n − k rows. Every row gives rise to m linear equations over Fq by Lemma 8.2.5. So CFq is the solution space of m(n − k) homogeneous linear equations over Fq . Therefore the dimension of CFq is at least n − m(n − k). Remark 8.2.10 ***Lower bound of DelsarteSidelnikov***
254
8.2.3
CHAPTER 8. POLYNOMIAL CODES
Invariant subspaces
Remark 8.2.11 Let D be the restriction of an Fqm linear code C. Suppose that h = (h1 , . . . , hn ) ∈ Fnqm is a parity check for D. So h1 c1 + · · · + hn cn = 0 for all c ∈ D. Then Pn
q i=1 hi ci cqi = ci
=
Pn
i=1
Pn q hqi cqi = ( i=1 hi ci ) = 0
for all c ∈ D, since for all i and c ∈ D. Hence (hq1 , . . . , hqn ) is also a parity check for the code D. Example 8.2.12 This is a continuation of Example 8.2.7. Consider the parity check matrix 1 α α2 α3 α4 α5 α6 α7 0 E = 1 α3 α6 α α4 α7 α2 α5 of the F9 linear code C 0 . Let D0 be the ternary restriction of C 0 . Then according to Proposition 8.2.6 the code D0 is the null space of the matrix H 0 given by 1 0 1 2 2 0 2 1 0 1 2 2 0 2 1 1 H0 = 1 2 2 0 2 1 1 0 0 2 1 1 0 1 2 2 The second row of E 0 is obtained by taking the third power of the entries of the first row. So D0 = D by Remark 8.2.11. Indeed, the last two rows of H 0 are linear combinations of the first two rows. Hence H 0 and H have the same rank, that is 2. Definition 8.2.13 Extend the Frobenius map ϕ : Fqm → Fqm , defined by ϕ(x) = xq , to the map ϕ : Fnqm → Fnqm , defined by ϕ(x) = (xq1 , . . . , xqn ). Likewise we define ϕ(G) of a matrix with entries (gij ) to be the matrix with entries (ϕ(gij )). Remark 8.2.14 The map ϕ : Fnqm → Fnqm has the property that ϕ(αx + βy) = αq ϕ(x) + β q ϕ(y) for all α, β ∈ Fqm and x, y ∈ Fnqm . Hence this map is Fq linear, since αq = α and β q = β for α, β ∈ Fqm . In particular, the Frobenius map is an automorphism of the field Fqm with Fq as the field of elements that are pointwise fixed. Therefore it leaves also the points of Fnq pointwise fixed under ϕ. If x ∈ Fnqm , then ϕ(x) = x if and only if x ∈ Fnq . Furthermore ϕ is an isometry. Definition 8.2.15 Let F be a subfield of G. The Galois group Gal(G/F) is the group all field automorphisms of G that leave F pointwise fixed. Gal(G/F) is denoted by Gal(q m , q) in case F = Fq and G = Fqm . A subspace W of Fnqm is called Gal(q m , q)invariant, or just invariant, if τ (W ) = W for all τ ∈ Gal(q m , q).
8.2. SUBFIELD AND TRACE CODES
255
Remark 8.2.16 Gal(q m , q) is a cyclic group generated by ϕ of order m. Hence a subspace W is invariant if and only if ϕ(W ) ⊆ W . The following two lemmas are similar to the statements for the shift operator in connection with cyclic codes in Propositions 7.1.3 and 7.1.6 but now for the Frobenius map. Lemma 8.2.17 Let G be k × n generator matrix of the Fqm linear code C. Let gi be the ith row of G. Then C is Gal(q m , q)invariant if and only if ϕ(gi ) ∈ C for all i = 1, . . . , k. Proof. If C is invariant, then ϕ(gi ) ∈ C for all i, since gi ∈ C. Pk Conversely, suppose that ϕ(gi ) ∈ C for all i. Let c ∈ C. Then c = i=1 xi gi for some xi ∈ Fqm . So k X ϕ(c) = xqi ϕ(gi ) ∈ C. i=1
Hence C is an invariant code.
Lemma 8.2.18 Let C be an Fqm linear code. Then C ⊥ is invariant if C is invariant. Proof. Notice that Pn Pn ϕ(x · y) = ( i=1 xi yi )q = i=1 xqi yiq = ϕ(x) · ϕ(y) for all x, y ∈ Fnqm . Suppose that C is an invariant code. Let y ∈ C ⊥ and c ∈C. Then ϕm−1 (c) ∈ C. Hence ϕ(y) · c = ϕ(y) · ϕm (c) = ϕ(y · ϕm−1 (c)) = ϕ(0) = 0, Therefore ϕ(y) ∈ C ⊥ for all y ∈ C ⊥ , and C ⊥ is invariant.
Proposition 8.2.19 Let C be Fqm linear code of length n. Then C is Gal(q m , q)invariant if and only if C has a generator matrix with entries in Fq if and only if C has a parity check matrix with entries in Fq . Proof. If C has a generator matrix with entries in Fq , then clearly C is invariant. Now conversely, suppose that C is invariant. Let G be a k × n generator matrix of C. We may assume without loss of generality that the first k columns are independent. So after applying the Gauss algorithm we get the row reduced echelon form G0 of G with the k × k identity matrix Ik in the first k columns. So G0 = (Ik A), where A is a k × (n − k) matrix. Let gi0 be the ith row of G0 . Now C is invariant. So ϕ(gi0 ) ∈ C and ϕ(gi0 ) is an Fqm linear combination of the gj0 . That is one can find elements sij in Fqm such that ϕ(gi0 ) =
n X j=1
sij gj0 .
256
CHAPTER 8. POLYNOMIAL CODES
Let S be the k × k matrix with entries (sij ). Then (Ik ϕ(A)) = (ϕ(Ik )ϕ(A)) = ϕ(G0 ) = SG0 = S(Ik A) = (SSA). Therefore Ik = S and ϕ(A) = SA = A. Hence the entries of A are elements of Fq . So G0 is a generator matrix of C with entries in Fq . The last equivalence is a consequence of Proposition 2.3.3. Example 8.2.20 Let α ∈ F8 be a primitive element such that α3 = α + 1. Let G be the generator matrix of the F8 linear code C with 1 α α2 α3 α4 α5 α6 G = 1 α2 α4 α6 α α3 α5 1 α4 α α5 α2 α6 α3 Let gi be the ith row of G. Then ϕ(gi ) = gi+1 for all i = 1, 2 and ϕ(g3 ) = g1 . Hence C is an invariant code by Lemma 8.2.17. The proof of Proposition 8.2.19 explains how to get a generator matrix G0 with entries in F2 . Let G0 be the row reduced echelon form of G. Then 1 0 0 1 0 1 1 G0 = 0 1 0 1 1 1 0 0 0 1 0 1 1 1 is indeed a binary matrix. In fact it is the generator matrix of the binary [7, 3, 4] Hamming code. Definition 8.2.21 Let C be an Fqm linear code. Define the codes C 0 and C ∗ by i C 0 = ∩m i=1 ϕ (C), P m C ∗ = i=1 ϕi (C) Remark 8.2.22 It is clear form the definitions that the codes C 0 and C ∗ are Gal(q m , q)invariant. Furthermore C 0 is the largest invariant code contained in C, that is if D is an invariant code and D ⊆ C, then D ⊆ C 0 . And similarly, C ∗ is the smallest invariant code containing C, that is if D is an invariant code and C ⊆ D, then C ∗ ⊆ D. Proposition 8.2.23 Let C be an Fqm linear code. Then C 0 = ((C ⊥ )∗ )⊥ Proof. The following inclusion holds C 0 ⊆ C. So dually C ⊥ ⊆ (C 0 )⊥ . Now C 0 is invariant. So (C 0 )⊥ is invariant by Lemma 8.2.18, and it contains C ⊥ . By Remark 8.2.22 (C ⊥ )∗ is the smallest invariant code containing C ⊥ . Hence (C ⊥ )∗ ⊆ (C 0 )⊥ and therefore C 0 ⊆ ((C ⊥ )∗ )⊥ . We have C ⊥ ⊆ (C ⊥ )∗ . So dually ((C ⊥ )∗ )⊥ ⊆ C. The code ((C ⊥ )∗ )⊥ is invariant and is contained in C. The largest code that is invariant and contained in C is equal to C 0 . Hence ((C ⊥ )∗ )⊥ ⊆ C 0 . Both inclusions give the desired equality.
8.2. SUBFIELD AND TRACE CODES
257
Theorem 8.2.24 Let C be an Fqm linear code. Then C and C 0 have the same restriction. Furthermore dimFq (CFq ) = dimFqm (C 0 ) and d(CFq ) = d(C 0 ). Proof. The inclusion C 0 ⊆ C implies (C 0 Fq ) ⊆ (CFq ). The code (CFq ) ⊗ Fqm is contained in C and is invariant. Hence (CFq ) ⊗ Fqm ⊆ C 0 , by Remark 8.2.22. So (((CFq ) ⊗ Fqm )Fq ) ⊆ (C 0 Fq ). But (CFq ) = (((CFq ) ⊗ Fqm )Fq ), by Lemma 8.2.2 applied to D = (CFq ). Therefore ((CFq ) ⊆ (C 0 Fq ), and with the converse inclusion above we get the desired equality (CFq ) = (C 0 Fq ). The code C 0 has a k × n generator matrix G with entries in Fq , by Proposition 8.2.19, since C 0 is an invariant code. Then G is also a generator matrix of (C 0 Fq ), by Lemma 8.2.2. Furthermore (CFq ) = (C 0 Fq ). Therefore dimFq (CFq ) = k = dimFqm (C 0 ). The code C 0 has a parity check H with entries in Fq , by Proposition 8.2.19. Then H is also a parity check matrix of (CFq ) over Fq . The minimum distance of a code can be expressed as the minimum number of columns in a parity check matrix that are dependent, by Proposition 2.3.11. Consider a l × m matrix B with entries in Fq . Then the the columns of B are dependent if and only if rank(B) < m. The rank B is equal to the number of pivots in the row reduced echelon form of B. The row reduced echelon form of B is unique, by Remark 2.2.18, and does not change by considering B as a matrix with entries over Fqm . Therefore d(CFq ) = d(C 0 ). Remark 8.2.25 Lemma 8.2.5 gives us a method to compute the parity check matrix of the restriction. Proposition 8.2.23 and Theorem 8.2.24 give us another way to compute the parity check and generator matrix of the restriction of a code. Let C be an Fqm linear code. Let H be a parity check matrix of C. Then H is a generator matrix of C ⊥ . Let (hi , i = 1, . . . , n − k) be the rows of H. Let H ∗ be the matrix with the (n − k)m rows ϕj (hi ), i = 1, . . . , n − k, j = 1, . . . , m. Then these rows generate (C ⊥ )∗ . Let H0 be the row reduced echelon form of H ∗ with the zero rows deleted. Then H0 has entries in Fq and is a generator matrix of (C ⊥ )∗ , since it is an invariant code. So H0 is the parity check matrix of ((C ⊥ )∗ )⊥ = C 0 . Hence it is also the parity check matrix of (C 0 Fq ) = (CFq ). Example 8.2.26 Consider the parity check matrix E of Example 8.2.7. Then E ∗ is equal to the matrix E 0 of Example 8.2.12. Taking the row reduced echelon form of E ∗ gives indeed the parity check matrix H obtained in Example 8.2.7.
8.2.4 ***
Cyclic codes as subfield subcodes
258
8.2.5
CHAPTER 8. POLYNOMIAL CODES
Trace codes F
Definition 8.2.27 The trace map TrFqq F
m
m
: Fqm → Fq is defined by
TrFqq (x) = x + xq + · · · + xq F
m−1
for x ∈ Fqm .
m
The notation TrFqq is abbreviated by Tr in case the context is clear. This map is extended co¨ ordinatewise to a map Tr : Fnqm → Fnq . Remark 8.2.28 Let F be a field and G a finite field extension of F of degree m. Then G is a vector space over F of dimension m. Choose a basis of G over F. Let x ∈ G. Then multiplication by x on G is an Flinear map. Let Mx be the corresponding matrix of this map with respect to the chosen basis. The sum of the diagonal elements of Mx is called the trace of x. This trace does not depend on the chosen basis and will be denoted by TrG F (x) or by Tr(x) for short. Definition 8.2.27 of the trace for a finite extension of a finite field is an ad hoc definition. With the above generalization of the definition of the trace the ad hoc definition becomes a property. The maps Tr : Fqm → Fq and Tr : Fnqm → Fnq are Fq linear. Proposition 8.2.29 (DelsarteSidelnikov) Let C be an Fqm linear code. Then (C ⊥ ∩ Fnq )⊥ = Tr(C). Proof. ***
8.2.6
Exercises
8.2.1 Let α ∈ F16 be a primitive element such that α4 = α+1. Choose αi = αi with i = 0, 1, 2, 3 as basis. Consider the parity check matrix 1 α α2 α3 · · · α14 E= 1 α2 α4 α6 · · · α13 of the F16 linear code C. Let E 0 be the 1 × 8 submatrix of E consisting of the first row of E. Let C 0 be the F16 linear code with E 0 as parity check matrix. Determine the the parity check matrices H of CFq and and H 0 of C 0 Fq , using Lemma 8.2.5 and Proposition 8.2.9. Show that H = H 0 . 8.2.2 Let α ∈ F16 be a primitive element such that α4 = α + 1. Give a binary parity check matrix of the binary restriction of the code RS4 (15, 0). Determine the dimension of the binary restriction of the code RSk (15, 0) for all k. 8.2.3 Let α ∈ F16 be a primitive element such that α4 = α + 1. Let G be the i 4 × 15 matrix with entry gij = αj2 at the ith row and the jth column. Let C be the code with generator matrix G. Show that C is Gal(16, 2)invariant and give a binary generator matrix of C. 8.2.4 Let m be a positive integer and C an Fqm linear code. Let ϕ be the Frobenius map of Fqm fixing Fq . Show that ϕ(C) is an Fqm linear code that is isometric with C. Give a counter example of a code C that is not monomial equivalent with ϕ(C). 8.2.5 Give proofs of the statements made in Remark 8.2.28.
8.3. SOME FAMILIES OF POLYNOMIAL CODES
8.3
259
Some families of polynomial codes
***
8.3.1
Alternant codes
Definition 8.3.1 Let a = (a1 , . . . , an ) be an ntuple of n distinct elements of Fqm . Let b = (b1 , . . . , bn ) be an ntuple of nonzero elements of Fqm . Let GRSk (a, b) be the generalized RS code over Fqm of dimension k. The alternant code ALTr (a, b) is the Fq linear restriction of (GRSr (a, b))⊥ . Proposition 8.3.2 The code ALTr (a, b) has parameters [n, k, d]q with k ≥ n − mr and d ≥ r + 1. ⊥ Proof. The code (GRS Q r (a, b)) is equal to GRSn−r (a, c) by Proposition 8.1.21 with cj = 1/(bj i6=j (aj − ai )) by Proposition 8.1.21, and has parameters [n, n − r, r + 1]qm by Proposition 8.1.14. So the statement is a consequence of Proposition 8.2.9.
Proposition 8.3.3 (ALTr (a, b))⊥ = Tr(GRSr (a, b)). Proof. This is a direct consequence of the definition of an alternant code and Proposition 8.2.29. Proposition 8.3.4 Every linear code of minimum distance at least 2 is an alternant code. Proof. Let C be a code of length n and dimension k. Then k < n, since the minimum distance of C is at least 2. Let m be a positive integer such that n − k divides m and q m ≥ n. Let a = (a1 , . . . , an ) be any ntuple of n distinct elements of Fqm . Let H be an (n − k) × n parity check matrix of C over Fq . Following the proof of Proposition 8.2.8, let α1 , . . . , αn−k be a basis of Fn−k q m is an extension of F n−k , since n − k divides m. Define over F . The field F q q q Pm bj = i=1 hij αi for j = 1, . . . , n. The minimum distance of C is at least 2. So H does not contain a zero column by Proposition 2.3.11. Hence bj 6= 0 for all j. Let b = (b1 , . . . , bn ). Then C is the restriction of GRS1 (a, b)⊥ . Therefore C = ALT1 (a, c) by definition. Remark 8.3.5 The above proposition gives that almost all linear codes are alternant, but it gives no useful information about the parameters of the code. ***Alternant codes meet the GV bound (MacWilliams & Sloane page 337) BCH codes are not asymptotically good?? ***
260
8.3.2
CHAPTER 8. POLYNOMIAL CODES
Goppa codes
A special class alternant codes is given by Goppa codes. Definition 8.3.6 Let L = (a1 , . . . , an ) be an ntuple of n distinct elements of Fqm . A polynomial g with coefficients in Fqm such that g(aj ) 6= 0 for all j is called a Goppa polynomial with respect to L. Define the Fq linear Goppa code Γ(L, g) by n X c j ≡ 0 mod g(X) . Γ(L, g) = c ∈ Fnq  X − aj j=1 Remark 8.3.7 The assumption g(aj ) 6= 0 implies that X − aj and g(X) are relatively prime, so their greatest common divisor is 1. Euclides algorithm gives polynomials Pj and Qj such that Pj (X)g(X) + Qj (X)(X − aj ) = 1. So Qj (X) is the inverse of X − aj modulo g(X). We claim that Qj (X) = −
g(X) − g(aj ) g(aj )−1 . X − aj
Notice that g(X) − g(aj ) has aj as zero. So g(X) − g(aj ) is divisible by X − aj and its fraction is a polynomial of degree one less than the degree of g(X). With the above definition of Qj we get Qj (X)(X − aj ) = −(g(X) − g(aj ))g(aj )−1 = 1 − g(X)g(aj )−1 ≡ 1 mod g(X). Remark 8.3.8 Let g1 and g2 be two Goppa polynomials with respect to L. If g2 divides g1 , then Γ(L, g1 ) is a subcode of Γ(L, g2 ). Proposition 8.3.9 Let L = a = (a1 , . . . , an ). let g be a Goppa polynomial of degree r. The Goppa code Γ(L, g) is equal to the alternant code ALTr (a, b) where bj = 1/g(aj ). Proof. Remark 8.3.7 implies that c ∈ Γ(L, g) if and only if n X
cj
j=1
g(X) − g(aj ) g(aj )−1 = 0, X − aj
since the left hand side is a polynomial of degree strictly smaller than the degree of g(X), and this polynomial is 0 if and only if it is 0 modulo g(X). Let g(X) = g0 + g1 X + · · · + gr X r . Then r r l−1 X X g(X) − g(aj ) X X l − alj = gl = gl X i al−1−i j X − aj X − aj i=0 l=0
=
l=0
r−1 X
r X
i=0
l=i+1
! gl ajl−1−i
X i.
Therefore c ∈ Γ(L, g) if and only if n X
r X
j=1
l=i+1
! gl al−1−i j
g(aj )−1 cj = 0
8.3. SOME FAMILIES OF POLYNOMIAL CODES
261
for all i = 0, . . . , r − 1, if and only if H1 cT = 0, where H1 is a r × n parity check matrix with jth column gr ar−1 + gr−1 ar−2 + · · · + g2 aj + g1 j j .. . 2 g(aj )−1 g a + g a + g r j r−1 j r−2 gr aj + gr−1 gr The coefficient gr is not zero, since g(X) has degree r. Divide the last row of H1 by gr . Then subtract gr−1 times the rth row from row r − 1. Next divide row r − 1 by gr . Continuing in this way by a sequence of elementary transformations it is shown that H1 is row equivalent with the matrix H2 with −1 entry ai−1 in the ith row and the jth column. So H2 is the generator j g(aj ) matrix of GRSr (a, b), where b = (b1 , . . . , bn ) and bj = 1/g(aj ). Hence Γ(L, g) is the restriction of GRSr (a, b)⊥ . Therefore Γ(L, g) = ALTr (a, b) by definition. Proposition 8.3.10 Let g be a Goppa polynomial of degree r over Fqm . Then the Goppa code Γ(L, g) is an [n, k, d] code with k ≥ n − mr and d ≥ r + 1. Proof. This is a consequence of Proposition 8.3.9 showing that a Goppa code is an alternant code and Proposition 8.3.2 on the parameters of alternant codes. Remark 8.3.11 Let g be a Goppa polynomial of degree r over Fqm . Then the Goppa code Γ(L, g) has minimum distance d ≥ r + 1 by Proposition 8.3.10. It is an alternant code, that is a subfield subcode of a GRS code of minimum distance r+1 by Proposition 8.3.9. This super code has several efficient decoding algorithms that correct br/2c errors. The same algorithms can be applied to the Goppa code to correct br/2c errors. Definition 8.3.12 A polynomial is called square free if all (irreducible) factors have multiplicity one. Remark 8.3.13 Notice that irreducible polynomials are square free Goppa polynomials. If g(X) is a square free Goppa polynomial, then g(X) and its formal derivative g 0 (X) have no common factor by Lemma 7.2.8. Proposition 8.3.14 Let g be a square free Goppa polynomial with coefficients in F2m . Then the binary Goppa code Γ(L, g) is equal to Γ(L, g 2 ). Proof. (1) The code Γ(L, g 2 ) is a subcode of Γ(L, g), by Remark 8.3.8. (2) Let c be a binary word. Define the polynomial f (X) by f (X) =
n Y j=1
(X − aj )cj
262
CHAPTER 8. POLYNOMIAL CODES
So f (X) is the reciprocal locator polynomial of c, it is the monic polynomial of degree wt(c) and its zeros are located at those aj such that cj 6= 0. Now f 0 (X) =
n X
cj (X − aj )cj −1
j=1
Hence
n Y
(X − al )cl .
l=1,l6=j
n
f 0 (X) X cj = f (X) X − aj j=1
Let c ∈ Γ(L, g). Then f 0 (X)/f (X) ≡ 0 mod g(X). Now gcd(f (X), g(X)) = 1. So there exist polynomials p(X) and q(X) such that p(X)f (X)+q(X)g(X) = 1. Hence f 0 (X) p(X)f 0 (X) ≡ ≡ 0 mod g(X). f (X) Therefore g(X) divides f 0 (X), since gcd(p(X), g(X)) = 1. Let f (X) = f0 + f1 X + · · · + fn X n . Then 2 bn/2c bn/2c n X X m−1 X 2 f2i+1 X 2i = f2i+1 ifi X i−1 = X i , f 0 (X) = i=0
i=0
i=0
since the coefficients are in F2m . So f 0 (X) is a square that is divisible by the square free polynomial g(X). Hence f 0 (X) is divisible by g(X)2 , so c ∈ Γ(L, g 2 ). Therefore Γ(L, g) is contained in Γ(L, g 2 ). So they are equal by (1). Proposition 8.3.15 Let g be a square free Goppa polynomial of degree r with coefficients in F2m . Then the binary Goppa code Γ(L, g) is an [n, k, d] code with k ≥ n − mr and d ≥ 2r + 1. Proof. This is a consequence of Proposition 8.3.14 showing that Γ(L, g) = Γ(L, g 2 ) and Proposition 8.3.10 on the parameters of Goppa codes. The lower bound on the dimension uses that g(X) has degree r, and the lower bound on the minimum distance uses that g 2 (X) has degree 2r. Example 8.3.16 Let α ∈ F8 be a primitive element such that α3 = α + 1. Let aj = αj−1 be an enumeration of the seven elements of L = F∗8 . Let g(X) = 1+X +X 2 . Then g is a square free polynomial in F2 [X] and a Goppa polynomial with respect to L. Let a be the vector with entries aj . Let b be defined by bj = 1/g(aj ). Then b = (1, α2 , α4 , α2 , α, α, α4 ). And Γ(L, g) = ALT2 (a, b) by Proposition 8.3.9. Let k be the dimension and d the minimum distance of Γ(L, g). Then k ≥ 1 and d ≥ 5 by Proposition 8.3.15. In fact Γ(L, g) is a one dimensional code generated by (0, 1, 1, 1, 1, 1, 1). Hence d = 6. Example 8.3.17 Let L = F210 . Consider the binary Goppa code Γ(L, g) with a Goppa polynomial g in F210 [X] of degree 50 with respect to L = F210 . Then the code has length 1024, dimension k ≥ 524 and minimum distance d ≥ 51. If moreover g is square free, then d ≥ 101. ***Goppa codes meet the GV bound, random argument***
8.3. SOME FAMILIES OF POLYNOMIAL CODES
8.3.3
263
Counting polynomials
The number of certain polynomials will be counted in order to get an idea of the number of Goppa codes. Remark 8.3.18 (1) Irreducible polynomials are square free Goppa polynomials. The number of monic irreducible polynomials in Fq [X] of degree d is counted by Irrq (d) and this number is computed by means of the M¨obius function as given by Proposition 7.2.19. (2) Every monic square free polynomial f (X) over Fq of degree r has a unique factorization in monic irreducible polynomials. Let ei be the number of irreducible factors in f (X) of degree i. Then e1 + 2e2 + · · · + rer = r and there are ei ways to choose among the Irrq (i) monic irreducible polynomials of degree i. Hence the number Sq (r) of monic square free polynomials over Fq of degree r is equal to r X Y Irrq (i) Sq (r) = . ei e +2e +···+re =r i=1 1
2
r
(3) The number SGq (r) of square free monic Goppa polynomials in Fq [X] of degree r with respect to L = Fq is similar, since such Goppa polynomials have no linear factors in Fq [X]. Hence X
SGq (r) =
2e2 +···+rer
r Y Irrq (i) . ei =r i=2
Simpler formulas are obtained in the following. Proposition 8.3.19 Let Sq (r) be the number of monic square free polynomials over Fq of degree r. Then Sq (0) = 1, Sq (1) = q and Sq (r) = q r − q r−1 for r > 1. Proof. Clearly Sq (0) = 1 and Sq (1) = q. Since 1 is the only monic polynomial of degree zero, and {a + Xa ∈ Fq } is the set of monic polynomials of degree one and they are all square free. If f (X) is a monic polynomial of degree r > 1, but not square free, then we have a unique factorization f (X) = g(X)2 h(X), where g(X) is a monic plynomial, say of degree a, and h(X) is a monic square free polynomial of degree b. So 2a + b = r and a > 0. Hence the number of monic polynomials of degree r over Fq that are not square free is q r − Sq (r) and equal to br/2c X q a Sq (r − 2a). a=1
Therefore br/2c
Sq (r) = q r −
X
q a Sq (r − 2a).
a=1
This recurrence relation with starting values Sq (0) = 1 and Sq (1) = q has the unique solution Sq (r) = q r − q r−1 for r > 1. This is left as an exercise.
264
CHAPTER 8. POLYNOMIAL CODES
Proposition 8.3.20 Let r ≤ n ≤ q. The number Gq (r, n) of monic Goppa polynomials in Fq [X] of degree r with respect to L that consists of n distinct given elements in Fq is given by r X n r−i Gq (r, n) = (−1)i q . i i=0 Proof. Let Pq (r) be the set of all monic polynomials in Fq [X] of degree r. Then Pq (r) := Pq (r) = q r , since r coefficients of a monic polynomial of degree r are free to choose in Fq . Let a be a vector of length n with entries the elements of L. Let I be a subset of {1, . . . , n}. Define Pq (r, I) = { f (X) ∈ Pq (r)  f (ai ) = 0 for all i ∈ I }. If r ≥ I, then Pq (r, I) = Pq (r − I) ·
Y (X − ai ), i∈I
since f (ai ) = 0 if and only if f (X) = g(X)(X − ai ), and the ai are mutually distinct. Hence Pq (r, I) := Pq (r, I) = Pq (r − I) = q r−I for all r ≥ I. So Pq (r, I) depends on q, r and only on the size of I. Furthermore Pq (r, I) is empty if r < I. The set of monic Goppa polynomials in Fq [X] of degree r with respect to L is equal to ! n n \ [ (Pq (r) \ Pq (r, ai )) = Pq (r) \ Pq (r, ai ) . i=1
i=1
The principle of inclusion/exclusion gives r X X I i n q r−i . Gq (r, n) = (−1) Pq (r, I) = (−1) i i=0 I
Proposition 8.3.21 Let r ≤ n ≤ q. The number SGq (r, n) of square free, monic Goppa polynomials in Fq [X] of degree r with respect to L that consists of n distinct given elements in Fq is given by r
SGq (r, n) = (−1)
X r−1 n+r−1 i n + 2i − 1 n + i − 1 q r−i . (−1) + n + i − 1 i r i=0
Proof. An outline of the proof is given. The details are left as an exercise. (1) The following recurrence relation holds br/2c
SGq (r, n) = Gq (r, n) −
X
Gq (a, n) · SGq (r − 2a, n)
a=1
and that the given formula for SGq (r, n) satisfies this recurrence relation. (2) ****The given formula satisfies the recurrence relation and the starting values.******
8.3. SOME FAMILIES OF POLYNOMIAL CODES
265
Example 8.3.22 ****Consider polynomials over the finite field F1024 . Compute the following numbers. (1) The number of monic irreducible polynomials of degree 50. (2) The number of square free monic polynomials of degree 50 (3) The number of monic Goppa polynomials of degree 50 with respect to L. (4) The number of square free, monic Goppa polynomials of degree 50 with respect to L. **** ***Question: If Γ(L, g1 ) = Γ(L, g2 ) and ..., then g1 = g2 ??? ***the book of Berlekamp on Algebraic coding theory. ***generating functions, asymptotics*** ***Goppa codes meet the GV bound.***
8.3.4
Exercises
8.3.1 Give a proof of Remark 8.3.8. 8.3.2 Let L = F∗9 . Consider the Goppa codes Γ(L, g) over F3 . Show that the only Goppa polynomials in F3 [X] of degree 2 are X 2 and 2X 2 . 8.3.3 Let L be an enumeration of the eight elements of F∗9 . Describe the Goppa codes Γ(L, X) and Γ(L, X 2 ) over F3 as alternant codes of the form ALT1 (a, b) and ALT1 (a, b0 ). Determine the parameters of these codes and compare these with the ones given in Proposition 8.3.15. 8.3.4 Let g be a square free Goppa polynomial of degree r over Fqm . Then the Goppa code Γ(L, g) has minimum distance d ≥ 2r + 1 by Proposition 8.3.15. Explain how to adapt the decoding algorithm mentioned in Remark 8.3.11 to correct r errors. 8.3.5 Let L = F211 . Consider the binary Goppa code Γ(L, g) with a square free Goppa polynomial g in F211 [X] of degree 93 with respect to L = F211 . Give lower bounds on the dimension the minimum distance of this code. 8.3.6 Give a proof of the formula Sq (r) = q r − q r−1 for r > 1 by showing by induction that it satisfies the recurrence relation given in the proof of Proposition 8.3.19. 8.3.7 Give a proof of the recurrence relation given in (1) of the proof of Proposition 8.3.21 and show that the given formula for SGq (r, n) satisfies the recurrence relation. 8.3.8 Consider polynomials over the finite field F211 . Let L = F211 . Give a numerical approximation of the following numbers. (1) The number of monic irreducible polynomials of degree 93. (2) The number of square free monic polynomials of degree 93 (3) The number of monic Goppa polynomials of degree 93 with respect to L. (4) The number of square free, monic Goppa polynomials of degree 93 with respect to L.
266
8.4
CHAPTER 8. POLYNOMIAL CODES
ReedMuller codes
The qary RS code RSk (n, 1) of length q − 1 was introduced as a cyclic code in Definition 8.1.1 and it was shown in Proposition 8.1.4 that it could also be described as the code obtained by evaluating all univariate polynomials over Fq of degree strictly smaller than k at all the nonzero elements of the finite field Fq . The extended RS codes can be considered as the code evaluating those functions at all the elements of Fq as done in 8.1.7. The multivariate generalization of the last point view is taken as the definition of ReedMuller codes and it will be shown that the shortened ReedMuller codes are certain cyclic codes. In this section, we assume n = q m . The vector space Fm q has n elements. Choose an enumerations of its point Fm q = {P1 , · · · , Pn }. Let P = (P1 , . . . , Pn ). Define the evaluation maps evP : Fq [X1 , . . . , Xm ] −→ Fnq by evP (f ) = (f (P1 ), . . . , f (Pn )) for f ∈ Fq [X1 , . . . , Xm ]. Definition 8.4.1 The qary ReedMuller code RMq (u, m) of order u in m variables is defined as RMq (u, m) = { evP (f )  f ∈ Fq [X1 , . . . , Xm ], deg(f ) ≤ u }. The dual of a ReedMuller code is again ReedMuller. Proposition 8.4.2 The dual code of RMq (u, m) is equal to RMq (u⊥ , m), where u⊥ = m(q − 1) − u − 1.
Proof.
8.4.1
Punctured ReedMuller codes as cyclic codes
The field Fqm can be viewed as an mdimensional vector space over Fq . Let β1 , · · · , βm be a basis of Fqm over Fq . Then we have an isomorphism of vector spaces ϕ : Fqm −→ Fm q such that ϕ(α) = (a1 , . . . , am ) if and only if α=
m X
ai βi
i=1
for every α ∈ Fqm . Choose a primitive element ζ of Fqm , that is a generator of F∗qm which is an element of order q m − 1. Now define the n points P = (P1 , . . . , Pn ) in Fm q by P1 = 0 and Pi = ϕ(ζ i−1 ) for i = 1, . . . , n. Pj := (a1j , a2j , . . . , am,j ),
j = 1, · · · , n.
and let α = (α1 , . . . , αn ) with αj :=
m X i=1
aij βi
j = 1, · · · , n.
8.4. REEDMULLER CODES
8.4.2
267
ReedMuller codes as subfield subcodes and trace codes
Alternant codes are restrictions of generalized RS codes, and it is shown [?, Theorem 15] that Sudan’s decoding algorithm can be applied to this situation. Following [?] we describe the qary ReedMuller code RMq (u, m) as a subfield subcode of RMqm (v, 1) for some v, and this last one is a RS code over Fqm . In this section, we assume n = q m . The vector space Fm q has n elements which m ∼ are often called points, i.e, Fm = {P , · · · , P }. Since F 1 n q q = Fq m , the elements m m of Fq exactly correspond to the points of Fq . Define the evaluation maps evP : Fq [X1 , . . . , Xm ] → Fnq
and
evα : Fqm [Y ] → Fnqm
by evP (f ) = (f (P1 ), . . . , f (Pn )) for f ∈ Fq [X1 , . . . , Xm ] and evα (g) = (g(α1 ), . . . , g(αn )) for g ∈ Fqm [Y ]. Recall that the qary ReedMuller code RMq (u, m) of order u is defined as RMq (u, m) = { evP (f )  f ∈ Fq [X1 , . . . , Xm ], deg(f ) ≤ u }. Similarly the q m ary ReedMuller code RMqm (v, 1) of order v is defined as RMqm (v, 1) = { evα (g)  g ∈ Fqm [Y ], deg(g) ≤ v }. The following proposition is form [?] and [?]. Proposition 8.4.3 Let ρ be the rest after division of u⊥ + 1 by q − 1 with quotient e, that is u⊥ + 1 = e(q − 1) + ρ, where ρ < q − 1. Define d = (ρ + 1)q e . Then d is the minimum distance of RMq (u, m). Proposition 8.4.4 Let n = q m . Let d be the minimum distance of RMq (u, m). Then RMq (u, m) is a subfield subcode of RMqm (n − d, 1). Proof. This can be shown by using the corresponding fact for the cyclic punctured codes as shown in Theorem 1 and Corollary 2 of [?]. Here we give a direct proof. 1) Consider the map of rings ϕ : Fqm [Y ] −→ Fqm [X1 , . . . , Xm ] defined by ϕ(Y ) = β1 X1 + · · · + βm Xm . Let Tr : Fqm → Fq be the trace map. This induces an Fq linear map Fqm [X1 , . . . , Xm ] −→ Fq [X1 , . . . , Xm ] that we also denote by Tr and which is defined by ! X X Tr fi X i = Tr(fi )X i i
i
268
CHAPTER 8. POLYNOMIAL CODES
im for i = (i1 , · · · , im ) ∈ where the multiindex notation is adopted X i = X1i1 · · · Xm m N0 . Define the Fq linear map
T : Fqm [Y ] −→ Fq [X1 , . . . , Xm ] by the composition T = Tr ◦ ϕ. The trace map Tr : Fnqm −→ Fnq is defined by Tr(a) = (Tr(a1 ), . . . , Tr(an )). Consider the square of maps Fqm [Y ]
T
Fq [X1 , . . . , Xm ]
evα
evP
? Fnqm
? Tr 
Fnq
. We claim that that this diagram commutes. That means that evP ◦ T = Tr ◦ evα . In order to show this it is sufficient that γY h is mapped to the same element under the two maps for all γ ∈ Fqm and h ∈ N0 , since the maps are Fq linear and the γY h generate Fqm [Y ] over Fq . Furthermore it is sufficient to show this for the evaluation maps evP : Fq [X1 , . . . , Xm ] → Fq and evα : Fqm [Y ] → Fqm for all P ∈ Fnq and elements α ∈ Fqm such that P = (a1 , a2 , . . . , am ) and Ppoints m α = i=1 ai βi . Now evP ◦ T (γY h ) = evP (Tr(γ(β1 X1 + · · · + βm Xm )h )) = !! X h i1 im evP Tr γ(β1 X1 ) · · · (βm Xm ) = i1 · · · im i1 +···+ım =h ! X h i1 i1 im im evP Tr(γβ1 · · · βm )X1 · · · Xm = i1 · · · im i1 +···+ım =h X h im i1 Tr(γβ1i1 · · · βm )a1 · · · aimm = i1 · · · im i1 +···+ım =h ! X h i1 im i1 im Tr γβ1 · · · βm a1 · · · am = i1 · · · im i1 +···+ım =h
Tr((γ(β1 a1 + · · · + βm am )h ) = Tr(γαh ) = Tr(evα (γY h )) = Tr ◦ evα (γY h ). This shows the commutativity of the diagram. 2) Let h be an integer such that 0 ≤ h ≤ q m − 1. Express h in radixq form h = h0 + h1 q + δ2 q 2 + · · · + hm−1 q m−1 .
8.4. REEDMULLER CODES
269
Define the weight of h as W (h) = h0 + δ1 + h2 + · · · + hm−1 . We show that for every f ∈ Fqm [Y ] there exists a polynomial g ∈ Fq [X1 , . . . , Xm ] such that deg(g) ≤ W (h) and evP ◦ T (γY h ) = evP (g). It is enough to show this for every f of the form f = γY h where γ ∈ Fqm and h an integer such that 0 ≤ h ≤ q m − 1. Consider ! m−1 Y P t t evP ◦ T (γY h ) = evP ◦ T (γY t ht q = evP ◦ T γ Y ht q . t=0
Expanding this expression gives Tr γ
m−1 Y
X
t=0 i1 +···+ım =ht
! ht i1 im im q t i1 γ(β1 · · · βm ) a1 · · · am . i1 · · · im
Let g = Tr γ
m−1 Y
X
t=0 i1 +···+ım =ht
! ht i1 im q t i1 im γ(β1 · · · βm ) X1 · · · Xm . i1 · · · im
Then this g has the desired properties. 3) A direct consequence of 1) and 2) is Tr(RMqm (h, 1)) ⊆ RMq (W (h), m). We defined d = (ρ + 1)q e , where ρ is the rest after division of u⊥ + 1 by q − 1 with quotient e, that is u⊥ + 1 = e(q − 1) + ρ, where ρ < q − 1. Then d − 1 is the smallest integer h such that W (h) = u⊥ + 1, see [?, Theorem 5]. Hence W (h) ≤ u⊥ for all integers h such that 0 ≤ h ≤ d − 2. Therefore Tr(RMqm (d − 2, 1)) ⊆ RMq (u⊥ , m). 4) So RMq (u, m) ⊆ (Tr(RMqm (d − 2, 1)))⊥ . 5) Let C be an Fqm linear code in Fnqm . The relation between the restriction C ∩ Fnq and the trace code Tr(C) is given by Delsarte’s theorem, see [?] and [?, chap. 7, §8 Theorem 11] C ∩ Fnq = (Tr(C ⊥ )⊥ . Application to 4) and using RMqm (n − d, 1) = RMqm (d − 2, 1)⊥ gives RMq (u, m) ⊆ RMqm (n − d, 1) ∩ Fnq . Hence RMq (u, m) is a subfield subcode of RMqm (n − d, 1). ***Alternative proof making use of the fact that RM is an extension of a restriction of a RS code, and use the duality properties of RS codes and dual(puncture)=shorten(dual)***
270
CHAPTER 8. POLYNOMIAL CODES
Example 8.4.5 The code RMq (u, m) is not necessarily the restriction of RMqm (n− d, 1). The following example shows that the punctured ReedMuller code is a proper subcode of the binary BCH code. Take q = 2, m = 6 and u = 3. Then u⊥ = 2, σ = 3 and ρ = 0. So d = 23 = 8. The code RM2∗ (3, 6) has parameters [63, 42, 7]. The binary BCH code with zeros ζ i with i ∈ {1, 2, 3, 4, 5, 6} has complete defining set the union of the sets: {1, 2, 4, 8, 16, 32}, {3, 6, 12, 24, 28, 33}, {5, 10, 20, 40, 17, 34}. So the dimension of the BCH code is: 63 − 3 · 6 = 45. Therefore the BCH code has parameters [63,45,7] and it has the punctured RM code as a subcode, but they are not equal. This is explained by the zero 9 = 1 + 23 having 2weight equal to 2 ≤ u⊥ , whereas no element of the cyclotomic coset {9, 18, 36} of 9 is in the set {1, 2, 3, 4, 5, 6}. The BCH code is the ∗ binary restriction of RM64 (56, 1). Hence RM2 (3, 6) is a subcode of the binary restriction of RM64 (56, 1), but they are not equal.
8.4.3
Exercises
8.4.1 Show the Shift bound for RM (q, m)∗ considered as cyclic code is equal to the actual minimum distance.
8.5
Notes
Subfield subcodes of RS code, McElieceSolomon. Numerous applications of ReedSolomon codes can be found in [135]. Twisted BCH codes by Edel. Folded RS codes by Guruswami. StichtenothWirtz Cauchy and Srivastava codes, RothSeroussi and D¨ ur. Proposition 8.3.19 is due to Carlitz [37]. See also [11, Exercise (3.3)]. Proposition 8.3.21 is a generalization of Retter [98].
Chapter 9
Algebraic decoding Ruud Pellikaan and XinWen Wu *** intro***
9.1
Errorcorrecting pairs
In this section we give an algebraic way, that is by solving a system of linear equations, to compute the error positions of a received word with respect to ReedSolomon codes. The complexity of this algorithm is O(n3 ).
9.1.1
Decoding by errorcorrecting pairs
In Definition 7.4.9 we introduced the star product a ∗ b for a, b ∈ Fnq by the co¨ ordinate wise multiplication a ∗ b = (a1 b1 , . . . , an bn ). Remark 9.1.1 Notice that multiplying polynomials first and than evaluating gives the same answer as first evaluating and than multiplying. That is, if f (X), g(X) ∈ Fq [X] and h(X) = f (X)g(X), then h(a) = f (a)g(a) for all a ∈ Fq . So ev(f (X)g(X)) = ev(f (X)) ∗ ev(g(X)) and eva (f (X)g(X)) = eva (f (X)) ∗ eva (g(X)) for the evaluation maps ev and eva . Proposition 9.1.2 Let k + l ≤ n. Then hGRSk (a, b) ∗ GRSl (a, c)i = GRSk+l−1 (a, b ∗ c) and hRSk (n, b) ∗ RSl (n, c)i = RSk+l−1 (n, b + c − 1) if n = q − 1. Proof. Now GRSk (a, b) = {eva (f (X))∗b  f (X) ∈ Fq [X], deg f (X) < k } and similar statements hold for GRSl (a, c) and GRSk+l−1 (a, b ∗ c). Furthermore (eva (f (X)) ∗ b) ∗ (eva (g(X)) ∗ c) = eva (f (X)g(X)) ∗ b ∗ c; 271
272
CHAPTER 9. ALGEBRAIC DECODING
and deg f (X)g(X) < k + l − 1 if deg f (X) < k and deg g(X) < l. Hence GRSk (a, b) ∗ GRSl (a, c) ⊆ GRSk+l−1 (a, b ∗ c). In general equality does not hold, but we have hGRSk (a, b) ∗ GRSl (a, c)i = GRSk+l−1 (a, b ∗ c), since on both sides the vector spaces are generated by the elements (eva (X i ) ∗ b) ∗ (eva (X j ) ∗ c) = eva (X i+j ) ∗ b ∗ c where 0 ≤ i < k and 0 ≤ j < l. Let n = q − 1. Let α be a primitive element of F∗q . Define aj = αj−1 and bj = an−b+1 for j = 1, . . . , n. Then RSk (n, b) = GRSk (a, b) by Example j 8.1.11. Similar statements hold for RSl (n, c) and RSk+l−1 (n, b + c − 1). The statement concerning the star product of RS codes is now a consequence of the corresponding statement on the GRS codes. Example 9.1.3 Let n = q − 1, k, l > 0 and k + l < n. Then RSk (n, 1) is in onetoone correspondence with polynomials of degree at most k − 1, and similar statements hold for RSl (n, 1) and RSk+l−1 (n, 1). Now RSk (n, 1) ∗ RSl (n, 1) corresponds onetoone with polynomials that are a product of a polynomial of degrees at most k − 1 and l − 1, respectively, that is to reducible polynomials over Fq of degree at most k + l − 1. There exists an irreducible polynomial of degree k + l − 1, by Remark 7.2.20. Hence RSk (n, 1) ∗ RSl (n, 1) 6= RSk+l−1 (n, 1). Definition 9.1.4 Let A and B be linear subspaces of Fnq . Let r ∈ Fnq . Define the kernel K(r) = { a ∈ A  (a ∗ b) · r = 0 for all b ∈ B}. Definition 9.1.5 Let B ∨ be the space of all linear functions β : B → Fq . Now K(r) is a subspace of A and it is the kernel of the linear map Sr : A → B ∨ defined by a 7→ βa , where βa (b) = (a ∗ b) · r. Let a1 , . . . , al and b1 , . . . , bm be bases of A and B, respectively. Then the map Sr has the m × l syndrome matrix ((bi ∗ aj ) · r1 ≤ j ≤ l, 1 ≤ i ≤ m) with respect to these bases. Example 9.1.6 Let A = RSt+1 (n, 1), B = RSt (n, 0). Then A ∗ B is contained in RS2t (n, 0) by Proposition 9.1.2. Let C = RS2t (n, 1). Then C ⊥ = RS2t (n, 0) by Proposition 8.1.2. As gn,k (X) = g0,k (X) for n = q − 1, by the definition of ReedSolomon code, we further have C ⊥ = RS2t (n, 0). Hence A ∗ B ⊆ C ⊥ . Let ai = ev(X i−1 ) for i = 1, . . . , t + 1, and bj = ev(X j ) for j = 1, . . . , t, and hl = ev(X l ) for l = 1, . . . , 2t. Then a1 , . . . , at+1 is a basis of A and b1 , . . . , bt is a basis of B. The vectors h1 , . . . , h2t form the rows of a parity check matrix H for C. Then ai ∗ bj = ev(X i+j−1 ) = hi+j−1 . Let r be a received word and s = rH T its syndrome. Then (bi ∗ aj ) · r = si+j−1 .
9.1. ERRORCORRECTING PAIRS
273
Hence to compute the kernel K(r) we matrix of syndromes s1 s2 ··· s2 s3 ··· .. .. .. . . . st
st+1
···
have to compute the null space of the st st+1 .. .
st+1 st+2 .. .
s2t−1
s2t
.
We have seen this matrix before as the coefficient matrix of the set of equations for the computation of the errorlocator polynomial in the algorithm of APGZ 7.5.3. Lemma 9.1.7 Let C be an Fq linear code of length n. Let r be a received word with error vector e. If A ∗ B ⊆ C ⊥ , then K(r) = K(e). Proof. We have that r = c + e for some codeword c ∈ C. Now a ∗ b is a parity check for C, since A ∗ B ⊆ C ⊥ . So (a ∗ b) · c = 0, and hence (a ∗ b) · r = (a ∗ b) · e for all a ∈ A and b ∈ B. Let J be a subset of {1, . . . n}. The subspace A(J) = { a ∈ A  aj = 0 for all j ∈ J }. was defined in 4.4.10. Lemma 9.1.8 Let A∗B ⊆ C ⊥ . Let e be the error vector of the received word r. If I = supp(e) = { i  ei 6= 0 }, then A(I) ⊆ K(r). If moreover d(B ⊥ ) > wt(e), then A(I) = K(r). Proof. 1) Let a ∈ A(I). Then ai = 0 for all i such that ei 6= 0, and therefore X (a ∗ b) · e = ai bi ei = 0 ei 6=0
for all b ∈ B. So a ∈ K(e). But K(e) = K(r) by Lemma 9.1.7. Hence a ∈ K(r). Therefore A(I) ⊆ K(r). 2) Suppose moreover that d(B ⊥ ) > wt(e). Let a ∈ K(r), then a ∈ K(e) by Lemma 9.1.7. Hence (e ∗ a) · b = e · (a ∗ b) = 0 for all b ∈ B, giving e ∗ a ∈ B ⊥ . Now wt(e ∗ a) ≤ wt(e) < d(B ⊥ ) So e ∗ a = 0 meaning that ei ai = 0 for all i. Hence ai = 0 for all i such that ei 6= 0, that is for all i ∈ I = supp(e). Hence a ∈ A(I). Therefore K(r) ⊆ A(I) and equality holds by (1). Remark 9.1.9 Let I = supp(e) be the set of error positions. The set of zero coordinates of a ∈ A(I) contains the set of error positions by Lemma 9.1.8. For that reason the elements of A(I) are called errorlocator vectors or functions. But the space A(I) is not known to the receiver. The space K(r) can be computed after receiving the word r. The equality A(I) = K(r) implies that all elements of K(r) are errorlocator functions. Let A ∗ B ⊆ C ⊥ . The basic algorithm for the code C computes the kernel K(r)
274
CHAPTER 9. ALGEBRAIC DECODING
for every received word r. If this kernel is nonzero, it takes a nonzero element a and determines the set J of zero positions of a. If d(B ⊥ ) > wt(e), where e is the errorvector, then J contains the support of e by Lemma 9.1.8. If the set J is not too large, the error values are computed. Thus we have a basic algorithm for every pair (A, B) of subspaces of Fnq such that A ∗ B ⊆ C ⊥ . If A is small with respect to the number of errors, then K(r) = 0. If A is large, then B becomes small, which results in a large code B ⊥ , and it will be difficult to meet the requirement d(B ⊥ ) > wt(e). Definition 9.1.10 Let A, B and C be subspaces of Fnq . Then (A, B) is called a terrorcorrecting pair for C if the following conditions are satisfied: 1. A ∗ B ⊆ C ⊥ , 2. dim(A) > t, 3. d(B ⊥ ) > t, 4. d(A) + d(C) > n Proposition 9.1.11 Let (A, B) be a terrorcorrecting pair for C. Then the basic algorithm corrects t errors for the code C with complexity O(n3 ). Proof. The pair (A, B) is a terrorcorrecting for C, so A ∗ B ⊆ C ⊥ and the basic algorithm can be applied to decode C. If a received word r has at most t errors, then the error vector e with support I has size at most t and A(I) is not zero, since I imposes at most t linear conditions on A and the dimension of A is at least t + 1. Let a be a nonzero element of K(r). Let J = {j  aj = 0}. We assumed that d(B ⊥ ) > t. So K(r) = A(I) by Lemma 9.1.8. So a is an errorlocator vector and J contains I. The weight of the vector a is at least d(A), so a has at most n − d(A) < d(C) zeros by (4) of Definition 9.1.10. Hence J < d(C) and Proposition 6.2.9 or 6.2.15 gives the error values. The complexity is that of solving systems of linear equations, that is O(n3 ). We will show the existence of errorcorrecting pairs for (generalized) ReedSolomon codes. Proposition 9.1.12 The codes GRSn−2t (a, b) and RSn−2t (n, b) have terrorcorrecting pairs. Proof. Let C = GRSn−2t (a, b). Then C ⊥ = GRS2t (a, c) for some c by Proposition 8.1.21. Let A = GRSt+1 (a, 1) and B = GRSt (a, c). Then A ∗ B ⊆ C ⊥ by Proposition 9.1.2. The codes A, B and C have parameters [n, t + 1, n − t], [n, t, n − t + 1] and [n, n − 2t, 2t + 1], respectively, by Proposition 8.1.14. Furthermore B ⊥ has parameters [n, n − t, t + 1] by Corollary 3.2.7, and has has minimum distance t + 1. Hence (A, B) is a terrorcorrecting pair for C. The code RSn−2t (n, b) is of the form GRSn−2t (a, b). Therefore the pair of codes (RSt+1 (n, 1), RSt (n, n − b + 1)) is a terrorcorrecting pair for the code RSn−2t (n, b).
9.1. ERRORCORRECTING PAIRS
275
Example 9.1.13 Choose α ∈ F16 such that α4 = α + 1 as primitive element of F16 . Let C = RS11 (15, 1). Let r = (0, α4 , α8 , α14 , α1 , α10 , α7 , α9 , α2 , α13 , α5 , α12 , α11 , α6 , α3 ) be a received word with respect to the code C with 2 errors. We show how to find the transmitted codeword by means of the basic algorithm. The dual of C is equal to RS4 (15, 0). Hence RS3 (15, 1) ∗ RS2 (15, 0) is contained in RS4 (15, 0). Take A = RS3 (15, 1) and B = RS2 (15, 0). Then A is a [15, 3, 13] code, and the dual of B is RS13 (15, 1) which has minimum distance 3. Therefore (A, B) is a 2errorcorrecting pair for C by Proposition 9.1.12. Let H = (αij  1 ≤ i ≤ 4, 0 ≤ j ≤ 14 ). Then H is a parity check matrix of C. The syndrome vector of r equals (s1 , s2 , s3 , s4 ) = rH T = (α10 , 1, 1, α10 ). The space K(r) consists of the evaluation ev(a0 +a1 X +a2 X 2 ) of all polynomials a0 + a1 X + a2 X 2 such that (a0 , a1 , a2 )T is in the null space of the matrix 10 s1 s2 s3 α 1 1 1 0 1 = ∼ . s2 s3 s4 1 1 α10 0 1 α5 So K(r) = hev(1 + α5 X + X 2 )i. The polynomial 1 + α5 X + X 2 has α6 and α9 as zeros. Hence the error positions are at the 7th and 10th coordinate. In order to compute the error values by Proposition 6.2.9 we have to find a linear combination of the 7th and 10th column of H that equals the syndrome vector. The system 6 α α9 α10 α12 α3 1 3 12 α α 1 α9 α6 α10 has (α5 , α5 )T as unique solution. That is, the error vector e has e7 = α5 , e10 = α5 and ei = 0 for all i 6∈ {7, 10}. Therefore the transmitted codeword is c = r − e = (0, α4 , α8 , α14 , α1 , α10 , α13 , α9 , α2 , α7 , α5 , α13 , α11 , α6 , α7 ).
9.1.2
Existence of errorcorrecting pairs
Example 9.1.14 Let C be the binary cyclic code with defining set {1, 3, 7, 9} as in Examples 7.4.8 and 7.4.17. Then d(C) ≥ 7 by the Roos bound 7.4.16 with U = {0, 4, 12, 20} and V = {2, 3, 4}. ***This gives us an error correcting pair*** Remark 9.1.15 The great similarity between the concept of an errorcorrecting pair and the techniques used by Van Lint and Wilson in the AB bound one can see in the reformulation of the Roos bound in Remark 7.4.25. A special case of this reformulation is obtained if we take a = b = t. Proposition 9.1.16 Let C be an Fq linear code of length n. Let (A, B) be a pair of Fqm linear codes of length n such that the following properties hold: (1) (A ∗ B) ⊥ C, (2) k(A) > t, (3) d(B ⊥ ) > t, (4) d(A) + 2t > n and (5) d(A⊥ ) > 1. Then d(C) ≥ 2t + 1 and (A, B) is a terrorcorrecting pair for C.
276
CHAPTER 9. ALGEBRAIC DECODING
Proof. The conclusion on the minimum distance of C is explained in Remark 7.4.25. Conditions (1), (2) and (3) are the same as in the ones in the definition of a terrorcorrecting pair. Condition (4) in the proposition is stronger than in the definition, since d(A) + d(C) ≥ d(A) + 2t + 1 > d(A) + 2t > n. Remark 9.1.17 As a consequence of this proposition there is an abundance of examples of codes C with minimum distance at least 2t + 1 that have a terrorcorrecting pair. Take for instance A and B MDS codes with parameters [n, t + 1, n − t] and [n, t, n − t + 1], respectively. Then k(A) > t and d(B ⊥ ) > t, since B ⊥ is an [n, n − t, t + 1] code. Take C = (A ∗ B)⊥ . Then d(C) ≥ 2t + 1 and (A, B) is a terrorcorrecting pair for C. Then the dimension of C is at least n − t(t + 1) and is most of the time equal to this lower bound. Remark 9.1.18 For a given code C it is hard to find a terrorcorrecting pair with t close to half the minimum distance. Generalized ReedSolomon codes have this property as we have seen in ?? and Algebraic geometry codes too as we shall see in **** ??***. We conjecture that if an [n, n − 2t, 2t + 1] MDS code has a terrorcorrecting pair, then this code is a GRS code. This is proven in the cases t = 1 and t = 2. Proposition 9.1.19 Let C be an Fq linear code of length n and minimum distance d. Then C has a terrorcorrecting pair if t ≤ (n − 1)/(n − d + 2). Proof. There exists an m and an Fqm linear [n, n − d + 1, d] code D that contains C, by Corollary 4.3.25. Let t be a positive integer such that t ≤ (n − 1)/(n − d + 2). It is sufficient to show that D has a terrorcorrecting pair. Let B be an [n, t, n − t + 1] code with the all one vector in it. Such a code exists if m is sufficiently large. Then B ⊥ is an [n, n − t, t + 1] code. So d(B ⊥ ) > t. Take A = (B ∗ D)⊥ . *** Now A is contained in D⊥ , since the all one vector is in B, and D⊥ is an [n, d − 1, n − d + 2] code. So d(A) ≥ n − d + 2. *** Now D⊥ ⊆ A, since the all one vector is in B. We have that D⊥ is an [n, d − 1, n − d + 2] code, so d(A) ≥ d(D⊥ ) = n − d + 2. Hence d(A) + d(D) > n. Let b1 , . . . , bt be a basis of B and d1 , . . . , dn−d+1 be a basis of D. Then x ∈ A if an only if x · (bi ∗ dj ) = 0 for all i = 1, . . . , t and j = 1, . . . , n − d + 1. This is system of t(n − d + 1) homogeneous linear equations and n − t(n − d + 1) ≥ t + 1 by assumption. Hence k(A) ≥ n − t(n − d + 1) > t. Therefore (A, B) is a terrorcorrecting pair for D and a fortiori for C.
9.1.3
Exercises
9.1.1 Choose α ∈ F16 such that α4 = α + 1 as primitive element of F16 . Let C = RS11 (15, 0). Let r = (α, 0, α11 , α10 , α5 , α13 , α, α8 , α5 , α10 , α4 , α4 , α2 , 0, 0) be a received word with respect to the code C with 2 errors. Find the transmitted codeword. 9.1.2 Consider the binary cyclic code of length 21 and defining set {0, 1, 3, 7}. This code has minimum distance 8. Give a 3 error correcting pair for this code. 9.1.3 Consider the binary cyclic code of length 35 and defining set {1, 5, 7}. This code has minimum distance 7. Give a 3 error correcting pair for this code.
9.2. DECODING BY KEY EQUATION
9.2
277
Decoding by key equation
In Section 7.5.5, we introduced Key equation. Now we introduce two algorithms which solve the key equation, and thus decode cyclic codes efficiently.
9.2.1
Algorithm of EuclidSugiyama
In Section 7.5.5 we have seen that the decoding of BCH code with designed minimum distance δ is reduced to the problem of finding a pair of polynomials (σ(Z), ω(Z)) satisfying the following key equation for a given syndrome polyPδ−1 nomial S(Z) = i=1 Si Z i−1 , σ(Z)S(Z) ≡ ω(Z)
(mod Z δ−1 )
such that deg(σ(Z)) ≤ t = (δ − 1)/2 and deg(ω(Z)) ≤ deg(σ(Z)) − 1. Here, Pt Pt+1 σ(Z) = i=1 σi Z i−1 is the errorlocator polynomial, and ω(Z) = i=1 ωi Z i−1 is the errorevaluator polynomial. Note that σ1 = 1 by definition. Given the key equation, the EuclidSugiyama algorithm (which is also called Sugiyama algorithm in the literature) finds the errorlocator and errorevaluator polynomials, by an iterative procedure. This algorithm is based on the wellknown Euclidean algorithm. To better understand the algorithm, we briefly review the Euclidean algorithm first. For a pair of univariate polynomials, namely, r−1 (Z) and r0 (Z), the Euclidean algorithm finds their greatest common divisor, which we denote by gcd(r−1 (Z), r0 (Z)). The Euclidean algorithm proceeds as follows. r−1 (Z) = q1 (Z)r0 (Z) r0 (Z) = q2 (Z)r1 (Z) .. . rs−2 (Z) = rs−1 (Z) =
qs (Z)rs−1 (Z) qs+1 (Z)rs (Z).
+ r1 (Z), + r2 (Z), .. .
deg(r1 (Z)) < deg(r0 (Z)) deg(r2 (Z)) < deg(r1 (Z)) .. .
+ rs (Z),
deg(rs (Z))
< deg(rs−1 (Z))
In each iteration of the algorithm, the operation of rj−2 (Z) = qj (Z)rj−1 (Z) + rj (Z), with deg(rj (Z)) < deg(rj−1 (Z)), is implemented by division of polynomials, that is, dividing rj−2 (Z) by rj−1 (Z), with rj (Z) being the remainder. The algorithm keeps running, until it finds a remainder which is the zero polynomial. That is, the algorithm stops after it completes the siteration, where s is the smallest j such that rj+1 (Z) = 0. It is easy to prove that rs (Z) = gcd(r−1 (Z), r0 (Z)). We are now ready to present the EuclidSugiyama algorithm for solving the key equation. Algorithm 9.2.1 (EuclidSugiyama Algorithm) Input: r−1 (Z) = Z δ−1 , r0 (Z) = S(Z), U−1 (Z) = 0, and U0 (Z) = 1. Proceed with the Euclidean algorithm for r−1 (Z) and r0 (Z), as presented above, until an rs (Z) is reached such that deg(rs−1 (Z)) ≥
1 (δ − 1) 2
and
deg(rs (Z)) ≤
1 (δ − 3), 2
278
CHAPTER 9. ALGEBRAIC DECODING
Update Uj (Z) = qj (Z)Uj−1 (Z) + Uj−2 (Z). Output: The following pair of polynomials: σ(Z)
=
Us (Z)
ω(Z)
=
(−1)s rs (Z)
where is chosen such that σ0 = σ(0) = 1. Then the errorlocator and evaluator polynomials are given as σ(Z) = Us (Z) and ω(Z) = (−1)s rs (Z). Note that the EuclidSugiyama algorithm does not have to run the Euclidean algorithm completely; it has a different stopping parameter s. Example 9.2.2 Consider the code C given in Examples 7.5.13 and 7.5.21. It is a narrow sense BCH code of length 15 over F16 of designed minimum distance δ = 5. Let r be the received word r = (α5 , α8 , α11 , α10 , α10 , α7 , α12 , α11 , 1, α, α12 , α14 , α12 , α2 , 0) Then S1 = α12 , S2 = α7 , S3 = 0 and S4 = α2 . So, S(Z) = α12 + α7 Z + α2 Z 3 . Running the EuclidSugiyama algorithm with the input S(Z), the results for each iteration are given by the following table.
j
rj−1 (Z)
rj (Z)
Uj−1 (Z)
Uj (Z)
0
Z4
α2 Z 3 + α7 Z + α12
0
1
1
α2 Z 3 + α7 Z + α12
α5 Z 2 + α10 Z
1
α13 Z
2
α5 Z 2 + α10 Z
α2 Z + α12
α13 Z
α10 Z 2 + Z + 1
Thus, we have found the errorlocator polynomial as σ(Z) = U2 (Z) = 1 + Z + α12 Z 2 , and the errorevaluator polynomial as ω(Z) = r2 (Z) = α12 + α2 Z.
9.2.2
Algorithm of BerlekampMassey
Consider again the following key equation σ(Z)S(Z) ≡ ω(Z)
(mod Z δ−1 )
such that deg(σ(Z)) ≤ t = (δ − 1)/2 and deg(ω(Z)) ≤ deg(σ(Z)) − 1; and Pδ−1 S(Z) = i=1 Si Z i−1 is given. It is easy to show that the problem of solving the key equation is equivalent to the problem of solving the following matrix equation with unknown (σ2 , . . . , σt+1 )T
9.2. DECODING BY KEY EQUATION
St St+1 .. .
St−1 St .. .
S2t−1
S2t−2
279
σ2 . . . S1 σ3 . . . S2 .. .. . . σt+1 . . . St
St+1 St+2 .. .
= −
.
S2t
The BerlekampMassey algorithm which we will introduce in this section can solve this matrix equation by finding σ2 , . . . , σt+1 for the following recursion Si = −
t+1 X
σi Si−j+1 ,
i = t + 1, . . . , 2t.
j=2
We should point out that the BerlekampMassey algorithm actually solves a more general problem, that is, for a given sequence E0 , E1 , E2 , . . . , EN −1 of length N (which we denote by E in the rest of the section), it finds the recursion Ei = −
L X
Λj Ei−j ,
i = L, . . . , N − 1,
j=1
for which L is smallest. If the matrix equation has no solution, the BerlekampMassey algorithm then finds a recursion with L > t. To make it more convenient to present the P BerlekampMassey algorithm and to L prove the correctness, we denote Λ(Z) = i=0 Λi Z i with Λ0 = 1. The above recursion is denoted by (Λ(Z), L), and L = deg(Λ(Z)) is called the length of the recursion. The BerlekampMassey algorithm is an iterative procedure for finding the shortest recursion for producing successive terms of the sequence E. The rth iteration of the algorithm finds the shortest recursion (Λ(r) (Z), Lr ) where Lr = deg(Λ(r) (X)), for producing the first r terms of the sequence E, that is, Ei = −
Lr X
(r)
Λj Ei−j ,
i = Lr , . . . , r − 1,
j=1
or equivalently, Lr X
(r)
Λj Ei−j = 0,
i = Lr , . . . , r − 1,
j=0 (r)
with Λ0 = 0. Algorithm 9.2.3 (BerlekampMassey Algorithm) (Initialization) r = 0, Λ(Z) = B(Z) = 1, L = 0, λ = 1, and b = 1. 1) If r = N , stop. Otherwise, compute ∆ = 2) If ∆ = 0, then λ ← λ + 1, and go to 5).
PL
j=0
Λj Er−j
280
CHAPTER 9. ALGEBRAIC DECODING 3) If ∆ 6= 0 and 2L > r, then Λ(Z) ← Λ(Z) − ∆b−1 Z λ B(Z) λ←λ+1 and go to 5). 4) If ∆ 6= 0 and 2L ≤ r, then T (Z) ← Λ(Z) (temporary storage of Λ(Z)) Λ(Z) ← Λ(Z) − ∆b−1 Z λ B(Z) L←r+1−L B(Z) ← T (Z) b←∆ λ←1 5) r ← r + 1 and return to 1).
Example 9.2.4 Consider again the code C given in Example 9.2.2. Let r be the received word r = (α5 , α8 , α11 , α10 , α10 , α7 , α12 , α11 , 1, α, α12 , α14 , α12 , α2 , 0) Then S1 = α12 , S2 = α7 , S3 = 0 and S4 = α2 . Now let us compute the errorlocator polynomial σ(Z) by using the BerlekampMassey algorithm. Letting Ei = Si+1 , for i = 0, 1, 2, 3, we have a sequence E = {E0 , E1 , E2 , E3 } = {α12 , α7 , 0, α2 }, as the input of the algorithm. The intermediate and final results of the algorithm are given in the following table. r
∆
B(Z)
Λ(Z)
L
0
α12
1
1
0
1
1
1
1 + α12 Z
1
2
α2
1
1 + α10 Z
1
3
α
1 + α10 Z
1 + α10 Z + α5 Z 2
2
4
0
1 + α10 Z
1 + Z + α10 Z 2
2
The result of the last iteration the BerlekampMassey algorithm, Λ(Z), is the errorlocator polynomial. That is σ(Z) = σ1 + σ2 Z + σ2 Z 2 = Λ(Z) = Λ0 + λ1 Z + Λ2 Z 2 = 1 + Z + α10 Z 2 . Substituting this into the key equation, we then get ω(Z) = α12 + α2 Z. Proof of the correctness: will be done. Complexity and some comparison between ES and BM algorithms: will be done.
9.3. LIST DECODING BY SUDAN’S ALGORITHM
9.2.3
281
Exercises
9.2.1 Take α ∈ F∗16 with α4 = 1 + α as primitive element. Let C be the BCH code over F16 , of length 15 and designed minimum distance 5, with defining set {1, 2, 3, 4, 6, 8, 9, 12}. The generator polynomial is 1 + X 4 + X 6 + X 7 + X 8 (see Example 7.3.13). Let r = (0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0) be a received word with respect to the code C. Find the syndrome polynomial S(Z). Write the key equation. 9.2.2 Consider the same code and same received word given in last exercise. Using the BerlekampMassey algorithm, compute the errorlocator polynomial. Determine the number of errors occurred in the received word. 9.2.3 For the same code and same received word given the previous exercises, using the EuclidSugiyama algorithm, compute the errorlocator and errorevaluator polynomials. Find the codeword which is closest to the received word. 9.2.4 Let α ∈ F∗16 with α4 = 1 + α as in Exercise 9.2.1. For the following sequence E over F16 , using the BerlekampMassey algorithm, find the shortest recursion for producing successive terms of E: E = {α12 , 1, α14 , α13 , 1, α11 }. 9.2.5 Consider the [15, 9, 7] ReedSolomon code over F16 with defining set {1, 2, 3, 4, 5, 6}. Suppose the received word is r = (0, 0, α11 , 0, 0, α5 , 0, α, 0, 0, 0, 0, 0, 0, 0). Using the BerlekampMassey algorithm, find the codeword which is closest to the received word.
9.3
List decoding by Sudan’s algorithm
A decoding algorithm is efficient if the complexity is bounded above by a polynomial in the code length. Bruteforce decoding is not efficient, because for a received word, it may need to compare q k codewords to return the most appropriate codeword. The idea behind list decoding is that instead of returning a unique codeword, the list decoder returns a small list of codewords. A listdecoding algorithm is efficient, if both the complexity and the size of the output list of the algorithm are bounded above by polynomials in the code length. List decoding was first introduced by Elias and Wozencraft in 1950’s. We now describe a list decoder more precisely. Suppose C is a qary [n, k, d] code, t ≤ n is a positive integer. For any received word r = (r1 , · · · , rn ) ∈ Fnq , we refer to any codeword c in C satisfying d(c, r) ≤ t as a tconsistent codeword. Let l be a positive integer less than or equal to q k . The code C is called (t, l)decodable, if for any word r ∈ Fnq the number of tconsistent codewords is at most l. If for any received word, a list decoder can find all the tconsistent codewords, and the output list has at most l codewords, then the decoder is called a (t, l)list decoder. In 1997 for the first time, Sudan proposed an efficient listdecoding algorithm for ReedSolomon codes. Later, Sudan’s listdecoding algorithm was generalized to decoding algebraicgeometric codes and ReedMuller codes.
282
9.3.1
CHAPTER 9. ALGEBRAIC DECODING
Errorcorrecting capacity
Suppose a decoding algorithm can find all the tconsistent codewords for any received word. We call t the errorcorrecting capacity or decoding radius of the decoding algorithm. As we have known in Section ??, for any [n, k, d] code, if t ≤ d−1 , then there is only one tconsistent codeword for any received word. In 2 , 1 decodable. The decoding algorithms other words, any [n, k, d] code is d−1 2 in the previous sections return a unique codeword for any received and word; . The they achieve an errorcorrecting capability less than or equal to d−1 2 list decoding achieves a decoding radius greater than d−1 ; and the size of the 2 output list must be bounded above by a polynomial in n. It is natural to ask the following question: For a [n, k, d] linear code C over Fq , what is the maximal value t, such that C is (t, l)decodable for a l which is bounded above by a polynomial in n? In the following, we give a lower bound on the maximum t such that C is (t, l)decodable, which is called Johnson bound in the literature. Proposition 9.3.1 Let C ⊆ Fnq be any linear code of minimum distance d = (1 − 1/q)(1 − β)n for 0 < β < 1. Let t = (1 − 1/q)(1 − γ)n for 0 < γ < 1. Then for any word r ∈ Fnq , √ min n(q − 1), (1 − β)/(γ 2 − β) , when γ > √β Bt (r) ∩ C ≤ 2n(q − 1) − 1, when γ = β where, Bt (r) = {x ∈ Fnq  d(x, r) ≤ t} is the Hamming ball of radius t around r. We will prove this proposition later. We are now ready to state the Johnson bound. Theorem 9.3.2 For any linear code C ⊆ Fnq of relative minimum distance δ = d/n, it is (t, l(n))decodable with l(n) bounded above by a linear function in n, provided that r t 1 q ≤ 1− 1− 1− δ . n q q−1 Proof. For any received word r ∈ Fnq , the set of tconsistent codewords {c ∈ C  d(c, r) ≤ t} = Bt (r) ∩ C. Let β be a positive real number and β < 1. Denote d = (1 − 1/q) (1 − β)n. Let t = (1 − 1/q) (1 − γ)n for some 0 < γ < 1. Suppose r t 1 q ≤ 1− δ . 1− 1− n q q−1 q √ q Then, γ ≥ 1 − q−1 · nd = β. By Proposition 9.3.1, the number of tconsistent codewords, l(n), which is Bt (r) ∩ C, is bounded above by a polynomial in n, here q is viewed as a constant. d−1 Remark 9.3.3 The classical errorcorrecting capability is t = . For a 2 linear [n, k] code of minimum distance d, we have d ≤ n − k + 1 (Note that for ReedSolomon codes, d = n − k + 1). Thus, the normalized capability n−k 1 1 t 1 ≈ − κ τ= ≤ · n n 2 2 2
9.3. LIST DECODING BY SUDAN’S ALGORITHM
283
where κ is the code rate. Let us compare this with the Johnson bound. From Theorem 9.3.2 and d ≤ n − k + 1, the Johnson bound is q q 1 − 1 − q−1 1 − 1q δ q q 1 − 1 − q−1 (1 − nk + n1 ) ≤ 1 − 1q √ ≈1− κ for large n and large q. A comparison is given by the following Figure 9.1.
Figure 9.1: Classical errorcorrecting capability v.s. the Johnson bound.
To prove Proposition 9.3.1, we need the following lemma. Lemma 9.3.4 Let v1 , . . . , vm be m nonzero vectors in the real N dimensional space, RN , satisfying that the inner product, vi ·vj ≤ 0, for every pair of distinct vectors. Then we have the following upper bounds on m: (1) m ≤ 2N . (2) If there exists a nonzero vector u ∈ RN such that u · vi ≥ 0 for all i = 1, . . . , m. Then m ≤ 2N − 1. (3) If there exists an u ∈ RN such that u · vi > 0 for all i = 1, . . . , m. Then m ≤ N. Proof. It is clear that (1) follows from (2). Suppose (2) is true. By viewing −v1 as u, the conditions of (2) are all satisfied. Thus, we have m − 1 ≤ 2N − 1, that is, m ≤ 2N .
284
CHAPTER 9. ALGEBRAIC DECODING
To prove (2), we will use induction on N . When N = 1, it is obvious that m ≤ 2N − 1 = 1. Otherwise, by the conditions, there are nonzero real numbers u, v1 and v2 such that u·v1 > 0 and u·v2 > 0, but v1 ·v2 < 0. This is impossible. Now considering N > 1, we can assume that m ≥ N + 1 (because if m ≤ N , the result m ≤ 2N − 1 already holds). As the vectors v1 , . . . , vm are all in RN , they must be linearly dependant. Let S ⊆ {1, P 2, . . . , m} be a nonempty set of minimum size for which there is a relation i∈S ai v1 = 0 with all ai 6= 0. We claim that the ai s must all be positive or all be negative. In fact, if not, we collect the terms with positive ai s on onePside and the terms P with negative ai s on the other. We then get an equation i∈S + ai vi = j∈S − bj vj (which we denote by w) with ai and bj all are positive, where S + and S − are disjoint nonempty sets and S + ∪ S − = S. By the minimalityP of S, w 6= 0. Thus, the inner P product w · w = 6 0. On the other hand, w · w = ( a v ) · ( i∈S + i i j∈S − bj vj ) = P (a b )(v · v ) ≤ 0 since a b > 0 and v · v ≤ 0. This contradiction shows i j i j i j i,j i j that ai s must all be positive or all be negative. Following this, we actually can assume that ai > 0 for all i ∈ S (otherwise, we can take a0i = −ai for a relation P 0 i∈S ai v1 = 0). Without lossPof generality, we assume that S = {1, 2, . . . , s}. By the linear s dependance i=1 ai vi = 0 with each ai > 0 and minimality of S, the vectors v1 , . . . , vs must span aPsubspace V of RN ofP dimension s − 1. Now, for l = s s s + 1, . . . , m, we have i=1 ai vi · vl = 0, as i=1 ai vi = 0. Since ai > 0 for 1 ≤ i ≤ s and all vi · vl ≤ 0, we have that vi is orthogonal to vl for all i, l with 1 ≤ i ≤ s and s < l ≤ m. Similarly, we can prove that u is orthogonal to vi for i = 1, . . . , s. Therefore, the vectors vs+1 , . . . , vm and u are all in the dual space V ⊥ which has dimension (N − s + 1). As s > 1, applying the induction hypothesis to these vectors, we have m − s ≤ 2(N − s + 1) − 1. Thus, we have m ≤ 2N − s + 1 ≤ 2N − 1. Now we prove (3). Suppose the result is not true, that is, m ≥ N + 1. As above, v1 , . . . , vm must be linearly dependant RN . Let S ⊆ {1, P 2, . . . , m} be a nonempty set of minimum size for which there is a relation i∈S ai v1 = 0 with all Psai 6= 0. Again, we can assume that ai > 0 for all ai ∈ S. From this, we have i=1 ai u · vi = 0. But this is impossible since for each i we have ai > 0 and u · vi > 0. This contradiction shows m ≤ N . Now we are ready to prove Proposition 9.3.1. Proof of Proposition 9.3.1. We identify vectors in Fnq with vectors in Rqn in the following way: First, we set an ordering for the elements of Fq , and denote the elements as α1 , α2 , . . . , αq . Denote by ord(β) the order of element β ∈ Fq under this ordering. For example, ord(β) = i, if β = αi Then, each element αi (1 ≤ i ≤ q) corresponds to the real unit vector of length q with a 1 in position i and 0 elsewhere. Without loss of generality, we assume that r = (αq , αq , . . . , αq ). Denote by c1 , c2 , . . . , cm all the codewords of C that are in the Hamming ball Bt (r) where t = (1 − 1/q)(1 − γ)n for some 0 < γ < 1. We view each vector in Rqn as having n blocks each having q components, where the n blocks correspond to the n positions of the vectors in Fnq . For each l = 1, . . . , q, denote by el the unit vector of length q with 1 in the lth position and 0 elsewhere. For i = 1, . . . , m, the vector in Rqn associated with
9.3. LIST DECODING BY SUDAN’S ALGORITHM
285
the codeword ci , which we denote by di , has in its jth block the components of the vector eord(ci [j]) , where ci [j] is the jth component of ci . The vector in Rqn associated with the word r ∈ Fnq , which we denote by s, is defined similarly. Let 1 ∈ Rqn be the all 1’s vector. We define v = λs + (1−λ) q 1 for 0 ≤ λ ≤ 1 that will be specified later. We observe that di and v are all in the space q P xj,l = 1} defined by the intersection of the hyperplanes Pj0 = {x ∈ Rqn  l=1
for j = 1, . . . , n. This fact implies that the vectors (di − v), for i = 1, . . . , m, q Tn P are all in P = j=1 Pj where Pj = {x ∈ Rqn  xj,l = 0}. As P is an l=1
n(q − 1)dimensional subspace of Rqn , we have that the vectors (d1 − v), for i = 1, . . . , m, are all in an n(q − 1)dimensional space. We will set the parameter λ so that the n(q − 1)dimensional vectors (di − v), i = 1, . . . , m, have all pairwise inner products less than 0. For i = 1, . . . , m, let ti = d(ci , r). Then ti ≤ t for every i, and di · v
= λ(di · s) + (1−λ) q (di · 1) = λ(n − ti ) + (1 − λ) nq ≥ λ(n − t) + (1 − λ) nq ,
v · v = λ2 n + 2(1 − λ)λ
n n 1 n + (1 − λ)2 = + λ2 (1 − )n, q q q q
di · dj = n − d(ci , cj ) ≤ n − d,
(9.1)
(9.2) (9.3)
which implies that for i 6= j, 1 (di − v) · (dj − v) ≤ 2λt − d + (1 − )(1 − λ)2 n. q
(9.4)
Substituting t = (1 − 1/q)(1 − γ)n and d = (1 − 1/q)(1 − β)n into the above inequation, we have 1 (di − v) · (dj − v) ≤ (1 − )n(β + λ2 − 2λγ). q
(9.5)
Thus, if γ > 21 ( βλ + λ), we will have all pairwise inner products to be negative. √ √ We pick λ to minimize ( βλ + λ) by setting λ = β. Now when γ > β, we have (di − v) · (dj − v) < 0 for i 6= j.
9.3.2
Sudan’s algorithm
The algorithm of Sudan is applicable to ReedSolomon codes, ReedMuller codes, algebraicgeometric codes, and some other families of codes. In this subsection, we give a general description of the algorithm of Sudan. Consider the following linear code C = {(f (P1 ), f (P2 ), · · · , f (Pn ))  f ∈ Fq [X1 , . . . , Xm ] and deg(f ) < k} , m where Pi = (xi1 , . . . , xim ) ∈ Fm q for i = 1, . . . , n, and n ≤ q . Note that when m = 1, the code is a ReedSolomon code or an extended ReedSolomon code; when m ≥ 2, it is a ReedMuller code.
286
CHAPTER 9. ALGEBRAIC DECODING
In the following algorithm and discussions, to simplify the statement we denote im i (i1 , . . . , im ) by i, X1i1 · · · Xm by X , H(X1 + x1 , . . . , Xm + xm , Y + y) by H(X + j1 jm j x, Y + y), i1 · · · im by i , and so on. Algorithm 9.3.5 (The Algorithm of Sudan for List Decoding)
INPUT: The following parameters and a received word: • Code length n and the integer k; m • n points in Fm q , namely, Pi := (xi1 , . . . , xim ) ∈ Fq , i = 1, . . . , n;
• Received word r = (y1 , . . . , yn ) ∈ Fnq ; • Desired errorcorrecting radius t.
Step 0: Compute parameters r, s satisfying certain conditions that we will give for specific families of codes in the following subsections.
Step 1: Find a nonzero polynomial H(X, Y ) = H(X1 , . . . , Xm , Y ) such that • The (1, . . . , 1, k−1)weighted degree of H(X1 , . . . , Xm , Y ) is at most s; , • For i = 1, . . . , n, each point, (xi , yi ) = (xi1 , . . . , xim , yi ) ∈ Fm+1 q is a zero of H(X, Y ) of multiplicity r.
Step 2: Find all the Y roots of H(X, Y ) of degree less than k, namely, f = f (X1 , . . . , Xm ) of deg(f ) < k such that H(X, f ) is a zero polynomial. For each such root, check if f (Pi ) = yi for at least n − t values of i ∈ {1, . . . , n}. If so, include f in the output list. As we will see later, for an appropriately selected parameter t, the algorithm of Sudan can return a list containing all the tconsistent codewords in polynomial time, with the size of the output list bounded above by a polynomial in code length. So far, the best known record of errorcorrecting radius of list decoding by Sudan’s algorithm is the Johnson bound. In order to achieve this bound, prior to the actual decoding procedure, that is, Steps 1 and 2 of the algorithm above, a pair of integers, r and s, should be carefully chosen, which will be used to find an appropriate polynomial H(X, Y ). The parameters r and s are independent on received words. They are used for the decoding procedure for any received word, as long as they are determined. The actual decoding procedure consists two steps: Interpolation and Root Finding. By the interpolation procedure, a nonzero polynomial H(X, Y ) is found. This polynomial contains all the polynomials which define the tconsistent codewords as its Y roots. A Y root of H(X, Y ) is a polynomial f (X) satisfying that H(X, f (X)) is a zero polynomial. The rootfinding procedure finds and returns all these Y roots; thus all the tconsistent codewords are found. We now explain the terms: weighted degree and multiplicity of a zero of a polynomial, which we have used in the algorithm. Given integers a1 , a2 , . . . , al , the
9.3. LIST DECODING BY SUDAN’S ALGORITHM
287
(a1 , a2 , . . . , al )weighted degree of a monomial αX1d1 X2d2 · · · Xldl (where α is the coefficient of the monomial), is a1 d1 + a2 d2 + . . . + al dl . The (a1 , a2 , . . . , al )weighted degree of a polynomial P (X1 , X2 , . . . , Xl ) is the maximal (a1 , a2 , . . . , al )weighted degree of its terms. For a polynomial P (X) = α0 + α1 X + α2 X 2 + . . . αd X d , it is clear that 0 is a zero of P (X), i.e., P (0) = 0, if and only if α0 = 0. We say 0 is a zero of multiplicity r of P (X), provided that α0 = α1 = . . . = αr−1 = 0 and αr 6= 0. For a nonzero value β, it is a zero of multiplicity r of P (X), provided that 0 is a zero of multiplicity r P of P (X + β). Similarly, for a multivariate polynomial P (X1 , X2 , . . . , Xl ) = αi1 ,i2 ,...,il X1i1 X2i2 · · · Xlil , (0, 0, . . . , 0) is a zero of multiplicity r of this polynomial, if and only if for all (i1 , i2 , . . . , il ) with i1 + i2 + . . . + il ≤ r − 1, αi1 ,i2 ,...,il = 0 and there exists (i1 , i2 , . . . , il ) with i1 + i2 + . . . + il = r such that αi1 ,i2 ,...,il 6= 0. A point (β1 , β2 , . . . , βl ) is a zero of multiplicity r of P (X1 , X2 , . . . , Xl ), provided that (0, 0, . . . , 0) is a zero of multiplicity r of P (X1 + β1 , X2 + β2 , . . . , Xl + βl ). Now we consider that the Step 1 of Algorithm 9.3.5 seeks. Suppose P a polynomial H(X, Y ) = αi,im+1 X i Y im+1 is nonzero polynomial in X1 , . . . , Xm , Y . It is easy to prove that (we leave the proof to the reader as an exercise), for xi = (xi1 , . . . , xim ) and yi , X H(X + xi , Y + yi ) = βj,jm+1 X j Y jm+1 j,jm+1
where βj,jm+1 =
X j10 ≥j1
···
X
X
0 0 ≥j jm m jm+1 ≥jm+1
0 0 0 j jm+1 j0 −jm+1 0 αj0 ,jm+1 xij −j yi m+1 . j jm+1
Step 1 of the algorithm seeks a nonzero polynomial H(X, Y ) such that its (1, . . . , 1, k − 1)weighted degree is at most s, and for i = 1, . . . , n, each (x, yi ) is a zero of H(X, Y ) of multiplicity r. Based on the discussion above, this can be done by solving a system consisting of the following homogeneous linear equations in unknowns αi,im+1 (which are coefficients of H(X, Y )), 0 0 X X X 0 j jm+1 j0 −jm+1 0 ··· αj0 ,jm+1 xij −j yi m+1 = 0. j jm+1 0 0 0 j1 ≥j1
jm ≥jm jm+1 ≥jm+1
for all i = 1, . . . , n, and for every j1 , . . . , jm , jm+1 ≥ 0 with j1 +· · ·+jm +jm+1 ≤ r − 1; and αi,im+1 = 0, for every i1 , . . . , im , im+1 ≥ 0 with i1 + · · · + im + (k − 1)im+1 ≥ s + 1.
9.3.3
List decoding of ReedSolomon codes
A ReedSolomon code can be defined as a cyclic code generated by a generator polynomial (see Definition 8.1.1) or as an evaluation code (see Proposition 8.1.4). For the purpose of list decoding by Sudan’s algorithm, we view ReedSolomon
288
CHAPTER 9. ALGEBRAIC DECODING
codes as evaluation codes. Note that since any nonzero element α ∈ Fq satisfies αn = α, we have ev(X n f (X)) = ev(f (X)) for any f (X) ∈ Fq [X], where n = q − 1. Therefore, RSk (n, 1) = {f (x1 ), f (x2 ), . . . , f (xn )  f (X) ∈ Fq [X], deg(f ) < k} where x1 , . . . , xn are n distinct nonzero elements of Fq . In this subsection, we consider the list decoding of ReedSolomon codes RSk (n, 1) and extended ReedSolomon codes ERSk (n, 1), that is, the case m = 1 of the general algorithm, Algorithm 9.3.5. As we will discuss later, Sudan’s algorithm can be adapted to list decoding generalized ReedSolomon codes (see Definition 8.1.10). The correctness and errorcorrecting capability of listdecoding algorithm are dependent on the parameters r and s. In the following, we first prove the correctness of the algorithm for appropriate choice of r and s. Then we calculate the errorcorrecting capability. We can prove the correctness of the listdecoding algorithm by proving: (1) There exists a nonzero polynomial H(X, Y ) satisfying the conditions given in Step 1 of Algorithm 9.3.5; and (2) All the polynomials f (X) satisfying the conditions in Step 2 are the Y roots of H(X, Y ), that is, Y − f (X) divides H(X, Y ). Proposition 9.3.6 Consider a pair of parameters r and s. (1) If r and s satisfy n
r+1 s(s + 2) . < 2(k − 1) 2
Then, a nonzero polynomial H(X, Y ) sought in Algorithm 9.3.5 does exist. (2) If r and s satisfy r(n − t) > s. Then, for any polynomial f (X) of degree at most k−1 such that f (xi ) = yi for at least n − t values of i ∈ {1, 2, . . . , n}, the polynomial H(X, Y ) is divisible by Y − f (X). Proof. We first prove (1). As discussed in the previous subsection, a nonzero polynomial H(X, Y ) exists as long as we have a nonzero solution of a system of homogeneous linear equations in unknowns αi1 ,i2 , i.e., the coefficients of H(X, Y ). A nonzero solution of the system exists, provided that the number of equations is strictly less than the number of unknowns. From the precise expression of the system (see the end of last subsection), it is easy to calculate the number of equations, which is n r+1 2 . Next, we compute the number of unknowns. This number is equal to the number of monomials X i1 Y i2 of
9.3. LIST DECODING BY SUDAN’S ALGORITHM
289
(1, k − 1)weighted degree at most s, which is s b k−1 c s−i2 (k−1)
P
P
i2 =0 s b k−1 c
i1 =0
1
P
(s + 1 − i2 (k − 1)) s s s c + 1 − k−1 b c b c + 1 = (s + 1) b k−1 2 k−1 k−1 s s ≥ b k−1 c + 1 2 + 1 s ≥ k−1 · s+2 2
=
i2 =0
where bxc stands for the maximal integer less than or equal to x. Thus, we proved (1). We now prove (2). Suppose H(X, f (X)) is not zero polynomial. Denote h(X) = H(X, f (X)). Let I = {i  1 ≤ i ≤ n and f (xi ) = yi }. We have I ≥ n − t. For any i = 1, . . . , n, as H(X, Y ) of multiplicity r, we can P (xi , yi ) is a zero jof γj1 ,j2 (X − xi ) 1 (Y − yi )j2 . Now, for any i ∈ I, we express H(X, Y ) = j1 +j2 ≥r
have f (X) − yi = (X − xi )f1 (X) for some f1 (X), because f (xi ) − yi = 0. Thus, we have X X h(X) = γj1 ,j2 (X−xi )j1 (f (X)−yi )j2 = γj1 ,j2 (X−xi )j1 +j2 (f1 (X))j2 . j1 +j2 ≥r
j1 +j2 ≥r r
This (X − xi ) divides h(X). Therefore, h(X) has a factor g(X) = Q implies that r (X − x ) , which is a polynomial of degree at least r(n − t). On the other i i∈I hand, since H(X, Y ) has (1, k − 1)weighted degree at most s and the degree of f (X) is at most k − 1, the degree of h(X) is at most s, which is less than r(n − t). This is impossible. Therefore, H(X, f (X) is a zero polynomial, that is, Y − f (X) divides H(X, Y ). Proposition 9.3.7 If t satisfies (n − t)2 > n(k − 1), then there exist r and s s(s+2) satisfying both n r+1 < 2(k−1) and r(n − t) > s. 2 Proof.
Set s = r(n − t) − 1. It suffices to prove that there exists r satisfying r+1 (r(n − t) − 1)(r(n − t) + 1) n < 2 2(k − 1)
which is equivalent to the following inequivalent ((n − t)2 − n(k − 1)) · r2 − n(k − 1) · r − 1 > 0. Since (n − t)2 − n(k − 1) > 0, any integer r satisfying p n(k − 1) + n2 (k − 1)2 + 4(n − t)2 − 4n(k − 1) r> 2(n − t)2 − 2n(k − 1) satisfies the inequivalent above. Therefore, for the listdecoding algorithm to be correct it suffices by setting the integers r and s as $ % p n(k − 1) + n2 (k − 1)2 + 4(n − t)2 − 4n(k − 1) r= +1 2(n − t)2 − 2n(k − 1)
290
CHAPTER 9. ALGEBRAIC DECODING
and s = r(n − t) − 1.
We give the following result, Theorem 9.3.8, which is a straightforward corollary of the two propositions. Theorem 9.3.8 For a [n, k] ReedSolomon or extended ReedSolomon code the listdecoding algorithm, Algorithm 9.3.5, can correctly find all the codewords c within distance t from the received word r, i.e., d(r, c) ≤ t, provided p t < n − n(k − 1). Remark 9.3.9 Note that for a [n, k] ReedSolomon code, the minimum distance d = n − k + 1 which implies that k − 1 = n − d. Substituting this into the bound on errorcorrecting capability in the theorem above, we have r d t <1− 1− . n n This shows that the listdecoding of ReedSolomon codes achieves the Johnson bound (see Theorem 9.3.2). Regarding the size of the output list of the listdecoding algorithm, we have the following theorem. Theorem 9.3.10 Consider a [n, k] ReedSolomon or extended ReedSolomon p n(k − 1) and any received word, the number of tconsist code. For any t < n − √ 3 codewords is O( n k). Proof. From Proposition 9.3.6, we actually have proved that the number N of the tconsist codewords is bounded from above by the degree degY (H(X, Y )). Since the (1, k − 1)weighted degree of H(X, Y ) is at most s, we have N ≤ s degY (H(X, Y )) ≤ b (k−1) c. By the choices of r and s,
s (k−1)
=O
n(k−1)(n−t) k−1
=
O(n(n − t)). Corresponding to the largest permissible value of t for the tconsist p codewords, we can choose n − t = b n(k − 1)c + 1. Thus, √ p N = O(n(n − t)) = O( n3 (k − 1)) = O( n3 k). Let us analyze the complexity of the list decoding of a [n, k] ReedSolomon code. As we have seen, the decoding algorithm consists of two main steps. Step 1 is in fact reduced to a problem of solving a system of homogeneous linear equations, which can making use of Gaussian elimination be implemented 3 s(s+2) s(s+2) with time complexity O = O(n3 ) where 2(k−1) is the number of 2(k−1) unknowns of the system of homogeneous linear equations, and s is given as in Proposition 9.3.7. The second step is a problem of finding Y roots the polynomial H(X, Y ). This can be implemented by using a fast rootfinding algorithm proposed by Roth and Ruckenstein in time complexity O(nk).
9.3. LIST DECODING BY SUDAN’S ALGORITHM
9.3.4
291
List Decoding of ReedMuller codes
We consider the list decoding of ReedMuller codes in this subsection. Let n = q m and P1 , . . . , Pn be an enumeration of the points of Fm q . Recall that the qary ReedMuller code RMq (u, m) of order u in m variables is defined as RMq (u, m) = {(f (P1 ), . . . , f (Pn ))  f ∈ Fq [X1 , . . . , Xm ], deg(f ) ≤ u}. Note that when m = 1, the code RMq (u, 1) is actually an extended ReedSolomon code. From Proposition 8.4.4, RMq (u, m) is a subfield subcode of RMqm (n − d, 1) where d be the minimum distance of RMq (u, m), that is RMq (u, m) ⊆ RMqm (n − d, 1) ∩ Fnq . Here RMqm (n − d, 1) is an extended ReedSolomon code over Fqm of length n and dimension k = n − d + 1. We now give a listdecoding algorithm for RMq (u, m) as follows. Algorithm 9.3.11 (ListDecoding Algorithm for ReedMuller Codes)
INPUT: Code length n and a received word r = (y1 , . . . , yn ) ∈ Fnq . Step 0: Do the following: (1) Compute p the minimum distance d of RMq (u, m) and a parameter t = dn − n(n − d) − 1e. (2) Construct the extension field Fqm using an irreducible polynomial of degree m over Fq . (3) Generate the code RMqm (n − d, 1). (4) Construct a parity check matrix H over Fq for the code RMq (u, m).
Step 1: Using the listdecoding algorithm for ReedSolomon codes over Fqm , find L(1) , the set of all codewords c ∈ RMqm (n − d, 1) satisfying d(c, r) ≤ t.
Step 2: For every c ∈ L(1) , check if c ∈ Fnq , if so, append c to L(2) . Step 3: For every c ∈ L(2) , check if HcT = 0, if so, append c to L. Output L. From Theorems 9.3.8 and 9.3.10, we have the following theorem. Theorem 9.3.12 Denote by d the minimum distance of the qary ReedMuller code RMq (u, m). Then RMq (u, m) is (t, l)decodable, provided that p p t < n − n(n − d) and l = O( (n − d)n3 ). The algorithm above correctly finds all the tconsistent codewords for any received vector r ∈ Fnq .
292
CHAPTER 9. ALGEBRAIC DECODING
Remark 9.3.13 Note that Algorithm 9.3.11 outputs a set of tconsistent codewords of the qary ReedMuller code defined by the enumeration of points of Fm q , say P1 , P2 , . . . , Pn , specified in Section 7.4.2. If RMq (u, m) is defined 0 0 0 by another enumeration of the points of Fm q , say P1 , P2 , . . . , Pn , we can get the correct tconsistent codewords by the following steps: (1) Find the permu0 tation π such that Pi = Pπ(i) , i = 1, 2, . . . , n, and the inverse permutation −1 ∗ π . (2) Let r = (rπ(1) , rπ(2) , . . . , rπ(n) ). Then, go to Steps 02 of Algorithm 9.3.11 with r∗ . (3) For every codeword c = (c1 , c2 , . . . , cn ) ∈ L, let π −1 (c) = (cπ−1 (1) , cπ−1 (2) , . . . , cπ−1 (n) ). Then, π −1 (L) = {π −1 (c)  c ∈ L} is the set of tconsistent codewords of RMq (u, m). Now, let us consider the complexity of Algorithm 9.3.11 In Step 0, to construct the extension field Fqm , it is necessary to find an irreducible polynomial g(x) of degree m over Fq . It is well known that there are efficient algorithms for finding irreducible polynomials over finite fields. For example, a probabilistic algorithm proposed by V. Shoup in 1994 can find an irreducible polynomial of degree m over Fq with expected number of O((m2 logm + mlogq)logmloglogm) field operations in Fq . To generate the ReedSolomon code GRSn−d+1 (a, 1) over Fqm , we need to find a primitive element of Fqm . With a procedure by I.E. Shparlinski in 1993, a primitive element of Fqm can be found in deterministic time O((q m )1/4+ε ) = O(n1/4+ε ), where n = q m is the length of the code, ε denotes an arbitrary positive number. Step 1 of Algorithm 9.3.11 can be implemented using the listdecoding algorithm in for ReedSolomon code GRSn−d+1 (a, 1) over Fqm . From the previous subsection, it can be implemented to run in O(n3 ) field operations in Fqm . So, the implementation of Algorithm 9.3.11 requires O(n) field operations in Fq and O(n3 ) field operations in Fqm .
9.3.5
Exercises
9.3.1 Let P (X1 , . . . , Xl ) =
P i1 ,...,il
αi1 ,...,il X1i1 · · · Xlil be a polynomial in vari
ables X1 , . . . , Xl with coefficients αi1 ,...,il in a field F. Prove that for any (a1 , . . . , al ) ∈ Fl , X P (X1 + a1 , . . . , Xl + al ) = βj1 ,...,jl X1j1 · · · Xljl j1 ,...,jl
where βj1 ,...,jl =
X j10 ≥j1
9.4
0 X j 0 j j 0 −j j 0 −j 1 ··· · · · l αj10 ,...,jl0 a11 1 · · · al l l . j1 jl 0 jl ≥jl
Notes
Many cyclic codes have errorcorrecting pairs, for this we refer to Duursma and K¨ otter [53, 54].
9.4. NOTES
293
The algorithms of BerlekampMassey [11, 79] and Sugiyama [118] both have O(t2 ) as an estimation of the complexity, where t is the number of corrected errors. In fact the algorithms are equivalent as shown in [50, 65]. The application of a fast computation of the gcd of two polynomials in [4, Chap. 16, §8.9] in computing a solution of the key equation gives as complexity O(t log2 (t)) by [69, 104].
294
CHAPTER 9. ALGEBRAIC DECODING
Chapter 10
Cryptography Stanislav Bulygin This chapter is aiming at giving an overview of topics from cryptography. In particular, we cover symmetric as well as asymmetric cryptography. When talking about symmetric cryptography, we concentrate on a notion of a block cipher, as a mean to implement symmetric cryptosystems in practical environments. Asymmetric cryptography is represented by the RSA and El Gamal cryptosystems, as well as codebased cryptosystems due to McEliece and Niederreiter. We also take a look at other aspects such as authentication codes, secret sharing, and linear feedback shift registers. The material of this chapter is quite basic, but we elaborate more on several topics. Especially we should connections to codes and related structures where applicable. The basic idea of algebraic attacks on block ciphers is considered in the next chapter, Section 11.3.
10.1
Symmetric cryptography and block ciphers
10.1.1
Symmetric cryptography
This section is devoted to the Symmetric cryptosystems. The idea behind these is quite simple and thus was basically known for quite along time. The task is to convey a secret between two parties, called traditionally Alice and Bob, so that figuring the secret out is not possible without knowledge of some additional information. This additional information is called a secret key and is supposed to be known only to the two communicating parties. The secrecy of the transmitted message rests entirely upon the knowledge of this secret key, and thus if an adversary or an eavesdropper, traditionally called Eve, is able to find out the key, then the whole secret communication is corrupted. Now let us take a look at the formal definition. Definition 10.1.1 The symmetric cryptosystem is defined by the following data: • The plaintext space P and the ciphertext space C. 295
296
CHAPTER 10. CRYPTOGRAPHY • {Ee : P → Ce ∈ K} and {Dd : C → Pd ∈ K} are the sets of encryption and decryption transformations, which are bijections from P to C and from C to P resp. • The above transformations are parametrized by the key space K. • Given an associated pair (e, d), so that a property ∀p ∈ P : Dd (Ee (p)) = p holds, knowing e it is ”computationally easy” to find out d and vise versa.
The pair (e, d) is called secret key. Moreover, e is called the encryption key and d is called the decryption key. Note that often the counterparts e and d coincide. This gives a reason for the name ”symmetric”. There exist also cryptosystems in which knowledge of an encryption key e does not reveal (i.e. it is ”computationally hard” to find) an associated decryption key d. So encryption keys can be made public, and such cryptosystems are called asymmetric or public see Section 10.2. Of course, one should specify exactly what are P, C, K and the transformations. Let us take a look at a concrete example. Example 10.1.2 The first use of a symmetric cryptosystem is conventionally attributed to Julius Caesar. He used the following cryptosystem for communication with his generals, which is historically called Caesar cipher . Let P and C be the sets of all strings composed of letters from the English (Latin for Caesar) alphabet A = {A, B, C, . . . , Z}. Let K = {0, 1, 2, . . . , 25}. Now an encryption transformation Ee given a plaintext p = (p1 , . . . , pn ), pi ∈ A, i = 1, . . . , n does the following. For each i = 1, . . . , n one determines a position of pi in the alphabet A (”A” being 0, ”B” being 1, . . . , ”Z” being 25). Next one finds a letter in A that stands e positions to the left, thus finding a letter ci ; one needs to overlap if the beginning of A is reached. So with the enumeration of A as above, we have ci = pi − e ( mod 26). In this way a ciphertext c = (c1 , . . . , cn ) is obtained. Decryption key is given by d = −e ( mod 26), or, equivalently, for decryption one needs to shift letters e positions to the right. Julius Caesar used e = 3 for his cryptosystem. Let us consider an example. For the plaintext p =”BRUTUS IS AN ASSASSIN”, the ciphertext (if we ignore spaces during the encryption) looks like c =”YORQRP FP XK XOOXOOFK”. To decrypt one simply shifts back 3 positions to the right.
10.1.2
Block ciphers. Simple examples
The above is a simple example of a socalled substitution cipher , which is in turn an instance of a block cipher. Block ciphers, among other things, provide a practical realization of symmetric cryptosystems. They can also be used for constructing other cryptographic primitives, like pseudorandom number generators, authentication codes (Section 10.3), hash functions. The formal definition follows. Definition 10.1.3 The nbit block cipher is defined as a mapping E : An ×K → An , where A is an alphabet set and K is the key space and for each k ∈ K the mapping E(·, k) =: Ek : {0, 1}n → {0, 1}n is invertible. Ek is the encryption transformation for the key k, and Ek−1 = Dk is the decryption transformation. If Ek (p) = c, then c is the ciphertext of the plaintext p under the key k.
10.1. SYMMETRIC CRYPTOGRAPHY AND BLOCK CIPHERS
297
It is common to work with the binary alphabet, i.e. A = {0, 1}. In such case, ideally we would like to have a block cipher that is random in the sense that it implements all (2n )! bijections from {0, 1}n to {0, 1}n . In practice, though, it is quite expensive to have such a cipher. So when designing a block cipher we care that it behaves like a random one, i.e. for a randomly chosen key k ∈ K the encryption transformation Ek should appear random. If one is able to find distinctions of Ek , where k is in some subset Kweak , from the random transformation, then it is an evidence of a weakness of the cipher. Such subset Kweak is called the subset of weak keys; we will turn back to this later when talking about DES. Now we present several simple examples of block ciphers. We consider permutation and substitution ciphers that were used quite intensively in the past (see Notes) and some fundamental ideas thereof appear also in the modern ciphers. Example 10.1.4 (Permutation or transposition cipher ) The idea of this cipher is to partition the plaintext into blocks and perform a permutation of elements in that block. More formally, partition the plaintext into blocks of the form p = p1 . . . pt and then permute: c = Ek (p) = pk(1) , . . . , pk(t) . A number t is called a period of the cipher. The key space K now is the set of all permutations of {1, . . . , t} : K = St . For example let the plaintext be p =”CODING AND CRYPTO”, let t = 5, and k = (4, 2, 5, 3, 1). If we remove the spaces and partition p into 3 blocks we obtain c =”INCODDCGANTORYP”. Used alone permutation cipher does not provide good security (see below), but in combination with other techniques it is used also in modern ciphers to provide diffusion in a ciphertext. Example 10.1.5 We can use Sage system to run the previous example. The code looks as follows. > S = AlphabeticStrings() > E = TranspositionCryptosystem(S,5) > K = PermutationGroupElement(’(4,2,5,3,1)’) > L = E.inverse_key(K) > M = S("CODINGANDCRYPTO") > e = E(K) > c = E(L) > e(M) INCODDCGANTORYP > c(e(M)) CODINGANDCRYPTO One can also choose a random key for encryption: > KR = E.random_key() > KR (1,4,2,3) > LR = E.inverse_key(KR) > LR (1,3,2,4) > eR = E(KR) > cR = E(LR) > eR(M) IDCONDNGACTPRYO
298
CHAPTER 10. CRYPTOGRAPHY
> cR(eR(M)) CODINGANDCRYPTO Example 10.1.6 (Substitution cipher ) An idea behind monoalphabetic substitution cipher is to provide substitution of every symbol in a plaintext by some other symbol from a chosen alphabet. Formally, let A be the alphabet, so that plaintexts and ciphertexts are composed of symbols from A. For the plaintext p = p1 . . . pn the ciphertext c is obtained as c = Ek (p) = k(p1 ), . . . , k(pn ). The key space now is the set of all permutations of A : K = SA . In Example 10.1.2 we have already seen an instance of such a cipher. There k was chosen to be k = (23, 24, 25, 0, 1, . . . , 21, 22). Again, used alone monoalphabetic cipher is insecure, but as a basic idea is used in modern ciphers to provide confusion in a ciphertext. There is also a polyalphabetic substitution cipher. Let the key k be defined as a sequence of permutations on A : k = (k1 , . . . , kt ), where t is the period. Then every t symbols of the plaintext p are mapped to t symbols of the ciphertext c as c = k1 (p1 ), . . . , kt (pt ). Simplifying ki to shifting by li symbols to the right we obtain ci = pi + li ( mod A). Such cipher is called simple Vigen´ere cipher . Example 10.1.7 The Sagecode for a substitution cipher encryption is given below. > S = AlphabeticStrings() > E = SubstitutionCryptosystem(S) > K = E.random_key() > K ZYNJQHLBSPEOCMDAXWVRUTIKGF > L = E.inverse_key(K) > M = S("CODINGANDCRYPTO") > e = E(K) > e(M) NDJSMLZMJNWGARD > c = E(L) Here the string ZYNJQHLBSPEOCMDAXWVRUTIKGF shows the permutation of the alphabet. Namely, the letter A is mapped to Z, the letter B is mapped to Y etc. One can also provide the permutation explicitly as follows > K = S(’MHKENLQSCDFGBIAYOUTZXJVWPR’) > e = E(K) > e(M) KAECIQMIEKUPYZA A piece of code for working with the simple Vigen´ere cipher is provided below. > S = AlphabeticStrings() > E = VigenereCryptosystem(S,15) > K = S(’XSPUDFOQLRMRDJS’) > L = E.inverse_key(K) > M = S("CODINGANDCRYPTO") > e = E(K) > e(M) ZGSCQLODOTDPSCG > c = E(L) > c(e(M))
10.1. SYMMETRIC CRYPTOGRAPHY AND BLOCK CIPHERS
299
Table 10.1: Frequencies of the letters in the English language E A R I O T N S L C U D P
11.1607% 8.4966% 7.5809% 7.5448% 7.1635% 6.9509% 6.6544% 5.7351% 5.4893% 4.5388% 3.6308% 3.3844% 3.1671%
M H G B F Y W K V X Z J Q
3.0129% 3.0034% 2.4705% 2.0720% 1.8121% 1.7779% 1.2899% 1.1016% 1.0074% 0.2902% 0.2722% 0.1965% 0.1962%
CODINGANDCRYPTO Note that here the string XSPUDFOQLRMRDJS defines 15 permutations: one per position. Namely, every letter is an image of the letter A at that position. So at the first position A is mapper to X (therefore, e.g. B is mapped to Y), at the second position A is mapped to S and so on. The ciphers above used alone do not provide security, as has already been mentioned. One way to break such ciphers is to use statistical methods. For the permutation ciphers note that they do not change frequency of occurrence of each letter of an alphabet. Comparing frequencies obtained from a ciphertext with a frequency distribution of the language used, one can figure out that he/she deals with the ciphertext obtained with a permutation cipher. Moreover, for cryptanalysis one may try to look for anagrams  words in which letters are permuted. If the eavesdropper is able to find such anagrams and solve them, then he/she is pretty close to breaking such a cipher (Exercise 10.1.1). Also if the eavesdropper has an access to an encryption device and is able to produce ciphertexts for plaintext of his/her choice (chosenplaintext attack), then he/she can simply choose plaintexts, such that figuring out the period and the permutation becomes easy. For monoalphabetic substitution ciphers one also notes that although letters are changed, the frequency with which they occur does not change. So the eavesdropper may compare frequencies in a longenough ciphertext with a frequency distribution of the language used and thus figure out, how letters of an alphabet were mapped to obtain a ciphertext. For example for the English alphabet one may use frequency analysis of words occurring in ”Concise Oxford Dictionary” (http://www.askoxford.com/asktheexperts/faq/aboutwords/frequency), see Table 10.1. Note that since positions of the symbols are not altered, the eavesdropper may not only look at frequencies of the symbols, but also for combinations of symbols. In particular look for pieces of a ciphertext that correspond to frequently used words like ”the”, ”we”, ”in”, ”at” etc. For polyalphabetic ciphers one needs to find out the period first. This can be done by the socalled Kasiski method . When the period is determined, one can proceed with the
300
CHAPTER 10. CRYPTOGRAPHY
frequency analysis as above performed separately for all sets of positions that stand at distance t at each other.
10.1.3
Security issues
As we have seen, block ciphers provide us with a mean to convey secret messages as per symmetric scheme. It is clear that an eavesdropper tries to get insight in this secret communication. The question that naturally arises is ”What does it mean to break a cipher?” or ”When the cipher is considered to be broken?”. In general we consider a cipher to be totally broken if the eavesdropper is able to recover the secret key, thus compromising the whole secret communication. We consider a cipher to be partially broken if the eavesdropper is able to recover (a part of) a plaintext from a given ciphertext, thus compromising the part of the communication. In order to describe actions of the eavesdropper more formally, different assumptions on the eavesdropper’s abilities and scenarios of attacks are introduced. Assumptions: • The eavesdropper has an access to all ciphertexts that are transmitted through the communication channel. He/she is able to extract these ciphertexts and use them further for his/her disposal. • The eavesdropper has a full description of the block cipher itself, i.e. he/she is aware of how the encryptions constituting the cipher act. The first assumption is natural to assume, since communication in the modern world (e.g. via the Internet) assumes huge amount of information to be transmitted between an enormous variety of parties. Therefore, it is impossible to provide secure channels for all such transmissions. The second one is also quite natural, as for most block ciphers that are proposed in the recent time there is a full description publicly available either as a legitimate standard or as a paper/report. Attack scenarios: • ciphertextonly: The eavesdropper does not have any additional information, only an intercepted ciphertext. • knownplaintext: Some amount of plaintextciphertext pairs encrypted with one particular yet unknown key are available to the eavesdropper. • chosenplaintext and chosenciphertext: The eavesdropper has an access to plaintextciphertext pairs with a specific eavesdropper’s choice of plaintexts and ciphertexts resp. • adaptive chosenplaintext and adaptive chosenciphertext: The choice of the special plaintexts resp. ciphertext in the previous scenario depends on some prior processing of pairs. • relatedkey: The eavesdropper is able to do encryptions with unknown yet related keys, with the relations known to the eavesdropper.
10.1. SYMMETRIC CRYPTOGRAPHY AND BLOCK CIPHERS
301
Note that the last three attacks are quite hard to realize in a practical environment and sometimes even impossible. Nevertheless, studying these scenarios provides more insight in the security properties of a considered cipher. When undertaking an attack on a cipher on thinks in terms of complexity. Recall from Definition 6.1.4 that there is always time or processing as well as memory or storage complexities. Another type of complexity one deals with here is data complexity, which is an amount of preknowledge (e.g. plain/ciphertexts) needed to mount an attack. The first thing to think of when designing a cipher is to choose block/key length, so that brute force attacks are not possible. Let us take a closer look here. If the eavesdropper is given 2n plaintextciphertext pairs encrypted with one secret key, then he/she entirely knows the encryption function for a given secret key. This implies that n should not be chosen too small, as then simply composing a codebook of associated plaintextsciphertexts is possible. For modern block ciphers, block length 128 bits is common. On the other side, if the eavesdropper is given just one plaintextciphertext pair (p, c), he/she may proceed as follows. Try every key from K (assume now that K = {0, 1}l ) until he/she finds k that maps p to c: Ek (p) = c. Validate k with another pair (or several pairs) (p0 , c0 ), i.e. check whether Ek (p0 ) = c0 . If validation fails, then discard k and move further in K. One expects to find a valid key after searching through a half of {0, 1}l , i.e. after 2l−1 trials. This observation implies that key space should not be too small, as then exhaustive search of such kind is possible. For modern ciphers key lengths of 128, 192, 256 bits are applied. Smaller blocklength, like 64 bits, are also employed in lightweight ciphers that are used for resource constraint devices. Let us now discuss two main types of security that exist out there for cryptosystems in general. Definition 10.1.8 • Computational security. Here one considers a cryptosystem to be secure (computationally) if the number of operations needed to break the cryptosystem is so large that cannot be executed in practice, similarly for memory. Usually one measures such a number by the best attacks available for a given cryptosystem and thus claiming computational security. Another similar idea is to show that breaking a given cryptosystem is equivalent to solving some problem, that is believed to be hard. Such security is called provable security or sometimes reductionist security. • Unconditional security. Here one assumes that an eavesdropper has an unlimited computational power. If one is able to prove that even having this unlimited power, an eavesdropper is not able to break a given cryptosystem, then it is said that the cryptosystem is unconditionally secure or that it provides perfect secrecy. Before going to examples of block ciphers, let us take a look at the security criteria that are usually used, when estimating security capabilities of a cipher. Security criteria: • stateoftheart security level: One gets more confident in a cipher’s security if known uptodate attacks, both generic and specialized, do not break the cipher faster, than the exhaustive search. The more such attacks are considered, the more confidence one gets. Of course, one cannot
302
CHAPTER 10. CRYPTOGRAPHY be absolutely confident here as new, unknown before, attacks may appear that would impose real threat. • block and key size: As we have seen above, small block and key sizes make brute force attacks possible, so in this respect, longer blocks and key provide more security. On the other hand, longer blocks and key imply more costs in implementing such a cipher, i.e. encryption time and memory consumption may rise considerably. So there is a tradeoff between security and ease/speed of an implementation. • implementation complexity: In addition to the previous point, one should also take care for efficient implementation of encryption/decryption mappings depending on an environment. For example different methods may be used for hardware and software implementations. Special care is to be taken, when one deals with hardware units with very limited memory (e.g. smartcards). • others: Things like data expansion and error propagation also play a role in applications and should be taken into account accordingly.
10.1.4
Modern ciphers. DES and AES
In Section 10.1.2 we considered basic ideas for block ciphers. Next, let us consider two examples of modern block ciphers. The first one  DES (Data Encryption Standard)  was proposed in 1976 and was used until late 1990s. Due to short key length, it became possible to implement an exhaustive search attack, so the DES was no longer secure. In 2001 the cipher Rijndael proposed by Belgian cryptographers Joan Daemen and Vincent Rijmen was adopted as the Advanced Encryption Standard (AES) in the USA and is now widely used for protecting classified governmental documents. In commerce AES also became the standard de facto. We start with DES, which is an instance of a Feistel cipher, which is in turn an iterative cipher. Definition 10.1.9 An iterative block cipher is a block cipher which performs sequentially a certain key dependant transformation Fk . This transformation is called round transformation and the number of rounds Nr is a parameter of an iterative cipher. It is also common to expand the initial private key k to subkeys ki , i = 1, . . . , Nr , where each ki is used as a key for F at round i. A procedure for obtaining the subkeys from the initial key is called a key schedule. For each ki the transformation F should be invertible to allow decryption. DES Definition 10.1.10 A Feistel cipher is an iterative cipher, where encryption is done as follows. Divide nbit plaintext p into two parts  left and right  (l0 , r0 ) (n is assumed to be even). A transformation f : {0, 1}n/2 × K0 → {0, 1}n/2 is chosen (K0 may differ from K). The initial secret key is expanded to obtain the subkeys ki , i = 1, . . . , Nr . Then for every i = 1, . . . , Nr a pair (li , ri ) is obtained from the previous pair (li−1 , ri−1 ) as follows: li = ri−1 , ri = li−1 ⊕ f (ri−1 , ki ). Here ”⊕” means bitwise addition of {0, 1}vectors. The ciphertext is taken as (rNr , lNr ) rather than (lNr , rNr ).
10.1. SYMMETRIC CRYPTOGRAPHY AND BLOCK CIPHERS
303
On Figure 10.1 one can see the scheme of Feistel cipher encryption. p l0
? r0 s ? f k1 e ? r1 s ? f k2 e ? r2
? l1
l2 q
q
q rN −1 s ? f kN e ? rN
lN −1
lN
? ? rN lN ? c
Figure 10.1: Feistel cipher encryption Note that f (·, ki ) need not be invertible (Exercise 10.1.5). Decryption is done analogously with the reverse order of subkeys: kNr , . . . , k1 . DES is a Feistel cipher that operates on 64bit blocks and needs a 56bit key. Actually the key is given initially in 64 bits, of which 8 bits can be used as parity checks. DES has 16 rounds. The subkeys k1 , . . . , k16 are 48 bits long. The transformation f from Definition 10.1.10 is chosen as f (ri−1 , ki ) = P (S(E(ri−1 ) ⊕ ki )).
(10.1)
Here E : {0, 1}32 → {0, 1}48 is an expansion transformation that expands 32bit vector to a 48bit one in order to fit the size of ki when doing bitwise addition. Next S is a substitution transformation that acts as follows. First
304
CHAPTER 10. CRYPTOGRAPHY
divide 48bit vector E(ri−1 ) ⊕ ki into 8 6bit blocks. For every block perform a (nonlinear) substitution that takes 6 bits and outputs 4 bits. Thus at the end one has a 32bit vector obtained by a concatenation of the results from the substitution S. The substitution S is an instance of an Sbox , a carefully chosen nonlinear transformation that makes relation between its input and output complex, thus adding confusion to the encryption transformation (see below for the discussion). Finally, P is a permutation of a 32bit vector. Algorithm 10.1.11 (DES encryption) Input: The 64bit plaintext p and the 64bit key k. Output: The 64bit ciphertext c corresponding to p. 1. Use the parity check bits k8 , k16 , . . . , k64 to detect errors in 8bit subblocks of k.  If no errors are detected then obtain 48bit subkeys k1 , . . . k16 from k using key schedule. 2. Take p and perform an initial permutation IP to p. Divide the 64bit vector IP (p) into halves (l0 , r0 ). 3. For i = 1, . . . , 16 do  li := ri−1 .  f (ri−1 , ki ) := P (S(E(ri−1 ) ⊕ ki )), with S, E, P as explained after (10.1).  ri := li−1 ⊕ f (ri−1 , ki ). 4. Interchange the last halves (l16 , r16 ) → (r16 , l16 ) = c0 . 5. Perform the permutation inverse to the initial one to c0 , the result is the ciphertext c := IP −1 (c0 ). Let us now give a brief overview of DES properties. First of all, we mention two basic features that any modern block cipher provides and definitely should be taken into account when designing a block cipher. • Confusion. When an encryption transformation of a block cipher makes relations among a plaintext, a ciphertext, and a key, as complex as possible, it is said that such cipher adds to the encryption process. Confusion is usually achieved by nonlinear transformations realized by Sboxes. • Diffusion. When an encryption transformation of a block cipher makes every bit of a ciphertext dependent on every bit of a plaintext and on every bit of a key, it is said that such cipher adds to the encryption process. Diffusion is usually achieved by permutations. See Exercise 11.3.1 for a concrete example. Empirically, DES has the above features, so in this respect appears to be rather strong. Let us discuss some other features of DES and some attacks that exist out there for DES. Let DESk (·) be an encryption transformation defined by DES as per Algorithm 10.1.11 for a key k. DES has 4 weak keys, in this context these are the keys k for which DESk (DESk (·)) is the identity mapping, which,
10.1. SYMMETRIC CRYPTOGRAPHY AND BLOCK CIPHERS
305
of course, violates the criteria mentioned above. Moreover for each of these weak keys DES has 232 fixed points, i.e. plaintexts p such that DESk (p) = p. There are 6 pairs of semiweak keys (dual keys), i.e. pairs (k1 , k2 ) such that DESk1 (DESk2 (·)) is the identity mapping. Similarly to weak keys, 4 out of 12 semiweak keys have 232 antifixed points, i.e. plaintexts p such that DESk (p) = p¯, where p¯ is the bitwise complement of p. It is also known that DES encryptions are not closed under composition, i.e. do not form a group. This is quite important as otherwise using multiple DES encryptions would be less secure, than otherwise is believed. If the eavesdropper is able to work under huge data complexity, several knownplaintext attacks become possible. The most wellknown of them related to DES are linear and differential cryptanalysis. The linear cryptanalysis was proposed by Mitsuru Matsui in early 1990s and is based on the idea of an approximation of a cipher with an affine function. In order to implement this attack for DES one needs 243 known plaintextciphertext pairs. Existence of such an attack is an evidence of theoretic weakness of DES. Similar observation is applicable to the differential cryptanalysis. The idea of this general method is to carefully explore how differences in inputs to certain parts of an encryption transformation affects outputs of these parts. Usually the focus is on the SBoxes. An eavesdropper is trying to find a bias in differences distribution, which which would allow him/her to distinguish a cipher from a random permutation. In the DESsituation the eavesdropper needs 255 known or 247 chosen plaintextciphertext pairs in order to mount such an attack. These attacks do not bear any practical threat to DES. Moreover, performing exhaustive search on the entire key space of size 256 is practically faster, than the attacks above. AES Next we present the basic description and properties of the Advanced Encryption Standard (AES). AES is a successor of DES and was proposed, because DES was not considered to be secure anymore. A new cipher for the Standard should have had larger key/block size and be resistant to linear and differential cryptanalysis that imposed theoretically a threat for the DES. The cipher Rijndael adapted for the Standard satisfies these demands. It operates on blocks of length 128 bits and keys of length 128, 192, or 256 bits. We will concentrate on AES, which employs keys of length 128 bits  the most common setting used. AES is an instance of a substitutionpermutation network. We give a definition next. Definition 10.1.12 The substitutionpermutation network (SPnetwork ) is the iterative block cipher with layers of Sboxes interchanged with layers of permutations (or PBoxes), see Figure 10.2. It is required that Sboxes are invertible. Note that in the definition of an SPnetwork we demand that Sboxes are invertible transformations in contrast to Feistel ciphers, where Sboxes do not have to be invertible, see the discussion after Definition 10.1.10. Sometimes invertibility of Sboxes is not required, which makes the definition wider. If we recall the notions of confusion and diffusion, we see that SPnetworks exactly reflect these notions: SBoxes provide local confusion and then bit permutations of affine maps provide diffusion.
306
CHAPTER 10. CRYPTOGRAPHY
plaintext ?
?
Sbox
Sbox
?
? q
q
q
Sbox
?
?
? Sbox ?
PBox ?
p p p
? Sbox ?
?
q
q
q
q
q
q
? ? p p p ? ?
? Sbox
Sbox
?
?
Sbox ?
PBox ? ciphertext Figure 10.2: SPnetwork The description of the AES follows. As has already been said, AES operates on 128bit blocks and 128bit keys (standard version). For convenience these 128bit vectors are considered as 4 × 4 arrays of bytes (8bits). AES128 (key length is 128 bits) has 10 rounds. We know that AES is an SPnetwork, so let us describe its substitution and diffusion (permutation) layers. AES substitution layer is based on 16 Sboxes each acting on a separate byte of the square representation. In AES terminology the Sbox is called SubBytes. One Sbox performs its substitution in three steps: 1. Inversion:: Consider an input byte binput (a {0, 1}vector of length 8) as an element of F256 . This is done via the isomorphism F2 [a]/ha8 + a4 + a3 + a + 1i ∼ = F256 , so that F256 can be regarded as an 8dimensional vector space over F2 *** Appendix ***. If binput 6= 0, then the output of this step is binverse = b−1 input otherwise binverse = 0. 2. F2 linear mapping:: Consider binverse again as a vector from F82 . The output of this step is given by blinear = L(binverse ), where L is an invertible F2 linear mapping given by a prescribed circulant matrix. 3. Sbox constant: The output of the entire Sbox is obtained as boutput = blinear + c, where c is an Sbox constant. Thus, in essence, each Sbox applies inversion and then the affine transformation to an 8bit input block yielding 8bit output block. It is easy to see that Sbox so defined is invertible. The substitution layer acts locally on each individual byte, whereas diffusion layer acts on the entire square array. The diffusion layer consists of two consecutive linear transformations. The first one, called ShiftRows, shifts the ith row of the array by i − 1 positions to the left. The second one, called MixColumns, is given by a 4 × 4 matrix M over F256 and transforms every column C of the array to a column M C. The matrix M is the parity check matrix of an MDS
10.1. SYMMETRIC CRYPTOGRAPHY AND BLOCK CIPHERS
307
code, cf. Definition 3.2.2 and was introduced to follow the socalled wide trail strategy and precludes the use of linear and differential cryptanalysis. Let us now describe the encryption process of AES. Algorithm 10.1.13 (AES encryption) Input: The 128bit plaintext p and the 128bit key k Output: The 128bit ciphertext c corresponding to p. 1. Perform initial key addition: w := p ⊕ k =AddRoundKey(p, k). 2. Expand the initial key k to subkeys k1 , . . . , k10 using key schedule. 3. For i = 1, . . . , 9 do  Perform Sbox substitution: w :=SubBytes(w).  Shift the rows: w :=ShiftRows(w).  Transform the columns with the MDS matrix M : w :=MixColumns(w).  Add the round key: w :=AddRoundKey(w, ki ) = w ⊕ ki . # The last round does not have MixColumns. 4. Perform Sbox substitution: w :=SubBytes(w). 5. Shift the rows: w :=ShiftRows(w). 6. Add the round key: w :=AddRoundKey(w, k10 ) = w ⊕ k10 . 7. The ciphertext is c := w. The key schedule is designed similarly to the encryption and is omitted here. All the details on the components, the key schedule, and the reverse cipher for decryption can be found in the literature, see Notes. The reverse cipher is quite straightforward as it has to undo invertible affine transformations and the inversion in F256 . Let us discuss some properties of AES. First of all, we note that AES possesses confusion and diffusion properties. The use of Sboxes provides sufficient resistance to linear and differential cryptanalysis that was one of the major concerns when replacing DES. The use of the affine mapping in the Sbox among other things removes fixed points. In the diffusion layer the diffusion is done separately for rows and columns. It is remarkable that in contrast to the DES, where the encryption is mainly described via table lookups, AES description is very algebraic. All transformations described as either field inversion or a matrix multiplication. Of course, in realworld applications some operations like Sbox are nevertheless realized as table lookups. Still the simplicity of the AES description has been in discussion since the selection process, where the future AES Rijndeal took part. Highly algebraic nature of the AES description boosted a new branch in cryptanalysis called algebraic cryptanalysis. We address this issue in the next chapter, see Section 11.3.
308
10.1.5
CHAPTER 10. CRYPTOGRAPHY
Exercises
10.1.1 It is known that the following ciphertext is obtained with a permutation cipher of period 6 and contains an anagram of a famous person’s name (spaces are ignored by encryption): ”AAASSNISFNOSECRSAAKIWNOSN”. Find the original plaintext. 10.1.2 Sequential composition of several permutation ciphers with periods t1 , . . . , ts is called compound permutation (compound transposition). Show that the compound permutation cipher is equivalent to a simple permutation cipher with the period t = lcm(t1 , . . . , ts ). 10.1.3 [CAS] The Hill cipher is defined as follows. One encodes a lengthn block p = (p1 . . . pn ), which is assumed to consistPof elements from Zn , with n an invertible n × n matrix H = (hij ) as ci = j=1 hij pj . Therewith one obtains the cryptogram c = (c1 . . . cn ). The decryption is done analogously using H −1 . Write a procedure that implements the Hill cipher. Compare your implementation with the HillCryptosystem class from Sage. 10.1.4 The following text is encrypted with a monoalphabetic substitution cipher. Decrypt it. the following ciphertext using frequency analysis and Table 10.1: AI QYWX YRHIVXEOI MQQIHMEXI EGXMSR. XLI IRIQC MW ZIVC GOSWI! Hint: Decrypting small words and using first may be very useful. Also use Table 10.1. 10.1.5 Show that in the definition of a Feistel cipher the transformation f need not be invertible to ensure encryption, in a sense that the round function is invertible even if f is not. Also show that performing encryption starting at (rNr , lNr ) with the reverse order of subkeys, yields (l0 , r0 ) at the end, thus providing a decryption. 10.1.6 It is known that the expansion transformation E of DES has the complementary property, i.e. for every input x it holds that E(¯ x) = E(x). It is also known that k¯ expands to k1 , . . . , k16 . Knowing this show that a. The entire DES transformation also possesses the complementary property: ∀p ∈ {0, 1}64 , k ∈ {0, 1}56 : DESk¯ (¯ p) = DESk (p). Using (a.) show b. It is possible to reduce exhaustive search complexity from 255 (half the keyspace size) to 254 .
10.2
Asymmetric cryptosystems
In Section 10.1 we considered symmetric cryptosystems. As we may see, for a successful communication Alice and Bob are required to keep their encryption/decryption keys secret. Only the channel itself is assumed to be eavesdropped. For Alice and Bob to set a secret communication it is necessary to
10.2. ASYMMETRIC CRYPTOSYSTEMS
309
convey encryption/decryption keys. This can be done, e.g. by means of a trusted courier or some very secure channel (like specially secured telephone line) that is considered to be strongly secure. This paradigm suited well for diplomatic and military communication: the amount of communicating parties in these scenarios was quite limited; in addition, usually communicating parties could afford sending a trusted courier in order to keep keys secret or provide some highly protected channel for exchanging keys. In 1970s with the beginning of electronic communication it became apparent that such an exchanging mechanism is absolutely inefficient. This is mainly due to a drastic increase in the number of communicating parties. It is not only diplomats or high order military officials that wish to set secret communication, but usual users (e.g. companies, banks, social networks users) who would like to be able to do business over some large distributed network. Suppose that there is n users which potentially are willing to communicate with each other secretly. Then it is possible to share secret keys between every pair of users. There are n(n − 1)/2 pairs of users, so one would need this number of exchanges in a network to set the communication. Note that already for n = 1, 000, we have n(n − 1)/2 = 499, 500, which is of course not something we would like to do. Another option would be to set some trusted authority in the middle who would store secret keys for every user and then if Alice would like to send a plaintext p to Bob, she would send cAlice = EKAlice (p) to the trusted authority Tim. Tim would decrypt p = DKAlice (cAlice ) and send cBob = EKBob (p) to Bob who is able then to decrypt cBob with his secret key KBob . An obvious drawback of this approach is that Tim knows all the secret keys, and thus is able to read (and alter!) all the plaintexts, which is of course not desirable. Another disadvantage is that for a large network it could be hard to implement the trusted authority of this kind as it has to take part in every communication between users and thus can get overwhelmed. A solution to the problem above was proposed by Diffie and Hellman in 1976. This was the starting point for asymmetric cryptography. The idea is that if Alice wants to communicate with some other parties, she generates an encryption/decryption pair (e, d) in such a way that knowing e it is computationally infeasible to obtain d. This is quite different from symmetric cryptography, where e and d are (computationally) the same. Motivation for the name ”asymmetric cryptosystem” as oppose to ”symmetric cryptosystem” should be clear now. So what Alice does, she publishes her encryption key e in some public repository and keeps d secret. If Bob wants to send a plaintext p to Alice, he simply finds her public key e = eAlice in the repository and uses it for encryption: c = Ee (p). Now Alice is able to decrypt with her private key d = dAlice . Note that due to assumptions we have on the pair (e, d), Alice is the only person who is able to decrypt c. Indeed, Eve is able to know c and an encryption key e used, but she is not able to get d for decryption. Remarkably, even Bob himself is not able to restore his plaintext p from c if he loses or deletes it beforehand! The formal definition follows. Definition 10.2.1 The asymmetric cryptosystem is defined by the following data: • The plaintext space P and the ciphertext space C. • {Ee : P → Ce ∈ K} and {Dd : C → Pd ∈ K} are the sets of encryption and decryption transformations resp., which are bijections from P to C
310
CHAPTER 10. CRYPTOGRAPHY and from C to P resp. • The above transformations are parameterized by the key space K. • Given an associated pair (e, d), so that a property ∀p ∈ P : Dd (Ee (p)) = p holds, knowing e it is ”computationally hard” to find out d.
Here, the encryption key e is called public and the decryption key d is called private. The core issue in the above definition is having a property that knowledge of e practically does not shed any light on d. The study of this issue led to the notion of a oneway function. We say that a function f : X → Y is oneway if it is ”computationally easy” to compute f (x) for any x ∈ X, but for y ∈ Im(f ) it is ”computationally hard” to find x ∈ X such that f (x) = y. Note that one may compute Y 0 = {f (x)x ∈ Z ⊂ X}, where Z is some small subset of X and then invert elements from Y 0 . Still Y 0 is essentially small compared to Im(f ), so for randomly chosen y ∈ Im(f ) the above assumption should hold. Theoretically it is not known if oneway functions exist, but in practice there are several candidates that are believed to be oneway. We discuss this a bit later. The above notion of oneway function solves half of the problem. Namely, if Bob sends Alice an encrypted plaintext c = E(p), where E is oneway, Eve is not able to find p as she is not able to invert E. But Alice faces then the same problem! Of course we would like to provide Alice with means to invert E and find p. Here the notion of a trapdoor oneway function comes in hand. A oneway function f : X → Y is said to be oneway trapdoor, if there is some additional information, called trapdoor , having which it is ”computationally easy” for y ∈ Im(f ) to find x ∈ X such that f (x) = y. Now if Alice possesses such a trapdoor for E she is able to obtain p from c. Example 10.2.2 We now give examples of functions that are believed to be oneway. 1. The first is f : Zn → Zn defined by f (x) = xa mod n. If we take a = 3 it is easy to compute x3 mod n, but having y ∈ Zn it is believed to be hard to compute x such that y = x3 mod n. For suitably chosen a and n this function is used in RSA cryptosystem, Section 10.2.1. For a = 2 one obtains the socalled Rabin scheme. It can be shown that in fact factoring n is equivalent to inverting f in this case. Since factoring of integers is considered to be a hard computational problem, it is believed that f is oneway. For RSA it is believed that inverting f is as hard as factoring, although no rigorous proof is known. In both schemes above it is assumed that n = pq, where p and q are (suitably chosen) primes, and this fact is a public knowledge, but p and q are kept secret. Oneway property relies on hardness of factoring n, i.e. finding p and q. For Alice the knowledge of p and q is a trapdoor using which he is able to invert f . Thus f is believed to be trapdoor oneway function. 2. The second example is g : F∗q → F∗q defined by g(x) = ax , where a generates the multiplicative group F∗q . The problem of inverting g is called discrete logarithm problem (DLP) in Fq . It is a basis for El Gamal scheme, Section
10.2. ASYMMETRIC CRYPTOSYSTEMS
311
10.2.2. The DLP problem is believed to be hard in general, thus g is believed to be one way, since for given x computing ax in F∗q is easy. One may also use domains different from Fq and try to solve DLP there, on some discussion on that cf. Section 10.2.2. 3. Consider a function h : Fkq → Fnq , k < n defined as Fkq 3 m 7→ mG+e ∈ Fnq , where G is a generator matrix of an [n, k, d]q linear code and wt(e) ≤ t ≤ (d − 1)/2. So h defines an encoding function for the code defined by G. When inverting h one faces the problem of bounded distance decoding, which is believed to be hard. The function h is a basis for the McEliece and Niederreiter cryptosystems, see Sections 10.6 and ??. 4. In the last example we consider a function z : Fnq → Fm q , n ≥ m defined 0 as Fnq 3 x 7→ F (x) = (f1 , . . . , fm (x)) ∈ Fm q , where fi s are nonlinear polynomials over Fq . Inverting z means finding a solution of F (X) = 0 of a system of nonlinear equations. This problem is known to be NPhard even if fi ’s are quadratic and q = 2. The function z is a basis of multivariate cryptosystems, see Section 10.2.3. Before going to consider concrete examples of asymmetric cryptosystems, we would like to note that there is a vital necessity of authentication in asymmetric cryptosystems. Indeed, imagine that Eve can not only intercept and read messages, but also alter the repository, where public keys are stored. Suppose Alice is willing to communicate a plaintext p to Bob. Assume that Eve is aware of this intention and is able to substitute Bob’s public key eBob with her key eEve for which she has the corresponding decryption key dEve . Alice, not knowing that the key was replaced, takes eEve and encrypts c = EeEve (p). Eve intercepts c and decrypts p = DdEve (c). So now Eve knows p. After that she may either encrypt p with Bob’s eBob and send the ciphertext to him, or even replace p with some other p0 . As a result, not only Eve gets the secret message p, but Bob can be misinformed by the message p0 , which, as he thinks, comes from Alice. Fortunately there are ways of providing means to tackle this problem. They include use of a third trusted party (TTP) and digital signatures. Digital signatures are asymmetric analogous of (message) authentication codes, Section 10.3. These are out of scope of this introductory chapter. The last remark concerns the type of security that asymmetric cryptosystems provide. Note that opposed to symmetric cryptosystems, some of which can be shown to be unconditionally secure, asymmetric cryptosystems can only be computationally secure. Indeed, having Bob’s public key eBob Eve can simply encrypt all possible plaintexts until she finds p such that EeBob (p) coincides with the ciphertext c that she observed.
10.2.1
RSA
Now we consider an example of one of the most used asymmetric cryptosystems  RSA , named after its creators R. Rivest, A. Shamir, and L. Adleman. This cryptosystem was proposed in 1977 shortly after Diffie and Hellman invented asymmetric cryptography. It it based on hardness of factorizing integers and up to now withstood cryptanalysis, although some of the attacks suggest careful choosing of public/private key and its size. First we present the RSA itself:
312
CHAPTER 10. CRYPTOGRAPHY
how one chooses a public/private key pair, how encryption/decryption is done and why it works. Then we consider a concrete example with small numbers. Finally we discuss some security issues. In this and the following subsection we denote plaintext with m, because historically p and q are reserved in context of RSA. Algorithm 10.2.3 (RSA key generation) Output: RSA public/private key pair ((e, n), d). 1. Choose two distinct primes p and q. 2. Compute n = pq and φ = φ(n) = (p − 1)(q − 1). 3. Select a number e, 1 < e < φ, such that gcd(e, φ) = 1. 4. Using extended Euclidian algorithm, compute d such that ed ≡ 1( mod φ). 5. The key pair is ((e, n), d). The integers e and d above are called encryption and decryption exponent resp.; the integer n is called modulus. For encryption Alice uses the following algorithm. Algorithm 10.2.4 (RSA encryption) Input: Plaintext m and Bob’s encryption exponent e together with the modulus n. Output: Ciphertext c. 1. Represent m as an integer 0 ≤ m < n. 2. Compute c = me ( mod n). 3. The ciphertext for sending to Bob is c. For decryption Bob uses the following Algorithm 10.2.5 (RSA decryption) Input: Ciphertext c, the decryption exponent d, and the modulus n. Output: Plaintext m. 1. Compute m = cd ( mod n). 2. The plaintext is m. Let us see why Bob gets initial m as a result of decryption. Since ed ≡ 1( mod φ), there exists an integer s such that ed = 1 + sφ. For gcd(m, p) there are two possibilities: either 1 or p. If gcd(m, p) = 1, then due to Fermat’s little theorem we have mp−1 = 1( mod p). Raising both sides to s(q − 1)th power and multiplying by m we have m1+s(p−1)(q−1) ≡ m( mod p). Now using ed = 1 + sφ = 1 + s(p − 1)(q − 1) we have med = m( mod p). For the case gcd(m, p) = p we get the last congruence right away. The same argument can be applied to q, so we obtain analogously med = m( mod q). Using the Chinese remainder theorem we then get med = m( mod n). So indeed cd = (me )d = m( mod n).
10.2. ASYMMETRIC CRYPTOSYSTEMS
313
Example 10.2.6 Consider an example of RSA as is described in the algorithms above with some small values. First let us choose the primes p = 5519 and q = 4651. So our modulus is n = pq = 25668869, thus φ = (p − 1)(q − 1) = 25658700. Take e = 29 as an encryption exponent, gcd(29, φ) = 1. Using Euclidian algorithm obtain e · (−3539131) + 4 · φ = 1, so take d = −3539131 mod φ = 22119569. The key pair now is ((e, n), d) = ((29, 25668869), 22119569). Suppose Alice wants to transmit a plaintext message m = 7847098 to Bob. She takes his public key e = 29 and computes c = me ( mod n) = 22152327. She sends c to Bob. After obtaining c Bob computes cd ( mod n) = m. Example 10.2.7 Magma computer algebra system (cf. Appendix ??) gives an opportunity to compute an RSA modulus of a given bitlength. For example, if we want to construct a “random” RSA modulus of bitlength 25, we should write: > RSAModulus(25); 26827289 1658111 Here the first number is the random RSA modulus n and the second one is a number e, such that gcd(e, φ(n)) = 1. We can also specify the number e explicitly (below e = 29): > n:=RSAModulus(25,29); > n; 19579939 One can further factorize n as follows: > Factorization(n); [ <3203, 1>, <6113, 1> ] This means that p = 3203 and q = 6113 are prime factors of n and n = pq. We can also use extended Euclidian algorithm to recover d as follows: > e:=29; phi:=25658700; > ExtendedGreatestCommonDivisor(e, phi); 1 3050975 4 So here 1 is the gcd and d = −3539131, as was computed in the example above. As has already been mentioned, the RSA relies on hardness of factoring integers. Of course, if Eve is able to factor n, then she is able to produce d and thus decrypt all ciphertexts. The open question is whether breaking RSA leads to a factoring algorithm for n. The problem of breaking RSA is called the RSA problem. There is no rigorous proof, though, that breaking RSA is equivalent to factoring. Still it can be shown that computing decryption exponent d and factoring are equivalent. Note that in principle for an attacker it might be unnecessary to compute d in order to figure out m from c given (e, n). Nevertheless, even though there is no rigorous proof of equivalence, RSA is believed to be as hard as factoring. Now we briefly discuss some other things that need to be taken into the consideration, when choosing parameters for the RSA. 1. For fast encryption using small encryption exponent is desirable, e.g. e = 3. The possibility for an attack exists then, if this exponent is used for sending the same message even to different recipients with different moduli. There is also concern about small decryption exponent. For example, if bitlength of d is approximately 1/4 of bitlength of n, then there is an efficient way to get d from (e, n).
314
CHAPTER 10. CRYPTOGRAPHY
2. As to the primes p and q one should take the following into account. First of all p − 1 and q − 1 should not have small factors, as then factoring n with Pollard’s p − 1 algorithm is possible. Then, in order to avoid elliptic curve factoring, p and q should be roughly of the same bitlength. On the other side if the difference p − q is too small then techniques like Fermat factorization become feasible. 3. In order to avoid problems as in (1.), different padding schemes are proposed that add certain amount of randomness to ciphertexts. Thus the same message will be encrypted to one of ciphertexts from some range. An important remark to make is that using socalled quantum computers, which are large enough, it is possible to solve the factorization problem in polynomial time. See Notes for references. The same problem exists for the cryptosystems based on the DLP, which are described in the next subsection. Problems (3.) and (4.) from Example 10.2.2 are not known to be susceptible to quantum computer attacks. Together with some other hard problems, they make a foundation for the postquantum cryptography that deals with cryptosystems resistant to quantum computer attacks. See Notes for references.
10.2.2
Discrete logarithm problem and publickey cryptography
In the previous subsection we considered the asymmetric cryptosystem RSA based on hardness of factorizing integers. As has already been noted in Example 10.2.2, there is also a possibility to use hardness of finding discrete logarithms as a basis for an asymmetric cryptosystem. General DLP is defined below. Definition 10.2.8 Let G be a finite cyclic group of order g. Let α be a generator of this group so that G = {αi 1 ≤ i ≤ g}. The discrete logarithm problem (DLP) in G is the problem of finding 1 ≤ x ≤ g from a = αx , where a ∈ G is given. For cryptographic purposes a group G should possess two main properties: 1.) the operation in G should be efficiently performed and 2.) the DLP in G should be difficult to solve (see Exercise 10.2.4). Cyclic groups that are widely used in cryptography include the multiplicative group F∗q of the finite field Fq (in particular the multiplicative group Z∗p for p prime), a group of points on an elliptic curve over a finite field. Other possibilities that exist out there are the group of units Z∗n for a composite n, the Jacobian of a hyperelliptic curve defined over a finite field, and the class group of an imaginary quadratic number field, see Notes. Here we consider classical El Gamal scheme based on the DLP. As we will see the following description will do for any cyclic group with ”efficient description”. Initially the multiplicative group of a finite field was used. Algorithm 10.2.9 (El Gamal key generation) Output: El Gamal public/private key pair ((G, α, h), a). 1. Choose some cyclic group G of order g = ord(G), where the group operation is done efficiently, and then choose its generator α. 2. Select a random integer a such that 1 ≤ a ≤ g − 2 and compute h = αa .
10.2. ASYMMETRIC CRYPTOSYSTEMS
315
3. The key pair is ((G, α, h), a). Note that G and α can be fixed in advance for all users, so only h becomes a public key. For encryption Alice uses the following algorithm. Algorithm 10.2.10 (El Gamal encryption) Input: Plaintext m and Bob’s encryption public key h together with α and the group description of G. Output: Ciphertext c. 1. Represent m as an element of G. 2. Select random b such that 1 ≤ b ≤ g − 2, where g = ord(G) and compute c1 = αb and c2 = m · hb . 3. The ciphertext for sending to Bob is c = (c1 , c2 ). For decryption Bob uses the following Algorithm 10.2.11 (El Gamal decryption) Input: Ciphertext c, the private key a together with α and the group description of G. Output: Plaintext m. g−1−a 1. In G compute m = c2 · c−a , where g = ord(G). 1 = c2 · c1
2. The plaintext is m. Let us see why we get initial m as a result of decryption. Using h = αa we have b −ab c2 · c−a = m · αab · α−ab = m. 1 =m·h ·α
Example 10.2.12 For this example let us take the group Z∗p where p = 8053 with a generator α = 2. Let us choose a private key to be a = 3117. Compute h = αa mod p = 3030. So the public key is h = 3030 and the private key is a = 3117. Suppose Alice wants to encrypt a message m = 1734 for Bob. For this she chooses a random b = 6809 computes c1 = αb mod p = 3540 and c2 = m · hb mod p = 7336. So her ciphertext is c = (3540, 7336). Upon receiving c Bob computes c2 · cp−1−a mod p = 7336 · 35404935 mod 8053 = 1734. 1 Now we briefly discuss some issues connected with the El Gamal scheme. • Message expansion: It should be noted that oppose to the RSA scheme, ciphertexts in El Gamal are twice as large as plaintexts. So we have that El Gamal scheme actually has a drawback of providing message expansion by factor of 2. • Randomization: Note that in Algorithm 10.2.10 we used randomization to compute a ciphertext. Randomization in encryption gives an advantage that the same message is mapped to different ciphertexts with different encryption runs. This in turn makes chosenplaintext attack more difficult. We will see another example of an asymmetric scheme with randomized encryption in Section 10.6, where we discuss McEliece scheme based on errorcorrecting codes.
316
CHAPTER 10. CRYPTOGRAPHY • Security reliance: The problem of breaking El Gamal scheme is equivalent to the socalled (generalized) DiffieHellman problem, which is the problem of finding αab ∈ G given αa ∈ G and αb ∈ G. Obviously enough, if one is able to solve the DLP, then one is able to solve the DiffieHelmann problem, i.e. DLP is polytime reducible to the DiffieHelmann problem (cf. Definition 6.1.22). It is not known whether these two problems are computationally equivalent. Nevertheless, it is believed that breaking El Gamal is as hard as solving DLP. • As we have mentioned before, El Gamal scheme is vulnerable to quantum computer attacks. See Notes.
10.2.3
Some other asymmetric cryptosystems
So far we have seen examples of asymmetric cryptosystems based on hardness of factoring integers (Section 10.2.1) and solving DLP in the multiplicative group of a finite field (Section 10.2.2). Other examples that will be covered are McEliece scheme that is based on hardness of decoding random linear codes (Section 10.6) and solving DLP in a group of points of an elliptic curve over a finite filed (Section ??). In this subsection we would like to briefly mention what are other alternatives that exist out there. The first direction we consider here is the socalled multivariate cryptography. Here cryptosystems based on hardness of solving the multivariate quadratic (MQ) problem. This problem is the problem of finding a solution x = (x1 , . . . , xn ) ∈ Fnq to the system y1 = f1 (X1 , . . . , Xn ), ... ym = fm (X1 , . . . , Xn ), where fi ∈ Fq [X1 , . . . , Xn ], deg fi = 2, i = 1, . . . , m and the vector y = (y1 , . . . , ym ) ∈ Fm q is given. This problem is known to be NPhard, so is thought to be a good source of a oneway function. The trapdoor is added by choosing fi ’s having some structure that is kept secret and allows decryption that e.g. boils down to univariate factorization over a larger field. To an eavesdropper, though, the system above with such a trapdoor should appear random. So the idea is that the eavesdropper can do no better than solve a random quadratic system over a finite field which is believed to be a hard problem. The cryptosystems and digital schemes in this category include e.g. Hidden Field Equations (HFE), SFLASH, Unbalanced Oil and Vinegar (UOV), Stepwise Triangular Schemes (STS), and some others. Some of those were broken and several modification were proposed to overcome the attacks (e.g. PFLASH, enSTS). At present it is not quite clear whether it is possible to design a secure multivariate cryptosystem. A lot of research in this area, though, gives a basis for optimism. Another wellknown example of a cryptosystem based on an NPhard problem is the knapsack cryptosystem. This cryptsystem was the first concrete realization of an asymmetric scheme and was proposed in 1978 by Merkle and Hellman. The knapsack cryptosystem is based on a wellknown NPhard subset sum problem. Namely the problem by given the set of positive integers A = {a1 , . . . , an } and the positive integer s to find a subset of A, such that the sum of elements from A yields s. The idea of Merkle and Hellman was to make socalled super increasing sequences, for which the above problem is easily solved, appear as
10.3. AUTHENTICATION, ORTHOGONAL ARRAYS, AND CODES
317
a random set A, thus providing a trapdoor. So an eavesdropper supposedly has nothing better to do as to deal with wellknown hard problem. This initial proposal was broken by Shamir and later an improved version was broken by Brickell. These attacks are based on integer lattices and made quite a shake in cryptographic community at that time. There are some other types of cryptosystems out there: polynomial based “Poly Cracker”type, lattice based, hash based, and group based. Therefore, we may summarize that active research is being conducted in order to provide alternatives to widely used cryptosystems.
10.2.4
Exercises
10.2.1 a. Given primes p = 5081 and q = 6829 and an encryption exponent e = 37 find the corresponding decryption exponent and encrypt the message m = 29800110. b. Let e and m be as above. Generate (e.g. with Magma) a random RSA modulus n of bitlength 25. For these n, e, m find the corresponding decryption exponent via factorizing n; encrypt m. 10.2.2 Show that a number λ = lcm(p − 1, q − 1) that is called universal exponent of n, can be used instead of φ in Algorithms 10.2.3 and 10.2.5. 10.2.3 Generate a public/private key pair for El Gamal scheme with G = Z∗7121 and encrypt a message m = 5198 using this scheme. 10.2.4 Give an example of a finite cyclic group where the DLP problem is easy to solve. 10.2.5 Show that using the same b in Algorithm 10.2.10 at least for two different encryptions is insecure, namely if c0 and c00 are two ciphertexts that correspond to m0 and m00 , which were encrypted with the same b, then knowing one of the plaintexts yields the other.
10.3
Authentication, orthogonal arrays, and codes
10.3.1
Authentication codes
In Section 10.1 we dealt with the problem of secure communication between two parties by means of symmetric cryptosystems. In this section we address another important problem, the problem of data source authentication. So we are now interested in providing means for Bob to make sure that a (encrypted) message he received from Alice indeed was sent by her and was not altered during the transmission. In this section we consider socalled authentication codes that provide tools necessary to ensure authentication. These codes are analyzed in terms of unconditional security (see Definition 10.1.8). For practical purposes one is more interested in computational security. Analogous to authentication codes for this purpose are message authentication codes (MACs). It is also to be noted that authentication codes are, in a sense, symmetric based, i.e. a secretly shared key is needed to provide such an authentication. There is also asymmetric analogue (Section 10.2) called a digital signature. In this model everybody can
318
CHAPTER 10. CRYPTOGRAPHY
verify Alice’s signature by publicly available verification algorithm. Let us go on now to the formal definition of an authentication code. Definition 10.3.1 An authentication code is defined by the following data: • A set of source states S. • A set of authentication tags T . • A set of keys, the keyspace K. • A set of authentication maps A parameterized by K: for each k ∈ K there is an authentication map ak : S → T We also define a message space M = S × T . The idea of authentication is as follows. Alice and Bob secretly agree on some secret key k ∈ K for their communication. Suppose that Alice wants to transmit a message s which per definition above is called a source state. Note that now we are not interested in providing secrecy for s itself, but rather in providing means of authentication for s. For the transmission Alice adds an authentication tag to s by t = ak (s). She then sends concatenated message (s, t). Usually (s, t) is an encrypted message, maybe also encoded for errorcorrection, but for us it does not play a role here. Suppose Bob receives (s0 , t0 ). He separates s0 and t0 and checks whether s0 = ak (t0 ). If the check succeeds, he accepts s0 as a valid message that came from Alice otherwise he rejects it. If no intrusion occurred we have s = s0 and t = t0 and the check trivially succeeds. But what if Eve wants to alter the message and make Bob believe that the altered by her choice message still originates from Alice? There are two types of Eve’s malicious actions one usually considers. • Impersonation: Eve sends some message (s, t) with an intention that Bob accepts it as Alice’s message, i.e. she aims at passing the check s = ak (t) with high probability, where the key k is unknown to Eve. • Substitution: Eve intercepts Alice’s message (s, t). Now she wants to substitute instead another message (s0 , t0 ), where s0 6= s such that ak (s0 ) = t0 for the key k unknown to Eve. As has already been said authentication codes are studied from the point of view of unconditional security, i.e. we assume that Eve has unbounded computational power. In this case we need to show that no matter how much computational power Eve has, she cannot succeed in the above attack scenarios with a large probability. Therefore, we need to estimate probabilities of success of impersonation PI and substitution PS , given probability distributions pS and pK of the source states set and key space resp. The probabilities PI and PS are also called deception probabilities. Note that PI as well as PS are computed in assumption that Eve tries to maximize her chances of deception. In reality Eve might want not only to maximize her probability to pass the check, but also she might have some preference as to which message she wants to substitute for Alice’s one. For example, intercepting Alice’s message (s, t), where s =”Meeting is at seven”, she would like to send something like (s0 , t0 ), where s0 =”Meeting is at six”. Thus PI and PS actually provide an upper bound on Eve’s chances of success.
10.3. AUTHENTICATION, ORTHOGONAL ARRAYS, AND CODES
319
Let us first compute PI . Let us compute what is the probability of some message (s, t) to be validated by Bob, when some private key k0 ∈ K is used. In P fact for Eve every key k that maps s to t will do. So, Pr(ak0 (s) = t) = k∈K:ak (s)=t pK (k). Now in order to maximize her chances, Eve should choose (s, t) with Pr(ak0 (s) = t) largest possible, i.e. PI = max{Pr(ak0 (s) = t)s ∈ S, t ∈ T }. Note that PI depends only on the distribution pK and not on pS . Computing PS is a bit trickier. We obtain that conditional probability Pr(ak0 (s0 ) = t0 ak0 (s) = t) of the fact that Eve’s message (s0 , t0 ), s0 6= s passes the check once valid message (s, t) is known is Pr(ak0 (s0 ) = t0 ak0 (s) = t) = P
=
k∈K:ak (s0 )=t0 ,ak (s)=t P k∈K:ak (s)=t
pK (k)
pK (k)
Pr(ak0 (s0 )=t0 ,ak0 (s)=t) Pr(ak0 (s)=t)
=
.
Having (s, t), Eve maximizes her chances by choosing (s0 , t0 ), s0 6= s, such that the corresponding conditional probability is maximal. To reflect this, introduce ps,t := max{Pr(ak0 (s0 ) = t0 ak0 (s) = t)s0 ∈ S \ {s}, t0 ∈ T }. Now in order to get PS we need to take weighted average of ps,t according to the distribution pS : X PS = pM (s, t)ps,t , (s,t)∈M
where the distribution pM is obtained as pM (s, t) = pS (s)p(ts) = pS (s)× × Pr(ak0 (s) = t). The value Pr(ak0 (s) = t) is called payoff of a message (s, t), we denote it as π(s, t). Also Pr(ak0 (s0 ) = t0 ak0 (s) = t) is a payoff of a message (s0 , t0 ) given a valid message (s, t), we denote it as πs,t (s0 , t0 ). For convenience one may think of an authentication code as of array, which rows are indexed by K, columns by S and an entry (k, s) for k ∈ K, s ∈ S has a value ak (s), see Exercise 10.3.1. We have discussed some basic things about authentication codes. So the question now is what are important criteria for a good authentication code. These are summarized below: 1. The deception probabilities must be small, so that eavesdropper’s chances are low. 2. S should be large to facilitate authentication of potentially large number of source states. 3. Note that since we are studying authentication codes from the point of view of unconditional security, the secret key should be used only one, and then changed for the next transmission as in onetime pad cf. Example ??. Thus K should be minimized, because key values have to be transmitted every time. E.g. if K = {0, 1}l , then keys of length log2 K = l are to be transmitted. Let us now concentrate on item (1.); items (2.) and (3.) are considered in the next subsections, where different constructions of authentication codes are presented. We would like to see what values can be achieved by PI and PS and under which circumstances do they achieve minimal possible values. Basic results are collected in the following proposition.
320
CHAPTER 10. CRYPTOGRAPHY
Proposition 10.3.2 Let the authentication code with the data S, T , K, A, pS , pK be fixed. We have 1. PI ≥ 1/T . Moreover, PI = 1/T  iff π(s, t) = 1/T  for all s ∈ S, t ∈ T . 2. PS ≥ 1/T . Moreover, PS = 1/T  iff πs,t (s0 , t0 ) = 1/T  for all s, s0 ∈ S, s 6= s0 ; t, t0 ∈ T . 3. PI = PS = 1/T  iff π(s, t)πs,t (s0 , t0 ) = 1/T 2 for all s, s0 ∈ S, s 6= s0 ; t, t0 ∈ T . Proof. 1. For a fixed source state s ∈ S we have X X X X π(s, t) = pK (k) = pK (k) = 1. t∈T
t∈T k∈K:ak (s)=t
k∈K
Thus for every s ∈ S there exists an authentication tag t = t(s) ∈ T , such that π(s, t(s)) ≥ 1/T . Now the claim follows by the computation of PI we made above. Note that equality is possible iff π(s, t) = 1/T  for all s ∈ S, t ∈ T . 2. For different fixed source states s, s0 ∈ S and a tag t ∈ T , such that (s, t) is valid we have X pK (k) X X k∈K:ak (s0 )=t0 ,ak (s)=t 0 0 X = πs,t (s , t ) = pK (k) t0 ∈T t0 ∈T k∈K:ak (s)=t
X =
pK (k)
k∈K:ak (s)=t
X
pK (k)
= 1.
k∈K:ak (s)=t
So for every s0 , s, t, s 6= s0 there exists a tag t0 = t0 (s0 ) : πs,t (s0 , t0 (s0 )) ≥ 1/T . Now the claim follows by the computation of PS we made above. Note that equality is possible iff πs,t (s0 , t0 ) = 1/T  for all s ∈ S, t ∈ T , due to the definition of ps,t . 3. If PI = PS = 1/T , then π(s, t) = 1/T  for all s ∈ S, t ∈ T and πs,t (s0 , t0 ) = 1/T  for all s, s0 ∈ S, s 6= s0 ; t, t0 ∈ T . For all s, s0 ∈ S, s 6= s0 ; t, t0 ∈ T we have π(s, t)πs,t (s0 , t0 ) = 1/T 2 . If π(s, t)πs,t (s0 , t0 ) = 1/T 2 for all s, s0 ∈ S, s 6= s0 ; t, t0 ∈ T , then due to the equality X X X 1 1 π(s, t) = π(s, t) πs,t (s0 , t0 ) = π(s, t)πs,t (s0 , t0 ) = = . 2 T  T  0 0 0 t ∈T
t ∈T
so PI = 1/T  by (1.). Now πs,t (s0 , t0 ) = So PS = 1/T  by (2.).
1 1 = . T 2 π(s, t) T 
t ∈T
10.3. AUTHENTICATION, ORTHOGONAL ARRAYS, AND CODES
321
As a straightforward consequence we have Corollary 10.3.3 With the notation as above and assuming that pK is the uniform distribution (keys are equiprobable), we have PI = PS = 1/T  iff {k ∈ K : ak (s0 ) = t0 , ak (s) = t} =
K , T 2
for all s, s0 ∈ S, s 6= s0 ; t, t0 ∈ T .
10.3.2
Authentication codes and other combinatorial objects
Authentication codes from orthogonal arrays Now we take a look at certain combinatorial objects, called orthogonal arrays that can be used for constructing authentication systems. A bit later we also consider a construction that uses errorcorrecting codes. For the definitions and basic properties of orthogonal arrays the reader is referred to Chapter 5, Section 5.5.1. What is important for us is that orthogonal arrays yield a construction of authentication codes in quite a natural way. The next proposition shows a relation between orthogonal arrays and authentication codes. Proposition 10.3.4 If there exists an orthogonal array OA(n, l, λ) with symbols from the set N with n elements, then one can construct an authentication code with S = l, K = λn2 , T = N and thus T  = n, for which PI = PS = 1/n. Conversely, if there exists an authentication code with the above parameters, then there exists an orthogonal array OA(n, l, λ). Proof. Consider OA(n, l, λ) as an array representation of an authentication code from Section 5.5.1. Moreover, set pK to be uniform, i.e. pK (k) = 1/(λn2 ) for every k ∈ K. Then values of parameters of such a code easily follow. In order to obtain values for PI and PS use Corollary 10.3.3. Indeed, {k ∈ K : ak (s0 ) = t0 , ak (s) = t} = λ by the definition of an orthogonal array, but λ = K/T 2 . The claim now follows. The converse if proved analogously. Let us now consider which criteria should be met by orthogonal arrays in order to produce good authentication codes. Parameters estimates for orthogonal arrays in terms of of authentication codes parameters n, l, λ follow directly from the above proposition. • If we set that deception probabilities should be at most some value: PI ≤ , PS ≤ , then an orthogonal array should have n ≥ 1/. • As we can always remove some columns from an orthogonal array and still obtain one after removal, we demand that l ≥ S. • λ should be minimized under constraints imposed by the previous two items. This is due to the fact that we would like to keep key space size as low as possible, as has already been noted in the previous subsection.
322
CHAPTER 10. CRYPTOGRAPHY
Finally, we present without proofs two characterization results, which say that if one wants to construct authentication codes with minimal deception probabilities, one cannot avoid using orthogonal arrays. Theorem 10.3.5 Assume there exists an authentication code defined by S, T , K, A, pK , pS with T  = n and PI = PS = 1/n. Then: 1. K ≥ n2 . The equality is achieved iff there exists an orthogonal array OA(n, l, 1) with l = S and pK (k) = 1/n2 for every k ∈ K. 2. K ≥ l(n − 1) + 1. The equality is achieved iff there exists an orthogonal array OA(n, l, λ) with l = S, λ = (l(n − 1) + 1)/n2 and pK (k) = 1/(l(n − 1) + 1) for every k ∈ K.
Authentication codes from errorcorrecting codes As we have seen above, if one wants to keep deception probabilities minimal, one has to deal with orthogonal arrays. A significant drawback of this approach is that the key space grows linearly in size of the source state set. In particular we have from Theorem 10.3.5 (2.) that K > l ≥ S. This means that amount of information that needs to be transmitted secretly is larger than the one that is allowed to go through a public channel. The same problem occurs in the onetime pad scheme, Example ??. Of course, this is not quite practical. In this subsection we consider socalled almost universal and almost strongly universal hash functions. By means of these functions it is possible to construct authentication codes with deception probabilities slightly larger than minimal, but size of the source state set of which grows exponentially in the key space size. This gives an opportunity to work with much shorter keys sacrificing security threshold a bit. Next we give a definition of an almost universal hash function. Definition 10.3.6 Let X and Y be some sets of cardinality n and m respectively. Consider the family H of functions f : X → Y . Denote N := H. We call a family H almost universal , if for every two different x1 , x2 ∈ X the number of functions f from H such that f (x1 ) = f (x2 ) is ≤ N . Notation for such a family is − AU (N, n, m). There is a natural connection between almost universal hash functions and errorcorrecting codes as is shown next. Proposition 10.3.7 The existence of one of the two objects below implies the existence of the other: 1. H = − AU (N, n, m) family of almost universal hash functions. 2. An mary errorcorrecting code C of length N , cardinality n and relative minimum distance d/N ≥ 1 − . Proof. Let us first describe − AU (N, n, m) as an array, similarly to how we have done it for orthogonal arrays. Rows of the representation array are indexed by functions from H and columns by the set X. On the place indexed by f ∈ H and x ∈ X we write f (x) ∈ Y . Now the equivalence becomes clear.
10.3. AUTHENTICATION, ORTHOGONAL ARRAYS, AND CODES
323
Indeed, consider this array also as a codebook for an errorcorrecting code C, so that the codewords are written in columns. It is clear that the length is the number of rows, N , cardinality is the number of columns, n. Entries of the array take their values from Y , thus C is an mary code. Now the definition of H implies that for any two codewords x1 and x2 (columns), the number of positions where they agree is ≤ N . But d(x1 , x2 ) is the number of positions where they disagree, so d(x1 , x2 ) ≥ (1 − )N , so d/N ≥ 1 − . The reverse implication is proven analogously. Next we define almost strongly universal hash functions that are used for authentication. Definition 10.3.8 Let X and Y be sets of cardinality n and m respectively. Consider a family H of functions f : X → Y . Denote N := H. We call a family H almost strongly universal, if the following two conditions hold: 1. For every x ∈ X and y ∈ Y the number of functions f from H such that f (x) = y is N/m. 2. For every two different x1 , x2 ∈ X and every y1 , y2 ∈ Y the number of functions f from H such that f (xi ) = yi , i = 1, 2 is ≤ · N/m. Notation for such a family is − ASU (N, n, m). Almost strongly universal hash functions are nothing but authentication codes with some conditions on the deception probabilities. The following proposition is quite straightforward and is left to the reader as an exercise. Proposition 10.3.9 If there exists a family H which is −ASU (N, n, m), then there exists an authentication code with K = H, S = X, T = Y , pK a uniform distribution, such that PI = 1/m and PS ≤ . Note that if = 1/m in Definition 10.3.8, then from Proposition 10.3.9, 10.3.2 (2.), 10.3.4 we see that − ASU (N, n, m) is actually an orthogonal array. The problem with orthogonal arrays has already been mentioned above. Note that with almost strongly universal hash functions we have more freedom, as we can make a bit larger, but gaining in other parameters, as we will see below. So for us it is interesting to be able to construct good ASU families. There are two methods of doing so based on coding theory: 1. Construct AU families from codes as per Proposition 10.3.7 and then use Stinson’s composition method , Theorem10.3.10 below. 2. Construct ASU families directly from errorcorrecting codes. Here we consider (1.). For (2.) see the Notes. The next result due to Stinson enables one to construct ASU families from AU families and some previously constructed ASU families; we omit the proof thereof. Theorem 10.3.10 Let X, Y, U be sets of cardinality n, m, u resp. Let H1 be an AUfamily 1 − AU (N1 , n, u) of functions f1 : X → U and let H2 be an ASUfamily 2 − ASU (N2 , u, m) of functions f2 : U → Y . Consider a family H of all possible compositions thereof: H = {f f = f2 ◦ f1 , fi ∈ Hi , i = 1, 2}. Then H is − ASU (N, n, m), where = 1 + 2 − 1 2 and N = N1 N2 .
324
CHAPTER 10. CRYPTOGRAPHY
Table 10.2: For Exercise 10.3.1
1 2 3 4
1 2 3 1 2
2 1 2 1 3
3 2 1 3 2
One example of the idea (1.) that employs ReedSolomon codes is given in Exercise 10.3.2. Note that from Exercise 10.3.2 and Proposition 10.3.9 follows 1/5 that there exists an authentication code with S = K(2/5)K (set a = 2b) and PI = 1/T , PS = 2/T . So by allowing the probability of substitution deception to rise just two times from the minimal value, we obtain that S grows exponentially in K, which was not possible with orthogonal arrays, where always K > S.
10.3.3
Exercises
10.3.1 An authentication code is represented by the array in Table 10.2 (cf. Sections 10.3.1, 5.5.1). The distributions pK and pS are given as follows: pS (1) = pS (3) = 1/4, pS (2) = 1/2; pK (1) = pK (2) = pK (3) = 1/6, pK (4) = 1/2. Compute PI and PS . Hint: For computing the sums use the following: e.g. for the sum X pK (k) k∈K:ak (s)=t
look at the column corresponding to s and look at which rows entry t appear, then sum up probabilities that correspond to marked rows (they are indexed by keys). 10.3.2 Consider a qary [q, k, q − k + 1] ReedSolomon code. • Construct the corresponding AU family using Proposition 10.3.7. What are parameters thereof? It is known that for natural numbers a, b : a ≥ b and q a prime power there exists an ASU family 1/q b − ASU (q a+b , q a , q b ). Using Stinson’s composition, Theorem 10.3.10, • prove that there exists an ASU family 2/q b − ASU (q 2a+b , q aq < 1/q a + 1/q b .
10.4
a−b
, q b ) with
Secret sharing
In the model of symmetric (Section 10.1) and asymmetric (Section 10.2) cryptography a onetoone relation between Alice and Bob is assumed, maybe with
10.4. SECRET SHARING
325
a trusted party in the middle. This means that Alice and Bob have necessary pieces of secret information to convey the communication between them. Sometimes it is necessary to distribute this secret information among several participants. Possible scenarios of such applications are: distributing the secret information among the participants in a way so that even if some participants lost some pieces of this secret information it is still possible to reconstruct the whole secret; also sometimes shared responsibility is required, i.e. some action is to be triggered only when several participants combine their secret pieces of information to form the one that triggers that action. Examples of the latter one could be triggering some military action (e.g. missile launch) by several authorized persons (e.g a president, higher military officials) or opening a bank vault by several topofficials of a bank. In this section we consider mathematical means to achieve the goal. The schemes providing such functionality are called secret sharing schemes. We consider in detail the first such scheme proposed by Adi Shamir in 1979. Then we also briefly demonstrate how errorcorrecting codes can be used for a construction of linear secret sharing schemes. In secret sharing schemes shares are produced from the secret to be shared. These shares are then assigned to participants of the scheme. The idea is that if several authorized participants gather in a group that is large enough they should be able to reconstruct the secret using knowledge of their shares. On the contrary if a group is too small or some outsiders decide to find out the secret, their knowledge should not be enough to figure it out. This leads to the following definition. Definition 10.4.1 Let Si , i = 1, . . . , n be the shares that are produced from the secret S. Consider a collection of n participants where each participant is assigned his/her share Si . A (t, n) threshold scheme is a scheme where every group of t (or more) participants out of n can obtain the secret S using their shares. On the other hand any group of less than t participants should not be able to obtain S. We next present the Shamir’s secret sharing scheme that is a classical example of a (t, n) threshold scheme for any n and t ≤ n. Algorithm 10.4.2 (Shamir’s secret sharing scheme) Setup: taking n as input prepare the scheme for n participants 1. Choose some prime power q > n and fix a working field Fq that will be used for all the operations in the scheme. 2. Assign to the n participants P1 , . . . , Pn some distinct nonzero elements x1 , . . . , xn ∈ F∗q . Input: The threshold value t, the secret information S in some form. Output: The secret S is shared among n participants. Generation and distribution of shares: 1. Encode a secret to be shared as an element S ∈ Fq . If this is not possible redo the Setup phase with a larger q. 2. Choose randomly t − 1 elements a1 , . . . , at−1 ∈ Fq . Assign a0 := S and Pt−1 form the polynomial f (X) = i=0 ai X i ∈ Fq [X].
326
CHAPTER 10. CRYPTOGRAPHY
3. For i = 1, . . . , n do • compute value yi = f (xi ), i = 1, . . . , n and assign the value yi to Pi . Computing the secret from the shares: 1. Any t participants Pi1 , . . . , Pit pull their shares yi1 , . . . , yit together and then using e.g. Lagrange interpolation with t interpolation points (xi1 , yi1 ), . . . , (xit , yit ) restore f and thus a0 = S = f (0). The part ”Computing the secret from the shares” is clearly justified by the following formulas of Lagrange interpolation (w.l.o.g. the first t participants pull their shares): t X Y X − xj , f (X) = yi xi − xj i=1 j6=i
so that f (xi ) = yi , i = 1, . . . , t and f is a unique polynomial of degree ≤ t − 1 with this property. Of course the participants do not have to reconstruct the whole f , they just need to know a0 that can be computed as S = a0 =
t X i=1
ci yi ,
ci =
Y j6=i
xj . xj − xi
(10.2)
So every t or more participants can recover the secret value S = f (0). On the other hand it is possible to show that for any t − 1 shares (w.l.o.g. the first ones) (xi , yi ), i = 1, . . . , t − 1 and any a ∈ Fq there exists a polynomial fa such that its evaluation at 0 is a. Indeed, take fa (X) = a + X f˜a (X), where f˜a (X) is the Lagrange polynomial of degree ≤ t − 2 such that f˜a (xi ) = (yi − a)/xi , i = 1, . . . , t − 1 (recall that xi ’s are nonzero). Then deg fa ≤ t − 1, fa (xi ) = yi , and fa (0) = a. So this means that any t−1 (or less) participants have no information about S: the best they can do is to guess the value of S, the probability of such a guess is 1/q. This is because, to their knowledge, f can be any of fa ’s. Example 10.4.3 Let us construct a (3, 6) Shamir’s threshold scheme. Take q = 8 and fix the field F8 = F2 [α]/hα3 + α + 1i. Element α is a generating element of F8 ∗ . For i = 1, . . . , 6 assign xi = αi to the participant Pi . Suppose that the secret S = α5 is to be shared. Choose a1 = α3 , a2 = α6 , so that f (X) = α5 +α3 X +α6 X 2 . Now evaluate y1 = f (α) = α3 , y2 = f (α2 ) = α3 , y3 = f (α3 ) = α6 , y4 = f (α4 ) = α5 , y5 = f (α5 ) = 1, y6 = f (α6 ) = α6 . For every i = 1, . . . , 6 assign yi as a share for Pi . Now suppose that the participants P2 , P3 , and P5 decide to pull their shares together and obtain S. As in (10.2) they compute x5 3 = 1, c3 = 1, c5 = 1. Accordingly, c2 y2 + c3 y3 + c5 y5 = α5 = S. c2 = x3x−x 2 x5 −x2 On the other hand, due to the explanation above, any 2 participants cannot deduce S from their shares. In other words, any element of F8 is equally likely for them to be the secret. See Exercise 10.4.1 for a simple construction of a (t, t) threshold scheme. Next let us outline how one can use linear errorcorrecting codes to construct secret sharing schemes. Let us fix the finite field Fq : the secret values will be drawn from this field. Also consider an [n, k]q linear code C with a generator 0 matrix G that has g00 , . . . , gn−1 as columns (we add dashes to indicate that
10.4. SECRET SHARING
327
these are columns and they are not to be confused with the usual notation for rows of G). Choose some information vector a ∈ Fkq such that S = ag00 , where S is the secret information. Then compute s = (s0 , s1 , . . . , sn−1 ) = aG. Now s0 = S and s1 , . . . , sn−1 can be used as shares. The next result characterizes a situation when the secret S can be obtained from the shares. Proposition 10.4.4 With the notation as above, let si1 , . . . , sim be some shares for 1 ≤ m ≤ n − 1. These shares can reconstruct the secret S iff c⊥ = (1, 0, . . . , 0, ci1 , 0, . . . , 0, cim , 0, . . . , 0) ∈ C ⊥ , where at least one cij 6= 0. Proof. The claim follows from the fact that G · c⊥T = 0 and that the secret S = ag00 can be obtained iff g00 is a linear combination of gi0 1 , . . . , gi0 m . If we carefully look one more time at Shamir’s scheme it is not a surprise that it can be seen as the above construction with ReedSolomon code as a code C. Indeed, choose N = q−1 and set xi = αi where α is a primitive element of Fq . It is then quite easy to see that encoding the secret and shares via the polynomial f as in Algorithm 10.4.2 is equivalent to encoding via the ReedSolomon code RSt (N, 1), cf. Definition 8.1.1 and Proposition 8.1.4. The only nuance is that in general we may assign some n ≤ N shares and not all N . Now we need to see that every collection of t shares reconstructs the secret. Using the above notation, let si1 , . . . , sit be the shares pulled together. According to Proposition 10.4.4 the dual of C = RSt (N, 1) should contain a codeword with the 1 at the first position and at least one nonzero element at positions i1 , . . . , it . From Proposition 8.1.2 we have that RSt (N, 1)⊥ = RSN −t (N, N ) and RSN −t (N, N ) is an MDS [N, N − t, t + 1] code. We use now Corollary 3.2.14 with the t + 1 positions 1, i1 , . . . , it and we are guaranteed to have a prescribed codeword. Therefore every collection of t shares reconstructs the secret. Having xi = αi is not really a restriction (Exercise 10.4.3). In general the problem of constructing secret sharing schemes can be reduced to finding codewords of minimum weight in a dual code as per Proposition 10.4.4. There are more advanced constructions based on errorcorrecting codes, in particular based on AGcode, see Notes for the references. It is clear that if a group of participants can recover the secret by combining their shares, then any group of participants containing this group can also recover the secret. We call a group of participants a minimal access set, if the participants of this group can recover the secret with their shares, while any proper subgroup of participants can not do so. From the preceding discussions, it is clear that there is onetoone correspondence between the set of minimal access sets and the set of minimal weight codewords of the dual code C ⊥ whose first coordinate is 1. Therefore, for a secret sharing scheme based on a code C, the problem of determining the access structure of the secret sharing scheme is reduced to the problem of determining the set of minimal weight codewords whose first coordinate is 1. It is obvious that the shares for the participants depend on the selection of the generator matrix G of the code C. However, by Proposition ??, the selection of generator matrix does not affect the access structure of the secret sharing scheme. Note that the set of minimal weight codewords whose first coordinate is 1 is a subset of the set of all minimal weight codewords. The problem of determining the set of all minimal weight codewords of a code is known as the covering
328
CHAPTER 10. CRYPTOGRAPHY
problem. This problem is a hard problem for an arbitrary linear code. In the following, let us have some more discussions on the access structure of secrete sharing schemes based on special classes of linear codes. It is clear that for any participant, he (she) must be in at least one minimal access set. This is true for any secret sharing scheme. Now, we further ask the following question: Given a participant Pi , how many minimal access sets are there which contain Pi ? This question is solved if the dual code of the code used by the secret sharing scheme is a constant weight code. In the following proposition, we suppose C is a qary 0 [n, k] code, and G = (g00 , g10 , . . . , gn−1 ) is a generator matrix of C. Proposition 10.4.5 Suppose C is a constant weight code. Then, in the secret sharing scheme based on C ⊥ , there are q k−1 minimal access sets. Moreover, we have the following: (1) If gi0 is a scalar multiple of g00 , 1 ≤ i ≤ n − 1, then every minimal access set contains the participant Pi . Such a participant is called a dictatorial participant. (2) If gi0 is not a scalar multiple of g00 , 1 ≤ i ≤ n−1, then there are (q −1)q k−1 minimal access sets which contain the participant Pi . Proof.
.........will be given later.........
The following problem is an interesting research problem: Identify (or construct) linear codes which are good for secret sharing, that is, the covering problem can be solved, or the minimal weight codewords can be well characterized. Several classes of linear codes which are good for secret sharing have been identified, see the papers by C. Ding and J. Yuan.
10.4.1
Exercises
10.4.1 Suppose that some trusted party T wants to share a secret S ∈ Zm between two participants A and B. For this, T generates some random number a ∈ Zm and assigns it to A. T then assigns b = S − a mod m to B. • Show that the scheme above is a (2, 2) threshold scheme. This scheme is an example of a splitknowledge scheme. • Generalize the idea above to construct a (t, t) threshold scheme for arbitrary t.
10.4.2 Construct a (4, 7) Shamir’s threshold scheme and share the bitstring ”1011” using it. Hint: Represent the bitstring ”1011” as an element of a finite field with more than 7 elements. 10.4.3 Remove the restriction on xi being equal to αi in the ReedSolomon construction of Shamir’s scheme by using Proposition 3.2.10.
10.5. BASICS OF STREAM CIPHERS. LINEAR FEEDBACK SHIFT REGISTERS329
10.5
Basics of stream ciphers. Linear feedback shift registers
In Section 10.1 we have seen how block ciphers are used for construction of symmetric cryptosystems. Here we give some basics of stream ciphers, i.e. ciphers that proceed information bitwise as oppose to blockwise. Stream ciphers are usually faster than block ciphers and have lower requirements on implementation costs. Nevertheless, stream ciphers appear to be more susceptible to cryptanalysis. Therefore, much care should be put in designing a secure cipher. In this section we concentrate on stream cipher design that involves the linear feedback shift register (LFSR) as one of the building blocks. The difference between block and stream ciphers is quite vague, since a block cipher can be turned to a stream one using some special mode of operation. Nevertheless, let us see what are characterizing features of such ciphers. A stream cipher is defined via its stream of states S, the keystream K, and the stream of outputs C. Having an input (plaintext) stream P, one would like to obtain C using S and K by operating successively on individual units of these streams. The streams C and K are obtained using some key, either secret or not. If these units are binary bits, we are dealing with the binary cipher. Consider an infinite sequence (a stream) of key bits k1 , k2 , k3 , . . . , and a stream of plaintext bits p1 , p2 , p3 , . . . . Then we can form a ciphertext stream by simply adding the key stream and the plaintext stream bitwise: ci = pi ⊕ ki , i = 1, 2, 3, . . . . One can stop at some moment n thus having the nbit ciphertext from the nbit key and the nbit plaintext. If ki ’s are chosen uniformly at random and independently, we have the onetime pad scheme. It can be shown that in the onetime pad if an eavesdropper only possesses the ciphertext, he/she cannot say anything about the plaintext. In other words, the knowledge of the ciphertext does not shed any additional light on the plaintext for an eavesdropper. Moreover, an eavesdropper even knowing n key bits is completely uncertain about the (n + 1)th bit. This is a classical example of unconditionally secure cryptosystem, cf. Definition 10.1.8. Although the above idea yields provable guarantees for security it has an essential drawback: a key should be at least as long as a plaintext, which is a usual thing in unconditionally secure systems, see also Section 10.3.1. Clearly this requirement is quite impractical. That is why what is usually done is the following. One starts with some bitstring of a fixed size called a seed , and then by making some operations with this string obtains some larger string (it can be infinite theoretically), which should “appear random” to an eavesdropper. Note that since the seed is finite we cannot talk about unconditional security anymore, only computational. Indeed, having long enough key stream in the knownplaintext scenario, it is in principle possible to run an exhaustive search on all possible seeds to find out the one that gives rise to the given key stream. In particular all the successive bits of the key stream will be known. Now let us present two commonly used types of stream ciphers: synchronous and selfsynchronizing. Let P = {p0 , p1 , . . . } be the plaintext stream, K = {k0 , k1 , . . . } be the keystream, C = {c0 , c1 , . . . } be the ciphertext stream, and S = {s0 , s1 , . . . } be the state stream. The synchronous stream ciphersynchronous
330
CHAPTER 10. CRYPTOGRAPHY
stream cipher is defined as follows: si+1 = f (si , k), ki = g(si , k), ci = h(ki , pi ), i = 0, 1, . . . . Here s0 is the initial state and f is the state function, which generates a next state from the previous one and also depends on a key. Now ki ’s form the key stream via the function g. See Exercise 10.5.1 for some toy example. Finally the ciphertext is formed by applying the output function h to the bits ki and pi . This cipher is called synchronous, since both Alice and Bob need to use the same key stream (ki )i . If some (non)malicious insertions/deletions occur the synchronization is lost, so additional means for providing synchronization are necessary. Note that usually the function h is just a bitwise addition of streams (ki )i and (pi )i . It is also very common for stream ciphers to have an initialization phase, where only the states si are updated first and the update and output starts to happen only at some later point of time. Therewith the key stream (ki ) gets more complicated and dependent on more state bits. The selfsynchronizing stream cipher is defined as si = (ci−t , . . . , ci−1 ), ki = g(si , k), ci = h(ki , pi ), i = 0, 1, . . . . Here (c−t , . . . , c−1 ) is a nonsecret initial state. So the encryption/decryption depends only on some number of ciphertext bits, therefore the output stream is able to recover from deletions/insertions. Observe that if h is a bitwise addition modulo 2, then the stream ciphers described above follow the idea of the onetime pad. The difference is that now one obtains the key stream (ki )i not fully randomly, but as a pseudorandom expansion of an initial state (seed) s0 . The LFSR is used as a building block in many stream ciphers that facilitates such a pseudorandom expansion. The LFSRs have an advantage that they can be efficiently implemented in hardware. Also the outputs of LFSRs have nice statistical properties. Moreover, LFSRs are closely related to socalled linear recurring sequences that are readily studied via algebraic methods. Schematically an LFSR can be presented as in Figure 10.3 Let us figure out what is going on the diagram. First the notation. A square box is a delay box sometimes called ”flipflop”. Its task is to pass its stored value further after each unit of time set by a synchronizing clock. A circle with the value ai in it performs AN D operation or multiplication modulo 2 on the input with the prescribed ai . The plus sign in a circle clearly means the XOR operation or addition modulo 2. Now the square boxes are initialized with some values, namely the box Di gets some value si ∈ {0, 1}, i = 0, . . . , L − 1. When the first time unit comes to an end the following happens: the value s0 becomes an output bit. Then all values si , i = 1, . . . , L − 1 are shifted from Di to Di−1 . Simultaneously for each i = 0, . . . , L − 1 the value si goes to an AN Dcircle, gets multiplied with ai and then all these products are summed L−1 up by means of pluscircles, so that the sum ⊕i=0 ai si is formed. This sum is written to DL−1 and is called sL . The same procedure takes place at the end of the next time unit: now s1 is the output, the remaining values are shifted,
10.5. BASICS OF STREAM CIPHERS. LINEAR FEEDBACK SHIFT REGISTERS331
e 6
e 6
qqq
aL−1 aL−2 6 6
s
DL−1
s
qqq
e 6
e 6
a2 a1 6 6
s 
DL−2
a0 6
sD1
s
Output
D0
Figure 10.3: Diagram of an LFSR and sL+1 = ⊕L i=1 ai−1 si is written to DL−1 . Analogously one proceeds further. The name “Linear Feedback Shift Register” is clear now: we use only linear operations here (multiplication by ai ’s and addition), the values that appear in D0 , . . . , DL−2 give feedback to DL−1 by means of a sum of the type described, also the values are being shifted from Di to Di−1 . Algebraically LFSRs are studied via the notion of linear recurring sequences, which we introduce next. Definition 10.5.1 Let L be a positive integer, let a0 , . . . , aL−1 be some values from F2 . A sequence S, which first L elements are s0 , . . . , sL−1 that are values from F2 and the defining rule is sL+i = aL−1 sL+i−1 + aL−2 sL+i−2 + · · · + a0 si , i ≥ 0,
(10.3)
is called the (Lth order) homogeneous linear recurring sequence in F2 . The elements s0 , . . . , sL−1 are said to form the initial state sequence. Obviously, a homogeneous linear recurring sequence represents an output of some LFSR and vice versa, so we will use the both notions interchangeably. Another important notion that comes along with the linear recurring sequences is the following. Definition 10.5.2 Let S be an Lth order homogeneous linear recurring sequence in F2 defined by (10.3). Then the polynomial f (X) = X L + aL−1 X L−1 + · · · + a0 ∈ F2 [X] is called the characteristic polynomial of S. Remark 10.5.3 The characteristic polynomial is also sometimes defined as g(X) = 1+aL−1 X +· · ·+a0 X L and is called connection or feedback polynomial. We have g(X) = f (1/X). Everything that will be said about f (X) in the sequel remains true also for g(X).
332

CHAPTER 10. CRYPTOGRAPHY r 6
r 6
r 
r 
r 6

D1 (a) D0

r 
D1 (b) D0
r 6

rD1 (c) D0
Figure 10.4: Diagram to Example 10.5.4 Example 10.5.4 On Figure 10.4 (a), (b), and (c) the diagrams for LFSRs with the characteristic polynomials X 2 + X + 1, X 2 + 1, X 2 + X are depicted. We removed the circles and sometimes also connected lines, since we are working in F2 , so ai ∈ {0, 1}. The table for the case (a) looks like this D0 1 1 0 1 1 ...
D1 0 1 1 0 1 ...
output 0 1 1 0 ...
So we see that the output sequence is actually periodic with period 3. The value 3 for the period is maximum one can get for L = 2. This is due to the fact that X 2 + X + 1 is irreducible and moreover primitive, see below Theorem 10.5.8. For the case (b) we have D0 1 0 1 ...
D1 0 1 0 ...
output 0 1 ...
and the period is 2. For the case (c) we have D0 1 1 1 ...
D1 0 1 1 ...
output 0 1 ...
So the output sequence here is not periodic, but is ultimately periodic, i.e. periodic starting at position 2 and the period here is 1. The nonperiodicity is due to the fact that for f (X) = X 2 + X, f (0) = 0, see Theorem 10.5.8. Example 10.5.5 Let us see how one can handle LFSRs in Magma. In Magma one works with a connection polynomial (Remark 10.5.3). For example we are given the connection polynomial f = X 6 + X 4 + X 3 + X + 1 and initial state sequence (s0 , s1 , s2 , s3 , s4 , s5 ) = (0, 1, 1, 1, 0, 1), then the next state

10.5. BASICS OF STREAM CIPHERS. LINEAR FEEDBACK SHIFT REGISTERS333 (s1 , s2 , s3 , s4 , s5 , s6 ) can be computed as > P:=PolynomialRing(GF(2)); > f:=X^6+X^4+X^3+X+1; > S:=[GF(2)0,1,1,1,0,1]; > LFSRStep(f,S); [ 1, 1, 1, 0, 1, 1 ] By writing > LFSRSequence(f,S,10); [ 0, 1, 1, 1, 0, 1, 1, 1, 1, 0 ] we get the next 10 state values s6 , . . . , s15 . In Sage one can do the same in the following way > con_poly=[GF(2)(i) for i in [1,0,1,1,0,1]] > init_state=[GF(2)(i) for i in [0,1,1,1,0,1]] > n=10 > lfsr_sequence(con_poly, init_state, n) [0, 1, 1, 1, 0, 1, 1, 1, 1, 0] So one has to provide the connection polynomial via its coefficients. As we have mentioned, the characteristic polynomial plays an essential role in determining the properties of a linear recurring sequence and the associated LFSR. Next we summarize all the results concerning a characteristic polynomial, but first let us make precise the notions of periodic and ultimately periodic sequences. Definition 10.5.6 Let S = {si }i≥0 be a sequence such that there exists a positive integer P such that sP +i = si ∀i = 0, 1, . . . . Such a sequence is called periodic and P is a period of S. If the property sP +i = si holds for all i starting from some nonnegative P0 , then such a sequence is called ultimately periodic also with a period P . Note that a periodic sequence is also ultimately periodic. Remark 10.5.7 Note that periodic and ultimately periodic sequences have many periods. It turns out that the least period always divides any other period. We will refer to the term period meaning the least period of a sequence. Now the main result follows. Theorem 10.5.8 Let S be an Lth order homogeneous linear recurring sequence and let f (X) ∈ F2 [X] be its characteristic polynomial. The following holds: 1. S is an ultimately periodic sequence with period P ≤ 2L − 1. 2. If f (0) 6= 0, then S is periodic. 3. If f (X) is irreducible over F2 , then S is periodic with period n such that P (2L − 1). 4. If f (X) is primitive *** recall the definition? *** over F2 , then S is periodic with period P = 2L − 1. Definition 10.5.9 A homogeneous linear recurring sequence S with f (X) a primitive characteristic polynomial is called maximal period sequence in F2 or an msequence.
334
CHAPTER 10. CRYPTOGRAPHY
The notions and results above can be generalized to the case of arbitrary finite field Fq . It is notable that one can compute the characteristic polynomial of an Lth order homogeneous linear recurring sequence S by knowing any subsequence of length at least 2L by means of an algorithm by Berlekamp and Massey, which is essentially the one from Section 9.2.2 *** more details have to be provided here to show the connection. The explanation in 9.2.2 is a bit too technical. Maybe we should introduce a simple version of BM in Chapter 6, which we could then use here? ***. See also Exercise 10.5.3. Naturally one is interested in obtaining sequences with large period. Therefore msequences have primary application. These sequences have nice statistical properties. For example the distribution of patterns that have length ≤ L is almost uniform. The notion of linear complexity is used as a tool for investigating the statistical properties of outputs of LFSRs. Roughly speaking, a linear complexity of a sequence is a minimal L such that there exists a homogeneous linear recurring sequence with the characteristic polynomial of degree L. Because of nice statistical properties LFSRs can be used as pseudorandom bit generators, see Notes. An obvious cryptographic drawback of LFSRs is the fact that the whole output sequence can be reconstructed by having just 2L bits of it, where L is the linear complexity of the sequence. This obstructs using LFSRs as cryptographic primitives, in particular as key stream generators. Nevertheless, one can use LFSRs in certain combinations, add nonlinearity and obtain quite effective and secure key stream generators for stream ciphers. Let us briefly describe the three possibilities of such combinations. • Nonlinear combination generator . Here one transmits outputs of l LFSRs L1 , . . . , Ll to a nonlinear function f with l inputs. The output of f becomes then the key stream. The function f should be chosen to be correlation immune, i.e. there should be no correlation between the output of f and outputs of any small subset of L1 , . . . , Ll . • Nonlinear filter generator . Here the L delay boxes at every time unit end give their values to a nonlinear function g with L inputs. The output of g becomes then the key stream. The function g is chosen in a way that its algebraic representation is dense. • Clockcontrolled generator . Here outputs of one LFSRs control the clocks of other LFSRs that compose the cipher. In this way the nonlinearity is introduced. For some examples of the above, see Notes.
10.5.1
Exercises
10.5.1 Consider an example of a synchronous cipher defined by the following data. The initial state is s0 = 10010. The function f shifts its argument by 3 positions to the right and adds 01110 bitwise. Now g is defined to sum up bits with positions 2,4, and 5 modulo 2 to obtain a keystream bit. Compute the first 6 key stream bits of such a cipher. 10.5.2 a. The polynomial f (X) = X 4 + X 3 + 1 is primitive over F2 . Draw a diagram of an LFSR that has f as the characteristic polynomial. Let
10.6. PKC SYSTEMS USING ERRORCORRECTING CODES
335
(0, 1, 1, 0) = (s0 , s1 , s2 , s3 ) be the initial state. Compute the output of such LFSR up to the point when it is seen that the output sequence is periodic. What is the period? b. Rewrite (a.) in terms of a connection polynomial. Take the same initial state and compute (e.g. with Magma) enough output sequence values to see the periodicity. 10.5.3 Let s0 , . . . , s2L−1 be the first 2L bits of an Lth order homogeneous linear recurring sequence defined by (10.3). If it is known that the matrix
s0 s1 .. .
s1 s2 .. .
... ... .. .
sL−1 sL .. .
sL−1
sL
...
s2L−1
is invertible, show that it is possible to compute a0 , . . . , aL−1 , i.e. to find out the structure of the underlying LFSR. 10.5.4 [CAS] The shrinking generator is an example of the clockcontrolled generator. The shrinking generator is composed of two LFSR’s L1 and L2 . The output of L1 controls the output of L2 in the following way: if the output bit of L1 is one, then the output bit of L2 is taken as an output of the whole generator. If the output of L1 is zero, then the output of L2 is discarded. So, in other words, the output of the generator forms a subsequence of the output of L2 and this subsequence is masked by the 1’s in the output of L1 . Write a procedure that implements the shrinking generator. Then use the output of the shrinking generator as a keystream k, and define a stream cipher with it, i.e. a ciphertext is formed as ci = pi ⊕ ki , where p is the plaintext stream. Compare your simulation results with the ones obtained with the ShrinkingGeneratorCipher class from Sage.
10.6
PKC systems using errorcorrecting codes
In this section we consider the public key encryption schemes due to McEliece (Section 10.6.1) and Niederreiter (Section 10.6.2). Both of these encryption schemes rely on hardness of decoding random linear codes as well as hardness of distinguishing a code with the prescribed structure from a random one. As we have seen, the problem of the nearest codeword decoding is NPhard. So the McEliece cryptosystem is one of the proposals to use an NPhard problem as a basis, for some others see Section 10.2.3. As has been mentioned at the end of Section 10.2.1, quantum computer attacks impose a potential threat for classical cryptosystems like RSA (Section 10.2.1) and those based on the DLP problem (Section 10.2.2). On the other side, no significant advantages of using a quantum computer in attacking the codebased schemes of McEliece and Niederreiter are known. Therefore, this area of cryptography attracted quite a lot of attention in the last years. See Notes on the recent developments.
336
10.6.1
CHAPTER 10. CRYPTOGRAPHY
McEliece encryption scheme
Now let us consider the public key cryptosystem by McEliece. It was proposed in 1978 and is in fact one of the oldest public key cryptosystems. The idea of the cryptosystem is to take a class of codes C for which there is an efficient bounded distance decoding algorithm. The secret code C ∈ Cis given by a k ×n generator matrix G. This G is scrambled into G0 = SGP by means of a k × k invertible matrix S and an n × n permutation matrix P . Denote by C 0 is the code with the generator matrix G0 . Now C 0 is equivalent to C, cf. Definition 2.5.15. The idea of scrambling is that the code C 0 should appear random to an attacker, so it should not be possible to use the efficient decoding algorithm available for C to decrypt messages. More formally we have the following procedures that define the encryption scheme as in Algorithms 10.1, 10.2, and 10.3. Note that in these algorithms when we say “choose” we mean “choose randomly from an appropriate set”. Algorithm 10.1 McEliece key generation Input: System parameters:  Length n  Dimension k  Alphabet size q  Errorcorrecting capacity t  A class C of [n, k] qary linear codes that have an efficient decoder that can correct up to t errors Output: McEliece public/private key pair (PK , SK ). Begin Choose C ∈ C represented by a generator matrix G and equipped with an efficient decoder DC . Choose an invertible qary k × k matrix S. Choose an n × n permutation matrix P . Compute G0 := SGP {a generator matrix of an equivalent [n, k] code} PK := G0 . SK := (DC , S, P ). return (PK , SK ). End Let us see why the decryption procedure really yields a correct message from a ciphertext. We have c1 = cP −1 = mSG + eP −1 . Now since wt(eP −1 ) = wt(e) = t, we have c2 = DC (c1 ) = mS. The last step is then trivial. Initially McEliece proposed to use the class of binary Goppa codes (cf. Section 8.3.2) as the class C. Interestingly enough, this class turned out to be pretty much the only secure choice up to now. See Section 10.6.3 for the discussion. As we saw in the procedures above, decryption is just a decoding with the code generated by G0 . So if we are successful in “masking”, for instance a binary Goppa code C, as a random code C 0 , then the adversary is faced with the problem of correcting t errors in a random code, which is assumed to be hard, if t is large enough. More on that in Section 10.6.3. Let us consider a specific example.
10.6. PKC SYSTEMS USING ERRORCORRECTING CODES
337
Algorithm 10.2 McEliece encryption Input:  Plaintext m  Public key PK = G0 Output: Ciphertext c. Begin Represent m as a vector from Fkq . Choose randomly a vector e ∈ Fnq of weight t *** notation for these vectors? ***. Compute c := mG0 + e. {encode and add noise; c is of length n} return c End Algorithm 10.3 McEliece decryption Input:  Ciphertext c  private key SK = (DC , S, P ) Output: Plaintext m. Begin Compute c1 := cP −1 . Compute c2 := DC (c1 ). Compute c3 := c2 S −1 . return c3 End
Example 10.6.1 [CAS] We use Magma to construct a McEliece encryption scheme based on a binary Goppa code, encrypt a message with it and then decrypt. First we construct a Goppa code of length 31, dimension 16, efficiently correcting 3 errors (see also Example 12.5.23): > q:=2^5; > P:=PolynomialRing(GF(q)); > g:=x^3+x+1; > a:=PrimitiveElement(GF(q)); > L:=[a^i : i in [0..q2]]; > C:=GoppaCode(L,g); // a [31,16,7] binary Goppa code > C2:=GoppaCode(L,g^2); > n:=#L; k:=Dimension(C); Note that we had to define the code C2 generated by the square of the Goppa polynomial g. Although the two codes are equal, we need the code C2 later for decoding. *** add references *** Now the key generation part: > G:=GeneratorMatrix(C); > S:=Random(GeneralLinearGroup(k,GF(2))); > Determinant(S); // indeed an invertible map 1 > p:=Random(Sym(n)); // a random permutation of an nset > P:=PermutationMatrix(GF(2), p); // its matrix > GPublic:=S*G*P; // our public generator matrix After we have obtained the public key, we can encrypt a message:
338
CHAPTER 10. CRYPTOGRAPHY
> MessageSpace:=VectorSpace(GF(2),k); > m:=Random(MessageSpace); > m; (1 1 0 0 1 1 1 0 0 0 0 1 1 1 0 0) > m2:=m*GPublic; > e:=C ! 0; e[10]:=1; e[20]:=1; e[25]:=1; // add 3 errors > c:=m2+e; Let us decrypt using the private key: > c1:=c*P^1; > bool,c2:=Decode(C2,c1: Al:="Euclidean"); > IS:=InformationSet(C); > ms:=MessageSpace ! [c2[i]: i in IS]; > m_dec:=ms*S^1; > m_dec; (1 1 0 0 1 1 1 0 0 0 0 1 1 1 0 0) We see that m_dec=m. Note that we applied the Euclidian algorithm for decoding a Goppa code (*** reference ***), but we had to apply it to the code generated by g^2 to be able to correct all three errors. Since as a result of decoding we obtained a codeword, not the message it encodes, we had to find an information set and then extract a subvector at positions that correspond to this set (our generator matrices are in a standard form, so we simply take the subvector).
10.6.2
Niederreiter’s encryption scheme
The scheme proposed by Niederreiter in 1986 is dual to the one of McEliece. Namely, instead of using generator matrices and words, this scheme uses parity check matrices and syndromes. Although different in terms of parameter sizes and efficiency of en/decryption, the two schemes of McEliece and Niederreiter actually can be shown to have equivalent security, see the end of this section. We now present how keys are generated and how en/decryption is performed in the Niederreiter scheme in Algorithms 10.4, 10.5, and 10.6. Note that in these algorithms we use the syndrome decoder. Recall that the notion of a syndrome decoder is equivalent to the notion of a minimum distance decoder *** add this to the decoding section ***. The correctness of the en/decryption procedures is shown analogously to the McEliece scheme, see Exercise 10.6.1. The only difference is that here we use a syndrome decoder, which returns a vector with the smallest nonzero weight that has the input syndrome, whereas in the case of McEliece the output of a decoder is the codeword closest to the given word. Let us take a look at a specific example. Example 10.6.2 [CAS] We are working in Magma as in Example 10.6.1 and are considering the same binary Goppa code from there. So the first 8 lines that define the code are the same; we just add > t:=Degree(g); Now the key generation part is quite similar as well: > H:=ParityCheckMatrix(C); > S:=Random(GeneralLinearGroup(nk,GF(2))); > p:=Random(Sym(n));P:=PermutationMatrix(GF(2), p); > HPublic:=S*H*P; // our public parity check matrix
10.6. PKC SYSTEMS USING ERRORCORRECTING CODES
339
Algorithm 10.4 Niederreiter key generation Input: System parameters:  Length n  Dimension k  Alphabet size q  Errorcorrecting capacity t  A class C of [n, k] qary linear codes that have an efficient syndrome decoder that corrects up to t errors Output: Niederreiter public/private key pair (PK , SK ). Begin Choose C ∈ C represented by a parity check matrix H and equipped with an efficient decoder DC . Choose an invertible qary (n − k) × (n − k) matrix S. Choose an n × n permutation matrix P . Compute H 0 := SHP {a parity check matrix of an equivalent [n, k] code} PK := H 0 . SK := (DC , S, P ). return (PK , SK ). End
Algorithm 10.5 Niederreiter encryption Input:  Plaintext m  Public key PK = H 0 Output: Ciphertext c. Begin Represent m as a vector from Fnq of weight t. *** notation! *** Compute c := H 0 mT . {ciphertext is a syndrome} return c End
Algorithm 10.6 Niederreiter decryption Input:  Ciphertext c  private key SK = (DC , S, P ) Output: Plaintext m. Begin Compute c1 := S −1 c. Compute c2 := DC (c1 ) {The decoder returns an error vector of weight t.} Compute c3 := P −1 c2 . return c3 End
340
CHAPTER 10. CRYPTOGRAPHY
The encryption is a bit trickier than in Example 10.6.1, since our messages now are vectors of length n and of weight t. > MessageSpace:=Subsets(Set([1..n]), t); > mm:=Random(MessageSpace); > mm:=[i: i in mm]; m:=C ! [0: i in [1..n]]; > // insert errors at given positions > for i in mm do > m[i]:=1; > end for; > c:=m*Transpose(HPublic); // the ciphertext The decryption part is also a bit tricker, because the decoding function of Magma expects a word, not a syndrome. So we have to find a solution to the parity check linear system and then pass this solution to the decoding function. > c1:=c*Transpose(S^1); > c22:=Solution(Transpose(H),c1); // find any solution > bool,c2:=Decode(C2,c22:Al:="Euclidean"); > m_dec:=(c22c2)*Transpose(P^1); One may see that m=m_dec holds. Now we will show that in fact the Niederreiter and McEliece encryption schemes have equivalent security. In order to do so, we assume that we have generated the two schemes from the same secret code C with a generator matrix G and a parity check matrix H. Assume further that the private key of the McEliece scheme is (S, G, P ) and for the Niederreiter scheme is (M, H, P ), so that the public keys are G0 = SGP and H 0 = M HP respectively. Let z = yH 0T be the ciphertext obtained by encrypting y with the Niederreiter scheme and c = mG0 + e be the ciphertext obtained from m with the McEliece scheme. Equivalence now means that if one is able to recover y from z, then one is able to recover c from m and vice versa. Therewith we show that the two systems based on the same code and the same secret permutation provide equivalent security. Now, assume we can recover any y of weight ≤ t from z = yH 0T . We now want to recover m from c = mG0 + e with wt(e) ≤ t. For y = e we have yH 0T = eH 0T = mG0 H 0T + eH 0T = cH 0T =: z, with c = mG0 + e, since G0 H 0T = SGP P T H T M T = SGH T M T = 0, due to P P T = Idn and GH T = 0. So if we can recover such y from the above constructed z, we are able to recover e and thus m from its ciphertext c = mG0 + e. Analogously, assume that for any m and e of weight ≤ t we can recover them from c = mG0 + e. Now we want to recover y of weight ≤ t from z = yH 0T . For e = y we have z = yH 0T = cH 0T with c = mG0 + y being any solution of the equation z = cH 0T . Now we can recover m from c and thus y.
10.6.3
Attacks
There are two types of attacks one may think of for codebased cryptosystems:
10.6. PKC SYSTEMS USING ERRORCORRECTING CODES
341
1. Generic decoding attacks. One tries to recover m from c using the code C 0. 2. Structural attacks. One tries to recover S, G, P from the code C 0 given by G0 in the McEliece scheme or S, H, P from in the Niederreiter scheme. Consider the McEliece encryption scheme. In the attack of type (1.), the attacker tries to directly decode the ciphertext c using the code C 0 generated by the public generator matrix G0 . Assuming C 0 is a random code, one may obtain complexity estimates for this type of attack. The best results in this direction are obtained using the family of algorithms that improve on the information set decoding (ISD), see Section 6.2.3. Recall that the idea of the (probabilistic) ISD is to find an errorfree information set I and then decode as c = r(I) G0 for the received word r. Here the matrix G0 is a generator matrix of an equivalent code that is systematic at the positions of I. In order to avoid confusion with the public generator matrix, we denote it by ˜ in the following. The first improvement of the ISD due to Lee and Brickell is G in allowing some small number p of errors to occur in the set I. So we no longer require r(I) = c(I) , but allow r(I) = c(I) + e(I) with wt(e(I) ) ≤ p. We can now modify the algorithm in Algorithm 6.2 as in Algorithm 10.7. Note that since we know the number of errors occurred, the ifpart has changed also. Algorithm 10.7 LeeBrickell ISD Input:  Generator matrix G of an [n, k] code C  Received word r  Number of errors t occurred, so that d(r, C) = t  Number of trials Ntrials  Parameter p Output: A codeword c ∈ C, such that d(r, c) = t or “No solution found” Begin c := 0; Ntr := 0; repeat Ntr := Ntr + 1; Choose a subset I of {1, . . . , n} of cardinality k. if G(I) is invertible then ˜ := G(−1) G G (I)
for e(I) of weight ≤ p do ˜ ˜ = (r(I) + e(I) )G c if d(˜ c, r) = t then ˜ return c end if end for end if until Ntr < Ntrials return “No solution found” End
342
CHAPTER 10. CRYPTOGRAPHY
Remark 10.6.3 In Algorithm 10.7 one may replace choosing a set I by choosing every time a random permutation matrix Π. Then one may find rref(GΠ) therewith obtaining an information set. One must keep track of the applied permutations Π in order to “go back” after finding a solution in this way. The probability of success in one trial of the LeeBrickell variant is k n−k pLB =
p
t−p n t
compared to the original one of the probabilistic ISD n−k t pISD = n . t
Since in the forloop of Algorithm 10.7 we have to run 2p times, p should be a small constant. In fact for small p, like p = 2, one obtains complexity improvement, although not asymptotical, but quite relevant for practice. There is a rich list of further improvements due to many researchers in the field, see Notes. The improvements basically consider different configurations of where a small number p of errors is allowed to be present, where only a block of l zeroes should be present, etc. Further, the choice of the next set I can be optimized, for example by changing just one element in the current I in a clever way. With all these techniques in mind, one obtains quite a considerable improvement of the ISD in practical attacks on the McEliece cryptosystem. In fact the original proposal of McEliece to use [1024, 524] binary Goppa codes correcting 50 errors is not a secure choice any more; one has to increase the parameters of the Goppa codes used. Example 10.6.4 [CAS] Magma contains implementations of the “vanilla” probabilistic ISD, which was also considered in the original paper of McEliece, as well as LeeBrickell’s variant and several other improved algorithms. Let us try to attack the toy example considered in Example 10.6.1. So we copy all the instructions responsible for the code construction, key generation, and encryption. > ... // as in Example \ref{exCASMcEliece} Then we use commands > CPublic:=LinearCode(GPublic); > McEliecesAttack(CPublic,c,3); > LeeBrickellsAttack(CPublic,c,3,2); to mount our toy attack. For this specific example it takes no time to execute both attacks. In both commands first we pass the code, then the received word, and then the number of errors to be corrected. In LeeBrickellsAttack the last parameter is exactly the parameter p from Algorithm 10.7; we set it to 2. We can correct errors with random codes. Below is the example: > C:=RandomLinearCode(GF(2),50,10); > c:=Random(C); r:=c; > r[2]:=r[2]+1; r[17]:=r[17]+1; r[26]:=r[26]+1; > McEliecesAttack(C,r,3); > LeeBrickellsAttack(C,r,3,2);
10.6. PKC SYSTEMS USING ERRORCORRECTING CODES
343
Apart from decoding being hard for the public code C 0 , it should be impossible to deduce the structure of the code C from the public C 0 . Structural attacks of (2.) aim at exploiting this structure. As we have mentioned, the choice of binary Goppa codes turned out to be pretty much the only secure choice up to now. There were quite a few attempts to propose other classes of codes for which efficient decoding algorithms are known. Alas, all of these proposals were broken, we just name a few: Generalized ReedSolomon (GRS) codes, ReedMuller codes, BCH codes, algebraicgeometry codes of small genus, LDPC codes, quasicyclic codes; see Notes. In the next section we will consider in detail how a prominent attack on GRS works. In particular, weakness of the GRS codes suggest, due to equivalent security, the weakness of the original proposal of Niederreiter, who suggested to use these codes in his scheme.
10.6.4
The attack of Sidelnikov and Shestakov
Let C be the code GRSk (a, b), where a consists of n mutually distinct entries of Fq and b consists of nonzero entries, cf. Definition 8.1.10. If this code is used in the McEliece PKC, then for an attacker the code C 0 with generator matrix G0 = SGP is known, where S is an invertible k × k matrix and P = ΠD with Π an n × n permutation matrix and D an invertible diagonal matrix. The code C 0 is equal to GRSk (a0 , b0 ), where a0 = aΠ and b0 = bP . In order to decode GRSk (a0 , b0 ) up to b(n − k + 1)/2c errors it is enough to find a0 and b0 . The S is not essential in masking G, since G0 has a unique row equivalent matrix (Ik A0 ) that is in reduced row echelon form. Here A0 is a generalized Cauchy matrix (Definition 3.2.17), but it is a priori not evident how to recover a0 and b0 from this. The code is MDS hence all square submatrices of A0 are invertible by Remark 3.2.16. In particular all entries of A0 are nonzero. After multiplying the coordinates with nonzero constants we get a code which is generalized equivalent with the original one, and is again of the form GRSk (a00 , b00 ), since r ∗ GRSk (a0 , b0 ) = GRSk (a0 , b0 ∗ r). So without loss of generality it may be assumed that the code has a generator matrix of the form (Ik A0 ) such that the last row and the first column of A0 consists of ones. Without loss of generality it may be assumed that ak−1 = ∞, ak = 0 and ak+1 = 1 by Proposition 8.1.25. Then according to Proposition 8.1.17 and Corollary 8.1.19 there exists a vector c with entries ci given by ( Q k b0i t=1,t6=i (a0i − a0t ) if 1 ≤ i ≤ k, ci = Q k b0i t=1 (a0i − a0t ) if k + 1 ≤ i ≤ n, such that A0 has entries a0ij given by a0ij =
cj+k−1 c−1 i a0j+k−1 − a0i
for 1 ≤ i ≤ k − 1, 1 ≤ j ≤ n − k + 1 and a0ik = cj+k−1 c−1 i for 1 ≤ j ≤ n − k + 1.
344
CHAPTER 10. CRYPTOGRAPHY
Example 10.6.5 Let G0 be the F7 given by 6 G0 = 3 1
generator matrix of a code C 0 with entries in 1 4 0
1 1 3
6 1 3
2 5 6
2 4 0
3 3 . 1
Then rref(G0 ) = (I3 A0 ) with
1 A0 = 4 3
3 4 1
3 6 6
6 6 . 3
G0 is a pubic key and it is known that it is the generator matrix of a generalized ReedSolomon code. So we want to find a in F77 consisting of mutually distinct entries and b in F77 with nonzero entries such that C 0 = GRS3 (a, b). Now C = (1, 4, 3, 1, 5, 5, 6) ∗ C 0 has a generator matrix of the form (I3 A) with 1 1 1 1 A = 1 5 4 2 . 1 4 3 6 We may assume without loss of generality that a1 = 0 and a2 = 1 by Proposition 8.1.25. ***...............***
10.6.5
Exercises
10.6.1 Show the correctness of the Niederreiter scheme. 10.6.2 Using methods of Section 10.6.3 attack larger McEliece schemes. In the Goppa construction take  m = 8, r = 16  m = 9, r = 5  m = 9, r = 7 Make observations that would answer the following questions:  Which attack is faster: the plain ISD or LeeBrickell’s variant?  What is the role of the parameter p? What is the optimal value of p in these experiments?  Does the execution time differ from one run to the other or it stays the same?  Is there any change in execution time, when the attacks are done for random codes with the same parameters as above? Try to experiment with other attack methods implemented in Magma: LeonsAttack, SternsAttack, CanteautChabaudsAttack. Hint: For constructing Goppa polynomials use the command PrimitivePolynomial.
10.7. NOTES
345
10.6.3 Consider binary Goppa codes of length 1024 and Goppa polynomial of degree 50. (1) Give an upper bound of the number of these codes. (2) What is the fraction of the number of these codes with respect to all binary [1024, 524] codes? (3) What is the minimum distance of a random binary [1024, 524] code according to the GilbertVarshamov bound? 10.6.4 Give an estimate of the complexity of decoding 50 errors of a received word with respect to a binary [1024, 524, 101] code by means of covering set decoding. 10.6.5 Let α ∈ F8 be a primitive generator matrix given by 6 α α6 0 0 α3 G = α4 α5
element such that α3 = α + 1. Let G0 be the α α3 α3
1 α4 1
α4 α6 α2
1 α6 0
α4 α4 . α6
(1) Find a in F78 consisting of mutually distinct entries and b in F78 with nonzero entries such that G0 is a generator matrix of GRS3 (a, b). (2) Consider the 3 × 7 generator matrix G of the code RS3 (7, 1) with entry α(i−1)(j−1) in the ith row and the jth column. Give an invertible 3 × 3 matrix S and a permutation matrix P such that G0 = SGP . (3) What is the number of pairs (S, P ) of such matrices?
10.7
Notes
Some excellent references for introduction to cryptography are [87, 117, 35].
10.7.1
Section 10.1
Computational security concerns with practical attacks on cryptosystems, whereas unconditional security works with probabilistic models, where an attacker is supposed to possess unlimited computing power. A usual claim when working with unconditional security would be to give an upper bound of attacker’s success probability. This probability is independent on the computing power of an attacker and bears ”absolute value”. For instance in the case of Shamir’s secret sharing scheme (Section 10.4) no matter how much computing power does a group of t − 1 participants have, it does not have better to do as to guess a value of the secret. Probability of such a success is 1/q. More on these issues can be found in [117]. A couple of remarks on block ciphers that were used in the past. Jefferson cylinder invented by Thomas Jefferson in 1795 and independently by Etienne Bazeries is a polyalphabetic cipher used by the U.S. army in 19231942 and had the name M94. It was constructed as a rotor with 20–36 discs with letters, each of which provided a substitution at a corresponding position. For its time it had quite good cryptographic properties. Probably the best known historic cipher is German’s Enigma. It had been used for commercial purposes already in 1920s, but became famous for its use during the World War II by the Nazi German military. Enigma is also a rotorbased polyalphabetic cipher. More on historical ciphers in [87].
346
CHAPTER 10. CRYPTOGRAPHY
The Kasiski method aims at recovering period of a polyalphabetic substitution cipher. Here one encrypts repeated portions of a plaintext with the same keyword. More details in [87]. The National Bureau of Standards (NBS, later became National Institute of Standards and Technology  NIST) initiated development of DES (Data Encryption Standard) in early 1970s. IBM’s cryptography department and in particular its leaders Dr. Horst Feistel (recall Feistel cipher) and Dr.W.Tuchman contributed the most to the development. The evaluation process was also facilitated by the NSA (National Security Agency). The standard was finally approved and published in 1977 [90]. A lot of controversy accompanied DES since its appearance. Some experts claimed that the developers could have intentionally added some design trapdoors to the cipher, so that its cryptanalysis would have been possible by them, but not by the others. The key size 56bits also raised concern, which eventually led to the need to adopt a new standard. Historical remarks on DES and its development can be found in [112]. Differential and linear cryptanalysis turned out to be the most successful theoretical attacks on DES. For initial papers on these attacks, see [15, 82]. The reader may also visit http://www.ciphersbyritter.com/RES/DIFFANA.HTM for more references and history of the differential cryptanalysis. We also mention that differential cryptanalysis may have been known to the developers long before Adi Shamir published his paper in 1990. Since DES encryptions do not form a group, the use of a triple application of DES was proposed that was called triple DES [66]. Although no effective cryptanalysis against triple DES was proposed it is barely used due to its slow compared to AES implementation. In the middle of 1990s it became apparent to the cryptography community that the DES did not provide sufficient security level anymore. So NIST announced a competition for a cipher that would replace DES and became the AES (Advanced Encryption Standard). The main criteria that were imposed for the future AES were resistance to linear and differential cryptanalysis, faster and more effective (compared to DES) implementation, ability to work with 128bit plaintext blocks and 128, 192, 256bit keys; the number of rounds was not specified. After five years of the selection process, the cipher Rijndael proposed by the Belgian researchers Joan Daemen and Vincent Rijmen won. The cipher was officially adopted for the AES in 2001 [91]. Because resistance to linear and differential cryptanalysis was one of the milestones in the design of AES, J.Daemen and V.Rijmen carefully studied this question and showed how such resistance can be achieved within Rijndael. In the design they used what they called wide trail strategy  a method devised specifically to counter the linear and differential cryptanalysis. Description of the AES together with the discussion of underlying design decisions and theoretical discussion can be found in [45]; for the wide trail strategy see [46]. As to attacks on AES, up to now there is no attack out there that could break AES at least theoretically, i.e. faster than the exhaustive search, in a scenario where the unknown key stays the same. Several attacks, though, work on nonfull AES that performs less than 10 rounds. For example Boomerang type of attacks are able to break 5–6 rounds of AES128 much faster, than the exhaustive search. For 6 rounds the Boomerang attack has data complexity of 271 128bit blocks, memory complexity of 233 blocks and time complexity of 271 AES encryptions. This attack is mounted under a mixture of chosen plaintext and adaptive chosen ciphertext scenarios. Some other attacks also can attack 5–7 rounds. Among them are the Square attack, proposed by Daemen and Rijmen themselves, collision attack, partial sums, impossible
10.7. NOTES
347
differentials. For an overview of attacks on Rijndael see [40]. There are recent works on related key attacks on AES, see [1]. It is possible to attack 12 rounds of AES192 and 14 rounds of AES256 in the related key scenario. Still, it is quite questionable by the community on whether one may consider these as a real threat. We would like to mention several other recent block ciphers. The cipher Serpent is an instance of an SPnetwork and was the second in the AES competition that was won by Rijndael. As was prescribed by the selection committee it also operates on 128bit blocks and key of sizes 128, 192, and 256 bits. Serpent has a strong security margin, prescribing 32 rounds. Some information online: http://www.cl.cam.ac.uk/~rja14/ serpent.html. Next, the cipher Blowfish proposed in 1993 is an instance of a Feistel cipher, has 16 rounds and operates on 64bit blocks and default key size of 128 bits. Blowfish is up to now resistant to cryptanalysis and its implementation is rather fast, although has some limitations that preclude its use in some environments. Information online: http://www.schneier.com/blowfish.html. A successor of Blowfish proposed by the same person  Bruce Schneier  Twofish was one of the five finalists in the AES competition. It has the same block and key sizes as all the AES contestants and has 16 rounds. Twofish is also a Feistel cipher. This cipher is also believed to resist cryptanalysis. Information online http://www.schneier.com/twofish.html. It is noticeable that all these ciphers are in public domain and are free for use in any software/hardware implementations. A lightweight block cipher PRESENT that operates on plaintext block of only 64 bits and key length of 80 and 128 bits; PRESENT has 31 rounds [2]. There exist proposals with even smaller block lengths, see http://www.ecrypt.eu.org/lightweight/index.php/Block_ciphers.
10.7.2
Section 10.2
The concept of the asymmetric cryptography was introduced by Diffie and Hellman in 1976 [48]. For an introduction to the subject and survey of results see [87, 89]. The notion of a oneway as well as trapdoor oneway function was also introduced by Diffie and Hellman in the same paper [48]. Rabin’s scheme from Example 10.2.2 was introduced by Rabin in [97] in 1979 and ElGamal scheme was presented in [55]. The notion of a digital signature was also presented in the pioneering work [48], see also [88]. The RSA scheme was introduced in 1977 by Rivest, Shamir, and Adleman [100]. In the same paper they showed that computing the decryption exponent and factoring are equivalent. There is no known polynomial time algorithm for factoring integers. Still there are quite a few algorithms out there that have subexponential complexity. For a survey of existing methods, see [96]. Asymptotically the best known subexponential algorithm is general number field sieve, and it has an expected running b)1/3 log b2/3 )), where b is a bit length of a number n that is to be time of O(exp(( 64 9 factored [36]. Development of factoring algorithms changed requirements on the RSA key size through the time. In their original paper [100] the authors suggested the use of 200 decimal digit modulus (664 bits). The sizes of 336 and 512 bits were also used. In 2010 the result of factoring RSA768 was announced. Used at present modulus of 1024 bits raises many questions on whether it may be considered secure. Therefore, for longterm security the key size of 2048 or even 3072 bits are to be used. Quite remarkable is the work of Shor [?, ?] who proposed an algorithm that can solve integer factorization problem in polynomial time on a quantum computer. The use of Z∗n in ElGamal scheme was proposed in [83]. For the use of Jacobian of
348
CHAPTER 10. CRYPTOGRAPHY
a hyperelliptic curve see [41]. There are several methods for solving DLP, we name just a few: Babystep giantstep, Pollard’s rho and Pollard’s lambda (or kangaroo), PohligHellman, index calculus. The fastest algorithms for solving DLP for Z∗p and F2m are variations of the index calculus algorithm. All of the above algorithms applied to the multiplicative group of a finite field do not have polynomial complexity. For an introduction to these methods the reader may consult [41]. Index calculus is an algorithm with subexponential complexity. These developments on DLP solving algorithms affected the key size of El Gamal scheme. In practice the key size grew from 512 to 768 and finally to 1024. At the moment using 1024bit key for El Gamal is considered to be a standard. The mentioned work of Shor [108] also solve the DLP problem in polynomial time. Therefore existence of a large enough quantum computer jeopardizes the ElGamal scheme. Despite doubts of some researcher of a possibility to construct a large enough quantum computer, the area of postquantum cryptography has evolved that incorporates cryptosystems that are potentially resistant to quantum computer attacks. See [12] for an overview of the area, which includes lattice basedd, hash based, coding based, and multivariate based cryptography. Some references to multivariate asymmetric systems, digital schemes, and their cryptanalysis can be found in [49]. The knapsack cryptosystem by Merkle and Hellman has an interesting history. It was one of the first asymmetric cryptosystems. Its successful cryptanalysis showed that only reliance on hardness of the underlying problem may be misleading. For an interesting historical development, see the survey chapter of Diffie [47].
10.7.3
Section 10.3
Authentication codes were initially proposed by MacWilliams, Gilbert, and Sloan [58]. Introductory material on authentication codes is well exposed in [117]. Message authentication codes (MACs) are widely used in practice for authentication purposes. MACs are keyed hash functions. In the case of MACs one demands from a hash function to provide compression (a message of arbitrary size is mapped to a fixed size vector), ease of computation (it should be easy to compute an image knowing the key), and computationresistance (practical impossibility to compute an image without knowing the key, even having some pairs elementimage in disposal). More on MACs in [87]. Results and discussion of the relation between authentication codes and orthogonal arrays is in [117, 116, 115]. Proposition 10.3.7 is due to Bierbrauer, Johansson, Kabatianskii, and Smeets [13]. By adding linear structure to the source state set, key space, tag space, and authentication mappings one obtains linear authentication codes that can be used in the study of distributed authentication systems [103].
10.7.4
Section 10.4
The notion of a secret sharing scheme was first introduced in 1979 by Shamir [106] and independently by Blakely [22]. We mention here some notions that were not mentioned in the main text. A secret sharing scheme is called perfect if knowledge of shares from an unauthorized group (e.g. a group of < t participants in Shamir’s scheme) does not reduce the uncertainty about the secret itself. In terms of entropy function it can be stated like this: H(SA) = 0, where S is the secret and A is an unauthorized set, moreover we have H(SB) = H(S) for B being an authorized set. In perfect secret sharing schemes it holds that the size of each share is at least the size of the secret.
10.7. NOTES
349
If equality holds such a system is called ideal . The notion of a secret sharing scheme can be generalized via the notion of an access structure. Using access structures one prescribes which subsets of participants can reconstruct the secret (authorized subset) and which cannot (unauthorized subset). The notion of a distribution of shares can also be formalized. More details on these notions and treatment using probability theory can be found e.g. in [117]. McEliece and Sarwate [85] were the first to point out the connection between Shamir’s scheme and the ReedSolomon codes construction. Some other works on relations between coding theory and secret sharing schemes include [70, 81, 94, 138]. More recent works concern applications of AGcodes to this subject. We mention the chapter of Duursma [52] and the work of Chen and Cramer [39]. In the latter two references the reader can also find the notion of secure multiparty computation, see [137]. The idea here is that several participants wish to compute the value of some publicly known function evaluated at their values (like shares in the above). The thing is that each of the participants should not be able to know the values of other participants by the known computed value of the public function and his/her own value. We also mention that as was the case with authentication codes, information theoretic perfectness can be traded off to obtain a system where shares are smaller than the secret [23].
10.7.5
Section 10.5
Introductory material on LFSRs with discussion of practical issues can be found in [87]. The notion of linear complexity is treated in [102], see also materials online at http://www.ciphersbyritter.com/RES/LINCOMPL.HTM. A thorough exposure of linear recurring sequences is in [74]. Some examples of noncryptographic use of LFSRs, namely randomization in digital broadcasting: Advanced Television Systems Committee (ATSC) standard for digital television format (http://www.atsc.org/guide_default.html), Digital Audio Broadcasting (DAB) digital radio technology for broadcasting radio stations, Digital Video Broadcasting  Terrestrial (DVBT) is the European standard for the broadcast transmission of digital terrestrial television (http://www.dvb.org/technology/standards/). An example of nonlinear combination generator: E0 is a stream cipher used in the Bluetooth protocol, see e.g. [64]; of nonlinear filter generator Knapsack generator [87]; of clockcontrolled generators: A5/1 and A5/2 are stream ciphers used to provide voice privacy in the Global System for Mobile communications (GSM) cellular telephone protocol http://web.archive.org/web/20040712061808/www.ausmobile.com/downloads/ technical/Security+in+the+GSM+system+01052004.pdf.
10.7.6 ***
Section 10.6
350
CHAPTER 10. CRYPTOGRAPHY
Chapter 11
The theory of Gr¨ obner bases and its applications Stanislav Bulygin In this chapter we deal with methods in coding theory and cryptology based on polynomial system solving. As the main tool for this we use the theory of Gr¨ obner bases that is a wellestablished instrument in computational algebra. In Section 11.1 we give a brief overview of the topic of polynomial system solving. We start with relatively easy methods of linearization and extended linearization. Then we give basics of more involved theory of Gr¨obner bases. The problem we are dealing with in this chapter is the problem of polynomial system solving. We formulate this problem as follows: let F be a field and let P = F[X1 , . . . , Xn ] be a polynomial ring over F in n variables X = (X1 , . . . , Xn ). Let f1 , . . . , fm ∈ P . We are interested in finding a set of solutions S ⊆ Fn to the polynomial system defined as f1 (X1 , . . . , Xn ) = 0, ... fm (X1 , . . . , Xn ) = 0.
(11.1)
In other words, S is composed of those elements from Fn that satisfy all the equations above. In terms of algebraic geometry this problem may be formulated as follows. Given an ideal I ⊆ P , find the variety VF (I) which it defines: VF (I) = {x = (x1 , . . . , xn ) ∈ Fn f (x) = 0 ∀f ∈ I}. Since now we are interested in applications to coding and cryptology, we will be working over finite fields and often we would like solutions of corresponding systems to lie in these finite fields, rather than in an algebraic closure. Recall that for every element α ∈ Fq the following holds: αq = α. This means that if we add an equation X q − X = 0 to a polynomial system (11.1), we are guaranteed that solutions for the Xvariable lie in Fq . After introducing tools for the polynomial system solving in Section 11.1, we 351
¨ 352CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS give two concrete applications in Sections 11.2 and 11.3. In Section 11.2 we consider applications of Gr¨ obner bases techniques to decoding linear codes, whereas Section 11.3 deals with methods of algebraic cryptanalysis of block ciphers. Due to space limitations many interesting topics related to these areas are not considered. We provide their short overview with references in the Notes section.
11.1
Polynomial system solving
11.1.1
Linearization techniques
We know how to solve systems of linear equations efficiently. Gaussian elimination is a standard tool for this job. If we are given a system of nonlinear equations, a natural solution would be to try to reduce this problem to a linear one, which we know how to solve. This simple idea leads to a technique that is called linearization. This technique works as follows: we replace every monomial occurring in a nonlinear (polynomial) equation by a new variable. At the end we obtain a linear system with the same number of equations, but many new variables. The hope is that by solving this linear system we are able to get a solution to our initial nonlinear problem. It is better to illustrate this approach on a concrete example. Example 11.1.1 Consider a quadratic system in two unknowns x and y over the field F3 : 2 x − y2 − x + y = 0 −x2 + x − y + 1 = 0 y2 + y + x = 0 2 x +x+y =0 Introduce new variables a := x2 and b := y 2 . Therewith we have a linear system: a−b−x+y =0 −a + x − y + 1 = 0 b+y+x=0 a+x+y =0 This system has a unique solution, which may be found with the Gaussian elimination: a = b = x = y = 1. Moreover, the solution on a and b is consistent with the conditions a = x2 , b = y 2 . So although the system is quadratic, we are still able to solve it purely with methods of linear algebra. It must be noted that the linearization technique works very seldom. Usually the number of variables (i.e. monomials) in the system is much lager than the number of equations. Therefore one has to solve an underdetermined linear system, which has many solutions, among which it is hard to find a “real” one that stems from the original nonlinear system. Example 11.1.2 Consider a system in three variables x, y, z over the field F16 : xy + yz + xz = 0 xyz + x + 1 = 0 xy + y + z = 0
11.1. POLYNOMIAL SYSTEM SOLVING
353
It may be shown that over F16 this system has a unique solution (1, 0, 0). If we replace monomials in this system with new variables we will end up with a linear system of 3 equations and 7 variables. This system is full rank. In particular the variables x, y, z are now free variables which yield values for other variables. So such linear system has 163 solutions, of which only one will provide a legitimate solution for the initial system. Other solutions do not have any meaning. E.g. we may show that assignment x = 1, y = 1, z = 1 implies that “variable” xy should be 0, and of course this cannot be true, since both x and y are 1. So using linearization technique boils down to sieving the set F316 for a right solution, but this is nothing more than an exhaustive search for the initial system. So the problem with the linearization technique is that we do not have enough linearly independent equations for solving. Here is where the idea of eXtended Linearization (XL) comes in hand. The idea of XL is to multiply initial equations by all monomials up to given degree (hopefully not too large) to generate new equations. Of course new variables will appear, since new monomials will appear. Still if the system is “nice” enough we may generate necessary number of linearly independent equations to obtain a solution. Namely, we hope that after “extending” our system with new equations and doing Gaussian elimination, we will be able to find a univariate equation. Then we can solve it and plug in obtained values and then proceed with a simplified system. Example 11.1.3 Consider a small system in two unknowns x, y over the field F4 : 2 x +y+1=0 xy + y = 0 It is clear that the linearization technique does not work so well in this case, since the number of variables (3) is larger than the number of equations (2). Now multiply the two equations first with x and then with y. Therewith we obtain four new equations, which have the same solution as the initial ones, so we may add them to the system. The new equations are: x3 + xy + x = 0 x2 y + xy = 0 x2 y + y 2 + y = 0 xy 2 + y 2 = 0 Here again the number of equations is lower than the number of variables. Still, by ordering monomials in the way that y 2 and y go leftmost in the matrix representation of a system and doing Gaussian elimination, we encounter a univariate equation y 2 = 0 (check this!). So we have a solution for y, namely y = 0. After substituting y = 0 in the first equation we have x2 + 1 = 0, which is again a univariate equation. Over F4 is has a unique solution x = 1. So by using linear algebra and univariate equation solving, we were able to obtain the solution (1, 0) for the system. Algorithm 11.1 explains more formally how the XL works. In our example it was enough to set D = 3. Usually one has to go much higher to get the result. In the next section we consider a technique of Gr¨obner basis, which is a more powerful tool. In some sense, it is a refined and improved version of the XL.
¨ 354CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS
Algorithm 11.1 XL(F, D) Input:  A system of polynomial equations F = {f1 = 0, . . . , fm = 0} of total degree d over the field F in variables x1 , . . . , xn ;  Parameter D; Output: a solution to the system F or the message “no solution found” Begin Dcurrent := d; Sol := ∅; repeat Extend: Multiply each equation fi ∈ F with all monomials of degree ≤ Dcurrent − d; Denote the system so obtained by Sys; Linearize: assign each monomial appearing in Sys a new variable, order the new variables such that xai go leftmost in the matrix representation of a system in blocks {xi , x2i , . . . }; Sys := Gauss(Sys); if exists a univariate equation f (xi ) = 0 in Sys then solve f (xi ) = 0 over F and obtain ai : f (ai ) = 0; Sol := Sol ∪ {(i, ai )}; if Sol = n then return Sol; end if Sys := Sys with ai substituted for xi ; else Dcurrent := Dcurrent + 1; end if until Dcurrent = D + 1 return “no solution found” End
11.1. POLYNOMIAL SYSTEM SOLVING
11.1.2
355
Gr¨ obner bases
In the previous section we saw how one can solve systems of nonlinear equations using linearization techniques. Speaking the language of algebraic geometry, we want to find elements of the variety V (f1 , ..., fm ), where V (f1 , ..., fm ) := {a ∈ Fn : ∀1 ≤ i ≤ m, fi (a) = 0} and F is a field. The target object of this section, Gr¨ obner basis technique, gives an opportunity to find this variety and also solve many other important problems like for example ideal membership, i.e. deciding whether a given polynomial may be obtained as a polynomial combination of the given set of polynomials. As we will see, the algorithm for finding Gr¨obner bases generalizes Gaussian elimination for linear systems on one side and Euclidean algorithm of finding the GCD of two univariate polynomials on the other side. We will see how this algorithm, the Buchberger’s algorithm, works and how Gr¨ obner bases can be applied for finding a variety (system solving) and some other problems. First of all, we need some definitions. *** should this go to Appendix? *** Definition 11.1.4 Let R := F[x1 , ...xn ] be a polynomial ring in n variables over the field F. An ideal in R is a subset of R with the following properties:  0 ∈ I;  for every f, g ∈ I : f + g ∈ I;  for every h ∈ R and every f ∈ I : h · f ∈ R. So the ideal I is a subset of R closed under addition and closed under multiplication with elements from R. Let f1 , . . . , fm ∈ R. It is easy to see that the object hf1 , . . . , fm i := {a1 f1 + · · · + am fm ai ∈ R ∀i} is an ideal. We say that hf1 , . . . , fm i is an ideal generated by the polynomials f1 , . . . , fm . From commutative algebra it is know that every ideal I has a finite system of generators, i.e. I = hf1 , . . . , fm i for some f1 , . . . , fm ∈ I. A Gr¨obner basis, that we define later, is a system of generators with special properties. A monomial in R is a product of the form xa1 1 · · · · · xann with a1 , . . . , an being nonnegative integers. Denote X = {x1 , . . . , xn } and by M on(X) the set of all monomials in R. Definition 11.1.5 A monomial ordering on R is any relation > on M on(X) such that  > is a total ordering on M on(X), i.e. any two elements of M on(X) are comparable;  > is multiplicative, i.e. X α > X β implies X α · X γ > X β · X γ for all αn 1 vectors γ with nonnegative integer entries, here X α = xα 1 · · · · · xn ;  > is a wellordering, i.e. every nonempty subset of M on(X) has a minimal element. Example 11.1.6 Here are some orderings that are frequently used in practice. 1. Lexicographic ordering induced by x1 > · · · > xn : X α >lp X β if and only if there exists an s such that α1 = β1 , . . . , αs−1 = βs−1 , αs > βs .
¨ 356CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS 2. Degree reverse lexicographic ordering induced by x1 > · · · > xn : X α >dp X β if and only if α := α1 +· · ·+αn > β1 +· · ·+βn =: β or if α = β and there exists an s such that αn = βn , . . . , αn−s+1 = βn−s+1 , αn−s < βn−s . 3. Block ordering or product ordering. Let X and Y be two ordered sets of variables, >1 a monomial ordering on F[X] and >2 a monomial ordering on F[Y ]. The block ordering on F[X, Y ] is the following: X α1 Y β1 > X α2 Y β2 if and only if X α1 >1 X α2 or if X α1 =1 X α2 and Y β1 >2 Y β2 . P Definition 11.1.7 Let > be a monomial ordering on R. Let f = α cα X α be a nonzero polynomial from R. Let α0 be such that cα0 6= 0 and X α0 > X α for all α 6= α0 with cα 6= 0. Then lc(f ) := cα0 is called the leading coefficient of f , lm(f ) := X α0 is called the leading monomial of f , lt(f ) := cα0 X α0 is called the leading term of f , moreover tail(f ) := f − lt(f ). Having these notions we are ready to define the notion of a Gr¨obner basis. Definition 11.1.8 Let I be an ideal in R. The leading ideal of I with respect to > is defined as L> (I) := hlt(f )f ∈ I, f 6= 0i. The L> (I) is abbreviated by L(I) if it is clear which ordering is meant. A finite subset G = {g1 , . . . , gm } of I is called a Gr¨ obner basis for I with respect to > if L> (I) = hlt(g1 ), . . . , lt(gm )i. We say that the set F = {f1 , . . . , fm } is a Gr¨obner basis if F is a Gr¨obner basis of the ideal hF i. Remark 11.1.9 Note that a Gr¨obner basis of an ideal is not unique. The socalled reduced Gr¨ obner basis of an ideal is unique. By this one means a Gr¨obner basis G in which all elements have leading coefficient equal to 1 and no leading term of an element g ∈ G divides any of the terms of g 0 , where g 6= g 0 ∈ G. Historically the first algorithm for computing Gr¨obner bases was proposed by Bruno Buchberger in 1965. In fact the very notion of the Gr¨obner basis was introduced by Buchberger in his Ph.D. thesis and was named after his Ph.D. advisor Wolfgang Gr¨ obner. In order to be able to formulate the algorithm we need two more definitions. Definition 11.1.10 Let f, g ∈ R \ {0} be two nonzero polynomials, and let lm(f ) and lm(g) be leading monomials of f and g resp. w.r.t some monomial ordering. Denote m := lcm(lm(f ), lm(g)). Then the spolynomial of these two polynomials is defined as spoly(f, g) = m/lm(f ) · f − lc(f )/lc(g) · m/lm(g) · g.
Remark 11.1.11 1. If lm(f ) = xa1 1 · . . . xann and lm(g) = xb11 · · · · · xbnn , then c1 cn m = x1 · . . . xn , where ci = max(ai , bi ) for all i. Therewith m/lm(f ) and m/lm(g) are monomials. 2. Note that if we write f = lc(f ) · lm(f ) + f 0 and g = lc(g) · lm(g) + g 0 , where lm(f 0 ) < lm(f ) and lm(g) < lm(g 0 ), then spoly(f, g) = m/lm(f ) · (lc(f ) · lm(f ) + f 0 ) − lc(f )/lc(g) · m/lm(g) · (lc(g) · lm(g) + g 0 ) = m · lc(f ) + m/lm(f ) · f 0 − m · lc(f ) − lc(f )/lc(g) · m/lm(g) · g 0 = m/lm(f ) · f 0 − lc(f )/lc(g) · m/lm(g) · g 0 . Therewith we “canceled out” the leading terms of both f and g.
11.1. POLYNOMIAL SYSTEM SOLVING
357
Example 11.1.12 In order to understand this notion better, let us see what are the spolynomials in the case of linear and univariate polynomials linear: Let R = Q[x, y, z] and a monomial ordering being lexicographic with x > y > z. Let f = 3x+2y−10z, g = x+5y−5z, then lm(f ) = lm(g) = x, m = x : spoly(f, g) = f − 3/1 · g = 3x + 2y − 10z − 3x − 15y + 15z = −13y + 5z and this is exactly what one would do to cancel out the variable x during the Gaussian elimination. univariate: Let R = Q[x]. Let f = 2x5 − x3 , g = x2 − 10x + 1, then m = x5 and spoly(f, g) = f − 2/1 · x3 · g = 2x5 − x3 − 2x5 + 20x4 − 2x3 = 20x4 − 3x3 and this is the first step in polynomial division algorithm, which is used in the Euclidean algorithm for finding gcd(f, g). To define the next notion we need for the Buchberger’s algorithm, we use the following result. Theorem 11.1.13 Let f1 , . . . , fm ∈ R \ {0} be nonzero polynomials in the ring R endowed with a monomial ordering < and let f ∈ R be some polynomial. Then there exist polynomials a1 , . . . , am , h ∈ R with the following properties: 1. f = a1 · f1 + · · · + am · fm + h; 2. lm(f ) ≥ lm(ai · fi ) for f 6= 0 and every i such that ai · fi 6= 0; 3. if h 6= 0, then lm(h) is not divisible by any of lm(ai · fi ). Moreover,if G = {f1 , . . . , fm } is a Gr¨ obner basis, then the polynomial h is unique. Definition 11.1.14 Let F = {f1 , . . . , fm } ⊂ R and f ∈ R. We define the normal form of f w.r.t F to be any h from Theorem 11.1.13. Notation is NF(f F ) := h. Remark 11.1.15 1. If R = F[x] and f1 := g ∈ R, then N F (f hgi) is exactly the remainder of division of the univariate polynomial f by the polynomial g. So the notion of a normal form generalizes the notion of a remainder for the case of a multivariate polynomial ring. 2. The notion of a normal form is uniquely defined only if f1 , . . . , fm is a Gr¨ obner basis. 3. Normal form has a very important property: f ∈ I ⇐⇒ NF(f G) = 0, where G is a Gr¨ obner basis of I. So by computing a Gr¨obner basis of the given ideal I and then computing the normal form of the given polynomial f we may decide whether f belongs to I or not. The algorithm for computing a normal form proceeds as in Algorithm 11.2. In Algorithm 11.2 the function Exists_LT_Divisor(F,h) returns an index i such that lt(fi ) divides lt(h) if such index exists and 0 otherwise. Note that the algorithm may also be adapted so that it returns the polynomial combination of fi ’s such that together with h it satisfies conditions (1)–(3) of Theorem 11.1.13.
¨ 358CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS Algorithm 11.2 NF(f F ) Input:  Polynomial ring R with monomial ordering <  Set of polynomials F = {f1 , . . . , fm } ⊂ R  Polynomial f ∈ R Output: a polynomial h which satisfies (1)–(3) of Theorem 11.1.13 for the set F and the polynomial f with some ai ’s from R Begin h := f ; i :=Exists_LT_Divisor(F, h); while h 6= 0 and i do h := h − lt(h)/lt(fi ) ∗ fi ; i :=Exists_LT_Divisor(F, h); end while return h End Example 11.1.16 Let R = Q[x, y] and the monomial ordering being lexicographic ordering with x > y. Let f = x2 − y 3 and F = {f1 , f2 } = {x2 + x + y, x3 + xy + y 3 }. At the beginning of Algorithm 11.2 h = f . Now Exists_LT_Divisor(F,h)=1 so we enter the whileloop. In the whileloop following assignment is made h := h−lt(h)/lt(f1 )·f1 = x2 −y 3 −x2 /x2 ·(x2 +x+y) = −x − y 3 − y. We compute again Exists_LT_Divisor(F,h)=0. So we do not enter in the second loop and h = −x − y 3 − y is a normal form of f we looked for. Now we are ready to formulate the Buchberger’s algorithm for finding a Gr¨obner basis of an ideal: Algorithm 11.3. The main idea of the algorithm is: if after “canceling” leading terms of the current pair (also called a critical pair ) we cannot “divide” the result by the current set, then add the result to this set and add all new pairs to the set of critical pairs. The next example shows the algorithm in action. Example 11.1.17 We take as a basis Example 11.1.16. So R = Q[x, y] with the monomial ordering being lexicographic ordering with x > y, and we have f1 = x2 + x + y, f2 = x3 + xy + y 3 . Initialization phase yields G := {f1 , f2 } and P airs := {(f1 , f2 )}. Now we enter the whileloop. We have to compute h = NF(spoly(f1 , f2 )G). First, spoly(f1 , f2 ) = x · f1 − f2 = x3 + x2 + xy − x3 − xy−y 3 = x2 −y 3 . As we know from Example 11.1.16, NF(x2 −y 3 G) = −x−y 3 −y and is nonzero. Therefore, we add f3 := h to G and add pairs (f3 , f1 ) and (f3 , f2 ) to P airs. Recall that pair (f1 , f2 ) is no longer in P airs, so now we have two elements there. In the second run of the loop we take the pair (f3 , f1 ) and remove it from P airs. Now spoly(f3 , f1 ) = −xy 3 − xy + x + y and NF(−xy 3 − xy + x + yG) = y 6 + 2y 4 − y 3 + y 2 =: f4 . We update the sets G and P airs. Now P airs = {(f3 , f2 ), (f4 , f1 ), (f4 , f2 ), (f4 , f3 )} and G = {f1 , f2 , f3 , f4 }. Next take the pair (f3 , f2 ). For this pair spoly(f3 , f2 ) = −x2 y 3 − x2 y + xy + y 3 and NF(spoly(f3 , f2 )G) = 0. It may be shown that likewise all the other pairs from the set P airs reduce to 0 w.r.t G. Therefore, the algorithm outputs G = {f1 , f2 , f3 , f4 } as a Gr¨obner basis of hf1 , f2 i w.r.t lexicographic ordering.
11.1. POLYNOMIAL SYSTEM SOLVING
359
Algorithm 11.3 Buchberger(F ) Input:  Polynomial ring R with monomial ordering <  Normal form procedure NF  Set of polynomials F = {f1 , . . . , fm } ⊂ R Output: Set of polynomials G ⊂ R such that G is a Gr¨obner basis of the ideal generated by the set F w.r.t monomial ordering < Begin G := {f1 , . . . , fm }; while P airs 6= empty do Select a pair (f, g) ∈ P airs; Remove the pair (f, g) from P airs; h := NF(spoly(f, g)G); if h 6= 0 then for all p ∈ G do Add pair (h, p) to P airs; end for Add h to G; end if end while return G End
Example 11.1.18 [CAS] Now we will show how to compute the above examples in Singular and Magma. In Singular one has to execute the following code: > ring r=0,(x,y),lp; > poly f1=x2+x+y; > poly f2=x3+xy+y3; > ideal I=f1,f2; > ideal GBI=std(I); > GBI; GBI[1]=y6+2y4y3+y2 GBI[2]=x+y3+y One may request computation of the reduced Gr¨obner basis by switching on the option option(redSB). In the above example GBI is already reduced. Now if we compute the normal form of f1f2 w.r.t GBI it should be zero. > NF(f1f2,GBI); 0 It is also possible to track computations for small examples using LIB"teachstd.lib";. One should add this line at the beginning of the above piece of code together with the line printlevel=1;, which makes program comments visible. Then one should use standard(I) instead of std(I) to see the run in detail. Similarly, NFMora(f1f2,I) should be used instead of NF(f1f2,I). In Magma the following pieace of code does the job: > P := PolynomialRing(Rationals(), 2, "lex"); > I:=[x^2+x+y,x^3+x*y+y^3]; > G:=GroebnerBabsis(I); > NormalForm(I[1]I[2],G);
¨ 360CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS
Now that we have introduced techniques necessary to compute Gr¨obner bases, let us demonstrate one of the main applications of Gr¨obner bases, namely polynomial system solving. The following result shows how one can solve a polynomial system of equations, provided one can compute a Gr¨obner basis w.r.t lexicographic ordering. Theorem 11.1.19 Let f1 (X) = · · · = fm (X) = 0 be a system of polynomial equations defined over F[X] with X = (x1 , . . . , xn ), such that it has finitely many solutions 1 . Let I = hf1 , . . . , fm i be an ideal defined by the polynomials in the system and let G be a Gr¨ obner basis for I with respect to >lp induced by xn < · · · < x1 . Then there are elements g1 , . . . , gn ∈ G such that n gn ∈ F[xn ], lt(gn ) = cn xm n , mn−1 gn−1 ∈ F[xn−1 , xn ], lt(gn−1 ) = cn−1 xn−1 , ... 1 g1 ∈ F[x1 , . . . , xn ], lt(g1 ) = c1 xm 1 .
for some positive integers mi , i = 1, . . . , n and elements ci ∈ F\{0}, i = 1, . . . , n.
It is clear how to solve the system I now. After computing G, first solve a (n) (n) univariate equation gn (xn ) = 0. Let a1 , . . . , aln be the roots. For every (n)
(n)
ai then solve gn−1 (xn−1 , ai ) = 0 to find possible values for xn−1 . Repeat this process until all the coordinates of all candidate solutions are found. The candidates form a finite set Can ⊆ Fn . Test all other elements of G on whether they vanish at elements of Can. If there is some g ∈ G that does not vanish at some can ∈ Can, then discard can from Can. Since the number of solutions is finite the above procedure terminates. Example 11.1.20 Let us be more specific and give a concrete example of how Theorem 11.1.19 can be applied. Turn back to Example 11.1.17. Suppose we want to solve a system of equations x2 + x + y = 0, x3 + xy + y 3 = 0 over the rationals. We compute a Gr¨obner basis of the corresponding ideal and obtain that elements f4 = y 6 +2y 4 −y 3 +y 2 and f3 = −x−y 3 −y belong to the Gr¨obner basis. Since f4 has finitely many solutions (at most 6 over the rationals) and for every fixed value of y f3 has exactly one solution of x, we actually know that our system has finitely many solutions, both over the rationals and its algebraic closure. In order to find solutions, we have to solve the univariate equation y 6 + 2y 4 − y 3 + y 2 = 0 for y. If we factorize, we obtain f4 = y 2 (y 4 + 2y 2 − y + 1), where y 4 + 2y 2 − y + 1 is irreducible over Q. So from the equation f4 = 0 we only get y = 0 as a solution. Then for y = 0 the equation f3 = 0 yields x = 0. Therefore, over rationals the given system has a unique solution (0, 0). Example 11.1.21 Let us consider another example. Consider the following 1 Rigorously speaking, we rewuire the system to have finitely many solutions in F. ¯ Such systems (ideals) are called zerodimensional.
11.1. POLYNOMIAL SYSTEM SOLVING
361
system over F2 : xy + x + y + z = 0, xz + yz + y = 0, x + yz + z = 0, x2 + x = 0, y 2 + y = 0, z 2 + z = 0. Note that the field equations x2 + x = 0, y 2 + y = 0, z 2 + z = 0 make sure that any solution for the first three equations actually lies in F2 . Since F2 is a finite field, we automatically get that the system above has finitely many solutions (in fact not more than 23 = 8). One can show that reduced Grbner basis (see Remark 11.1.9) w.r.t lexicographic ordering with x > y > z of the corresponding ideal is: G = {z 2 + z, y + z, x}. From this we obtain that the system in question has two solutions: (0, 0, 0) and (0, 1, 1). In Sections 11.2 and 11.2 we will see many more situations when Gr¨obner bases are applied in the solving context. Gr¨obner basis techniques are also used for answering many other important questions. To end this section, we give one such application. *** should this go to Section 11.3? *** Example 11.1.22 Sometimes it is needed to obtain explicit equations relating certain variables from given implicit ones. The following example is quite typical in algebraic cryptanalysis of block ciphers. One of the main building blocks of modern block ciphers are the socalled S–Boxes, local nonlinear transformations that in composition with other, often linear, mappings compose a secure block cipher. Suppose we have an S–Box S that transforms a 4bit vector into a 4bit vector in a nonlinear way as follows. Consider a nonzero binary vector x as an element in F24 via an identification of F42 and F24 = F2 [a]/ha4 + a + 1i done in a usual way, so that e.g. a vector (0, 1, 0, 0) is mapped to the primitive element a, and (0, 1, 0, 1) is mapped to a + a3 . Now if x is considered as an element of F24 the S–Box S maps it to y = x−1 and then considers it again as a vector via the above identification. The zero vector is mapped to the zero vector. Not going deeply into details, we just state that such a transformation can be represented over F2 as a system of quadratic equations that implicitly relate the input variables x with the output variables y. The equations are y0 x0 + y3 x1 + y2 x2 + y1 x3 + 1 = 0, y1 x0 + y0 x1 + y3 x1 + y2 x2 + y3 x2 + y1 x3 + y2 x3 = 0, y2 x0 + y1 x1 + y0 x2 + y3 x2 + y2 x3 + y3 x3 = 0, y3 x0 + y2 x1 + y1 x2 + y0 x3 + y3 x3 = 0, together with the field equations x2i + xi = 0 and yi2 + yi = 0 for i = 0, . . . , 3. The equations do not describe the part when 0 is mapped to 0, so only the inversion is modeled. In certain situations it is more preferable to have explicit relations that would should how the output variables y depend on the input variables x. For this the following technique is used. Consider the above equations as polynomials in the same polynomial ring F2 [y0 , . . . , y3 , x0 , . . . , x3 ] with y0 > · · · > y3 > x0 > · · · > x3 w.r.t the block ordering with blocks being y and xvariables, and the ordering is degree reverse lexicographic in each block (see Example 11.1.6). In
¨ 362CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS this ordering, each monomial in yvariables will be larger than any monomial in xvariable, regardless of their degree. This ordering is a socalled elimination ordering for y variables. The reduced Gr¨obner basis of the ideal composed of the above equations is x2i + xi , the field equations on x; (x0 + 1) · (x1 + 1) · (x2 + 1) · (x3 + 1), which provides that x should not be the allzero vector, and y3 + x1 x2 x3 + x0 x3 + x1 x3 + x2 x3 + x1 + x2 + x3 , y2 + x0 x2 x3 + x0 x1 + x0 x2 + x0 x3 + x2 + x3 , y1 + x0 x1 x3 + x0 x1 + x0 x2 + x1 x2 + x1 x3 + x3 , y0 + x0 x1 x2 + x1 x2 x3 + x0 x2 + x1 x2 + x0 + x1 + x2 + x3 , which give explicit relations on y in terms of x. Interestingly enough, the field equations together with the latter explicit equations describe the entire S–Box transformation, so the case 0 7→ 0 is also covered. Using similar techniques one may obtain other interesting properties of ideals, which come in hand in different applications.
11.1.3
Exercises
11.1.1 Let R = Q[x, y, z] and let F = {f1 , f2 } with f1 = x2 + xy + z 2 , f2 = y 2 + z and let f = x3 + 2y 3 − z 3 . The monomial ordering is degree lexicographic. Compute NF(f F ). Use the procedure NFMora from the Singular’s teachstd.lib to check your result 2 . 11.1.2 Let R = F2 [x, y, z] and let F = {f1 , f2 } with f1 = x2 + xy + z 2 , f2 = xy + z 2 . The monomial ordering is lexicographic. Compute a Gr¨obner basis of hF i. Use the procedure standard from the Singular’s teachstd.lib to check your result. 11.1.3 [CAS] Recall that in Example 11.1.20 we came to the conclusion that the only solution over the rationals for the system is (0, 0). Use Singular’s library solve.lib and in particular the command solve to find also complex solutions of this system. 11.1.4 Upgrade Algorithm 11.2 so that it also returns ai ’s from Theorem 11.1.19. 11.1.5 Prove the socalled product criterion: if polynomials f and g are such that lm(f ) and lm(g) are coprime, then NF(spoly(f, g){f, g}) = 0.
11.1.6 Do the following sets constitute a Gr¨obner basis: 2 By setting printing level appropriately, procedures of teachstd.lib enable tracking their run. Therewith one may see exactly what a corresponding algorithm is doing.
¨ 11.2. DECODING CODES WITH GROBNER BASES
363
1. F1 := {xy + 1, yz + x + y + 2} ⊂ Q[x, y, z] with the degree ordering being degree lexicographic? 2. F2 := {x+20, y +10, z +12, u+1} ⊂ F23 [x, y, z, u] with the degree ordering being the block ordering with blocks (x, y) and (z, u) and degree reverse lexicographic ordering inside the blocks?
11.2
Decoding codes with Gr¨ obner bases
As the first application of the Gr¨ obner basis method we consider decoding linear codes. For the clarity of presentation we make an emphasis on cyclic codes. We consider Cooper’s philosophy or the power sums method in Section 11.2.1 and the method of generalized Newton identities in Section 11.2.2. In Section 11.2.3 we provide a brief overview of methods for decoding general linear codes.
11.2.1
Cooper’s philosophy
Now we will give an introduction to the socalled Cooper’s philosophy or the power sums method. This method uses the special form of a paritycheck matrix of a cyclic code. The main idea is to write these parity check equations with unknowns for error positions and error values and then solve with respect to these unknowns by adding some natural restrictions on them. Let F = Fqm be the splitting field of X n − 1 over Fq . Let a be a primitive nth root of unity in F. If i is in the defining set of a cyclic code C (Definition ??), then (1, ai , . . . , a(n−1)i )cT = c0 + c1 ai + · · · + cn−1 a(n−1)i = c(ai ) = 0, for every codeword c ∈ C. Hence (1, ai , . . . , a(n−1)i ) is a parity check of C. Let {i1 , . . . , ir } be a defining set of C. Then a parity check matrix H of C can be represented as a matrix with entries in F (see also Section 7.5.3): 1 ai1 a2i1 . . . a(n−1)i1 1 ai2 a2i2 . . . a(n−1)i2 H= . . .. .. .. .. .. . . . . 1
air
a2ir
...
a(n−1)ir
Let c, r and e be the transmitted codeword, the received word and the error vector, respectively. Then r = c + e. Denote the corresponding polynomials by c(x), r(x) and e(x), respectively. If we apply the parity check matrix to r, we obtain sT := HrT = H(cT + eT ) = HcT + HeT = HeT , since HcT = 0, where s is the syndrome vector. Define si = r(ai ) for all i = 1, . . . , n. Then si = e(ai ) for all i in the complete defining set, and these si are called the known syndromes. The remaining si are called the unknown syndromes. We have that the vector s above has entries s = (si1 , . . . , sir ). Let t be the number of errors that occurred while transmitting c over a noisy channel. If the error vector is of weight t, then it is of the form e = (0, . . . , 0, ej1 , 0, . . . , 0, ejl , 0, . . . , 0, ejt , 0, . . . , 0).
¨ 364CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS More precisely there are t indices jl with 1 ≤ j1 < · · · < jt ≤ n such that ejl 6= 0 for all l = 1, . . . , t and ej = 0 for all j not in {j1 , . . . , jt }. We obtain siu = r(aiu ) = e(aiu ) =
t X
ejl (aiu )jl , 1 ≤ u ≤ r.
(11.2)
l=1
The aj1 , . . . , ajt but also the j1 , . . . , jt are the error locations, and the ej1 , . . . , ejt are the error values. Define zl = ajl and yl = ejl . Then z1 , . . . , zt are the error locations and y1 , . . . , yt are the error values and the syndromes in (11.2) become generalized power sum functions siu =
t X
yl zliu , 1 ≤ u ≤ r.
(11.3)
l=1
In the binary case the error values are yi = 1, and the syndromes are the ordinary power sums. Now we give a description of Cooper’s philosophy. As the receiver does not know how many errors occurred, the upper bound t is replaced by the errorcorrecting capacity e and some zl ’s are allowed to be zero, while assuming that the number of errors is at most the errorcorrecting capacity e. The following variables are introduced: X1 , . . . , Xr , Z1 , . . . , Ze and Y1 , . . . , Ye , where Xu stands for the syndrome siu , 1 ≤ u ≤ r; Zl stands for the error location zl for 1 ≤ l ≤ t, and 0 for t < l ≤ e; and finally Yl stands for the error value yl for 1 ≤ l ≤ t, and any element of Fq \ {0} for t < l ≤ e. The syndrome equations (11.2) are rewritten in terms of these variables as power sums: fu :=
e X
Yl Zliu − Xu = 0, 1 ≤ u ≤ r.
l=1
We also add some other equations in order to specify the range of values that can be achieved by our variables, namely: m
u := Xuq − Xu = 0, 1 ≤ u ≤ r, since sj ∈ F; *** add field equations in the Appendix *** ηl := Zln+1 − Zl = 0, 1 ≤ l ≤ e, since ajl are either nth roots of unity or zero; and λl := Ylq−1 − 1 = 0, 1 ≤ l ≤ e, since yl ∈ Fq \ {0}. We obtain the following set of polynomials in the variables X = (X1 , . . . , Xr ), Z = (Z1 , . . . , Ze ) and Y = (Y1 , . . . , Ye ): FC = {fu , u , ηl , λl : 1 ≤ u ≤ r, 1 ≤ l ≤ e} ⊂ Fq [X, Z, Y ].
(11.4)
The zerodimensional ideal IC generated by FC is called the CRHTsyndrome ideal associated to the code C, and the variety V (FC ) defined by FC is called the CRHTsyndrome variety, after Chen, Reed, Helleseth and Truong. We have
¨ 11.2. DECODING CODES WITH GROBNER BASES
365
V (FC ) = V (IC ). Initially decoding of cyclic codes was essentially brought to finding the reduced Gr¨ obner basis of the CRHTideal. Unfortunately, the CRHTvariety has many spurious elements, i.e. elements that do not correspond to error positions/values. It turns out that adding more polynomials to the CRHTideal gives an opportunity to eliminate these spurious elements. By adding polynomials χl,m := Zl Zm p(n, Zl , Zm ) = 0, 1 ≤ l < m ≤ e to FC , where p(n, X, Y ) =
n−1 X Xn − Y n = X i Y n−1−i , X −Y i=0
(11.5)
we ensure that for all l and m either Zl and Zm are distinct or at least one of them is zero. The resulting set of polynomials is FC0 := {fu , u , ηi , λi , χl,m : 1 ≤ u ≤ r, 1 ≤ i ≤ e, 1 ≤ l < m ≤ e} ⊂ Fq [X, Z, Y ]. (11.6) The ideal generated by FC0 is denoted by IC0 . By investigating the structure of IC0 and its reduced Gr¨ obner basis with respect to lexicographic order induced by X1 < · · · < Xr < Ze < · · · < Z1 < Y1 < · · · < Ye , the following result may be proven. Theorem 11.2.1 Every cyclic code C possesses a general errorlocator polynomial LC . This means that there exists a unique polynomial LC ∈ Fq [X1 , . . . , Xr , Z] that satisfies the following two properties: • LC = Z e + at−1 Z e−1 + · · · + a0 with aj ∈ Fq [X1 , . . . , Xr ], 0 ≤ j ≤ e − 1; • given a syndrome s = (si1 , . . . , sir ) ∈ Fr corresponding to an error of weight t ≤ e and error locations {k1 , . . . , kt }, if we evaluate the Xu = siu for all 1 ≤ u ≤ r, then the roots of LC (s, Z) are exactly ak1 , . . . , akt and 0 of multiplicity e − t, in other words LC (s, Z) = Z e−t
t Y
(Z − aki ).
i=1
Moreover, LC belongs to the reduced Gr¨ obner basis of the ideal IC0 and its is a unique element, which is a univariate polynomial in Ze of degree e. *** check this *** Having this polynomial, decoding of the cyclic code C reduces to univariate factorization. The main effort here is finding the reduced Gr¨obner basis of IC0 . In general this is infeasible already for moderate size codes. For small codes, though, it is possible to apply this technique successfully. Example 11.2.2 As an example we consider finding the general error locator polynomial for a binary cyclic BCH code C with parameters [15,7,5] that corrects 2 errors. This code has {1, 3} as a defining set. So here q = 2, m = 4, and n = 15. The field F16 is the splitting field of X 15 − 1 over F2 . In the above description we have to write equations for all syndromes that correspond to elements in
¨ 366CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS the complete defining set. Note that we may write the equations only for the elements from the defining set {1, 3} as all the others are just consequences of those. Following the description above we write generators FC0 of the ideal IC0 in the ring F2 [X1 , X2 , Z1 , Z2 ]: Z13 + Z23 − X2 , Z116+ Z2 − X1 , X1 − X1 , X216 − X2 , 16 Z − Z1 , Z216 − Z2 , 1 Z1 Z2 p(15, Z1 , Z2 ). We suppress the equations λ1 and λ2 as error values are over F2 . In order to find the general error locator polynomial we compute the reduced Gr¨obner basis G of the ideal IC0 with respect to the lexicographical order induced by X1 < X2 < Z2 < Z1 . The elements of G are: 16 X1 + X1 , X2 X 15 + X2 , 8 1 4 12 X2 + X2 X1 + X22 X13 + X2 X16 , Z2 X115 + Z2 , Z 2 + Z2 X1 + X2 X114 + X12 , 2 Z1 + Z2 + X1 According to Theorem 11.2.1 the general error correcting polynomial LC is a unique element of G of degree 2 with respect to Z2 . So LC ∈ F2 [X1 , X2 , Z] is LC (X1 , X2 , Z) = Z 2 + ZX1 + X2 X114 + X12 . Let us see how decoding using LC works. Let r = (1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1) be a received word with 2 errors. In the field F16 with a primitive element a, such that a4 + a + 1 = 0, a is also a 15th root of unity. Then the syndromes are s1 = a2 , s3 = a14 . Plug them into LC in place of X1 and X2 and obtain: LC (Z) = Z 2 + a2 Z + a6 . Factorizing yields LC = (Z + a)(Z + a5 ). According to Theorem 11.2.1, exponents 1 and 5 show exactly the error locations minus 1. So that errors occurred on positions 2 and 6. Example 11.2.3 [CAS] All the computations in the previous example may be undertaken using the library decodegb.lib of Singular. The following Singularcode yields the CRHTideal and its reduced Gr¨obner basis. > LIB "decodegb.lib"; > // binary cyclic [15,7,5] code with a defining set (1,3) > list defset=1,3; // defining set > int n=15; // length > int e=2; // errorcorrecting capacity > int q=2; // base field size > int m=4; // degree extension of the splitting field > int sala=1; // indicator to add additional equations as in (11.5)
¨ 11.2. DECODING CODES WITH GROBNER BASES
367
> def A=sysCRHT(n,defset,e,q,m,sala); > setring A; // set the polynomial ring for the system ’crht’ > option(redSB); // compute reduced Groebner basis > ideal red_crht=std(crht); Now, inspecting the ideal red_crht we see which polynomial should we take as a general errorlocator polynomial according to Theorem 11.2.1. > poly gen_err_loc_poly=red_crht[5]; At this point we have to change to a splitting field in order to do our further computations. > list l=ringlist(basering); > l[1][4]=ideal(a4+a+1); > def B=ring(l); > setring B; > poly gen_err_loc_poly=imap(A,gen_err_loc_poly); We can now process our received vector and compute the syndromes: > matrix rec[1][n]=1,1,0,1,0,0,0,0,0,0,1,1,1,0,1; > matrix checkrow1[1][n]; > matrix checkrow3[1][n]; > int i; > number work=a; > for (i=0; i<=n1; i++) { > checkrow1[1,i+1]=work^i; > } > work=a^3; > for (i=0; i<=n1; i++){ > checkrow3[1,i+1]=work^i; > } > // compute syndromes > matrix s1mat=checkrow1*transpose(rec); > matrix s3mat=checkrow3*transpose(rec); > number s1=number(s1mat[1,1]); > number s3=number(s3mat[1,1]); One can now substitute and solve > poly specialized_gen=substitute(gen_err_loc_poly,X(1),s1,X(2),s3); > factorize(specialized_gen); [1]: _[1]=1 _[2]=Z(2)+(a) _[3]=Z(2)+(a^2+a) [2]: 1,1,1 One can also check that a^5=a^2+a. So we have seen that it is theoretically possible to encode all the information needed for decoding a cyclic code in one polynomial. Finding this polynomial, though, is a quite challenging task. Moreover, note that the polynomial coefficients aj ∈ Fq [X1 , . . . , Xr ] may be quite dense, so it may be a problem even just to store the polynomial LC . The method, nevertheless, provides efficient closed formulas for small codes that are relevant in practice. This method can be adapted to correct erasures and to find the minimum distance of a code.
¨ 368CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS More information on these issues is in Notes.
11.2.2
Newton identities based method
In Section 7.5.2 and Section 7.5.3 we have seen how Newton identities can be used for efficient decoding of cyclic codes up to half the BCH bound. Now we want to generalize this method and be able to decode up to half the minimum distance. In order to correct more errors we have to pay a price. Systems we have to solve are no longer linear, but quadratic. This is exactly where Gr¨obner basis techniques come into play. Let us recall necessary notions. Note that we change the notation a bit, as it will be convenient for the generalization. The errorlocator polynomial is defined by σ(Z) =
t Y (Z − zl ). l=1
If this product is expanded σ(Z) = Z t + σ1 Z t−1 + · · · + σt−1 Z + σt , then the coefficients σi are the elementary symmetric functions in the error locations z1 , . . . , zt . X σi = (−1)i zj1 zj2 . . . zji , 1 ≤ i ≤ t. 1≤j1
The syndromes si and the coefficients σi satisfy the following generalized Newton identities, see Proposition 7.5.8: si +
t X
σj si−j = 0,
for all i ∈ Zn .
(11.7)
j=1
Now suppose that the complete defining set of the cyclic code contains the 2t consecutive elements b, . . . , b + 2t − 1 for some b. Then d ≥ 2t + 1 by the BCH bound. Furthermore the set of equations (11.7) for i = b + t, . . . , b + 2t − 1 is a system of t linear equations in the unknowns σ1 , . . . , σt with the known syndromes sb , . . . , sb+2t−1 as coefficients. Gaussian elimination solves the system of equations with complexity O(t3 ). In this way we obtain the APGZ decoding algorithm, see Section 7.5.3. See Example 7.5.11 for the algorithm in action on a small example. One may go further and obtain closed formulas or solve the decoding problem via the key equation, see Section ?? Section ??. All the above mentioned algorithms from Chapter 7 decode up to the BCH errorcorrecting capacity, which is often strictly smaller than the true capacity. A general method was outlined by Berlekamp, Tzeng, Hartmann, Chien, and Stevens, where the unknown syndromes were treated as variables. We have si+n = si ,
for all i ∈ Zn ,
since si+n = r(ai+n ) = r(ai ). Furthermore sqi = (e(ai ))q = e(aiq ) = sqi ,
for all i ∈ Zn ,
¨ 11.2. DECODING CODES WITH GROBNER BASES and σiq
m
= σi ,
369
for all 1 ≤ i ≤ t.
So the zeros of the following set of polynomials N ewtont in the variables S1 , . . . , Sn and σ1 , . . . , σt are considered. qm σ − σi , i Si+n − Si , N ewtont q − Sqi , S i Pt Si + j=1 σj Si−j ,
for for for for
all all all all
1 ≤ i ≤ t, i ∈ Zn , i ∈ Zn , i ∈ Zn .
(11.8)
Solutions of N ewtont are called generic, formal or onestep. Computing these solutions is considered as a preprocessing phase which has to be performed only one time. For the actual decoder for every received word r the variables Si are specialized to the actual value si (r) for i ∈ SC . Alternatively one can solve N ewtont together with the polynomials Si − si (r) for i ∈ SC . This is called online decoding. Note that obtaining general errorlocator polynomial as in the previous subsection is an example of formal decoding: this polynomial has to be found only once. Example 11.2.4 Let us consider an example of decoding using Newton identities and such that the APGZ algorithm is not applicable. We consider a 3error correcting cyclic code of length 31 with a defining set {1, 5, 7}. Note that BCH errorcorrecting capacity of this code is 2. We are aiming now at correcting 3 errors. Let us write the corresponding ideal: σ1 S31 + σ2 S30 + σ3 S29 + S1 , σ1 S1 + σ2 S31 + σ3 S30 + S2 , σ1 S2 + σ2 S1 + σ3 S31 + S3 , σ1 Si−1 + σ2 Si−2 + σ3 Si−3 + Si , 4 ≤ i ≤ 31, 32 σ i + σi , i = 1, 2, 3, Si+31 + Si , for all i ∈ Z31 , 2 Si + S2i , for all i ∈ Z31 , Note that the equations Si+31 = Si , and 2 S1 + S2 , S14 + S4 , 2 S3 + S6 , S34 + S12 , 2 S + S10 , S54 + S20 , 52 S7 + S14 , S74 + S28 , S34 + S12 , S32 + S6 , 2 4 S11 + S22 , S11 + S13 , 2 4 + S29 , S + S , S 30 15 15 2 + S31 . S31
Si2 = S2i imply, S18 + S8 , S116 + S16 , 8 S3 + S24 , S316 + S17 , S58 + S9 , S516 + S18 , 8 S7 + S25 , S716 + S19 , S38 + S24 , S316 + S17 , 16 8 + S26 , S11 + S21 , S11 8 16 S15 + S27 , S15 + S23 ,
Our intent is to write σ1 , σ2 , σ3 in terms of known syndromes S1 , S5 , S7 . The next step would be to compute the reduced Gr¨obner basis of this system with respect to some elimination order induced by S31 > · · · > S8 > S6 > S4 > · · · > S2 > σ1 > σ2 > σ3 > S7 > S5 > S1 . Unfortunately the computation is quite time consuming and the result is too huge to illustrate the idea. Rather, we
¨ 370CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS do online decoding, i.e. for a concrete received r compute syndromes S1 , S5 , S7 , plug the values into the system and then find σ’s. Let r = (0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1) be a received word with three errors. So the known syndromes we need are s1 = a7 , s5 = a25 and s7 = a29 . Substitute these values into the system above and compute the reduced Gr¨obner basis of the system. The reduced Gr¨obner basis with respect to the degree reverse lexicographic order (here it is possible to go without an elimination order, see Remark ??) restricted to the variables σ1 , σ2 , σ3 is σ3 + a5 , σ2 + a3 , σ1 + a7 Corresponding values for σ’s gives rise to the error locator polynomial: σ(Z) = Z 3 + a7 Z 2 + a3 Z + a5 . Factoring this polynomial yields three roots: a4 , a10 , a22 , which indicate error positions 5, 11, and 23. Note also that we could have worked only with the equations for S1 , S5 , S7 , S3 , S11 , S15 , S31 , but the Gr¨ obner basis computation is harder then. Example 11.2.5 [CAS] The following program fulfils the above computation using decodegb.lib from Singular. > LIB"decodegb.lib"; > int n=31; // length > list defset=1,5,7; // defining set > int t=3; // number of errors > int q=2; // base field size > int m=5; // degree extension of the splitting field > def A=sysNewton(n,defset,t,q,m); > setring A; > // change the ring to work in the splitting field > list l=ringlist(basering); > l[1][4]=ideal(a5+a2+1); > def B=ring(l); > setring B; > ideal newton=imap(A,newton); > matrix rec[1][n]=0,0,1,0,0,1,1,1,1,0,1,1,0,0,1,1,0,1,0,0,0,0,0,1,0,0,1,1,0,0,1; > // compute the paritycheck rows for defining set (1,5,7) > // similarly to the example with CRHT > ... > // compute syndromes s1,s5,s7 > // analogously to the CRHTexample > ... > // substitute the known syndromes in the system > ideal specialize_newton; > for (i=1; i<=size(newton); i++) { > specialize_newton[i]=substitute(newton[i],S(1),s1,S(5),s5,S(7),s7); > }
¨ 11.2. DECODING CODES WITH GROBNER BASES > > > > > > > >
371
option(redSB); // find sigmas ideal red_spec_newt=std(specialize_newton); // identify values of sigma_1m sigma_2, and sigma_3 // find the roots of the errorlocator ring solve=(2,a),Z,lp;minpoly=a5+a2+1; poly error_loc=Z3+(a4+a2)*Z2+(a^3)*Z+(a^2+1); // sigma’s plugged in factorize(error_loc);
So as we see, by using Gr¨ obner basis it is possible to go beyond the BCH errorcorrecting capacity. The price paid is the complexity of solving quadratic, as opposed to linear, systems. *** more stuff in notes ***
11.2.3
Decoding arbitrary linear codes
Now we will outline a couple of ideas that may be used for decoding arbitrary linear codes up to the full errorcorrecting capacity. Decoding affine variety codes with FitzgeraldLax The following method generalizes ideas of Cooper’s philosophy to arbitrary linear codes. In this approach the main notion is the affine variety code. Let P1 , . . . , Pn be points in Fsq . It is possible to compute a Gr¨obner basis of an ideal I ⊆ Fq [U1 , . . . , Us ] of polynomials that vanish exactly at these points. Define Iq := I + hX1q − X1 , . . . , Xsq − Xs i. So Iq is a 0dimensional ideal. We have V (Iq ) = {P1 , . . . , Pn }. An affine variety code C(I, L) = φ(L) is an image of the evaluation map φ : R → Fnq , f¯ 7→ (f (P1 ), . . . , f (Pn )), where R := Fq [U1 , . . . , Us ]/Iq , L is a vector subspace of R and f¯ is the coset of f in Fq [U1 , . . . , Us ] modulo Iq . It is possible to show that every qary linear [n, k] code, or equivalently its dual, can be represented as an affine variety code for certain choice of parameters. See Exercise 11.2.2 for such a construction in the case of cyclic codes. In order to write a system of polynomial equations similar to the one in Section 11.2.1 one needs to generalize the CRHT approach to affine codes. Similarly to the CRHT method the system of equations (or equivalently the ideal) is composed of the “paritycheck” part and the “constraints” part. Paritycheck part is constructed according to the evaluation map φ. Now, as can be seen from Exercise 11.2.2, the points P1 , . . . , Pn encode positions in a vector, similarly to how ai encode positions in the case of a cyclic code, a being a primitive nth root of unity. Therefore, one needs to add polynomials (gl (Xk1 , . . . , Xks ))l=1,...,m;k=1,...,t for every error position. Adding other natural constraints, like field equations on error values, and then computing a Gr¨obner basis of the combined ideal IC w.r.t certain elimination ordering, it is possible to recover both error positions (i.e. values of “error points”) and error values. In general, finding I and L is quite technical and it turns out that for random codes this method is quite poor, because of the complicated structure of IC . The method may be quite efficient, though, if a code has more structure, like in the case of geometric codes (e.g. Hermitian codes). We mention also that there
¨ 372CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS are improvements of the approach of Fitzgerald and Lax, which follow the same idea as the improvements for the CRHTmethod. Namely, one adds polynomials that ensure that the error locations are different. It can be proven that affine variety codes possess the socalled multidimensional general errorlocator polynomial, which is a generalization of the general errorlocator polynomial from Theorem 11.2.1. Decoding by embedding in an MDS code Now we briefly outline a method that provides a system for decoding that is composed of at most quadratic equations. The main feature of the method is that we do not need field equations for the solution to lie in a correct domain. Let C be an Fq linear [n, k] code with error correcting capacity e. Choose a parity check matrix H of C. Let h1 , . . . , hr be the rows of H. Let b1 , . . . , bn be a basis of Fnq . Let Bs be the s × n matrix with b1 , . . . , bs as rows, then B = Bn . We say that b1 , . . . , bn is an ordered MDS basis and B an MDS matrix if all the s × s submatrices of Bs have rank s for all s = 1, . . . , n. Note that an MDS basis for Fnq always exists if n ≤ q. By extending an initial field to a sufficiently large degree, we may assume that an MDS basis exists there. Since the parameters of a code do not change when going to a scalar extension, we may assume that our code C is defined over this sufficiently large Fq with q ≥ n. Each row hi is then a linear combination of Pnthe basis b1 , . . . , bn , that is there are constants aij ∈ Fq such that hi = j=1 aij bj . In other words H = AB where A is the r × n matrix with entries aij . For every i and j, bi ∗ bj is a linear combination of thePbasis vectors b1 , . . . , bn , so there are constants n ij ij µij l=1 µl bl . The elements µl ∈ Fq are called the l ∈ Fq such that bi ∗ bj = structure constants of the basis b1 , . . . , bn . Linear functions Uij in the variables Pn U1 , . . . , Un are defined as Uij = l=1 µij l Ul . Definition 11.2.6 For the received vector r the ideal J(r) in the ring Fq [U1 , . . . , Un ] is generated by the elements Pn − sj (r) for j = 1, . . . , r, l=1 ajl Ul where s(r) is the syndrome of r. The ideal I(t, U, V ) in the ring Fq [U1 , . . . , Un , V1 , . . . , Vt ] is generated by the elements Pt − Ui,t+1 for i = 1, . . . , n. j=1 Uij Vj Let J(t, r) be the ideal in Fq [U1 , . . . , Un , V1 , . . . , Vt ] generated by J(r) and I(t, U, V ). Now we are ready to state the main result of the method. Theorem 11.2.7 Let B be an MDS matrix with structure constants µij l and linear functions Uij . Let H be a parity check matrix of the code C such that H = AB as above. Let r = c + e be a received word with c in C the codeword sent and e the error vector. Suppose that the weight of e is not zero and at most e. Let t be the smallest positive integer such that J(t, r) has a solution (u, v) ¯ q . Then wt(e) = t and the solution is unique satisfying u = Be. The over F error vector is recovered as e = B −1 u.
¨ 11.2. DECODING CODES WITH GROBNER BASES
373
So as we see, although we did not impose any field equations neither on U − nor on V −variables, we still are able to obtain a correct solution. For the case of cyclic codes by going to a certain field extension Fq it may be shown that the system I(t, U, V ) actually defines the generalized Newton identities. Therefore one of the corollaries of the above theorem that it is actually possible to work without the field equations in the method of Newton identities. Decoding by normal form computations Another method for arbitrary linear codes has a different approach to how one represents coderelated information. Below we outline the idea for binary codes. Let [X] be a commutative monoid generated by X = {X1 , . . . , Xn }. The following mapping associates a vector of reduced exponents to a monomial: ψ : [X] → Fn2 , Q n ai i=1 Xi 7→ (a1 mod 2, . . . , an mod 2). Now, let {w1 , . . . , wk } be rows of a generator matrix G of the binary [n, k] code C with the errorcorrecting capacity e. Consider an ideal IC ⊆ K[X1 , . . . , Xn ], where K is an arbitrary field: I := hX w1 − 1, . . . , X wk − 1, X12 − 1, . . . , Xn2 − 1i. So the ideal IC encodes the information about the code C. The next theorem shows how one decodes using IC . Theorem 11.2.8 Let GB be the reduced Gr¨ obner basis of IC w.r.t some degree compatible monomial ordering <. If wt(ψ(N F (X a , GB))) ≤ e, then ψ(N F (X a , GB)) is the error vector corresponding to the received word ψ(X a ), i.e. ψ(X a ) − ψ(N F (X a , GB)) is the codeword of C, closest to ψ(X a ). Note that IC is a binomial ideal, and therefore GB is also a binomial ideal. For binomial ideals a normal form of a monomial is again a monomial. So the computation in the theorem above are welldefined. Using the special structure of IC it is possible to improve on Gr¨obner basis computations to obtain GB, compared to usual techniques. It is remarkable that the coderelated information as well as a solution to the decoding problem is represented by exponents of monomials, whereas in all the methods we considered before these data are encoded as values of certain variables.
11.2.4
Exercises
11.2.1 [CAS] Consider a binary cyclic code of length 21 with a defining set (1, 3, 7, 9). This code has parameters [21, 7, 8], see Example 7.4.8 and Example 7.4.17. The BCH bound is 5, so we cannot correct more than 2 errors with the methods from Chapter 7. Use the full errorcorrection capacity and correct 3 errors in some random codeword using methods from Section 11.2.1, Section 11.2.2, and decodegb.lib from Singular. Note that finding the general errorlocator polynomial is very intense, therefore use online decoding in the CRHTmethod: plug in concrete values of syndromes before computing a Gr¨obner basis. 11.2.2 Show how a cyclic code may be considered as an affine variety code from Section 11.2.3.
¨ 374CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS 11.2.3 Using a method of normal forms decode one error in a random codeword of the Hamming code (Example 2.2.14). Try different coefficient fields, as well as different monomial orderings. Do you always get the same result?
11.3
Algebraic cryptanalysis
In the previous section we have seen how polynomial system solving (via Gr¨obner bases) is used in the problem of decoding linear codes. In this section we briefly highlight another interesting application of polynomial system solving. Namely, we will be talking about algebraic cryptanalysis of block ciphers. Block ciphers were introduced in Chapter 10 as one of the main tools for providing secure symmetric communication. There we also mentioned that there exist methods for cryptanalyzing block ciphers, i.e. distinguishing them from random permutations and using this for recovering secret key used for the encryption. Traditional methods of cryptanalysis are statistical in nature. A cryptanalyst or attacker queries a cipher seen as a blackbox and set up with an unknown key with (possibly chosen) plaintexts and receives corresponding ciphertexts. By collecting many such pairs a cryptanalyst hopes to find statistical patterns that would distinguish the cipher in question from a random permutation. Algebraic cryptanalysis takes another approach. In this approach a cryptanalyst writes down a system of polynomial equations over a finite field (usually F2 ), which corresponds to the cipher in question via modeling operations done by the cipher during the encryption process (and also key schedule) as algebraic (polynomial) equations. Therewith the obtained system of equations reflects the encryption process; plaintext and ciphertext are parameters of the system; key is the unknown variable represented e.g. by bit variables. After plugging in actual plaintext/ciphertext the system should yield the unknown secret key as a solution. In theory, provided that plaintext and key lengths coincide, an attacker needs only one pair of plaintext/ciphertext to recover the key3 . This feature distinguishes algebraic approach from the statistical one, where an attacker usually needs many pairs to observe some statistical pattern. We proceed as follows. In Section 11.3.1 we describe a toy cipher, which will then be used to illustrate the idea outlined above. We will see how to write equations for the toy cipher in Section 11.3.2. We will also see that it may be possible to write equations in different ways, which can be important for actual solving. In Section 11.3.3 we address the question of writing equations for an arbitrary SBox.
11.3.1
Toy example
As a toy block cipher we will take an iterative (Definition 10.1.9) block cipher (Definition 10.1.3) with text/key length of 16 bits and a tworound encryption. Our toy cipher is an SPnetwork (Definition 10.1.12). Namely in every round we have a layer of local substitutions (SBoxes) followed by a permutation layer. Specifically, the encryption algorithm proceeds as in Algorithm 11.4. In this Algorithm SBox inherits the main idea of the SBox in the AES, see Section 10.1.4. Namely, we divide the state vector w := (w0 , . . . , w15 ) into four blocks of 4 consecutive bits. Then each block of four bits is considered as an 3 He/she
may need a few pairs in case the size of a plaintext and key do not coincide
11.3. ALGEBRAIC CRYPTANALYSIS
375
∼ F2 [x]/hx4 + x + 1i. The SBox then takes this number element of the field F16 = and outputs an inverse in F16 for nonzero inputs, or 0 ∈ F16 otherwise. The so obtained number is then interpreted again as a vector over F2 of length 4. Now the permutation layer represented by Perm acts on the entire 16bit state vector. The bit at position i, 0 ≤ i ≤ 15 is moved to position P os(i), where 4 · i mod 15, 0 ≤ i ≤ 14, P os(i) = (11.9) 15, i = 15. So Perm(w) = (wP os(1) , . . . , wP os(15) ). Interestingly enough, this permutation provides optimal diffusion in a sense that full dependency is achieved already after 2 rounds, see Exercise 11.3.1. Schematically the encryption process of our toy cipher is depicted on Figure ... . *** add figure *** Algorithm 11.4 Toy cipher encryption Input: A 16bit plaintext p and a 16bit key k. Output: A 16bit ciphertext c. Begin Perform initial key addition: w := p ⊕ k =AddKey(p, k). for i = 1, . . . , 2 do Perform Sbox substitution: w :=SBox(w). Perform a permutation w :=Perm(w). Add the key: w :=AddKey(w, k) = w ⊕ k. end for The ciphertext is c := w. return c End
11.3.2
Writing down equations
Now let us turn to the question of how to write a system of equations that describes the encryption algorithm as in Algorithm 11.4. We would like to write equations on the bit level, i.e. over F2 . Denote by p = (p0 , . . . , p15 ) and c = (c0 , . . . , c15 ) the plaintext and ciphertext variables that appear as parameters in our system. Then k = (k0 , . . . , k15 ) are unknown key variables. Let xi = (xi,0 , . . . , xi,15 ), i = 0, 1 be the variables representing result of bitwise key addition, yi = (yi,0 , . . . , yi,15 ), i = 1, 2 be variables representing outcome of the SBoxes, and zi = (zi,0 , . . . , zi,15 ), i = 1, 2 be results of the permutation layer. Now we can write the encryption process as the following system: x0 = p + k, yi = SBox(xi−1 ), i = 1, 2, zi = P erm(yi ), i = 1, 2, x1 = z1 + k, c = z2 + k.
(11.10)
Here SBox and P erm are some polynomial functions that act on variablevectors according to Algorithm 11.4.
¨ 376CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS There are three operations that are performed in the algorithm: bitwise key addition, substitution via four 4bit SBoxes, and the permutation. The key addition is represented trivially as above and one can write it on the bit level as, e.g. in the initial key addition: x0,j = pj + kj , 0 ≤ j ≤ 15. The permutation P erm also does not pose any problem. According to (11.9) we have that the blocks zi = P erm(yi ), i = 1, 2 above are written as zi,j = yi,P os−1 (j) , 0 ≤ j ≤ 15, where P os−1 (j) may be easily computed and in fact in this case we have P os−1 = P os. An interesting question is how to write equations over F2 that would describe the SBox transformation SBox. Since SBox is composed of four parallel SBoxes that perform inversion in F16 , we may concentrate on writing equation for one SBox. Let a = (a0 , a1 , a2 , a3 ) be input bits of the SBox and b = (b0 , b1 , b2 , b3 ) are the output bits. The way we defined SBox, we should consider a 6= 0 as an element of F16 and then compute b = a−1 in F16 . Afterwards we regard b as a vector in F42 . The allzero vector is mapped to the allzero vector. The describing equation for inversion over F16 for the case a 6= 0 is obviously simply a · b = 1 or, incorporating the case a = 0, b = a14 . Let us concentrate on the case a 6= 0. We would like to rewrite the equation a · b = 1 over F16 into a system of equations over F2 which involves the bit variables ai ’s and bj ’s. In Example 11.1.22 we have seen what these equations are. But how can we obtain those? Using the identification F16 ∼ = F2 [x]/hx4 + x + 1i we identify vectors 4 (a0 , a1 , a2 , a3 ) and (b0 , b1 , b2 , b3 ) from F2 with a = a0 + a1 x + a2 x2 + a3 x3 and b = b0 + b1 x + b2 x2 + b3 x3 . Now having the rule x4 + x + 1 = 0 in mind we have to perform the multiplication a · b and collect the coefficients for the exponents of x. We have (considering that x4 = x + 1, x5 = x2 + x, x6 = x3 + x2 ): a · b = (a0 + a1 x + a2 x2 + a3 x3 ) · (b0 + b1 x + b2 x2 + b3 x3 ) = a0 b0 + (a0 b1 + a1 b0 )x + (a0 b2 + a2 b0 + a1 b1 )x2 + (a0 b3 + a3 b0 + a1 b2 + a2 b1 )x3 + +(a1 b3 + a3 b1 + a2 b2 )x4 + (a2 b3 + a3 b2 )x5 + a3 b3 x6 = = a0 b0 + (a0 b1 + a1 b0 )x + (a0 b2 + a2 b0 + a1 b1 )x2 + (a0 b3 + a3 b0 + a1 b2 + a2 b1 )x3 + +(a1 b3 + a3 b1 + a2 b2 )(x + 1) + (a2 b3 + a3 b2 )(x2 + x) + a3 b3 (x3 + x2 ) = (a0 b0 + a1 b3 + a3 b1 + a2 b2 ) + (a0 b1 + a1 b0 + a1 b3 + a2 b2 + a2 b3 + a3 b1 + a3 b2 )x+ +(a0 b2 + a1 b1 + a2 b0 + a2 b3 + a3 b2 + a3 b3 )x2 + (a0 b3 + a1 b2 + a2 b1 + a3 b0 + a3 b3 )x3 . So the vector representation of the product a·b is (a0 b0 +a1 b3 +a3 b1 +a2 b2 , a0 b1 + a1 b0 +a1 b3 +a2 b2 +a2 b3 +a3 b1 +a3 b2 , a0 b2 +a1 b1 +a2 b0 +a2 b3 +a3 b2 +a3 b3 , a0 b3 + a1 b2 + a2 b1 + a3 b0 + a3 b3 ). The vector representation of 1 ∈ F16 is (1, 0, 0, 0). By comparing the corresponding vector entries we have the following system over F2 that describes the SBox: a0 b0 + a1 b3 + a3 b1 + a2 b2 = 1, a0 b1 + a1 b0 + a1 b3 + a2 b2 + a2 b3 + a3 b1 + a3 b2 = 0, a0 b2 + a1 b1 + a2 b0 + a2 b3 + a3 b2 + a3 b3 = 0, a0 b3 + a1 b2 + a2 b1 + a3 b0 + a3 b3 = 0. In order to fully describe the SBox we must recall that our bit variables ai ’s and bj ’s live in F2 . Therefore the field equations a2i + ai = 0 and b2i + bi = 0
11.3. ALGEBRAIC CRYPTANALYSIS
377
for 0 ≤ i ≤ 3 have to be added. So now we have obtained exactly the implicit equations as in Example 11.1.22. By adding field equations for all participating variables to the equations we introduced above, we obtain a full description of the toy cipher in assumption that no zeroinversion occurs in SBoxes, a probability of this event is computed in Exercise 11.3.2. Having a pair (p, c) encoded with an unknown key k we may plug in the values of p and c into the system (11.10) and try to solve for the unknowns and in particular for the unknowns k. Work out Exercise 11.3.3 to see the details. Going back to Example 11.1.22 we recall that it is possible to obtain explicit relations between the inputs and outputs. Note also that these relations now also include the case 0 7→ 0, if we remove the equation (a0 +1)(a1 +1)(a2 +1)(a3 +1) = 0. These explicit equations are: b0 b1 b2 b3
= a0 a1 a2 + a1 a2 a3 + a0 a2 + a1 a2 + a0 + a1 + a2 + a3 , = a0 a1 a3 + a0 a1 + a0 a2 + a1 a2 + a1 a3 + a3 , = a0 a2 a3 + a0 a1 + a0 a2 + a0 a3 + a2 + a3 , = a1 a2 a3 + a0 a3 + a1 a3 + a2 a3 + a1 + a2 + a3 .
These equations may be useful in the following approach. By having explicit equations of degree three that describe the SBoxes, one may obtain equations of degree 3 · 3 = 9 in the key variables only. Indeed, one should do consecutive substitutions from equation to equation in the system (11.10). One proceeds by substituting corresponding bit variables from x0 = p + k to y1 = SBox(x0 ), therewith obtaining relations of the form y1 = f (p, k) of degree three in k (p is assumed to be known as usual). Then substitute y1 = f (p, k) to z1 = P erm(y1 ) and then these to x1 = z1 + k. One obtains relations of the form x1 = g(p, k) again of degree three in k. Now the next substitution of x1 = g(p, k) to y2 = SBox(x1 ) increases the degree. Namely, because g is of degree three and SBox is of degree three we obtain equations y2 = h(p, k) of degree 3·3 = 9. The following substitutions do not increase the degree, since all the following equations are linear. The reason for us wanting to obtain such equations in key variables only is a possibility to use more than one pair of plain/ciphertext encoded with the same unknown key. By doing the above process for each such pair, we obtain each time 16 equations of degree 9 in the key variables k (the key stays the same). Note that if we would use the implicit representation we could not eliminate the “intermediate” variables, such as x0 , y1 , z1 , etc. Moreover, these intermediate variables depend on parameters p (and c), so these variables are all different for different plaintext/ciphertext pairs. The idea of the latter approach is to keep the number of variables as small as possible, but increase the number of equations that relate them. In the theory and practice of solving polynomial systems it has been noted that solving more overdetermined (i.e. more equations than variables) systems has a positive effect on complexity and thus on the success of solving a system in question. Still, degree9 equations are too hard to attack. We would like to reduce the degree of our equations. Below we outline a general principle, known as the “meetinthemiddle” principle, to reduce the degree. As the name suggests, we would like to obtain some relations between variables in the middle, rather than at the end of encryption. For this we need to invert the second half of a cipher in question. In our case this means to invert the second round. We have
¨ 378CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS already noted that P erm = P erm−1 . Also since the SBox transformation is an inversion in F16 with 0 7→ 0 we have that SBox = SBox−1 . Now similarly with the above substitution procedure, we do “forward” substitutions: x0 = p + k → y1 = SBox(x0 ) → z1 = P erm(y1 ), obtaining at the end equations z1 = F (p, k) of degree 3, and then “backward” substitutions z2 = c + k → y2 = P erm(z2 ) → x1 = SBox(y2 ) → z1 = x1 + k, obtaining equations z1 = G(c, k) also of degree 3. Equating the two one obtains a system of 16 equations F (p, k) = G(c, k) of degree 3 in key variables k only. Repeating this process for each plain/ciphertext pair, one may obtain as many equations (each time a multiple of 16) as one wants. One should not forget, of course, to include the field equations each time to make sure that the values of variables stay in F2 . Exercise 11.3.4 elaborates on solving using this approach.
11.3.3
General SBoxes
In the previous section we have seen how to write equations for the SBox given by the inversion function in the field F16 . Although this idea was employed in the AES, a widely used cipher, cf. Section 10.1.4, this is not a standard way to define SBoxes in block ciphers. Usually SBoxes are defined via socalled lookup tables, i.e. tables which explicitly prescribe an output value to a given input value. Whereas we used algebraic structure of the toy cipher in Section 11.3.1 to derive equations, it is still not clear from that exposition how to write SBox equations in the more general case of lookup table definitions. As an illustrating example we will use a 3bit SBox. This SBox is even smaller than the one employed in our toy cipher. Still it has been proposed in one of the socalled lightweight block ciphers PrintCIPHER. The lookup table for this SBox, call it S, is as follows: x S(x)
0 0
1 1
2 3
3 6
4 7
5 4
6 5
7 2
Here we used decimal representation for length3 binary vectors. For example, the SBox maps the vector 2 = (0, 1, 0) to the vector 3 = (1, 1, 0). One method we can use to obtain explicit relations for the output values is as follows. The SBox S is a function S : F32 → F32 , which can be seen as a collection of functions Si : F32 → F2 , i = 0, 1, 2 mapping input vectors to the bits at position 0, 1, and 2 resp. It is known that *** recall?! *** that each function defined over a finite field is actually a polynomial function. Let us find a polynomial describing the function S0 . The lookup table in this case is as follows: x 0 1 2 3 4 5 6 7 S0 (x) 0 1 1 0 1 0 1 0 Denote by x0 , x1 , x2 the input bits. We have S0 (x0 , x1 , x2 ) = S0 (0, 0, 0) · (x0 − 1)(x1 − 1)(x2 − 1)+ +S0 (1, 0, 0) · x0 (x1 − 1)(x2 − 1)+ +S0 (0, 1, 0) · (x0 − 1)x1 (x2 − 1) + S0 (1, 1, 0) · x0 x1 (x2 − 1)+ S0 (0, 0, 1) · (x0 − 1)(x1 − 1)x2 + S0 (1, 0, 1) · x0 (x1 − 1)x2 + S0 (0, 1, 1) · (x0 − 1)x1 x2 + S0 (1, 1, 1) · x0 x1 x2 .
11.3. ALGEBRAIC CRYPTANALYSIS
379
Indeed, by assigning concrete values (v0 , v1 , v2 ) to (x0 , x1 , x2 ) we obtain S0 (v0 , v1 , v2 ) = S0 (v0 , v1 , v2 ) · 1 · 1 · 1 and in each other summand at least one factor will evaluate to zero canceling that summand. Substituting concrete values for S0 from the lookup table, we obtain: S0 (x0 , x1 , x2 ) = x0 (x1 − 1)(x2 − 1) + (x0 − 1)x1 (x2 − 1) + (x0 − 1)(x1 − 1)x2 + +(x0 − 1)x1 x2 = x1 x2 + x0 + x1 + x2 . Analogously we obtain polynomial expressions for S1 and S2 : S1 (x0 , x1 , x2 ) = x0 x2 + x1 + x2 , S2 (x0 , x1 , x2 ) = x0 x1 + x2 . Another technique based on linear algebra gives an opportunity to obtain different relations between input an output variables. We are interested in relations of as low degree as possible and usually these are quadratic relations. Let us demonstrate how to obtain bilinear relations for the SBox S. Denote P yi = Si , i = 0, 1, 2. So we are interested in finding relations of the form 0≤i,j≤3 aij xi yj = 0. In order to do this, we treat coefficients aij ’s as variables. Each assignment of values to (x0 , x1 , x2 ) yields a unique assignment of values to (y0 , y1 , y2 ) according to the lookup table. Each assignment of (x0 , x1 , x2 ) and thus of (y0 , y1 , y2 ) provides usPwith a linear equation in aij ’s by plugging in assigned values in the relation 0≤i,j≤3 aij xi yj = 0, which should hold for every assignment. We may use 23 = 8 assignments for the xvariables to get 8 linear equations in 3 · 3 = 9 variables aij ’s. Each nontrivial solution of this homogeneous linear system provides us with a nontrivial bilinear relation between the x and yvariables. Exercise 11.3.5 works out the details of this approach for the example of S. We just mention that, e.g. x0 y2 + x1 y0 + x1 y1 + x2 y1 + x2 y2 = 0 is one such bilinear relation. There exist overall 3 linearly independent bilinear relations. UsingPexactly the same P idea one may find other P relations, e.g. general quadratic: a x y + b x x + 0≤i,j≤3 ij i j 0≤i
11.3.4
Exercises
11.3.1 Prove that in the toy cipher of Section 11.3.1 every ciphertext bit depends on every plaintext bit. 11.3.2 Considering that inputs to the SBoxes of the toy cipher are all uniformly distributed and independent random values, what is the probability that no zeroinversion occurs during the encryption? 11.3.3 [CAS] Using Magma and/or Singular and/or SAGE/PolyBoRi write an equation system representing the toy cipher from Section 11.3.1. When defining a base ring for your Gr¨ obner bases computations think/experiment on the following questions: • which ordering of variables works better?
¨ 380CHAPTER 11. THE THEORY OF GROBNER BASES AND ITS APPLICATIONS • which monomial ordering is better? try e.g. lexicographic, degree reverse lexicographic; • does the result of the computation change when changing the ordering? why? • what happens if you remove the field equations? • try explicit vs. implicit representations for the SBox; 11.3.4 Work out the meetinthemiddle approach of Section 11.3.2. For the substitution use the command subst in Singular. 11.3.5 Find bilinear relations for the SBox S using the linear algebra approach from Section 11.3.3. Compose a matrix for the homogeneous system as described in the text. The rows will be indexed by assignments of (x0 , x1 , x2 ) and columns by indexes (i, j) of the variables ai,j that are coefficients for xi yj . Show that rank of this matrix is 7 and thus you can get 3 linearly independent solutions. Write down 3 linearly independent bilinear relations for S. 11.3.6 An SBox in the block cipher PRESENT is a nonlinear transformation of 4bit vectors. Its lookup table is as follows: x SBox(x)
0 12
1 5
2 6
3 11
4 9
5 0
6 10
7 13
8 3
9 14
10 15
11 8
12 4
13 7
14 1
• Write down equations that relate input bits explicitly to output bits. What is the degree of these equations? • Find all linearly independent bilinear relations and general quadratic relations between inputs and outputs.
11.4
Notes
15 2
Chapter 12
Coding theory with computer algebra packages Stanislav Bulygin In this chapter we give a brief overview of the three computer algebra systems: Singular, Magma, and GAP. We concentrate our attention on things that are useful for the book. On other topics, as well as language semantics and syntax the reader is referred to the corresponding webcites.
12.1
Singular
As is cited at www.singular.unikl.de: “SINGULAR is a Computer Algebra System for polynomial computations with special emphasis on the needs of commutative algebra, algebraic geometry, and singularity theory”. In the context of this book, we use some functionality provided for AGcodes (brnoeth.lib), decoding linear code via polynomial system solving (decodegb.lib), teaching cryptography (crypto.lib, atkins.lib) and Gr¨obner bases (teachstd.lib). Singular can be downloaded free of charge from http://www.singular.unikl. de/download.html for different platforms (Linux, Windows, Mac OS). The current version is 304 *** to change at the end *** The webcite provides an online manual at http://www.singular.unikl.de/Manual/ latest/index.htm. Below we provide the list of commands that can be used to work with objects presented in this book together with short descriptions. Examples of use can be looked at from the links given below. More examples occur throughout the book at the corresponding places. The functionality mentioned above is provided via libraries and not the kernel function to load a library in Singular one has to type (brnoeth.lib as an example): > LIB"brnoeth.lib"; brnoeth.lib: BrillNoether Algorithm, WeierstrassSG and AGcodes by J. I. Farran Martin and C. Lossen (http://www.singular.unikl.de/Manual/ latest/sing_1238.htm#SEC1297) Description: Implementation of the BrillNoether algorithm for solving the 381
382CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES RiemannRoch problem and applications in Algebraic Geometry codes. The computation of Weierstrass semigroups is also implemented. The procedures are intended only for plane (singular) curves defined over a prime field of positive characteristic. For more information about the library see the end of the file brnoeth.lib. Selected procedures:  NSplaces: computes nonsingular places with given degrees  BrillNoether: computes a vector space basis of the linear system L(D)  Weierstrass: computes the Weierstrass semigroup of C at P up to m  AGcode_L: computes the evaluation AG code with divisors G and D  AGcode_Omega: computes the residual AG code with divisors G and D  decodeSV: decoding of a word with the basic decoding algorithm  dual_code: computes the dual code
decodegb.lib: Decoding and minimum distance of linear codes with Gr¨obner bases by S. Bulygin (...) Description: In this library we generate several systems used for decoding cyclic codes and finding their minimum distance. Namely, we work with the Cooper’s philosophy and generalized Newton identities. The original method of quadratic equations is worked out here as well. We also (for comparison) enable to work with the system of FitzgeraldLax. We provide some auxiliary functions for further manipulations and decoding. For an overview of the methods mentioned above, see “Decoding codes with GB” section of the manual. For the vanishing ideal computation the algorithm of Farr and Gao is implemented. Selected procedures:  sysCRHT: generates the CRHTideal as in Cooper’s philosophy  sysNewton: generates the ideal with the generalized Newton identities  syndrome: computes a syndrome w.r.t. the given check matrix  sysQE: generates the system of quadratic equations for decoding  errorRand: inserts random errors in a word  randomCheck: generates a random check matrix  mindist: computes the minimum distance of a code  decode: decoding of a word using the system of quadratic equations  decodeRandom: a procedure for manipulation with random codes  decodeCode: a procedure for manipulation with the given code  vanishId: computes the vanishing ideal for the given set of points
12.2. MAGMA
383
crypto.lib: Procedures for teaching cryptography by G. Pfister (http: //www.singular.unikl.de/Manual/late Description: The library contains procedures to compute the discrete logarithm, primalitytests, factorization included elliptic curve methodes. The library is intended to be used for teaching purposes but not for serious computations. Sufficiently high printlevel allows to control each step, thus illustrating the algorithms at work. atkins.lib: Procedures for teaching Elliptic Curve cryptography (primality test) by S. Steidel (http://www.singular.unikl.de/Manual/latest/sin g_1281.htm#SEC1340) Description: The library contains auxiliary procedures to compute the elliptic curve primality test of Atkin and the Atkin’s Test itself. The library is intended to be used for teaching purposes but not for serious computations. Sufficiently high printlevel allows to control each step, thus illustrating the algorithms at work. teachstd.lib: Procedures for teaching standard bases by G.M. Greuel (http://www.singular.unikl.de/Manual/latest/sing_1344.htm#SEC1403) Description: The library is intended to be used for teaching purposes, but not for serious computations. Sufficiently high printlevel allows to control each step, thus illustrating the algorithms at work. The procedures are implemented exactly as described in the book ’A SINGULAR Introduction to Commutative Algebra’ by G.M. Greuel and G. Pfister (Springer 2002) []. Selected procedures:  tail: tail of f  leadmonomial: leading monomial as poly (also for vectors)  monomialLcm: lcm of monomials m and n as poly (also for vectors)  spoly: spolynomial of f [symmetric form]  NFMora normal form of i w.r.t Mora algorithm  prodcrit: test for product criterion  chaincrit: test for chain criterion  standard: standard basis of ideal/module
12.2
Magma
“Magma is a large, wellsupported software package designed to solve computationally hard problems in algebra, number theory, geometry and combinatorics” – this is a formulation given at the official website http://magma.maths.usyd. edu.au/magma/. The current version is 2.157 *** to change at the end ***. In this book we use illustrations with Magma for different coding constructions: general, as well as more specific, such as AGcodes, also some machinery for working with algebraic curves, as well as a few procedures for cryptography. Although Magma is a noncommercial system, it is still not free of charge: one has to purchase a license to work with it. Details can be found at http://magma.maths.usyd.edu.au/magma/Ordering/ordering.shtml. Still one can try to run simple Magmacode in the socalled “MagmaCalculator”
384CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES (http://magma.maths.usyd.edu.au/calc/). *** All examples and exercises run successfully in this calculator. *** Online help system for Magma can be found at http://magma.maths.usyd.edu.au/magma/htmlhelp/MAGMA.htm. Next we describe briefly some procedures that come in hand while dealing with objects from this book. We list only a few commands to give a flavor of functionality. One can get a lot more from the manual.
12.2.1
Linear codes
Full list of commands with descriptions can be found at http://magma.maths.usyd. edu.au/magma/htmlhelp/text1667.htm  LinearCode: constructs a linear codes as a vector subspace  PermutationCode: permutes positions in a code  RepetitionCode: constructs a repetition code  RandomLinearCode: constructs random linear code  CyclicCode constructs a cyclic code  ReedMullerCode: constructs a ReedMuller code  HammingCode: constructs a Hamming code  BCHCode: constructs a BCH code  ReedSolomonCode: constructs a ReedSolomon code  GeneratorMatrix: yields the generator matrix  ParityCheckMatrix: yields the parity check matrix  Dual: constructs the dual code  GeneratorPolynomial: yields the generator polynomial of the given cyclic code  CheckPolynomial: yields the check polynomial of the given cyclic code  Random: yields a random codeword  Syndrome: yields a syndrome of a word  Distance: yields distance between words  MinimumDistance: computes minimum distance of a code  WeightEnumerator: computes the weight enumerator of a code  ProductCode: constructs a product code from the given two  SubfieldSubcode: constructs a subfield subcode  McEliecesAttack: Runs basic attack on the McEliece cryptosystem  GriesmerBound: provides the Griesmer bound for the given parameters
12.2. MAGMA
385
 SpherePackingBound: provides the sphere packing bound for the given parameters  BCHBound: provides the BCH bound for the given cyclic code  Decode: decode a code with standard methods  MattsonSolomonTransform: computes the MattsonSolomon transform  AutomorphismGroup: computes the automorphism group of the given code
12.2.2
AGcodes
Full list of commands with descriptions can be found at http://magma.maths.usyd. edu.au/magma/htmlhelp/text1686.htm  AGCode: constructs an AGcode  AGDualCode: constructs a dual AGcode  HermitianCode: constructs a Hermitian code  GoppaDesignedDistance: returns designed Goppa distance  AGDecode: basic algorithm for decoding an AGcode
12.2.3
Algebraic curves
Full list of commands with descriptions can be found at http://magma.maths.usyd. edu.au/magma/htmlhelp/text1686.htm  Curve: constructs a curve  CoordinateRing: computes the coordinate ring of the given curve with Gr¨ obner basis techniques  JacobianMatrix: computes the Jacobian matrix  IsSingular: test if the given curve has singularities  Genus: computes genus of a curve  EllipticCurve: constructs an elliptic curve  AutomorphismGroup: computes the automorphism of the given curve  FunctionField: computes the function field of the given curve  Valuation: computes a valuation of the given function w.r.t the given place  GapNumbers:: yields gap numbers  Places: computes places of the given curve  RiemannRochSpace: computes the RiemannRoch space
386CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES  Basis: computes a sequence containing a basis of the RiemannRoch space L(D) of the divisor D.  CryptographicCurve: given the finite field computes an elliptic curve E over a finite field together with a point P on E such that the order of P is a large prime and the pair (E, P ) satisfies the standard security conditions for being resistant to MOV and Anomalous attacks
12.3
GAP
In this subsection we consider GAP computational discrete algebra system that is “a system for computational discrete algebra, with particular emphasis on Computational Group Theory”. GAP stands for Groups, Algorithms, Programming, http://www.gapsystem.org. Although primary concern of GAP is computations with groups, it also provides codingoriented functionality via the GUAVA package, http://www.gapsystem.org/Packages/guava.html. GAP can be downloaded for free from http://www.gapsystem.org/Download/index. html. The current GAP version is 4.4.12, the current GUAVA version is 3.9. *** to change at the end *** As before, we only list here some procedures to provide an understanding of which things can be done with GUAVA/GAP. Package GUAVA is included as follows: > LoadPackage("guava"); Online manual for guava can be found at http://www.gapsystem.org/Manuals/ pkg/guava3.9/htm/chap0.html Selected procedures:  RandomLinearCode: constructs a random linear code  GeneratorMatCode: constructs a linear code via its generator matrix  CheckMatCode: constructs a linear code via its parity check matrix  HammingCode: constructs a Hamming code  ReedMullerCode: constructs a ReedMuller code  GeneratorPolCode: constructs a cyclic code via its generator polynomial  CheckPolCode: constructs a cyclic code via its check polynomial  RootsCode: constructs a cyclic code via roots of the generator polynomial  BCHCode: constructs a BCH code  ReedSolomomCode: constructs a ReedSolomon code  CyclicCodes: returns all cyclic codes of given length  EvaluationCode: construct an evaluation code  AffineCurve: sets a framework for working with an affine curve  GoppaCodeClassical: construct a classical geometric Goppa code  OnePointAGCode: construct a onepoint AGcode
12.4. SAGE
387
 PuncturedCode: construct a punctured code for the given  DualCode: constructs the dual code to the given  UUVCode: constructs a code via the (uu + v)construction  LowerBoundMinimumDistance: yields the best lower bound on the minimum distance available  UpperBoundMinimumDistance: yields the best upper bound on the minimum distance available  MinimumDistance: yields the minimum distance of the given code  WeightDistribution: yield the weight distribution of the given code  Decode: general decoding procedure
12.4
Sage
Sage framework provides an opportunity to use strengths of many opensource computer algebra systems (among them are Singular and GAP) for developing effective code for solving different mathematical problems. The general framework is made possible through the pythoninterface. Sage is thought as an opensource alternative to commercial systems, such as Magma, Maple, Mathematica, and Matlab. Sage provides tools for a wide variety of algebraic and combinatorial objects among other things. For example functionality for coding theory and cryptography is present, as well as functionality for working with algebraic curves. The webpage of the project is http://www.sagemath.org/. One can download Sage from http://www.sagemath.org/download.html. The reference manual for Sage is available at http://www.sagemath.org/doc/reference/. Now we briefly describe some commands that may come in hand while working with this book.
12.4.1
Coding Theory
Manual available at http://www.sagemath.org/doc/reference/coding.html Codingfunctionality of Sage has a lot in common with the one of GAP/GUAVA. In fact, for many commands Sage uses implementations available from GAP. Selected procedures:  LinearCodeFromCheckMatrix: constructs a linear code via its parity check matrix  RandomLinearCode: constructs a random linear code  CyclicCodeFromGeneratingPolynomial: constructs a cyclic code via its generator polynomial  QuadraticResidueCode: constricts a quadratic residue cyclic code  ReedSolomonCode: constructs a ReedSolomon code
388CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES  gilbert_lower_bound: computes the lower bound due to Gilbert  permutation_automorphism_group: computes the permutation automorphism group of the given code  weight_distribution: computes the weight distribution of a code
12.4.2
Cryptography
Manual available at http://www.sagemath.org/doc/reference/cryptography.html Selected procedures/classes:  SubstitutionCryptosystem: defines a substitution cryptosystem/cipher  VigenereCryptosystem: defines the Vigenere cryptosystem/cipher  lfsr_sequence: produces an output of the given LFSR  SR: returns a small scale variant for the AES
12.4.3
Algebraic curves
Manual available at http://www.sagemath.org/doc/reference/plane_curves.html Selected procedures/classes:  EllipticCurve_finite_field: constructs an elliptic curves over a finite field  trace_of_frobenius: computes the trace of Frobenius of an elliptic curve  cardinality: computes the number of rational points of an elliptic curve  HyperellipticCurve_finite_field: constructs a hyperelliptic curves over a finite field
12.5
Coding with computer algebra
12.5.1
Introduction
..............
12.5.2
Errorcorrecting codes
Example 12.5.1 Let us construct Example 2.1.6 for n = 5 using GAP/GUAVA. First, we need to define the list of codewords > M := Z(2)^0 * [ > [1,1,0,0,0],[1,0,1,0,0],[1,0,0,1,0],[1,0,0,0,1],[0,1,1,0,0], > [0,1,0,1,0],[0,1,0,0,1],[0,0,1,1,0],[0,0,1,0,1],[0,0,0,1,1] > ]; In GAP Z(q) is a primitive element of the field GF(q). So multiplying the list M by Z(2)^0 we make sure that the elements belong to GF(2). Now construct the code: > C:=ElementsCode(M,"Example 2.1.6 for n=5",GF(2));
12.5. CODING WITH COMPUTER ALGEBRA
389
a (5,10,1..5)1..5 Example 2.1.6 for n=5 over GF(2) We can compute the minimum distance and the size of C as follows > MinimumDistance(C); 2 > Size(C); 10 Now the information on the code is updated > Print(C); a (5,10,2)1..5 Example 2.1.6 for n=5 over GF(2) The block 1..5 gives a range for the covering radius of C. We treat it later in 3.2.24. Example 12.5.2 Let us construct the Hamming [7, 4, 3] code in GAP/GUAVA and Magma. Both systems have a builtin command for this. In GAP >C:=HammingCode(3,GF(2)); a linear [7,4,3]1 Hamming (3,2) code over GF(2) Here the syntax is HammingCode(r,GF(q)) where r is the redundancy and GF(q) is the defining alphabet. We can extract a generator matrix as follows > M:=GeneratorMat(C);; > Display(M); 1 1 1 . . . . 1 . . 1 1 . . . 1 . 1 . 1 . 1 1 . 1 . . 1 Two semicolons indicate that we do not want an output of a command be printed on the screen. Display provides a nice way to represent objects. In Magma we do it like this > C:=HammingCode(GF(2),3); > C; [7, 4, 3] "Hamming code (r = 3)" Linear Code over GF(2) Generator matrix: [1 0 0 0 1 1 0] [0 1 0 0 0 1 1] [0 0 1 0 1 1 1] [0 0 0 1 1 0 1] So here the syntax is reverse. Example 12.5.3 Let us construct [7, 4, 3] binary Hamming code via its parity check matrix. In GAP/GUAVA we proceed as follows > H1:=Z(2)^0*[[1,0,0],[0,1,0],[0,0,1],[1,1,0],[1,0,1],[0,1,1],[1,1,1]];; > H:=TransposedMat(H1);; > C:=CheckMatCode(H,GF(2)); a linear [7,4,1..3]1 code defined by check matrix over GF(2) We can now check the property of the check matrix: > G:=GeneratorMat(C);; > Display(G*H1); . . . . . . . . . . . .
390CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES We can also compute syndromes: > c:=CodewordNr(C,7); [ 1 1 0 0 1 1 0 ] > Syndrome(C,c); [ 0 0 0 ] > e:=Codeword("1000000");; > Syndrome(C,c+e); [ 1 0 0 ] So we have taken the 7th codeword in the list of codewords of C and showed that its syndrome is 0. Then we introduced an error at the first position: the syndrome is nonzero. In Magma one can generate codes only by vector subspace generators. So the way to generate a code via its parity check matrix is to use Dual command, see Example 12.5.4. So we construct the Hamming code as in Example 12.5.2 and then proceed as above. > C:=HammingCode(GF(2),3); > H:=ParityCheckMatrix(C); > H; [1 0 0 1 0 1 1] [0 1 0 1 1 1 0] [0 0 1 0 1 1 1] > G:=GeneratorMatrix(C); > G*Transpose(H); [0 0 0] [0 0 0] [0 0 0] [0 0 0] Syndromes are handled as follows: > c:=Random(C); > Syndrome(c,C); (0 0 0) > V:=AmbientSpace(C); > e:=V![1,0,0,0,0,0,0]; > r:=c+e; > Syndrome(r,C); (1 0 0) Here we have taken a random codeword of C and computed its syndrome. Now, V is the space where C is defined, so the error vector e sits there, which is indicated by the prefix V!. Example 12.5.4 Let us start again with the binary Hamming code and see how dual codes are constructed in GAP and Magma. In GAP we have > C:=HammingCode(3,GF(2));; > CS:=DualCode(C); a linear [7,3,4]2..3 dual code > G:=GeneratorMat(C);; > H:=GeneratorMat(CS);; > Display(G*TransposedMat(H)); . . . . . .
12.5. CODING WITH COMPUTER ALGEBRA
391
. . . . . . The same can be done in Magma. Moreover, we can make sure that the dual of the Hamming code is the predefined simplex code: > C:=HammingCode(GF(2),3); > CS:=Dual(C); > G:=GeneratorMatrix(CS); > S:=SimplexCode(3); > H:=ParityCheckMatrix(S); > G*Transpose(H); [0 0 0 0] [0 0 0 0] [0 0 0 0]
Example 12.5.5 Let us work out some examples in GAP and Magma that illustrate the notions of permutation equivalency and permutation automorphism group. As a model example we take as usual the binary Hamming code. Next we show how equivalency can be checked in GAP/GUAVA: > C:=HammingCode(3,GF(2));; > p:=(1,2,3)(4,5,6,7);; > CP:=PermutedCode(C,p); a linear [7,4,3]1 permuted code > IsEquivalent(C,CP); true So codes C and CP are equivalent. We may compute the permutation that brings C to CP: > CodeIsomorphism( C, CP ); (4,5) Interestingly, we obtain that CP can be obtained from C just by (4,5). Let as check if this is indeed true: > CP2:=PermutedCode(C,(4,5));; > Display(GeneratorMat(CP)*TransposedMat(CheckMat(CP2))); . . . . . . . . . . . . So indeed the codes CP and CP2 are the same. The permutation automorphism group can be computed via: > AG:=AutomorphismGroup(C); Group([ (1,2)(5,6), (2,4)(3,5), (2,3)(4,6,5,7), (4,5)(6,7), (4,6)(5,7) ]) > Size(AG) 168 So the permutation automorphism group of C has 5 generators and 168 elements. In Magma there is no immediate way to define permuted codes. We still can compute a permutation automorphism group, which is called a permutation group there: > C:=HammingCode(GF(2),3); > PermutationGroup(C); Permutation group acting on a set of cardinality 7
392CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES Order = (3, (1, (2, (3,
12.5.3
168 = 6)(5, 3)(4, 3)(4, 7)(5,
2^3 * 3 * 7 7) 5) 7) 6)
Code constructions and bounds
Example 12.5.6 In this example we go through the above constructions in GAP and Magma. As a model code we consider the [15, 11, 3] binary Hamming code. > C:=HammingCode(4,GF(2));; > CP:=PuncturedCode(C); a linear [14,11,2]1 punctured code > CP5:=PuncturedCode(C,[11,12,13,14,15]); a linear [10,10,1]0 punctured code So PuncturedCode(C) punctures C at the last position and there is also a posibility to give the positions explicitly. The same syntax is for the shortening construction. > CS:=ShortenedCode(C); a linear [14,10,3]2 shortened code > CS5:=ShortenedCode(C,[11,12,13,14,15]); a linear [10,6,3]2..3 shortened code Next we extend a code and check the property described in Proposition 3.1.11. > CE:=ExtendedCode(C); a linear [16,11,4]2 extended code > CEP:=PuncturedCode(CE);; > C=CEP; true A code C can be extended i times via ExtendedCode(C,i). Next take the shortened code augment and lengthen it. > CSA:=AugmentedCode(CS);; > d:=MinimumDistance(CSA);; > CSA; a linear [14,11,2]1 code, augmented with 1 word(s) > CSL:=LengthenedCode(CS); a linear [15,11,2]1..3 code, lengthened with 1 column(s) By default the augmentation is done by the allone vector. One can specify the vector v to augment with explicitly by AugmentedCode(C,v). One can also do extension in the lengthening construction i times by LengthenedCode(C,i). Now we do the same operations in Magma. > C:=HammingCode(GF(2),4); > CP:=PunctureCode(C, 15); > CP5:=PunctureCode(C, {11..15}); > CS:=ShortenCode(C, 15); > CS5:=ShortenCode(C, {11..15}); > CE:=ExtendCode(C); > CEP:=PunctureCode(CE,16); > C eq CEP; true
12.5. CODING WITH COMPUTER ALGEBRA
393
> CSA:=AugmentCode(CS); > CSL:=LengthenCode(CS); One can also expurgate a code as follows. > CExp:=ExpurgateCode(C); > CExp; [15, 10, 4] Cyclic Linear Code over GF(2) Generator matrix: [1 0 0 0 0 0 0 0 0 0 1 0 1 0 1] [0 1 0 0 0 0 0 0 0 0 1 1 1 1 1] [0 0 1 0 0 0 0 0 0 0 1 1 0 1 0] [0 0 0 1 0 0 0 0 0 0 0 1 1 0 1] [0 0 0 0 1 0 0 0 0 0 1 0 0 1 1] [0 0 0 0 0 1 0 0 0 0 1 1 1 0 0] [0 0 0 0 0 0 1 0 0 0 0 1 1 1 0] [0 0 0 0 0 0 0 1 0 0 0 0 1 1 1] [0 0 0 0 0 0 0 0 1 0 1 0 1 1 0] [0 0 0 0 0 0 0 0 0 1 0 1 0 1 1] We see that in fact the code CExp has more structure: it is cyclic, i.e. a cyclic shift of every codeword is again a codeword, cf. Chapter 7. One can also expurgate codewords from the given list L by ExpurgateCode(C,L). In GAP this is done via ExpurgatedCode(C,L). Example 12.5.7 Let us demonstrate how direct product is constructed in GAP and Magma. We construct a direct product of the binary [15, 11, 3] Hamming code with itself. In GAP we do > C:=HammingCode(4,GF(2));; > CProd:=DirectProductCode(C,C); a linear [225,121,9]15..97 direct product code In Magma: > C:=HammingCode(GF(2),4); > CProd:=DirectProduct(C,C); Example 12.5.8 Now we go through some of the above constructions using GAP and Magma. As model codes for summands we take binary [7, 4, 3] and [15, 11, 3] Hamming codes. In GAP the direct sum and the (uu + v)construction are implemented. > C1:=HammingCode(3,GF(2));; > C2:=HammingCode(4,GF(2));; > C:=DirectSumCode(C1,C2); a linear [22,15,3]2 direct sum code > CUV:=UUVCode(C1,C2); a linear [22,15,3]2..3 UU+V construction code In Magma along with the above commands, a command for the juxtaposition is defined. The syntax of the commands is as follows: > C1:=HammingCode(GF(2),3); > C2:=HammingCode(GF(2),4); > C:=DirectSum(C1,C2); > CJ:=Juxtaposition(C2,C2); // [30, 11, 6] Cyclic Linear Code over GF(2) > CPl:=PlotkinSum(C1,C2);
394CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES Example 12.5.9 Let us construct a concatenated code in GAP and Magma. We concatenate a Hamming [17, 15, 3] code over F16 and the binary [7, 4, 3] Hamming code. In GAP we do the following > O:=[HammingCode(2,GF(16))];; > I:=[HammingCode(3,GF(2))];; > C:=BZCodeNC(O,I); a linear [119,60,9]0..119 Blokh Zyablov concatenated code In GAP there is a possibility to perform a generalized construction using many outer and inner codes, therefore the syntax is with square brackets to define lists. In Magma we proceed as below > O:=HammingCode(GF(16),2); > I:=HammingCode(GF(2),3); > C:=ConcatenatedCode(O,I); Example 12.5.10 Magma provides a way to construct an MDS code with parameters [q + 1, k, q − k + 2] over Fq given the prime power q and positive integer k. Example follows > C:=MDSCode(GF(16),10); //[17, 10, 8] Cyclic Linear Code over GF(2^4) Example 12.5.11 GAP and Magma provide commands that give an opportunity to compute some lower and upper bounds in size and minimum distance of codes, as well as stored tables for best known codes. Let us take a look how this functionality is handled in GAP first. The command UpperBoundSingleton(n,d,q) gives an upper bound on size of codes of length n, minimum distance d defined over Fq . This applies also to nonlinear codes. E.g.: > UpperBoundSingleton(25,10,2); 65536 In the same way one can compute the Hamming, Plotkin, and Griesmer bounds: > UpperBoundHamming(25,10,2); 2196 > UpperBoundPlotkin(25,10,2); 1280 > UpperBoundGriesmer(25,10,2); 512 Note that GAP does not require qd > (q − 1)n as in Theorem 3.2.29. If qd > (q − 1)n is not the case, shortening is applied. One can compute an upper bound which is a result of several bounds implemented in GAP > UpperBound(25,10,2); 1280 Since Griesmer bound is not in the list with which UpperBound works, we obtain larger value. Analogously one can compute lower bounds > LowerBoundGilbertVarshamov(25,10,2); 16 Here 16 = 24 is the size of the binary code of length 25 with the minimum distance at least 10. One can access builtin tables (although somewhat outdated) as follows: Display(BoundsMinimumDistance(50,25,GF(2))); rec( n := 50,
12.5. CODING WITH COMPUTER ALGEBRA
395
k := 25, q := 2, references := rec( EB3 := [ "%A Y. Edel & J. Bierbrauer", "%T Inverting Construction \\ Y1", "%R preprint", "%D 1997" ], Ja := [ "%A D.B. Jaffe", "%T Binary linear codes: new results on \\ nonexistence", "%D 1996", "%O \\ http://www.math.unl.edu/~djaffe/codes/code.ps.gz" ] ), construction := false, lowerBound := 10, lowerBoundExplanation := [ "Lb(50,25)=10, by taking subcode of:", \\ "Lb(50,27)=10, by extending:", "Lb(49,27)=9, reference: EB3" ], upperBound := 12, upperBoundExplanation := [ "Ub(50,25)=12, by a onestep Griesmer \\ bound from:", "Ub(37,24)=6, by considering shortening to:", "Ub(28,15)=6, otherwise extending would contradict:", "Ub(29,15)=7, \\ reference: Ja" ] ) In Magma one can compute the bounds in the following way > GriesmerBound(GF(2),25,10): > PlotkinBound(GF(2),25,10); >> PlotkinBound(GF(2),25,10); ^ Runtime error in ’PlotkinBound’: Require n <= 2*d for even weight binary case > PlotkinBound(GF(2),100,51); 34 > SingletonBound(GF(2),25,10): > SpherePackingBound(GF(2),25,10): > GilbertVarshamovBound(GF(2),25,10); 9 > GilbertVarshamovLinearBound(GF(2),25,10); 16 Note that the result on the Plotkin bound is different from the one computed by GAP, since Magma implements an improved bound treated in Remark 3.2.32. The colon at the end of line suppresses the output. Access to builtin database for given n and d is done as follows: > BDLCLowerBound(GF(2),50,10); 27 > BDLCUpperBound(GF(2),50,10); 29 The corresponding commands for given n, k and k, d start with prefixes BKLC and BLLC respectively.
12.5.4
Weight enumerator
Example 12.5.12 This example illustrates some functionality available for weight distribution computations in GAP and Magma. In GAP one can compute the weight enumerator of a code as well as the weight enumerator for its dual via the MacWilliams identity. > C:=HammingCode(4,GF(2));; > CodeWeightEnumerator(C);
396CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES x_1^15+35*x_1^12+105*x_1^11+168*x_1^10+280*x_1^9+435*x_1^8+435*x_1^7+\\ 280*x_1^6+168*x_1^5+105*x_1^4+35*x_1^3+1 CodeMacWilliamsTransform(C); 15*x_1^8+1 One interesting feature available in GAP is drawing weight histograms. It works as follows: > WeightDistribution(C); [ 1, 0, 0, 35, 105, 168, 280, 435, 435, 280, 168, 105, 35, 0, 0, 1 ] > WeightHistogram(C); 435* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ++++++++++++0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 In Magma the analogous functionality looks as follows: > C:=HammingCode(GF(2),4); > WeightEnumerator(C); $.1^15 + 35*$.1^12*$.2^3 + 105*$.1^11*$.2^4 + 168*$.1^10*$.2^5 + 280*$.1^9*$.2^6 \\ + 435*$.1^8*$.2^7 + 435*$.1^7*$.2^8 + 280*$.1^6*$.2^9 + 168*$.1^5*$.2^10 + \\ 105*$.1^4*$.2^11 + 35*$.1^3*$.2^12 + $.2^15 > W:=WeightDistribution(C); > MacWilliamsTransform(15,11,2,W); [ <0, 1>, <8, 15> ] So WeightEnumerator(C) actually returns the homogeneus weight enumerator with $.1 and $.2 as variables.
12.5. CODING WITH COMPUTER ALGEBRA
12.5.5
Codes and related structures
12.5.6
Complexity and decoding
397
Example 12.5.13 In GAP/GUAVA and Magma for general linear code the idea of Definition 2.4.10 (1) is employed. In GAP such a decoding goes as follows > C:=RandomLinearCode(15,5,GF(2));; > MinimumDistance(C); 5 > # can correct 2 errors > c:="11101"*C; # encoding > c in C; true > r:=c+Codeword("01000000100000"); > c1:=Decodeword(C,r);; > c1 = c; true > m:=Decode(C,r); # obtain initial message word [ 1 1 1 0 1 ] One can also obtain the syndrome table that is a table of pairs coset leader / syndrome by SyndromeTable(C). The same idea is realized in Magma as follows. > C:=RandomLinearCode(GF(2),15,5); // can be [15,5,5] code > # can correct 2 errors > c:=Random(C); > e:=AmbientSpace(C) ! [0,1,0,0,0,0,0,0,1,0,0,0,0,0,0]; > r:=c+e; > result,c1:=Decode(C,r); > result; // does decoding succeed? true > c1 eq c; true There are more advanced decoding methods for general linear codes. More on that in Section 10.6.
12.5.7
Cyclic codes
Example 12.5.14 We have already constructed finite fields and worked with them in GAP and Magma. Let us take a look one more time at those notions and show some new. In GAP we handle finite fields as follows. > G:=GF(2^5);; > a:=PrimitiveRoot(G); Z(2^5) > DefiningPolynomial(G); x_1^5+x_1^2+Z(2)^0 a^5+a^2+Z(2)^0; # check 0*Z(2) Pretty much the same funtionality is provided in Magma > G:=GF(2^5);
398CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES > a:=PrimitiveElement(G); > DefiningPolynomial(G); $.1^5+$.1^2+1 > b:=G.1; > a eq b; true > // define explicitly > P:=PolynomialRing(GF(2)); > p:=x^5+x^2+1; > F:= ext; > F; Finite field of size 2^5 Example 12.5.15 Minimal polynomials are computed in GAP as follows: > a:=PrimitiveUnityRoot(2,17);; > MinimalPolynomial(GF(2),a); x_1^8+x_1^7+x_1^6+x_1^4+x_1^2+x_1+Z(2)^0 In Magma it is done analogous > a:=RootOfUnity(17,GF(2)); > MinimalPolynomial(a,GF(2)); x^8 + x^7 + x^6 + x^4 + x^2 + x + 1 Example 12.5.16 Some example on how to compute the cyclotomic polynomial in GAP and Magma follow. In GAP > CyclotomicPolynomial(GF(2),10); x_1^4+x_1^3+x_1^2+x_1+Z(2)^0 In Magma it is done as follows > CyclotomicPolynomial(10); $.1^4  $.1^3 + $.1^2  $.1 + 1 Note that in Magma the cyclotomic polynomial is always defined over Q. Example 12.5.17 Let us construct cyclic codes via roots in GAP and Magma. In GAP/GUAVA we proceed as follows. > C:=GeneratorPolCode(h,17,GF(2));; # h is from Example 6.1.41 > CR:=RootsCode(17,[1],2);; > MinimumDistance(CR);; > CR; a cyclic [17,9,5]3..4 code defined by roots over GF(2) > C=CR; true > C2:=GeneratorPolCode(g,17,GF(2));; # g is from Example 6.1.41 > CR2:=RootsCode(17,[3],2);; > C2=CR2; true So we generated first a cyclic code which generator polynomial has a (predefined) primitive root of unity as a root. Then we took the first element, which is not in the cyclotomic class of 1 and that is 3. We constructed a cyclic code with a primitive root of unity cubed as a root of the generator polynomial. Note that these results are in accordance with Example 12.5.15. We can also compute the number of all cyclic codes of the given length, as e.g. > NrCyclicCodes(17,GF(2));
12.5. CODING WITH COMPUTER ALGEBRA
399
8 In Magma we do the construction as follows > a:=RootOfUnity(17,GF(2)); > C:=CyclicCode(17,[a],GF(2)); Example 12.5.18 We can compute the MattsonSolomon transform in Magma. This is done as follows: > F := PolynomialRing(SplittingField(x^171)); > f:=x^15+x^3+x; > A:=MattsonSolomonTransform(f,17); > A; $.1^216*x^16 + $.1^177*x^15 + $.1^214*x^14 + $.1^99*x^13 + \\ $.1^181*x^12 + $.1^173*x^11 + $.1^182*x^10 + $.1^198*x^9 + \\ $.1^108*x^8 + $.1^107*x^7 + $.1^218*x^6 + $.1^91*x^5 + $.1^54*x^4\\ + $.1^109*x^3 + $.1^27*x^2 + $.1^141*x + 1 > InverseMattsonSolomonTransform(A,17) eq f; true So for the construction we need a field that contains a primitive nth root of unity. We can also compute the inverse transform.
12.5.8
Polynomial codes
Example 12.5.19 Now we describe constructions of ReedSolomon codes in GAP/ GUAVA and Magma. In GAP we proceed as follows: > C:=ReedSolomonCode(31,5); a cyclic [31,27,5]3..4 ReedSolomon code over GF(32) A construction of the extended code is somewhat different from the one defined in Definition 8.1.6. GUAVA first constructs by ExtendedReedSolomonCode(n,d) first ReedSolomonCode(n1,d1) and then extends it. The code is defined over GF(n), so n should be a prime power. > CE:=ExtendedReedSolomonCode(31,5); a linear [31,27,5]3..4 extended Reed Solomon code over GF(31) The generalized ReedSolomon codes are handled as follows. > R:=PolynomialRing(GF(2^5));; > a:=Z(2^5);; > L:=List([1,2,3,6,7,10,12,16,20,24,25,29],i>Z(2^5)^i);; > CG:=GeneralizedReedSolomonCode(L,4,R);; So we define the polynomial ring R and the list of points L. Note that such a construction corresponds to the construction from Definition 8.1.10 with b = 1. In Magma we proceed as follows. > C:=ReedSolomonCode(31,5); > a:=PrimitiveElement(GF(2^5)); > A:=[a^i:i in [1,2,3,6,7,10,12,16,20,24,25,29]]; > B:=[a^i:i in [1,2,1,2,1,2,1,2,1,2,1,2]]; > CG:=GRSCode(A,B,4); So Magma give an opportunity to construct the generalized ReedSolomon codes with arbitrary b which entries are nonzero. Example 12.5.20 In Magma one can compute subfield subcodes. This is done as follows:
400CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES > a:=RootOfUnity(17,GF(2)); > C:=CyclicCode(17,[a],GF(2^8)); // splitting field size 2^8 > CSS:=SubfieldSubcode(C); > C2:=CyclicCode(17,[a],GF(2)); > C2 eq CSS; true > CSS_4:=SubfieldSubcode(C,GF(4)); // [17, 13, 4] code over GF(2^2) By default the prime subfield is taken for the construction. Example 12.5.21 *** GUAVA slow!!! *** In Magma we can compute a trace code as is shown below: > C:=HammingCode(GF(16),3); > CT:=Trace(C); > CT:Minimal; [273, 272] Linear Code over GF(2) We can also specify a subfield to restrict to by giving it as a second parameter in Trace. Example 12.5.22 In GAP/GUAVA in order to construct an alternant code we proceed as follows > a:=Z(2^5);; > P:=List([1,2,3,6,7,10,12,16,20,24,25,29],i>a^i);; > B:=List([1,2,1,2,1,2,1,2,1,2,1,2],i>a^i);; > CA:=AlternantCode(2,B,P,GF(2)); a linear [12,5,3..4]3..6 alternant code over GF(2) By providing an extension field as the last parameter in AlternantCode, one constructs an extension code (as per Definition 8.2.1) of the one defined by the base field (in our example it is GF(2)), rather than the restrictionconstruction as in Definition 8.3.1. In Magma one proceeds as follows. > a:=PrimitiveElement(GF(2^5)); > A:=[a^i:i in [1,2,3,6,7,10,12,16,20,24,25,29]]; > B:=[a^i:i in [1,2,1,2,1,2,1,2,1,2,1,2]]; > CA:=AlternantCode(A,B,2); > CG:=GRSCode(A,B,2); > CGS:=SubfieldSubcode(Dual(CG)); > CA eq CGS; true Here one can add a desired subfield for the restriction as in Definition 8.3.1 via giving it as another parameter at the end of the parameter list for AlternantCode. Example 12.5.23 In GAP/GUAVA one can construct a Goppa code as follows. > x:=Indeterminate(GF(2),"x");; > g:=x^3+x+1; > C:=GoppaCode(g,15); a linear [15,3,7]6..8 classical Goppa code over GF(2) So the Goppa code C is constructed over the field, where the polynomial g is defined. There is also a possibility to provide the list of nonroots L explicitly via GoppaCode(g,L). In Magma one needs to provide the list L explicitly.
12.5. CODING WITH COMPUTER ALGEBRA
401
> P:=PolynomialRing(GF(2^5)); > G:=x^3+x+1; > a:=PrimitiveElement(GF(2^5)); > L:=[a^i : i in [0..30]]; > C:=GoppaCode(L,G); > C:Minimal; [31, 16, 7] "Goppa code (r = 3)" Linear Code over GF(2) The polynomial G should be defined in the polynomial ring over the extension. The command C:Minimal only displays the description for C, no generator matrix is displayed. Example 12.5.24 Now we show how binary ReedMuller code can be constructed in GAP/GUAVA and also we check the property from the previous proposition. > u:=5;; > m:=7;; > C:=ReedMullerCode(u,m);; > C2:=ReedMullerCode(mu1,m);; > CD:=DualCode(C); > CD = C2; true In Magma one can do the above analogously: > u:=5; > m:=7; > C:=ReedMullerCode(u,m); > C2:=ReedMullerCode(mu1,m); > CD:=Dual(C); > CD eq C2; true
12.5.9
Algebraic decoding
402CHAPTER 12. CODING THEORY WITH COMPUTER ALGEBRA PACKAGES
Chapter 13
B´ ezout’s theorem and codes on plane curves Ruud Pellikaan In this section affine and projective plane curves are defined. B´ezout’s theorem on the number of points in the intersection of two plane curves is proved. A class of codes from plane curves is introduced and the parameters of these codes are determined. Divisors and rational functions on plane curve will be discussed.
13.1
Affine and projective space
lines planes quadrics coordinate transformations pictures
13.2
Plane curves
¯ its algebraic closure. By an affine plane curve over F we Let F be a field and F ¯ 2 such that F (x, y) = 0, where F ∈ F[X, Y ]. mean the set of points (x, y) ∈ F Here F = 0 is called the defining equation of the curve. The Frational points of the curve with defining equation F = 0 are the points (x, y) ∈ F2 such that F (x, y) = 0. The degree of the curve is the degree of F . Two plane curves with defining equations F = 0 and G = 0 have a component in common with defining equation H = 0, if F and G have a nontrivial factor H in common, that is F = BH and G = AH for some A, B ∈ F[X, Y ], and the degree of H is not zero. A curve with defining equation F = 0, F ∈ F[X, Y ], is called irreducible if F is not divisible by any G ∈ F[X, Y ] such that 0 < deg(G) < deg(F ), and absolutely ¯ irreducible if F is irreducible when considered as a polynomial in P F[X, Y ]. The partial derivative with respect to X of a polynomial F = fij X i Y j is defined by X FX = ifij X i−1 Y j . 403
´ 404CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES The partial derivative with respect to Y is defined similarly. A point (x, y) on an affine curve with equation F = 0 is singular if FX (x, y) = FY (x, y) = 0, where FX and FY are the partial derivatives of F with respect to X and Y , respectively. A regular point of a curve is a nonsingular point of the curve. A regular point (x, y) on the curve has a welldefined tangent line to the curve with equation FX (x, y)(X − x) + FY (x, y)(Y − y) = 0. Example 13.2.1 The curve with defining equation X 2 + Y 2 = 0 can be considered over any field. The polynomial X 2 + Y 2 is irreducible in F3 [X, Y ] but reducible in F9 [X, Y ] and F5 [X, Y ]. The point (0, 0) is an Frational point of the curve over any field F, and it is the only singular point of this curve if the characteristic of F is not two. A projective plane curve of degree d with defining equation F = 0 over F is the ¯ such that F (x, y, z) = 0, where F ∈ F[X, Y, Z] set of points (x : y : z) ∈ P2 (F) is a homogeneous polynomial of degree d. P Let F = fij X i Y j ∈ F[X, Y ] be a polynomial of degree d. The homogenization F ∗ of F is an element of F[X, Y, Z] and is defined by X F∗ = fij X i Y j Z d−i−j . Then F ∗ (X, Y, Z) = Z d F (X/Z, Y /Z). If F = 0 defines an affine plane curve of degree d, then F ∗ = 0 is the equation of the corresponding projective curve. A point at infinity of the affine curve with equation F = 0 is a point of the projective plane in the intersection of the line at infinity and the projective curve with equation F ∗ = 0. So the points at infinity on the curve are all points ¯ such that F ∗ (x, y, 0) = 0. (x : y : 0) ∈ P2 (F) A projective plane curve is reducible, respectively absolutely irreducible, if its defining homogeneous polynomial is irreducible, respectively absolutely irreducible. A point (x : y : z) on a projective curve with equation F = 0 is singular if FX (x, y, z) = FY (x, y, z) = FZ (x, y, z) = 0, and regular otherwise. Through a regular point (x : y : z) on the curve passes the tangent line with equation FX (x, y, z)X + FY (x, y, z)Y + FZ (x, y, z)Z = 0. If F ∈ F[X, Y, Z] is a homogeneous polynomial of degree d, then Euler’s equation XFX + Y FY + ZFZ = dF holds. So the two definitions of the tangent line to a curve in the affine and projective plane are consistent with each other. A curve is called regular or nonsingular if all its points are regular. In Corollary 13.3.13 it will be shown that a regular projective plane curve is absolutely irreducible.
Remark 13.2.2 Let F be a polynomial in F[X, Y ] of degree d. Suppose that F has at least d + 1 elements. Then there exists an affine change of coordinates
13.2. PLANE CURVES
405
such that the coefficients of U d and V d in F (U, V ) are 1. This is seen as follows. The projective curve with the defining equation F ∗ = 0 intersects the line at infinity in at most d points. Then there exist two Frational points P and Q on the line at infinity and not on the curve. Choose a projective transformation of coordinates which transforms P and Q into (1 : 0 : 0) and (0 : 1 : 0), respectively. This change of coordinates leaves the line at infinity invariant and gives a polynomial F (U, V ) such that the coefficients of U d and V d are not zero. An affine transformation can now transform these coefficients into 1. If for instance F = X 2 Y + XY 2 ∈ F4 [X, Y ] and α is a primitive element of F4 , then X = U + αV and Y = αU + V gives F (U, V ) = U 3 + V 3 . Similarly, for all polynomials F, G ∈ F[X, Y ] of degree l and m there exists an affine change of coordinates such that the coefficients of V l and V m in F (U, V ) and G(U, V ), respectively, are 1. Example 13.2.3 The Fermat curve Fm is a projective plane curve with defining equation X m + Y m + Z m = 0. The partial derivatives of X m + Y m + Z m are mX m−1 , mY m−1 , and mZ m−1 . So considered as a curve over the finite field Fq , it is regular if m is relatively prime to q. Example 13.2.4 Suppose q = r2 . The Hermitian curve Hr over Fq is defined by the equation U r+1 + V r+1 + 1 = 0. The corresponding homogeneous equation is U r+1 + V r+1 + W r+1 = 0. Hence it has r + 1 points at infinity and it is the Fermat curve Fm over Fq with r = m − 1. The conjugate of a ∈ Fq over Fr is obtained by a ¯ = ar . So the equation can also be written as ¯ + V V¯ + W W ¯ = 0. UU This looks like equating a Hermitian form over the complex numbers to zero and explains the terminology. We will see in Section 3 that for certain constructions of codes on curves it is convenient to have exactly one point at infinity. We will give a transformation such that the new equation of the Hermitian curve has this property. Choose an element b ∈ Fq such that br+1 = −1. There are exactly r + 1 of these, since q = r2 . Let P = (1 : b : 0). Then P is a point of the Hermitian curve. The tangent line at P has equation U + br V = 0. Multiplying with b gives the equation V = bU . Substituting V = bU in the defining equation of the curve gives that W r+1 = 0. So P is the only intersection point of the Hermitian curve and the tangent line at P . New homogeneous coordinates are chosen such that this tangent line becomes the line at infinity. Let X1 = W , Y1 = U and Z1 = bU − V . Then the curve has homogeneous equation X1r+1 = br Y1r Z1 + bY1 Z1r − Z1r+1
´ 406CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES in the coordinates X1 , Y1 and Z1 . Choose an element a ∈ Fq such that ar + a = −1. There are r of these. Let X = X1 , Y = bY1 + aZ1 and Z = Z1 . Then the curve has homogeneous equation X r+1 = Y r Z + Y Z r with respect to X, Y and Z. Hence the Hermitian curve has affine equation X r+1 = Y r + Y with respect to X and Y . This last equation has (0 : 1 : 0) as the only point at infinity. To see that the number of affine Fq rational points is r + (r + 1)(r2 − r) = r3 one argues as follows. The right side of the equation X r+1 = Y r + Y is the trace from Fq to Fr . The first r in the formula on the number of points corresponds to the elements of Fr . These are exactly the elements of Fq with zero trace. The remaining term corresponds to the elements in Fq with a nonzero trace, since the equation X r+1 = β, β ∈ F∗r , has exactly r + 1 solutions in Fq . Example 13.2.5 The Klein curve has homogeneous equation X 3 Y + Y 3 Z + Z 3 X = 0. More generally we define the curve Km by the equation X m Y + Y m Z + Z m X = 0. Suppose that m2 −m+1 is relatively prime to q. The partial derivatives of the left side of the equation are mX m−1 Y + Z m , mY m−1 Z + X m and mZ m−1 X + Y m . Let (x : y : z) be a singular point of the curve Km . If m is divisible by the characteristic, then xm = y m = z m = 0. So x = y = z = 0, a contradiction. If m and q are relatively prime, then xm y = −my m z = m2 z m x. So (m2 − m + 1)z m x = xm y + y m z + z m x = 0. Therefore z = 0 or x = 0, since m2 − m + 1 is relatively prime to the characteristic. But z = 0 implies xm = −my m−1 z = 0. Furthermore y m = −mz m−1 x. So x = y = z = 0, which is a contradiction. Similarly x = 0 leads to a contradiction. Hence Km is nonsingular if gcd(m2 − m + 1, q) = 1.
13.3
B´ ezout’s theorem
The principal theorem of algebra says that a polynomial of degree m in one variable with coefficients in a field has at most m zeros. If the field is algebraically closed and if the zeros are counted with multiplicities, then the total number of zeros is equal to m. B´ezout’s theorem is a generalization of the principal theorem of algebra from one to several variables. It can be stated and proved in any number of variables. But only the two variable case will be treated, that is to say the case of plane curves. First we recall some wellknown notions from commutative algebra.
´ 13.3. BEZOUT’S THEOREM
407
Let R be a commutative ring with a unit. An ideal I in R is called a prime ideal if I 6= R and for all f, g ∈ R if f g ∈ I, then f ∈ I or g ∈ I. Let F be a field. Let F be a polynomial in F[X, Y ] which is not a constant, and let I be the ideal in F[X, Y ] generated by F . Then I is a prime ideal if and only if F is irreducible. Let R be a commutative ring with a unit. A nonzero element f of R is called a zero divisor if f g = 0 for some g ∈ R, g 6= 0. The ring R is called an integral domain if it has no zero divisors. Let S be a commutative ring with a unit. Let I be an ideal in S. The factor ring of S modulo I is denoted by S/I. Then I is a prime ideal if and only if S/I is an integral domain. Let R be an integral domain. Define the relation ∼ on the set of pairs {(f, g)f, g ∈ R, g 6= 0} by (f1 , g1 ) ∼ (f2 , g2 ) if and only if there exists an h ∈ R, h 6= 0 such that f1 g2 h = g1 f2 h. This is an equivalence relation. Its classes are called fractions . The class of (f, g) is denoted by f /g or fg and f is called the numerator and g the denominator. The field of fractions or quotient field of R consists of all fractions f /g where f, g ∈ R and g 6= 0 and is denoted by Q(R). This indeed is a field with addition and multiplication defined by f1 f2 f1 g2 + f2 g1 + = g1 g2 g1 g2
and
f1 f2 f1 f2 · = . g1 g2 g1 g2
Example 13.3.1 The quotient field of the integers Z is the rationals Q. The quotient field of the ring of polynomials F[X1 , . . . , Xm ] is called the field of rational functions (in m variables) and is denoted by F(X1 , . . . , Xm ). Remark 13.3.2 If R is a commutative ring with a unit, then matrices with entries in R and the determinant of a square matrix can be defined as when R is a field. The usual properties for matrix addition and multiplication hold. If moreover R is an integral domain, then a square matrix M of size n has determinant zero if and only if there exists a nonzero r ∈ Rn such that rM = 0. This is seen by considering the same statement over the quotient field Q(R), where it is true, and clearing denominators. Furthermore we define an algebraic construction which is called the resultant of two polynomials that measures whether they have a factor in common. Definition 13.3.3 Let R be a commutative ring with a unit. Then R[Y ] is the ring of polynomials in one variable Y with coefficients in R. Let F P and G be l i two polynomials in R[Y ] of degree l and m, respectively. Then F = i=0 fi Y Pm j and G = j=0 gj Y , where fi , gj ∈ R for all i, j. Define the Sylvester matrix Sylv(F, G) of F and G as the square matrix of size l + m by f0 f1 . . . . fl 0 ... 0 0 0 f0 f1 . . . . fl 0 ... 0 .. . . . . .. .. .. .. .. . . . . . . . · · · . 0 . . . 0 f0 f1 . . . . f 0 l 0 0 . . . 0 f f . . . . f Sylv(F, G) = 0 1 l . g0 g1 . . . . . g 0 . 0 m 0 g0 g1 . . . . . g 0 . m . . . . . . . . .. .. .. .. · · · .. .. .. .. 0 . . . 0 g0 g1 . . . . . gm
´ 408CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES The first m rows consist of the cyclic shifts of the first row f0 f1 . . . fl 0 . . . 0 and the last l rows consist of the cyclic shifts of row m + 1 g0 g1 . . . gm 0 . . . 0 . The determinant of Sylv(F, G) is called the resultant of F and G and is denoted by Res(F, G). Proposition 13.3.4 If R is an integral domain and f and g are elements of R[Y ], then Res(F, G) = 0 if and only if F and G have a nontrivial common factor. Proof. If F and G have a nontrivial common factor, then F = BH and G = AH for some A, B and H in R[Y ], where H has nonzero degree. So AF = BG for some A P and B, wherePdeg(A) < mP = deg(G) and deg(B) < l = deg(F ). P Write A = ai Y i , F = fj Y j , B = br Y r and G = gs Y s . Rewrite the equation AF − BG = 0 as a system of equations X X ai fj − br gs = 0 for k = 0, 1, ..., l + m − 1, i+j=k
r+s=k
or as a matrix equation (a, −b)Sylv(F, G) = 0, where a = (a0 , a1 , ..., am−1 ), and b = (b0 , b1 , ..., bl−1 ). Hence the rows of the matrix Sylv(F, G) are dependent in case F and G have a common factor and so its determinant is zero. Thus we have shown that if F and G have a nontrivial common factor, then Res(F, G) = 0. The converse is also true. This is proved by reversing the argument. Corollary 13.3.5 If F is an algebraically closed field and F and G are elements of F[Y ], then Res(F, G) = 0 if and only if F and G have a common zero in F. After this introduction on the resultant, we are in a position to prove a weak form of B´ezout’s theorem. Proposition 13.3.6 Two plane curves of degree l and m that do not have a component in common, intersect in at most lm points. Proof. A special case of B´ezout is m = 1. A line, which is not a component of a curve of degree l, intersects this curve in at most l points. Stated differently, suppose that F is a polynomial in X and Y with coefficients in a field F and has degree l, and L is a nonconstant linear form. If F and L have more than l common zeros, then L divides F in F[X, Y ]. A more general special case is if F is a product of linear terms. So if one of the curves is a union of lines and the other curve does not contain any of these lines as a component, then the number of points in the intersection is at most lm. This follows from the above special case. The third special case is: if G = XY − 1 and F is arbitrary. Then
´ 13.3. BEZOUT’S THEOREM
409
the first curve can be parameterized by X = T, Y = 1/T ; substituting this in F gives a polynomial in T and 1/T of degree at most l, multiplying by T l gives a polynomial of degree at most 2l, and therefore the intersection of these two curves has at most 2l points. It is not possible to continue like this, that is to say by parameterizing the second curve by rational functions in T : X = X(T ) and Y = Y (T ). The proof of the general case uses elimination theory. Suppose that we have two equations in two variables of degree l and m, respectively, and we eliminate one variable. Then we get a polynomial in one variable of degree at most lm having as zeros the first coordinates of common zeros of the two original polynomials. In geometric terms, we have two curves of degree l respectively m in the affine plane, and we project the points of the intersection to a line. If we can show that we get at most lm points on this line and we can choose the projection in such a way that no two points of the intersection project to one point on the line, then we are done. We may assume that the field is algebraically closed, since by a common zero ¯ 2 such that F (x, y) = G(x, y) = 0. (x, y) of F and G, we mean a pair (x, y) ∈ F Let F and G be polynomials in the variables X and Y of degree l and m, respectively, with coefficients in a field F, and which do not have a common factor in F[X, Y ]. Then they do not have a nontrivial common factor in R[Y ], where R = F[X], so Res(F, G) is not zero by Proposition 13.3.4. By Remark 13.2.2 we may assume that, after an affine change of coordinates, F and G are monic and have degree l and in Y with Plm, respectively, asPpolynomials m coefficients in F[X]. Hence F = i=0 fi Y i and G = j=0 gj Y j , where fi and gj are elements of F[X] of degree at most l − i and m − j, respectively, and fl = gm = 1. The square matrix Sylv(F, G) of size l + m has entries in F[X]. Taking the determinant gives the resultant Res(F, G) which is an element of R = F[X], that is to say a polynomial in X with coefficients in F. The degree Pl is at most lm. This can be seen by homogenizing F and G. Then F ∗ = i=1 fi0 y i where fi0 is a homogeneous polynomial in X and Z of degree l − i, and similarly for G∗ . The determinant D(X, Z) of the corresponding Sylvester matrix is homogeneous of degree lm, since D(T X, T Z) = T lm D(X, Z). This is seen by dividing the rows and columns of the matrix by appropriate powers of T . We claim that the zeros of the polynomial Res(F, G) are exactly the projection of all points in the intersection of the curves defined by the equations F = 0 and G = 0. Thus we claim that x is a zero of Res(F, G) if and only if there exists an element y ∈ F such that (x, y) is a common zero of F and G. Let F (x) and G(x) be the polynomials in F[Y ], which are obtained from F and G by substituting x for X. The polynomials F (x) and G(x) have again degree l and m in Y , since we assumed that F and G are monic polynomials in Y of degrees l and m, respectively. Now Res(F (x), G(x)) = Res(F, G)(x),
´ 410CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES that is to say it does not make a difference if we substitute x for X first and take the resultant afterwards, or take the resultant first and make the substitution afterwards. The degree of F and G has not diminished after the substitution. Let (x, y) be a common zero of F and G. Then y is a common zero of F (x) and G(x), so Res(F (x), G(x)) = 0 by Corollary 13.3.5, and therefore Res(F, G)(x) = 0. For the proof of the converse statement, one reads the above proof backwards. Now we know that Res(F, G) is not identically zero and has degree at most lm, and therefore Res(F, G) has at most lm zeros. There is still a slight problem, it may happen that for a fixed zero x of Res(F, G) there exists more than one y such that (x, y) is a zero of F and G. This occasionally does happen. We will show that after a suitable coordinate change this does not occur. For every zero x of Res(F, G) there are at most min{l, m} elements y such that (x, y) is a zero of F and G. Therefore F and G have at most min{l2 m, lm2 } zeros in common, hence the collection of lines, which are incident with two distinct points of these zeros, is finite. Hence we can find a point P that is not in the union of this finite collection of lines. Furthermore there exists a line L incident with P and not incident with any of the common zeros of F and G. In fact almost every point P 0 and line L0 , incident with P 0 , have the above mentioned properties. Choose homogeneous coordinates such that P = (0 : 1 : 0) and L is the line at infinity. If P1 = (x, y1 ) and P2 = (x, y2 ) are zeros of F and G, then the line with equation X − xZ = 0 through the corresponding points (x : y1 : 1) and (x : y2 : 1) in the projective plane, contains also P . This contradicts the choice made for P . So for every zero x of Res(f, g) there exists exactly one y such that (x, y) is a zero of F and G. Hence F and G have at most lm common zeros. This finishes the proof of the weak form of B´ezout’s theorem. There are several reasons why the number of points in the intersection could be less than lm: F may not be algebraically closed; points of the intersection may lie at infinity; and multiplicities may occur. Take for instance F = X 2 − Y 2 + 1, G = Y and F = F3 . Then the two points of the intersection lie in F29 and not in F23 . Let H = Y − 1. Then the two lines defined by G and H have no intersection in the affine plane. The homogenized polynomials G∗ = G and H ∗ = Y − Z define curves in the projective plane which have exactly (1 : 0 : 0) in their intersection. Finally the line with equation H = 0 is the tangent line to the conic defined by F at the point (0, 1), and this point has to be counted with multiplicity 2. In order to define the multiplicity of a point of intersection we have to localize the ring of polynomials. Definition 13.3.7 Let P = (x, y) ∈ F2 . Let F[X, Y ]P be the subring of the field of fractions F(X, Y ) consisting of all fractions A/B such that A, B ∈ F[X, Y ] and B(P ) 6= 0. The ring F[X, Y ]P is called the localization of F[X, Y ] at P . We explain the use of localization for the definition of the multiplicity by analogy to the multiplicity of a zero of a polynomial in one variable. Let F = (X −a)e G, where a ∈ F, F, G ∈ F[X] and G(a) 6= 0. Then a is a zero of F with multiplicity e. The dimension of F[X]/(F ) as a vector space over F is equal to the degree
´ 13.3. BEZOUT’S THEOREM
411
of F . But the element G is invertible in the localization F[X]a of F[X] at a. So the ideal generated by F in F[X]a is equal to the ideal generated by (X − a)e . Hence the dimension of F[X]a /(F ) over F is equal to e. Definition 13.3.8 Let P be a point in the intersection of two affine curves X and Y defined by F and G, respectively. The intersection multiplicity I(P ; X , Y) of P at X and Y is defined by I(P ; X , Y) = dim F[X, Y ]P /(F, G). Without proof we state several properties of the intersection multiplicity. After a projective change of coordinates it may be assumed that the point P = (0, 0) is the origin of the affine plane. There is a unique way to write F as the sum of its homogeneous parts F = Fd + Fd+1 + · · · + Fl , where Fi is homogeneous of degree i, and Fd 6= 0 and Fl 6= 0. The homogeneous ¯ which are called the tangent lines polynomial Fd defines a union of lines over F, of X at P . The point P is regular point if and only if d = 1. The tangent line to X at P is defined by F1 = 0 if d = 1. Similarly G = Ge + Ge+1 + · · · + Gm . If the tangent lines of X at P are distinct from the tangent lines of Y at P , then the multiplicity of P is equal to de. In particular, if P is a regular point of both curves and the tangent lines are distinct, then d = e = 1 and the intersection multiplicity is 1. The Hermitian curve over Fq , with q = r2 , has the property that every line in the projective plane with coefficients in Fq intersects the Hermitian curve in r + 1 distinct points or in exactly one point with multiplicity r + 1. P Definition 13.3.9 A cycle is a formal sum mP P of points of the projective ¯ with integer coefficients mP such that for finitely many P its coeffiplane P2 (F) P P cient mP is not zero. The degree of a cycle is defined by deg( mP P ) = mP . If the projective plane curves X and Y are defined by the equations F = 0 and G = 0, respectively, then the intersection cycle X · Y is defined by X X ·Y = I(P ; X , Y)P. Proposition 13.3.6 implies that this indeed is a cycle, that is to say there are only finitely many points P such that I(P ; X , Y) is not zero. Example 13.3.10 Consider the curve X with homogeneous equation X a Y c + Y b+c Z a−b + X d Z a+c−d = 0 with d < b < a. Let L be the line with equation X = 0. The intersection of L with X consists of the points P = (0 : 0 : 1) and Q = (0 : 1 : 0). The origin of the affine plane is mapped to P under the mapping (x, y) 7→ (x : y : 1). The affine equation of the curve is X a Y c + Y b+c + X d = 0.
´ 412CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES The intersection multiplicity at P of X and L is equal to the dimension of F[X, Y ]0 /(X, X a Y c + Y b+c + X d ), which is b + c. The origin of the affine plane is mapped to Q under the mapping (x, z) 7→ (x : 1 : z). The affine equation of the curve becomes now X a + Z a−b + X d Z a+c−d = 0. The intersection multiplicity at P of X and L is equal to the dimension of F[X, Y ]0 /(X, X a + Z a−b + X d Z a+c−d ), which is a − b. Therefore X · L = (b + c)P + (a − b)Q. Let M be the line with equation Y = 0. Let N be the line with equation Z = 0. Let R = (1 : 0 : 0). One shows similiarly that X · M = dP + (a + c − d)R
and X · N = aQ + cR.
We state now as a fact the following strong version of B´ezout’s theorem. Theorem 13.3.11 If X and Y are projective plane curves of degrees l and m, respectively, that do not have a component in common, then deg(X · Y) = lm. Corollary 13.3.12 Two projective plane curves of positive degree have a point in common. Corollary 13.3.13 A regular projective plane curve is absolutely irreducible. Proof. If F = GH is a factorization of F with factors of positive degree, we get FX = GX H + GHX by the product or Leibniz rule for the partial derivative. So FX is an element of the ideal generated by G and H, and similarly for the other two partial derivatives. Hence the set of common zeros of FX , FY , FZ and F contains the set of common zeros of G and H. The intersection of the curves with equations G = 0 and H = 0 is not empty, by Corollary 13.3.12 since G and H have positive degrees. Therefore the curve has a singular point.
Remark 13.3.14 Notice that the assumption that the curve is a projective plane curve is essential. The equation X 2 Y −X = 0 defines a regular affine plane curve, but is clearly reducible. However one gets immediately from Corollary 13.3.13 that if F = 0 is an affine plane curve and the homogenization F ∗ defines a regular projective curve, then F is absolutely irreducible. The affine curve with equation X 2 Y − X = 0 has the points (1 : 0 : 0) and (0 : 1 : 0) at infinity, and (0 : 1 : 0) is a singular point.
13.4. CODES ON PLANE CURVES
413
13.3.1
Another proof of Bezout’s theorem by the footprint
13.4
Codes on plane curves
Let G be an irreducible element of Fq [X, Y ] of degree m. Let P1 , . . . , Pn be n distinct points in the affine plane over Fq which lie on the plane curve defined by the equation G = 0. So G(Pj ) = 0 for all j = 1, . . . , n. Consider the code E(l) = {(F (P1 ), . . . , F (Pn ))  F ∈ Fq [X, Y ], deg(F ) ≤ l}. Let Vl be the vector space of all polynomials in two variables X, Y and coefficients in Fq ,and of degree at most l. Let P = {P1 , . . . , Pn }. Consider the evaluation map evP : Fq [X, Y ] −→ Fnq defined by evP (F ) = (F (P1 ), . . . , F (Pn )). Then this is a linear map that has E(l) as image of Vl . Proposition 13.4.1 Let k be the dimension and d the minimum distance of the code E(l). Suppose lm < n. Then d ≥ n − lm. k=
l+2 2
lm + 1 −
m−1 2
if if
l < m, l ≥ m.
α β Proof. The monomials of the form X Y with α + β ≤ l form a basis of Vl . l+2 Hence Vl has dimension 2 . Let F ∈ Vl . If G is a factor of F , then the corresponding codeword evP (F ) is zero. Conversely, if evP (F ) = 0, then the curves with equation F = 0 and G = 0 have degree l0 ≤ l and m, respectively, and have the n points P1 , . . . , Pn in their intersection. B´ezout’s theorem and the assumption lm < n imply that F and G have a common factor. But G is irreducible. Therefore F is divisible by G. So the kernel of the evaluation map, restricted to Vl , is equal to GVl−m , which is zero if l < m. Hence k = l+2 if l < m, and 2 l+2 l−m+2 m−1 k= − = lm + 1 − 2 2 2
if l ≥ m. The same argument with B´ezout gives that a nonzero codeword has at most lm zeros, and therefore has weight at least n − lm. This shows that d ≥ n − lm. Example 13.4.2 Conics, reducible and irreducible............................. Remark 13.4.3 If F1 , . . . , Fk is a basis for Vl modulo GVl−m , then (Fi (Pj )  1 ≤ i ≤ k, 1 ≤ j ≤ n) is a generator matrix of E(l). So it is a parity check matrix for C(l), the dual of E(l). The minimum distance d⊥ of C(l) is equal to the minimal number of dependent columns of this matrix. Hence for all t < d⊥ and every subset Q of
´ 414CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES P consisting of t distinct points Pi1 , . . . , Pit , the corresponding k × t submatrix must have maximal rank t. Let Ll = Vl /GVl−m . Then the evaluation map evQ induces a surjective map from Ll to Ftq . The kernel is the space of all functions F ∈ Vl which are zero at the points of Q modulo GVl−m , which we denote by Ll (Q). So dim(Ll (Q)) = k − t. Conversely, the dimension of Ll (Q) is at least k − t for all tsubsets Q of P. But in order to get a bound for d⊥ , we have to know that dim(Ll (Q)) = k − t for all t < d⊥ . The theory developed so far is not sufficient to get such a bound. The theorem of RiemannRoch in the theory of algebraic curves gives an answer to this question. See Section ??. Section ?? gives another, more elementary, solution to this problem. Notice that the following inequality hold for the codes E(l): k + d ≥ n + 1 − g, where g = (m − 1)(m − 2)/2. In Section 7 we will see that g is the (arithmetic) genus. In Sections 36 the role of g will be played by the number of gaps of the (Weierstrass) semigroup of a point at infinity.
13.5
Conics, arcs and Segre
Proposition 13.5.1 m(3, q) =
q+1 q+2
if q is odd, if q is even.
Proof. We have seen that m(3, q) is at least q + 1 for all q in Example ??. If case q is even, then m(3, q) is least q + 2 by in Example 3.2.12. ***Segre*** ***Finite geometry and the Problems of Segre***
13.6
Qubic plane curves
13.6.1
Elliptic cuves
13.6.2
The addition law on elliptic curves
13.6.3
Number of rational points on an elliptic curve
Manin’s proof, Chahal
13.6.4
The discrete logarithm on elliptic curves
13.7
Quartic plane curves
13.7.1
Flexes and bitangents
13.7.2
The Klein quartic
13.8
Divisors
In the following, X is an irreducible smooth projective curve over an algebraically closed field F.
13.8. DIVISORS
415
P Definition 13.8.1 A divisor is a formal sum D = P ∈X nP P , with nP ∈ Z and nP = 0 for all but a finite number of points P . The support of a divisor is the set of points with nonzero coefficient. A divisor D is called effective if all coefficients nP P are nonnegative (notation D < 0). The degree deg(D) of the divisor D is nP . Definition 13.8.2 Let X and Y be projective plane curves defined by the equations F = 0 and G = 0, respectively, then the intersection divisor X ·Y is defined by X X ·Y = I(P ; X , Y)P, where I(P ; X , Y) is the intersection mulitplicity of Definition ??. B´ezout’s theorem tells us that X · Y is indeed a divisor and that its degree is lm if the degrees of X and Y are l and m, respectively. Let vP = ordP be the discrete valuation defined for functions on X in Definition ??. Definition 13.8.3 If f is a rational function on X , not identically 0, we define the divisor of f to be X (f ) = vP (f )P. P ∈X
So, in a sense, the divisor of f is a bookkeeping device that tells us where the zeros and poles of f are and what their multiplicities and orders are. Theorem 13.8.4 The degree of a divisor of a rational function is 0. Proof. Let X be a projective curve of degree l. Let f be a rational function on the curve X . Then f is represented by a quotient A/B of two homogeneous polynomials of the same degree, say m. Let Y and Z be the hypersurfaces defined by the equations A = 0 and B = 0, respectively. Then vP (f ) = I(P ; X , Y) − I(P ; X , Z), since f = a/b = (a/hm )(b/hm )−1 , where H is a homogeneous linear form representing h such that H(P ) 6= 0. Hence (f ) = X · Y − X · Z. So (f ) is indeed a divisor and its degree is zero, since it is the difference of two intersection divisors of the same degree lm. Example 13.8.5 Look at the curve of Example ??. We saw that f = x/(y + z) has a pole of order 2 in Q = (0 : 1 : 1). The line L with equation X = 0 intersects the curve in three points, namely P1 = (0 : α : 1), P2 = (0 : 1 + α : 1) and Q. So X · L = P1 + P2 + Q. The line M with equation Y = 0 intersects the curve in three points, namely P3 = (1 : 0 : 1), P4 = (α : 0 : 1) and P5 = (1 + α : 0 : 1). So X · M = P3 + P4 + P5 . The line N with equation Y + Z = 0 intersects the curve only in Q. So X · N = 3Q. Hence (x/(y + z)) = P1 + P2 − 2Q and (y/(y + z)) = P3 + P4 + P5 − 3Q. In this example it is not necessary to compute the intersection multiplicities since they are a consequence of B´ezout’s theorem.
´ 416CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES Example 13.8.6 Let X be the Klein quartic with equation X 3 Y + Y 3 Z + Z 3 X = 0 of Example 13.2.5. Let P1 = (0 : 0 : 1), P2 = (1 : 0 : 0) and Q = (0 : 1 : 0). Let L be the line with equation X = 0. Then L intersects X in the points P1 and Q. Since L is not tangent in Q, we see that I(Q; X , L) = 1. So the intersection multiplicity of X and L in P1 is 3, since the multiplicities add up to 4. Hence X · L = 3P1 + Q. Similarly we get for the lines M and N with equations Y = 0 and Z = 0, respectively, X · M = 3P2 + P1 and X · N = 3Q + P2 . Therefore (x/z) = 3P1 − P2 − 2Q and (y/z) = P1 + 2P2 − 3Q. Definition 13.8.7 The divisor of a rational function is called a principal divisor. We shall call two divisors D and D0 linearly equivalent if and only if D − D0 is a principal divisor ; notation D ≡ D0 . This is indeed an equivalence relation. Definition 13.8.8 Let D be a divisor on a curve X . We define a vector space L(D) over F by L(D) = {f ∈ F(X )∗  (f ) + D < 0} ∪ {0}. The dimension of L(D) over F is denoted by l(D). Pr Ps Note that if D = i=1 ni Pi − j=1 mj Qj with all ni , mj > 0, then L(D) consists of 0 and the functions in the function field that have zeros of multiplicity at least mj at Qj (1 ≤ j ≤ s) and that have no poles except possibly at the points Pi , with order at most ni (1 ≤ i ≤ r). We shall show that this vector space has finite dimension. First we note that if D ≡ D0 and g is a rational function with (g) = D − D0 , then the map f 7→ f g shows that L(D) and L(D0 ) are isomorphic. Theorem 13.8.9 (i) l(D) = 0 if deg(D) < 0, (ii) l(D) ≤ 1 + deg(D). Proof. (i) If deg(D) < 0, then for any function f ∈ F(X )∗ , we have deg((f ) + D) < 0, that is to say, f ∈ / L(D). (ii) If f is not 0 and f ∈ L(D), then D0 = D+(f ) is an effective divisor for which L(D0 ) has the same dimension as L(D) byP our observation above. So without r loss of generality D is effective, say D = i=1 ni Pi , (ni ≥ 0 for 1 ≤ i ≤ r). Again, assume that f is not 0 and f ∈ L(D). In the point Pi , we map f onto i the corresponding element of the ni dimensional vector space (t−n OPi )/OPi , i where ti is a local parameter at Pi . We thus obtain a mapping of f onto the direct sum of these vector spaces ; (map the 0function onto 0). This is a linear mapping. Suppose that f is in the kernel. This means that f does not have a pole in any of the points Pi , that is to say, f is a constant function. It follows that r X l(D) ≤ 1 + ni = 1 + deg(D). i=1
13.9. DIFFERENTIALS ON A CURVE
417
Example 13.8.10 Look at the curve of Examples ?? and 13.8.5. We saw that f = x/(y + z) and g = y/(y + z) are regular outside Q and have a pole of order 2 and 3, respectively, in Q = (0 : 1 : 1). So the functions 1, f and g have mutually distinct pole orders and are elements of L(3Q). Hence the dimension of L(3Q) is at least 3. We will see in Example 13.10.3 that it is exactly 3.
13.9
Differentials on a curve
Let X be an irreducible smooth curve with function field F(X ). Definition 13.9.1 Let V be a vector space over F(X ). An Flinear map D : F(X ) → V is called a derivation if it satifies the product rule D(f g) = f D(g) + gD(f ). ExampleP 13.9.2 Let X be the projective line Pwith ifuntion field F(X). Define D(F ) = iai X i−1 for a polynomial F = ai X ∈ F[X] and extend this definition to quotients by GD(F ) − F D(G) F = . D G G2 Then D : F(X) → F(X) is a derivation. Definition 13.9.3 The set of all derivations D : F(X ) → V will be denoted by Der(X , V). We denote Der(X , V) by Der(X ) if V = F(X ). The sum of two derivations D1 , D2 ∈ Der(X , V) is defined by (D1 + D2 )(f ) = D1 (f ) + D2 (f ). The product of D ∈ Der(X , V) with f ∈ F(X ) is defined by (f D)(g) = f D(g). In this way Der(X , V) becomes a vector space over F(X ). Theorem 13.9.4 Let t be a local parameter at a point P . Then there exists a unique derivation Dt : F(X ) → F(X ) such that Dt (t) = 1. Furthermore Der(X ) is one dimensional over F(X ) and Dt is a basis element for every local parameter t. Definition 13.9.5 A rational differential form or differential on X is an F(X )linear map from Der(X ) to F(X ). The set of all rational differential forms on X is denoted by Ω(X ). Again Ω(X ) becomes a vector space over F(X ) in the obvious way. Consider the map d : F(X ) −→ Ω(X ), where for f ∈ F(X ) the differential df : Der(X ) → F(X ) is defined by df (D) = D(f ) for all D ∈ Der(X ). Then d is a derivation. Theorem 13.9.6 The space Ω(X ) has dimension 1 over F(X ) and dt is a basis for every point P with local parameter t. So for every point P and local parameter tP , a differential ω can be represented in a unique way as ω = fP dtP , where fP is a rational function. The obvious definition for “the value “ of ω in P by ω(P ) = fP (P ) has no meaning, since it depends on the choice of tP . Despite of this negative result it is possible to say whether ω has a pole or a zero at P of a certain order.
´ 418CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES Definition 13.9.7 Let ω be a differential on X . The order or valuation of ω in P is defined by ordP (ω) = vP (ω) = vP (fP ). The differential form ω is called regular if it has no poles. The regular differentials on X form an F[X ]module, which we denote by Ω[X ]. This definition does not depend on the choices made. If X is an affine plane curve defined by the equation F = 0 with F ∈ F[X, Y ], then Ω[X ] is generated by dx and dy as an F[X ]module with the relation fx dx+ fy dy = 0. Example 13.9.8 We again look at the curve X in P2 given by X 3 +Y 3 +Z 3 = 0 in characteristic unequal to three. We define the sets Ux by Ux = {(x : y : z) ∈ X  y 6= 0, z 6= 0} and similarly Uy and Uz . Then Ux , Uy , and Uz cover X since there is no point on X where two coordinates are zero. It is easy to check that the three representations y 2 x d on Ux , ω= z y
z 2 y η= d on Uy , x z
2 x z ζ= on Uz d y x
define one differential on X . For instance, to show that η and ζ agree on Uy ∩Uz one takes the equation (x/z)3 + (y/z)3 + 1 = 0, differentiates, and applies the formula d(f −1 ) = −f −2 df to f = z/x. The only regular functions on X are constants, so one cannot represent this differential as g df with f and g regular functions on X . Now the divisor of a differential is defined as for functions. Definition 13.9.9 The divisor (ω) of the differential ω is defined by X (ω) = vP (ω)P. P ∈X
Of course, one must show that only finitely many coefficients in (ω) are not 0. Let ω be a differential and W = (ω). Then W is called a canonical divisor. If ω 0 is another nonzero differential, then ω 0 = f ω for some rational function f . So (ω 0 ) = W 0 ≡ W and therefore the canonical divisors form one equivalence class. This class is also denoted by W . Now consider the space L(W ). This space of rational functions can be mapped onto an isomorphic space of differential forms by f 7→ f ω. By the definition of L(W ), the image of f under the mapping is a regular differential form, that is to say, L(W ) is isomorphic to Ω[X]. Definition 13.9.10 Let X be a smooth projective curve over F. We define the genus g of X by g = l(W ). Example 13.9.11 Consider the differential dx on the projective line. Then dx is regular at all points Pa = (a : 1), since x − a is a local parameter in Pa and dx = d(x − a). Let Q = (1 : 0) be the point at infinity. Then t = 1/x is a local parameter in Q and dx = −t−2 dt. So vQ (dx) = −2. Hence (dx) = −2Q and l(−2Q) = 0. Therefore the projective line has genus zero.
13.10. THE RIEMANNROCH THEOREM
419
The genus of a curve will play an important role in the following sections. For methods with which one can determine the genus of a curve, we must refer to textbooks on algebraic geometry. We mention one formula without proof, the socalled Pl¨ ucker formula. Theorem 13.9.12 If X is a nonsingular projective curve of degree m in P2 , then 1 g = (m − 1)(m − 2). 2 Example 13.9.13 The genus of a line and a nonsingular conic are zero by Theorem 13.9.12. In fact a curve of genus zero is isomorphic to the projective line. For example the curve X with equation XZ − Y 2 = 0 of Example ?? is isomorphic to P1 where the isomorphism is given by (x : y : z) 7→ (x : y) = (y : z) for (x : y : z) ∈ X . The inverse map is given by (u : v) 7→ (u2 : uv : v 2 ). Example 13.9.14 So the curve of Examples ??, 13.8.5 and 13.9.8 has genus 1 and by the definition of genus, L(W ) = F, so regular differentials on X are scalar multiples of the differential ω of Example 13.9.8. For the construction of codes over algebraic curves that generalize Goppa codes, we shall need the concept of residue of a differential at a point P . This is defined in accordance with our treatment of local behavior of a differential ω. Definition 13.9.15 Let P be a point on X , t a local parameterPat P and ω = f dt the representation of ω. The function f can be written as i ai ti . We define the residue ResP (ω) of ω in the point P to be a−1 . One can show that this algebraic definition of the residue does not depend on the choice of the local parameter t. One of the basic results in the theory of algebraic curves is known as the residue theorem. We only state the theorem. Theorem 13.9.16 If ω is a differential on a smooth projective curve X , then X ResP (ω) = 0. P ∈X
13.10
The RiemannRoch theorem
The following theorem, known as the RiemannRoch theorem is not only a central result in algebraic geometry with applications in other areas, but it is also the key to the new results in coding theory. Theorem 13.10.1 Let D be a divisor on a smooth projective curve of genus g. Then, for any canonical divisor W l(D) − l(W − D) = deg(D) − g + 1. We do not give the proof. The theorem allows us to determine the degree of canonical divisors.
´ 420CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES Corollary 13.10.2 For a canonical divisor W , we have deg(W ) = 2g − 2. Proof. Everywhere regular functions on a projective curve are constant, that is to say, L(0) = F, so l(0) = 1. Substitute D = W in Theorem 13.10.1 and the result follows from Definition 13.9.10. Example 13.10.3 It is now clear why in Example 13.8.10 the space L(3Q) has dimension 3. By Example 13.9.14 the curve X has genus 1, the degree of W − 3Q is negative, so l(W − 3Q) = 0. By Theorem 13.10.1 we have l(3Q) = 3. At first, Theorem 13.10.1 does not look too useful. However, Corollary 13.10.2 provides us with a means to use it successfully. Corollary 13.10.4 Let D be a divisor on a smooth projective curve of genus g and let deg(D) > 2g − 2. Then l(D) = deg(D) − g + 1. Proof. By Corollary 13.10.2, deg(W − D) < 0, so by Theorem 13.8.9(i), l(W − D) = 0. Example 13.10.5 Consider the code of Theorem ??. We embed the affine plane in a projective plane and consider the rational functions on the curve defined by G. By B´ezout’s theorem, this curve intersects the line at infinity, that is to say, the line defined by Z = 0, in m points. These are the possible poles of our rational functions, each with order at most l. So, in the terminology of Definition 13.8.8, we have a space of rational functions, defined by a divisor D of degree lm. Then Corollary 13.10.4 and Theorem ?? imply that the curve defined by G has genus at most equal to m−1 . This is exactly what we find 2 from the Pl¨ ucker formula 13.9.12. Let m be a nonnegative integer. Then l(mP ) ≤ l((m − 1)P ) + 1, by the argument as in the proof of Theorem 13.8.9. Definition 13.10.6 If l(mP ) = l((m − 1)P ), then m is called a (Weierstrass) gap of P . A nonnegative integer that is not a gap is called a nongap of P . The number of gaps of P is equal to the genus g of the curve, since l(iP ) = i + 1 − g if i > 2g − 2, by Corollary 13.10.4 and 1 = l(0) ≤ l(P ) ≤ · · · ≤ l((2g − 1)P ) = g. If m ∈ N0 , then m is a nongap of P if and only if there exists a rational function which has a pole of order m in P and no other poles. Hence, if m1 and m2 are nongaps of P , then m1 + m2 is also a nongap of P . The nongaps form the Weierstrass semigroup in N0 . Let (ρi i ∈ N) be an enumeration of all the nongaps of P in increasing order, so ρ1 = 0. Let fi ∈ L(ρi P ) be such that vP (fi ) = −ρi for i ∈ N. Then f1 , . . . , fi provide a basis for the space L(ρi P ). This will be the approach of Sections 37. The term l(W − D) in Theorem 13.10.1 can be interpreted in terms of differentials. We introduce a generalization of Definition 13.8.8 for differentials.
13.11. CODES FROM ALGEBRAIC CURVES
421
Definition 13.10.7 Let D be a divisor on a curve X . We define Ω(D) = {ω ∈ Ω(X )  (ω) − D < 0} and we denote the dimension of Ω(D) over F by δ(D), called the index of speciality of D. The connection with functions is established by the following theorem. Theorem 13.10.8 δ(D) = l(W − D). Proof. If W = (ω), we define a linear map φ : L(W − D) → Ω(D) by φ(f ) = f ω. This is clearly an isomorphism. Example 13.10.9 If we take D = 0, then by Definition 13.9.10 there are exactly g linearly independent regular differentials on a curve X . So the differential of Example 13.9.8 is the only regular differential on X (up to a constant factor) as was already observed after Theorem 13.9.12.
13.11
Codes from algebraic curves
We now come to the applications to coding theory. Our alphabet will be Fq . Let F be the algebraic closure of Fq . We shall apply the theorems of the previous sections. A few adaptations are necessary, since for example, we consider for functions in the coordinate ring only those that have coefficients in Fq . If the affine curve X over Fq is defined by a prime ideal I in Fq [X1 , . . . , Xn ], then its coordinate ring Fq [X ] is by definition equal to Fq [X1 , . . . , Xn ]/I and its function field Fq (X ) is the quotient field of Fq [X ]. It is always assumed that the curve is absolutely irreducible, this means that the the defining ideal is also prime in F[X1 , . . . , Xn ]. Similar adaptions are made for projective curves. Notice that F (x1 , . . . , xn )q = F (xq1 , . . . , xqn ) for all F ∈ Fq [X1 , . . . , Xn ]. So if (x1 , . . . , xn ) is a zero of F and F is defined over Fq , then (xq1 , . . . , xqn ) is also a zero of F . Let F r : F → F be the Frobenius map defined by F r(x) = xq . We can extend this map coordinatewise to points in affine and projective space. If X is a curve defined over Fq and P is a point of X , then F r(P ) is also a point of X , by the above remark. A divisor D on X is called rational if the coefficients of P and F r(P ) in D are the same for any point P of X . The space L(D) will only be considered for rational divisors and is defined as before but with the restriction of the rational functions to Fq (X ). With these changes the stated theorems remain true over Fq in particular the theorem of RiemannRoch 13.10.1. Let X be an absolutely irreducible nonsingular projective curve over Fq . We shall define two kinds of algebraic geometry codes from X . The first kind generalizes ReedSolomon codes, the second kind generalizes Goppa codes. In the following, P1 , P2 , . . . , Pn are rational points on X and D is the divisor P1 + P2 + · · · + Pn . Furthermore G is some other divisor that has support disjoint from D. Although it is not necessary to do so, we shall make more restrictions on G, namely 2g − 2 < deg(G) < n.
´ 422CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES Definition 13.11.1 The linear code C(D, G) of length n over Fq is the image of the linear map α : L(G) → Fnq defined by α(f ) = (f (P1 ), f (P2 ), . . . , f (Pn )). Codes of this kind are called geometric ReedSolomon codes. Theorem 13.11.2 The code C(D, G) has dimension k = deg(G) − g + 1 and minimum distance d ≥ n − deg(G). Proof. (i) If f belongs to the kernel of α, then f ∈ L(G − D) and by Theorem 13.8.9(i), this implies f = 0. The result follows from the assumption 2g − 2 < deg(G) < n and Corollary 13.10.4. (ii) If α(f ) has weight d, then there are n − d points Pi , say Pi1 , Pi2 , . . . , Pin−d , for which f (Pi ) = 0. Therefore f ∈ L(G − E), where E = Pi1 + · · · + Pin−d . Hence deg(G) − n − d ≥ 0. Note the analogy with the proof of Theorem ??. Example 13.11.3 Let X be the projective line over Fqm . Let P n = q m − 1. We n define P0 = (0 : 1), P∞ = (1 : 0) and we define the divisor D as j=1 Pj , where j Pj = (β : 1), (1 ≤ j ≤ n). We define G = aP0 + bP∞ , a ≥ 0, b ≥ 0. (Here β is a primitive nth root of unity.) By Theorem 13.10.1, L(G) has dimension a + b + 1 and one immediately sees that the functions (x/y)i , −a ≤ i ≤ b, form a basis of L(G). Consider the code C(D, G). A generator matrix for this code has as rows (β i , β 2i , . . . , β ni ) with −a ≤ i ≤ b.POne easily checks that (c1 , c2 , . . . , cn ) is a n codeword in C(D, G) if and only if j=1 cj (β l )j = 0 for all l with a < l < n − b. It follows that C(D, G) is a ReedSolomon code. The subfield subcode with coordinates in Fq is a BCH code. Example 13.11.4 Let X be the curve of Examples ??, 13.8.5, 13.8.10 and 13.10.3. Let G = 3Q, where Q = (0 : 1 : 1). We take n = 8, so D is the sum of the remaining rational points. The coordinates are given by x y z
Q 0 1 1
P1 0 α 1
P2 0 α 1
P3 1 0 1
P4 α 0 1
P5 α 0 1
P6 1 1 0
P7 α 1 0
P8 α 1 0
where α = α2 = 1+α. We saw in Examples 13.8.10 and 13.10.3 that 1, x/(y +z) and y/(y + z) are a basis of L(3Q) over F and hence also over F4 . This leads to the following generator matrix for C(D, G): 1 1 1 1 1 1 1 1 0 0 1 α α 1 α α . α α 0 0 0 1 1 1 By Theorem 13.11.2, the minimum distance is at least 5 and of course, one immediately sees from the generator matrix that d = 5. We now come to the second class of algebraic geometry codes. We shall call these codes geometric Goppa codes. Definition 13.11.5 The linear code C ∗ (D, G) of length n over Fq is the image of the linear map α∗ : Ω(G − D) → Fnq defined by α∗ (η) = (ResP1 (η), ResP2 (η), . . . , ResPn (η)). The parameters are given by the following theorem.
13.11. CODES FROM ALGEBRAIC CURVES
423
Theorem 13.11.6 The code C ∗ (D, G) has dimension k ∗ = n − deg(G) + g − 1 and minimum distance d∗ ≥ deg(G) − 2g + 2. Proof. Just as in Theorem 13.11.2, these assertions are direct consequences of Theorem 13.10.1 (RiemannRoch), using Theorem 13.10.8 (making the connection between the dimension of Ω(G) and l(W − G)) and Corollary 13.10.2 (stating that the degree of a canonical divisor is 2g − 2). Example 13.11.7 Let L = {α1 , . . . , αn } be a set of n distinct elements of Fqm . Let g be a polynomial in Fqm [X] which is not zero at αi for all i. The (classical) Goppa code Γ(L, g) is defined by X ci ≡ 0 (mod g )}. Γ(L, g) = {c ∈ Fnq  X − αi Let Pi = (αi : 1), Q = (1 : 0) and D = P1 + · · · + Pn . If we take for E the divisor of zeros of g on the projective line, then Γ(L, g) = C ∗ (D, E − Q) and X ci dX ∈ Ω(E − Q − D). c ∈ Γ(L, g) if and only if X − αi This is the reason that some authors extend the definiton of geometric Goppa codes to subfield subcodes of codes of the form C ∗ (D, G). It is a wellknown fact that the parity check matrix of the Goppa code Γ(L, g) is equal to the following generator matrix of a generalized RS code g(α1 )−1 ... g(αn )−1 α1 g(α1 )−1 ... αn g(αn )−1 , .. .. . ··· . r−1 −1 r−1 −1 α1 g(α1 ) . . . αn g(αn ) where r is the degree of the Goppa polynomial g. So Γ(L, g) is the subfield subcode of the dual of a generalized RS code. This is a special case of the following theorem. Theorem 13.11.8 The codes C(D, G) and C ∗ (D, G) are dual codes. Proof. From Theorem 13.11.2 and Theorem 13.11.6 we know that k + k ∗ = n. So it suffices to take a word from each code and show that the inner product of the two words is 0. Let f ∈ L(G), η ∈ Ω(G − D). By Definitions 13.11.1 and 13.11.5, the differential f η has no poles except possibly poles of order 1 in the points P1 , P2 , . . . , Pn . The residue of f η in Pi is equal to f (Pi )ResPi (η). By Theorem 13.9.16, the sum of the residues of f η over all the poles, that is to say, over the points Pi , is equal to zero. Hence we have 0=
n X
f (Pi )ResPi (η) = hα(f ), α∗ (η)i.
i=1
Several authors prefer the codes C ∗ (D, G) over geometric RS codes but the nonexperts in algebraic geometry probably feel more at home with polynomials than with differentials. That this is possible without loss of generality is stated in the following theorem.
´ 424CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES Theorem 13.11.9 Let X be a curve defined over Fq . Let P1 , . . . , Pn be n rational points on X . Let D = P1 + · · · + Pn . Then there exists a differential form ω with simple poles at the Pi such that ResPi (ω) = 1 for all i. Furthermore C ∗ (D, G) = C(D, W + D − G) for all divisors G that have a support disjoint from the support of D, where W is the divisor of ω. So one can do without differentials and the codes C ∗ (D, G). However, it is useful to have both classes when treating decoding methods. These use parity checks, so one needs a generator matrix for the dual code. In the next paragraph we treat several examples of algebraic geometry codes. It is already clear that we find some good codes. For example from Theorem 13.11.2 we see that such codes over a curve of genus 0 (the projective line) are MDS codes. In fact, Theorem 13.11.2 says that d ≥ n − k + 1 − g, so if g is small, we are close to the Singleton bound.
13.12
Rational functions and divisors on plane curves
This section will be finished together with the correction of Section 7. rational cycles, Frobenius, divisors..... rational functions discrete valuation, discrete valuation ring. Example 13.12.1 Consider the curve X with homogeneous equation X a Y c + Y b+c Z a−b + X d Z a+c−d = 0 with d < b < a as in Example 13.3.10. The divisor of the rational function x/z is x = (X · L) − (X · N ) = (b + c)P − bQ − cR. z The divisor of the rational function y/z is y = (X · M) − (X · N ) = dP − aQ − (a − d)R. z Hence the divisor of (x/z)α (y/z)β is ((b + c)α + dβ)P + (−bα − aβ)Q + (−cα + (a − d)β)R. It has only a pole at Q if and only if cα ≤ (a − d)β. (This will serve as a motivation for the choice of the basis of R in Proposition ??.)
13.13
Resolution or normalization of curves
13.14
Newton polygon of plane curves
[?]
13.15. NOTES
13.15
425
Notes
Goppa submitted his seminal paper [?] in June 1975 and it was published in 1977. Goppa also published three more papers in the eighties [?, ?, ?] and a book [?] in 1991. Most of this section is standard textbook material. See for instance [?, ?, ?, ?] to mention a few. Section 13.4 is a special case of Goppa’s construction and comes from [?]. The Hermitian curves in Example 13.2.4 and their codes have been studied by many authors. See [?, ?, ?, ?]. The Klein curve goes back to F. Klein [?] and has been studied thoroughly, also over finite fields in connection with codes. See [?, ?, ?, ?, ?, ?, ?, ?].
´ 426CHAPTER 13. BEZOUT’S THEOREM AND CODES ON PLANE CURVES
Chapter 14
Curves
427
428
CHAPTER 14. CURVES
14.1
Algebraic varieties
14.2
Curves
14.3
Curves and function fields
14.4
Normal rational curves and Segre’s problems
14.5
The number of rational points
14.5.1
Zeta function
14.5.2
HasseWeil bound
14.5.3
Serre’s bound
14.5.4
Ihara’s bound
14.5.5
DrinfeldVl˘ adut¸ bound
14.5.6
Explicit formulas
14.5.7
Oesterl´ e’s bound
14.6
Trace codes and curves
14.7
Good curves
14.7.1
Maximal curves
14.7.2
Shimura modular curves
14.7.3
Drinfeld modular curves
14.7.4
TsfasmanVl˘ adut¸Zink bound
14.7.5
Towers of GarciaStichtenoth
14.8. APPLICATIONS OF AG CODES
429
14.8
Applications of AG codes
14.8.1
McEliece crypto system with AG codes
14.8.2
Authentication codes
Here we consider an application of AGcodes to authentication. Recall that in Chapter 10, Section 10.3.1 we started to consider authentication codes that are constructed via almost universal and almost strongly universal hash functions. They, in turn, can be constructed using errorcorrection codes. We recall two methods of constructing authentication codes (almost strongly universal hash function to be precise) from errorcorrecting codes: 1. Construct AUfamilies from codes as per Proposition 10.3.7 and then use Stinson’s composition method, Theorem 10.3.10. 2. Construct ASUfamilies directly from errorcorrecting codes. As an example we mentioned ASUfamilies constructed as in (1.) using ReedSolomon codes, Exercise 10.3.2. Now we would like to move on and present a general construction of almost universal hash functions that employs AGcodes. The following proposition formulates the result we need. Proposition 14.8.1 Let C be an algebraic curve over Fq with N + 1 rational points P0 , P1 , . . . , PN . Fix P = Pi for some i = 0, . . . , N and let W S(P ) = {0, w1 , w2 , . . . } be the Weierstraß semigroup of P . Then for each j ≥ 1 one can construct an almost universal hash family − U (N, q j , q), where ≤ wj /N . Proof. Indeed, construct an AGcode C = CL (D, wj P ), where the divisor P D is defined as D = k6=i Pk and P = Pi . So C is obtained as an image of the evaluation map for the functions that have a pole only at P and its order is bounded by wj . From ?? we have that length of C is N , dim C = dim L(wj P ) = j, and d(C) ≥ N − deg(wj P ) = N − wj . So 1 − d(C)/N ≤ wj and now the claim easily follows. As an example of this proposition, we show next how one can obtain AUfamilies from Hermitian curves. Proposition 14.8.2 For every prime power q and every i ≤ q, Hermitian curve y q + y = xq+1 over Fq2 yields 2 i − U (q 3 , q i +i , q 2 ). q2
Proof. Recall from ?? that Hermitian curve over Fq2 has q 3 + 1 rational points P1 , . . . , Pq3 , P∞ . Construct C = CL (D, wi P ), where P = P∞ is a place Pq3 at infinity, D = i=1 Pi , and W S(P ) = {0, w1 , w2 , . . . }. It is known that the Weierstraß semigroup W S(P ) is generated by q and q + 1. Let us show that w(i+1) = iq for all i ≤ q. We proceed by induction. For i = 1 2 we have w1 = q, which is obviously true. Then suppose that for some i ≥ 1 we have w( i ) = (i − 1)q and want to prove w(i+1) = iq. Clearly, for this we need to 2
2
show that there is exactly i − 1 nongaps between (i − 1)q and iq (these numbers themselves are not included in the count). So for the nongaps aq + b(q + 1)
430
CHAPTER 14. CURVES
that lie between (i − 1)q and iq we have: (i − 1)q < aq + b(q + 1) < iq. Thus, automatically, a < i. We have then (i − a − 1)
q q < b < (i − a) . q+1 q+1
(14.1)
So from here we see that 0 < a < i−1, because for a = i−1 we have b < q/(q+1), which is not possible. So there are i − 1 values of a, namely 0, . . . , i − 2, which could give rise to a nongap. The interval from (14.1) has length q/(q + 1) < 1. So it may contain at most one integer. If i−a < q +1, then (i−a−1)q/(q +1) < i − a − 1 < (i − a)q/(q + 1). And thus an integer i − a − 1 is always in that interval if i − a < q + 1. But for 0 < a < i − 1, the condition i − a < q + 1 is always full filled, since i ≤ q by the assumption. Thus for every 0 < a < i − 1, there exists exactly one b = i − a − 1, such that aq + b(q + 1) lies between (i − 1)q and iq. It is also easily seen that all these nongaps are different. So, indeed, w(i+1) = iq for all i ≤ q. 2 Now the claim follows form Proposition 14.8.1. As a consequence we have Corollary 14.8.3 Let a, b be positive integers such that b ≤ a ≤ 2b and q a is a square. Then there exists 2(a−b) 2 /2 b − SU (q 5a/2+b , q aq , q ). qb
Proof. Do the ”Hermitian” construction from the previous proposition over Fqa and i = q a−b . Then the claim follows from Theorem 10.3.10 and Exercise 10.3.2. *** Suzuki curves? *** To get some feeling about all these, the reader is advised to solve Exercise 14.8.1. Now we move to (2.). We would like to show the direct construction of Xing et. al ?? that uses AGcodes. Theorem 14.8.4 Let C be an algebraic curve over Fq of genus g and let R be some set of rational points of C. Let G be a positive divisor such that R > deg(G) ≥ 2g + 1 and R ∩ supp(G) = ∅. Then there exists − ASU (N, n, m) with N = qR, n = q deg(G)−g+1 , m = q, and = deg(G)/R. Proof. Consider the set H = {h(P,α) : L(G) → Fq h(P,α) (f ) = f (P ) + α, f ∈ L(G)}. Take H as functions in the definition of an ASUfamily; set X = L(G), Y = Fq . Then X = dim L(G) = deg(G) − g + 1, because deg(G) ≥ 2g + 1 > 2g − 1. It can be shown (see Exercise 14.8.2) that if deg(G) ≥ 2g + 1, then H = qR. It is also easy to see that for any a ∈ L(G) and any b ∈ Fq there exists exactly R = H/q functions from H that map a to b. This proves the first part of being ASU. As to the second part consider m= =
max
a1 6=a2 ∈L(G);b1 ,b2 ∈Fq
max
a1 6=a2 ∈L(G);b1 ,b2 ∈Fq
{h(P,α) ∈ Hh(P,α) (a1 ) = b1 ; h(P,α) (a2 ) = b2 } =
{(P, α) ∈ R × Fq (a1 − a2 − b1 + b2 )(P ) = 0; a2 (P ) + α = b2 }.
14.9. NOTES
431
As a1 −a2 ∈ L(G)\{0} and b1 −b2 ∈ Fq we see that a1 −a2 −b1 +b2 ∈ L(G)\{0}. Note that there cannot be more than deg(G) zeros of a1 − a2 − b1 + b2 among points in R (cf. ??). Since α in (P, α) is uniquely determined by P ∈ R, we see that there are at most deg(G) pairs (P, α) ∈ R × Fq that satisfy both (a1 − a2 − b1 + b2 )(P ) = 0, a2 (P ) + α = b2 . In other words, m ≤ deg(G) =
deg(G) · H . R · q
We can take now = deg(G)/R in Definition 10.3.8.
Again we present here a concrete result coming from Hermitian codes. Corollary 14.8.5 Let q be a prime power and let an integer q 3 > d ≥ q(q−1)+1 be given. Then there exists d − ASU (q 5 , q d−q(q−1)/2+1 , q 2 ). q3
Proof. Consider again the Hermitian curve over P Fq2 . Take any rational point P and construct C = CL (D, G), where D = P 0 6=P P 0 is the sum of all remaining rational points (there is q 3 of them), and G = dP . Then the claim follows directly from the previous theorem. For a numerical example we refer again to Exercise 14.8.1.
14.8.3
Fast multiplication in finite fields
14.8.4
Correlation sequences and pseudo random sequences
14.8.5
Quantum codes
14.8.6
Exercises
14.8.1 Suppose we would like to obtain an authentication code with PS = 2−20 ≥ PI and log S ≥ 234 . Give the parameters of such an authentication code using the following constructions. Compare the results. • OAconstruction as per Theorem 10.3.5. • RSconstruction as per Exercise 10.3.2. • Hermitian construction as per Corollary 14.8.3. • Hermitian construction as per Corollary 14.8.5. 14.8.2 Let H = {h(P,α) : L(G) → Fq h(P,α) (f ) = f (P ) + α, f ∈ L(G)} as in the proof of Theorem 14.8.4. Prove that if deg(G) ≥ 2g + 1, then H = qR.
14.9
Notes
432
CHAPTER 14. CURVES
Bibliography [1] [2] [3] N. Abramson. Information theory and coding. McGrawHill, New York, 1963. [4] A.V. Aho, J.E. Hopcroft, and J.D. Ulman. The design and analysis of computer algorithms. AddisonWesley, Reading, 1979. [5] M. Aigner. Combinatorial theory. Springer, New York, 1979. [6] A. Ashikhmin and A. Barg. Minimal vectors in linear codes. IEEE Transactions on Information Theory, 44(5):2010–2017, 1998. [7] C.A. Athanasiadis. Characteristic polynomials of subspace arrangements and finite fields. Advances in Mathematics, 122:193–233, 1996. [8] A. Barg. The matroid of supports of a linear code. AAECC, 8:165–172, 1997. [9] A. Barg. Complexity issues in coding theory. In V.S. Pless and W.C. Huffman, editors, Handbook of coding theory, volume 1, pages 649–754. NorthHolland, Amsterdam, 1998. [10] E.R. Berlekamp. Key papers in the development of coding theory. IEEE Press, New York, 1974. [11] E.R. Berlekamp. Algebraic coding theory. Aegon Park Press, Laguna Hills, 1984. [12] D.J. Bernstein, J. Buchmann, and E. Dahmen, editors. PostQuantum Cryptography. SpringerVerlag, Berlin Heidelberg, 2009. [13] J. Bierbrauer, T. Johansson, G. Kabatianskii, and B. Smeets. On families of hash functions via geometric codes and concatenation. In Advances in Cryptology – CRYPTO ’93. Lecture Notes in Computer Science, volume 773, pages 331–342, 1994. [14] N. Biggs. Algebraic graph theory. Cambridge University Press, Cambridge, 1993. 433
434
BIBLIOGRAPHY
[15] E. Biham and A. Shamir. Differential cryptanalysis of DESlike cryptosystems. In Advances in Cryptology – CRYPTO ’90. Lecture Notes in Computer Science, volume 537, pages 2–21, 1990. [16] G. Birkhoff. On the number of ways of coloring a map. Proc. Edinburgh Math. Soc., 2:83–91, 1930. [17] A. Bj¨ orner and T. Ekedahl. Subarrangments over finite fields: Chomological and enumerative aspects. Advances in Mathematics, 129:159–187, 1997. [18] J.E. Blackburn, N.H. Crapo, and D.A. Higgs. A catalogue of combinatorial geometries. Math. Comp., 27:155–166, 1973. [19] R.E. Blahut. Theory and practice of error control codes. AddisonWesley, Reading, 1983. [20] R.E. Blahut. Algebraic codes for data transmission. Cambridge University Press, Cambridge, 2003. [21] I.F. Blake. Algebraic coding theory: History and development. Dowden, Hutchinson and Ross, Stroudsburg, 1973. [22] G.R. Blakely. Safeguarding cryptographic keys. In Proceedings of 1979 national Computer Conf., pages 313–317, New York, 1979. [23] G.R. Blakely and C. Meadows. Security of ramp schemes. In Advances in Cryptology – CRYPTO ’84. Lecture Notes in Computer Science, volume 196, pages 242–268, 1985. [24] A. Blass and B.E. Sagan. M¨obius functions of lattices. Advances in Mathematics, 129:94–123, 1997. [25] T. Britz. MacWilliams identities and matroid polynomials. The Electronic Journal of Combinatorics, 9:R19, 2002. [26] T. Britz. Relations, matroids and codes. PhD thesis, Univ. Aarhus, 2002. [27] T. Britz. Extensions of the critical theorem. 305:55–73, 2005.
Discrete Mathematics,
[28] T. Britz. Higher support matroids. Discrete Mathematics, 307:2300–2308, 2007. [29] T. Britz and C.G. Rutherford. Covering radii are not matroid invariants. Discrete Mathematics, 296:117–120, 2005. [30] T. Britz and K. Shiromoto. A macwillimas type identity for matroids. Discrete Mathematics, 308:4551–4559, 2008. [31] T. Brylawski. A decomposition for combinatorial geometries. Transactions of the American Mathematical Society, 171:235–282, 1972. [32] T. Brylawski and J. Oxley. Intersection theory for embeddings of matroids into uniform geometries. Stud. Appl. math., 61:211–244, 1979.
BIBLIOGRAPHY
435
[33] T. Brylawski and J. Oxley. Several identities for the characteristic polynomial of a combinatorial geometry. Discrete Mathematics, 31(2):161–170, 1980. [34] T.H. Brylawski and J.G. Oxley. The tutte polynomial and its applications. In N. White, editor, Matroid Applications. Cambridge University Press, Cambridge, 1992. [35] J. Buchmann. Introduction to Cryptography. Springer, Berlin, 2004. [36] J.P. Buhler, H.W. Lenstra Jr., and C. Pomerance. Factoring integers with the number field sieve. In A.K. Lenstra and H.W. Lenstra Jr., editors, The development of the number field sieve. Lecture Notes in Computer Science, volume 1554, pages 50–94. Springer, Berlin, 1993. [37] L. Carlitz. The arithmetic of polynomials in a galois field. American Journal of Mathematics, 54:39–50, 1932. [38] P. Cartier. Les arrangements d’hyperplans: un chapitre de g´eom´etrie combinatoire. Seminaire N. Bourbaki, 561:1–22, 1981. [39] H. Chen and R. Cramer. Algebraic geometric secret sharing schemes and secure multiparty computations over small fields. In C. Dwork, editor, Advances in Cryptology – CRYPTO 2006. Lecture Notes in Computer Science, volume 4117, pages 521–536. Springer, Berlin, 2006. [40] C. Cid and H. Gilbert. AES security report, ECRYPT, IST2002507932. Available online at http://www.ecrypt.eu.org/ecrypt1/documents/D.STVL.21.0.pdf. [41] H. Cohen and G. Frey. Handbook of elliptic and hyperelliptic curve cryptography. CRC Press, Boca Raton, 2006. [42] H. Crapo. M¨ obius inversion in lattices. Archiv der Mathematik, 19:595– 607, 1968. [43] H. Crapo. The tutte polynomial. Aequationes Math., 3:211–229, 1969. [44] H. Crapo and G.C. Rota. On the foundations of combinatorial theory: Combinatorial geometries. MIT Press, Cambridge MA, 1970. [45] J. Daemen and R. Vincent. The design of rijndael. Springer, Berlin, 1992. [46] J. Daemen and R. Vincent. The wide trail design strategy. In B. Honary, editor, Cryptography and Coding 2001. Lecture Notes in Computer Science, volume 2260, pages 222–238. Springer, Berlin, 2001. [47] W. Diffie. The first ten years of public key cryptography. In J. Simmons, editor, Contemporary Cryptology: The Science of Information Integrity, pages 135–176. IEEE Press, Piscataway, 1992. [48] W. Diffie and M.E. Hellman. New directions in cryptography. IEEE Trans. Inform. Theory, 22:644–654, 1976.
436
BIBLIOGRAPHY
[49] J. Ding, J.E. Gower, and D.S. Schmidt. Multivariate Public Key Cryptosystems. Advances in Information Security. Springer Science+Business Media, LLC, 2006. [50] J.L. Dornstetter. On the equivalence of the BerlekampMassey and the Euclidean algorithms. IEEE Trans. Inform. Theory, 33:428–431, 1987. [51] W.M.B. Dukes. On the number of matroids on a finite set. S´eminaire Lotharingien de Combinatoire, 51, 2004. [52] I. Duursma. Algebraic geometry codes: general theory. In D. Ruano E. Martin´ezMoro, C. Munuera, editor, Advances in algebraic geometry codes, pages 1–48. World Scientific, New Jersey, 2008. [53] I.M. Duursma. Decoding codes from curves and cyclic codes. PhD thesis, Eindhoven University of Technology, 1993. [54] I.M. Duursma and R. K¨otter. Errorlocating pairs for cyclic codes. IEEE Trans. Inform. Theory, 40:1108–1121, 1994. [55] T. ElGamal. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inform. Theory, 31:469–472, 1985. [56] G. Etienne and M. Las Vergnas. Computing the Tutte polynomial of a hyperplane arrangement. Advances in Applied Mathematics, 32:198–211, 2004. [57] L. Euler. Solutio problematis ad geometriam situs pertinentis. Commentarii Academiae Scientiarum Imperialis Petropolitanae, 8:128–140, 1736. [58] E.N. Gilbert, F.J. MacWilliams, and N.J.A. Sloan. Codes, which detect deception. Bell Sys. Tech. J., 33(3):405–424, 1974. [59] C. Greene. Weight enumeration and the geometry of linear codes. Studies in Applied Mathematics, 55:119–128, 1976. [60] C. Greene and T. Zaslavsky. On the interpretation of whitney numbers through arrangements of hyperplanes, zonotopes, nonradon partitions and orientations of graphs. Trans. Amer. Math. Soc., 280:97–126, 1983. [61] R.W. Hamming. Error detecting and error correcting codes. Bell System Techn. Journal, 29:147–160, 1950. [62] R.W. Hamming. Coding and Information Theory. PrenticeHall, New Jersey, 1980. [63] T. Helleseth, T. Kløve, and J. Mykkeltveit. The weight distribution of irreducible cyclic codes with block lengths n1 ((q l − 1)/n). Discrete Mathematics, 18:179–211, 1977. [64] M. Hermelina and K. Nyberg. Correlation properties of the bluetooth combiner generator. In Dan Boneh, editor, Information Security and Cryptology ICISC 1999. Lecture Notes in Computer Science, volume 1787, pages 17–29. Springer, Berlin, 2000.
BIBLIOGRAPHY
437
[65] A.E. Heytmann and J.M. Jensen. On the equivalence of the BerlekampMassey and the Euclidean algorithm for decoding. IEEE Trans. Inform. Theory, 46:2614–2624, 2000. [66] L.J. Hoff