Upgrade one-true-awk to 20240311

This project was upgraded with external_updater.
Usage: tools/external_updater/updater.sh update external/one-true-awk
For more info, check https://cs.android.com/android/platform/superproject/+/main:tools/external_updater/README.md

Test: TreeHugger
Change-Id: Ieee9fe2c62ac2df29085eff57a792f6d0c71e32f
diff --git a/FIXES b/FIXES
index 52f49e3..33a36fc 100644
--- a/FIXES
+++ b/FIXES
@@ -25,10 +25,35 @@
 This file lists all bug fixes, changes, etc., made since the 
 second edition of the AWK book was published in September 2023.
 
+Mar 10, 2024:
+	fixed use-after-free bug in fnematch due to adjbuf invalidating
+	the pointers to buf. thanks to github user caffe3 for spotting
+	the issue and providing a fix, and to Miguel Pineiro Jr.
+	for the alternative fix.
+	MAX_UTF_BYTES in fnematch has been replaced with awk_mb_cur_max.
+	thanks to Miguel Pineiro Jr.
+
+Jan 22, 2024:
+	Restore the ability to compile with g++. Thanks to
+	Arnold Robbins.
+
+Dec 24, 2023:
+	Matchop dereference after free problem fix when the first
+	argument is a function call. Thanks to Oguz Ismail Uysal.
+	Fix inconsistent handling of --csv and FS set in the
+	command line. Thanks to Wilbert van der Poel.
+	Casting changes to int for is* functions. 
+
+Nov 27, 2023:
+	Fix exit status of system on MacOS. Update to REGRESS.
+	Thanks to Arnold Robbins. 
+	Fix inconsistent handling of -F and --csv, and loss of csv
+	mode when FS is set. 
+	
 Nov 24, 2023:
         Fix issue #199: gototab improvements to dynamically resize the
         table, qsort and bsearch to improve the lookup speed as the
-        table gets larger for multibyte input. thanks to Arnold Robbins.
+        table gets larger for multibyte input. Thanks to Arnold Robbins.
 
 Nov 23, 2023:
 	Fix Issue #169, related to escape sequences in strings.
@@ -37,29 +62,29 @@
 	by Miguel Pineiro Jr.
 
 Nov 20, 2023:
-	rewrite of fnematch to fix a number of issues, including
+	Rewrite of fnematch to fix a number of issues, including
 	extraneous output, out-of-bounds access, number of bytes
 	to push back after a failed match etc.
-	thanks to Miguel Pineiro Jr.
+	Thanks to Miguel Pineiro Jr.
 
 Nov 15, 2023:
-	Man page edit, regression test fixes. thanks to Arnold Robbins
-	consolidation of sub and gsub into dosub, removing duplicate
-	code. thanks to Miguel Pineiro Jr.
+	Man page edit, regression test fixes. Thanks to Arnold Robbins
+	Consolidation of sub and gsub into dosub, removing duplicate
+	code. Thanks to Miguel Pineiro Jr.
 	gcc replaced with cc everywhere.
 
 Oct 30, 2023:
-	multiple fixes and a minor code cleanup.
-	disabled utf-8 for non-multibyte locales, such as C or POSIX.
-	fixed a bad char * cast that causes incorrect results on big-endian
-	systems. also fixed an out-of-bounds read for empty CCL.
-	fixed a buffer overflow in substr with utf-8 strings.
-	many thanks to Todd C Miller.
+	Multiple fixes and a minor code cleanup.
+	Disabled utf-8 for non-multibyte locales, such as C or POSIX.
+	Fixed a bad char * cast that causes incorrect results on big-endian
+	systems. Also fixed an out-of-bounds read for empty CCL.
+	Fixed a buffer overflow in substr with utf-8 strings.
+	Many thanks to Todd C Miller.
 
 Sep 24, 2023:
 	fnematch and getrune have been overhauled to solve issues around
-	unicode FS and RS. also fixed gsub null match issue with unicode.
-	big thanks to Arnold Robbins.
+	unicode FS and RS. Also fixed gsub null match issue with unicode.
+	Big thanks to Arnold Robbins.
 
 Sep 12, 2023:
 	Fixed a length error in u8_byte2char that set RSTART to
@@ -84,9 +109,8 @@
 	of a string of 3 emojis is 3, not 12 as it would be if bytes
 	were counted.
 
-	Regular expressions are processes as UTF-8.
+	Regular expressions are processed as UTF-8.
 
 	Unicode literals can be written as \u followed by one
 	to eight hexadecimal digits.  These may appear in strings and
 	regular expressions.
-
diff --git a/METADATA b/METADATA
index 2b83084..2c9a96a 100644
--- a/METADATA
+++ b/METADATA
@@ -1,19 +1,19 @@
 # This project was upgraded with external_updater.
-# Usage: tools/external_updater/updater.sh update one-true-awk
+# Usage: tools/external_updater/updater.sh update external/one-true-awk
 # For more info, check https://cs.android.com/android/platform/superproject/+/main:tools/external_updater/README.md
 
 name: "one-true-awk"
 description: "This is the version of awk described in \'The AWK Programming Language\', by Al Aho, Brian Kernighan, and Peter Weinberger (Addison-Wesley, 1988, ISBN 0-201-07981-X)."
 third_party {
-  url {
-    type: GIT
-    value: "https://github.com/onetrueawk/awk.git"
-  }
-  version: "fbd1d5b712e27a9bb527e39ed6e9bf3b9afbb1df"
   license_type: NOTICE
   last_upgrade_date {
-    year: 2023
-    month: 11
-    day: 27
+    year: 2024
+    month: 4
+    day: 9
+  }
+  identifier {
+    type: "Git"
+    value: "https://github.com/onetrueawk/awk.git"
+    version: "20240311"
   }
 }
diff --git a/README.md b/README.md
index 84fb06e..a41fb3c 100644
--- a/README.md
+++ b/README.md
@@ -27,6 +27,7 @@
 The option `--csv` turns on CSV processing of input:
 fields are separated by commas, fields may be quoted with
 double-quote (`"`) characters, quoted fields may contain embedded newlines.
+Double-quotes in fields have to be doubled and enclosed in quoted fields.
 In CSV mode, `FS` is ignored.
 
 If no explicit separator argument is provided,
@@ -117,6 +118,8 @@
 
 If your system does not have `yacc` or `bison` (the GNU
 equivalent), you need to install one of them first.
+The default in the `makefile` is `bison`; you will have
+to edit the `makefile` to use `yacc`.
 
 NOTE: This version uses ISO/IEC C99, as you should also.  We have
 compiled this without any changes using `gcc -Wall` and/or local C
@@ -143,4 +146,4 @@
 
 #### Last Updated
 
-Mon 16 Oct 2023 11:23:08 IDT
+Mon 05 Feb 2024 08:46:55 IST
diff --git a/b.c b/b.c
index 881c052..870eecf 100644
--- a/b.c
+++ b/b.c
@@ -116,7 +116,7 @@
 static int get_gototab(fa*, int, int);
 static int set_gototab(fa*, int, int, int);
 static void clear_gototab(fa*, int);
-extern int u8_rune(int *, const uschar *);
+extern int u8_rune(int *, const char *);
 
 static int *
 intalloc(size_t n, const char *f)
@@ -346,7 +346,7 @@
 	int i;
 
 	for (i = 0, p = *pp; i < max && isxdigit(*p); i++, p++) {
-		if (isdigit(*p))
+		if (isdigit((int) *p))
 			n = 16 * n + *p - '0';
 		else if (*p >= 'a' && *p <= 'f')
 			n = 16 * n + *p - 'a' + 10;
@@ -416,7 +416,7 @@
 		FATAL("out of space for character class [%.10s...] 1", p);
 	bp = buf;
 	for (i = 0; *p != 0; ) {
-		n = u8_rune(&c, p);
+		n = u8_rune(&c, (const char *) p);
 		p += n;
 		if (c == '\\') {
 			c = quoted(&p);
@@ -424,7 +424,7 @@
 			if (*p != 0) {
 				c = bp[-1];
 				/* c2 = *p++; */
-				n = u8_rune(&c2, p);
+				n = u8_rune(&c2, (const char *) p);
 				p += n;
 				if (c2 == '\\')
 					c2 = quoted(&p); /* BUG: sets p, has to be u8 size */
@@ -607,18 +607,18 @@
 	size_t orig_size = f->gototab[state].allocated;		// 2nd half of new mem is this size
 	memset(p + orig_size, 0, orig_size * sizeof(gtte));	// clean it out
 
-	f->gototab[state].allocated = new_size;			// update gotottab info
+	f->gototab[state].allocated = new_size;			// update gototab info
 	f->gototab[state].entries = p;
 }
 
-static int get_gototab(fa *f, int state, int ch) /* hide gototab inplementation */
+static int get_gototab(fa *f, int state, int ch) /* hide gototab implementation */
 {
 	gtte key;
 	gtte *item;
 
 	key.ch = ch;
 	key.state = 0;	/* irrelevant */
-	item = bsearch(& key, f->gototab[state].entries,
+	item = (gtte *) bsearch(& key, f->gototab[state].entries,
 			f->gototab[state].inuse, sizeof(gtte),
 			entry_cmp);
 
@@ -638,7 +638,7 @@
 	return left->ch - right->ch;
 }
 
-static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab inplementation */
+static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab implementation */
 {
 	if (f->gototab[state].inuse == 0) {
 		f->gototab[state].entries[0].ch = ch;
@@ -662,7 +662,7 @@
 
 		key.ch = ch;
 		key.state = 0;	/* irrelevant */
-		item = bsearch(& key, f->gototab[state].entries,
+		item = (gtte *) bsearch(& key, f->gototab[state].entries,
 				f->gototab[state].inuse, sizeof(gtte),
 				entry_cmp);
 
@@ -710,7 +710,7 @@
 		return(1);
 	do {
 		/* assert(*p < NCHARS); */
-		n = u8_rune(&rune, p);
+		n = u8_rune(&rune, (const char *) p);
 		if ((ns = get_gototab(f, s, rune)) != 0)
 			s = ns;
 		else
@@ -743,7 +743,7 @@
 			if (f->out[s])		/* final state */
 				patlen = q-p;
 			/* assert(*q < NCHARS); */
-			n = u8_rune(&rune, q);
+			n = u8_rune(&rune, (const char *) q);
 			if ((ns = get_gototab(f, s, rune)) != 0)
 				s = ns;
 			else
@@ -774,7 +774,7 @@
 		s = 2;
 		if (*p == 0)
 			break;
-		n = u8_rune(&rune, p);
+		n = u8_rune(&rune, (const char *) p);
 		p += n;
 	} while (1); /* was *p++ */
 	return (0);
@@ -799,7 +799,7 @@
 			if (f->out[s])		/* final state */
 				patlen = q-p;
 			/* assert(*q < NCHARS); */
-			n = u8_rune(&rune, q);
+			n = u8_rune(&rune, (const char *) q);
 			if ((ns = get_gototab(f, s, rune)) != 0)
 				s = ns;
 			else
@@ -830,8 +830,6 @@
 }
 
 
-#define MAX_UTF_BYTES	4	// UTF-8 is up to 4 bytes long
-
 /*
  * NAME
  *     fnematch
@@ -868,16 +866,28 @@
 
 	do {
 		/*
-		 * Call u8_rune with at least MAX_UTF_BYTES ahead in
+		 * Call u8_rune with at least awk_mb_cur_max ahead in
 		 * the buffer until EOF interferes.
 		 */
-		if (k - j < MAX_UTF_BYTES) {
-			if (k + MAX_UTF_BYTES > buf + bufsize) {
+		if (k - j < awk_mb_cur_max) {
+			if (k + awk_mb_cur_max > buf + bufsize) {
+				char *obuf = buf;
 				adjbuf((char **) &buf, &bufsize,
-				    bufsize + MAX_UTF_BYTES,
+				    bufsize + awk_mb_cur_max,
 				    quantum, 0, "fnematch");
+
+				/* buf resized, maybe moved. update pointers */
+				*pbufsize = bufsize;
+				if (obuf != buf) {
+					i = buf + (i - obuf);
+					j = buf + (j - obuf);
+					k = buf + (k - obuf);
+					*pbuf = buf;
+					if (patlen)
+						patbeg = buf + (patbeg - obuf);
+				}
 			}
-			for (n = MAX_UTF_BYTES ; n > 0; n--) {
+			for (n = awk_mb_cur_max ; n > 0; n--) {
 				*k++ = (c = getc(f)) != EOF ? c : 0;
 				if (c == EOF) {
 					if (ferror(f))
@@ -887,7 +897,7 @@
 			}
 		}
 
-		j += u8_rune(&c, (uschar *)j);
+		j += u8_rune(&c, j);
 
 		if ((ns = get_gototab(pfa, s, c)) != 0)
 			s = ns;
@@ -907,17 +917,13 @@
 			break;     /* best match found */
 
 		/* no match at origin i, next i and start over */
-		i += u8_rune(&c, (uschar *)i);
+		i += u8_rune(&c, i);
 		if (c == 0)
 			break;    /* no match */
 		j = i;
 		s = 2;
 	} while (1);
 
-	/* adjbuf() may have relocated a resized buffer. Inform the world. */
-	*pbuf = buf;
-	*pbufsize = bufsize;
-
 	if (patlen) {
 		/*
 		 * Under no circumstances is the last character fed to
@@ -1229,8 +1235,6 @@
 	return 0;
 }
 
-extern int u8_rune(int *, const uschar *); /* run.c; should be in header file */
-
 int relex(void)		/* lexical analyzer for reparse */
 {
 	int c, n;
@@ -1248,7 +1252,7 @@
 rescan:
 	starttok = prestr;
 
-	if ((n = u8_rune(&rlxval, prestr)) > 1) {
+	if ((n = u8_rune(&rlxval, (const char *) prestr)) > 1) {
 		prestr += n;
 		starttok = prestr;
 		return CHAR;
@@ -1295,7 +1299,7 @@
 		if (!adjbuf((char **) &buf, &bufsz, n, n, (char **) &bp, "relex1"))
 			FATAL("out of space for reg expr %.10s...", lastre);
 		for (; ; ) {
-			if ((n = u8_rune(&rlxval, prestr)) > 1) {
+			if ((n = u8_rune(&rlxval, (const char *) prestr)) > 1) {
 				for (i = 0; i < n; i++)
 					*bp++ = *prestr++;
 				continue;
@@ -1389,7 +1393,7 @@
 		}
 		break;
 	case '{':
-		if (isdigit(*(prestr))) {
+		if (isdigit((int) *(prestr))) {
 			num = 0;	/* Process as a repetition */
 			n = -1; m = -1;
 			commafound = false;
diff --git a/bugs-fixed/REGRESS b/bugs-fixed/REGRESS
index 98d578a..30bdc7c 100755
--- a/bugs-fixed/REGRESS
+++ b/bugs-fixed/REGRESS
@@ -11,6 +11,7 @@
 	echo === $i
 	OUT=${i%.awk}.OUT
 	OK=${i%.awk}.ok
+	OK2=${i%.awk}.ok2
 	IN=${i%.awk}.in
 	input=
 	if [ -f $IN ]
@@ -22,7 +23,10 @@
 	if cmp -s $OK $OUT
 	then
 		rm -f $OUT
+	elif [ -f $OK2 ] && cmp -s $OK2 $OUT
+	then
+		rm -f $OUT
 	else
-		echo ++++ $i failed!
+		echo "+++ $i failed!"
 	fi
 done
diff --git a/bugs-fixed/matchop-deref.awk b/bugs-fixed/matchop-deref.awk
new file mode 100644
index 0000000..6c066aa
--- /dev/null
+++ b/bugs-fixed/matchop-deref.awk
@@ -0,0 +1,11 @@
+function foo() {
+	return "aaaaaab"
+}
+
+BEGIN { 
+	print match(foo(), "b")
+}
+
+{
+	print match(substr($0, 1), "b")     
+}
diff --git a/bugs-fixed/matchop-deref.bad b/bugs-fixed/matchop-deref.bad
new file mode 100644
index 0000000..343ee5c
--- /dev/null
+++ b/bugs-fixed/matchop-deref.bad
@@ -0,0 +1,2 @@
+-1
+-1
diff --git a/bugs-fixed/matchop-deref.in b/bugs-fixed/matchop-deref.in
new file mode 100644
index 0000000..0d197e1
--- /dev/null
+++ b/bugs-fixed/matchop-deref.in
@@ -0,0 +1 @@
+aaaaaab
diff --git a/bugs-fixed/matchop-deref.ok b/bugs-fixed/matchop-deref.ok
new file mode 100644
index 0000000..49019db
--- /dev/null
+++ b/bugs-fixed/matchop-deref.ok
@@ -0,0 +1,2 @@
+7
+7
diff --git a/bugs-fixed/system-status.ok2 b/bugs-fixed/system-status.ok2
new file mode 100644
index 0000000..f1f631e
--- /dev/null
+++ b/bugs-fixed/system-status.ok2
@@ -0,0 +1,3 @@
+normal status 42
+death by signal status 257
+death by signal with core dump status 262
diff --git a/lib.c b/lib.c
index b5b83f8..0dac1f9 100644
--- a/lib.c
+++ b/lib.c
@@ -399,7 +399,7 @@
 	i = 0;	/* number of fields accumulated here */
 	if (inputFS == NULL)	/* make sure we have a copy of FS */
 		savefs();
-	if (strlen(inputFS) > 1) {	/* it's a regular expression */
+	if (!CSV && strlen(inputFS) > 1) {	/* it's a regular expression */
 		i = refldbld(r, inputFS);
 	} else if (!CSV && (sep = *inputFS) == ' ') {	/* default whitespace */
 		for (i = 0; ; ) {
@@ -845,10 +845,10 @@
 {
 	const char *os = s;
 
-	if (!isalpha((uschar) *s) && *s != '_')
+	if (!isalpha((int) *s) && *s != '_')
 		return 0;
 	for ( ; *s; s++)
-		if (!(isalnum((uschar) *s) || *s == '_'))
+		if (!(isalnum((int) *s) || *s == '_'))
 			break;
 	return *s == '=' && s > os;
 }
@@ -883,7 +883,7 @@
 	if (no_trailing)
 		*no_trailing = false;
 
-	while (isspace(*s))
+	while (isspace((int) *s))
 		s++;
 
 	/* no hex floating point, sorry */
@@ -895,7 +895,7 @@
 		is_nan = (strncasecmp(s+1, "nan", 3) == 0);
 		is_inf = (strncasecmp(s+1, "inf", 3) == 0);
 		if ((is_nan || is_inf)
-		    && (isspace(s[4]) || s[4] == '\0'))
+		    && (isspace((int) s[4]) || s[4] == '\0'))
 			goto convert;
 		else if (! isdigit(s[1]) && s[1] != '.')
 			return false;
@@ -918,7 +918,7 @@
 	/*
 	 * check for trailing stuff
 	 */
-	while (isspace(*ep))
+	while (isspace((int) *ep))
 		ep++;
 
 	if (no_trailing != NULL)
diff --git a/main.c b/main.c
index c478e32..5bc1272 100644
--- a/main.c
+++ b/main.c
@@ -22,7 +22,7 @@
 THIS SOFTWARE.
 ****************************************************************/
 
-const char	*version = "version 20231124";
+const char	*version = "version 20240311";
 
 #define DEBUG
 #include <stdio.h>
@@ -199,6 +199,10 @@
 		argc--;
 		argv++;
 	}
+
+	if (CSV && (fs != NULL || lookup("FS", symtab) != NULL))
+		WARNING("danger: don't set FS when --csv is in effect");
+
 	/* argv[1] is now the first argument */
 	if (npfile == 0) {	/* no -f; first argument is program */
 		if (argc <= 1) {
diff --git a/run.c b/run.c
index 7462c38..799e998 100644
--- a/run.c
+++ b/run.c
@@ -795,7 +795,7 @@
 
 Cell *matchop(Node **a, int n)	/* ~ and match() */
 {
-	Cell *x, *y;
+	Cell *x, *y, *z;
 	char *s, *t;
 	int i;
 	int cstart, cpatlen, len;
@@ -817,7 +817,7 @@
 		i = (*mf)(pfa, s);
 		tempfree(y);
 	}
-	tempfree(x);
+	z = x;
 	if (n == MATCHFCN) {
 		int start = patbeg - s + 1; /* origin 1 */
 		if (patlen < 0) {
@@ -839,11 +839,13 @@
 		x = gettemp();
 		x->tval = NUM;
 		x->fval = start;
-		return x;
 	} else if ((n == MATCH && i == 1) || (n == NOTMATCH && i == 0))
-		return(True);
+		x = True;
 	else
-		return(False);
+		x = False;
+
+	tempfree(z);
+	return x;
 }
 
 
@@ -1298,7 +1300,8 @@
 
 						if (bs == NULL)	{ // invalid character
 							// use unicode invalid character, 0xFFFD
-							bs = "\357\277\275";
+							static char invalid_char[] = "\357\277\275";
+							bs = invalid_char;
 							count = 3;
 						}
 						t = bs;
@@ -2065,6 +2068,7 @@
 	Node *nextarg;
 	FILE *fp;
 	int status = 0;
+	int estatus = 0;
 
 	t = ptoi(a[0]);
 	x = execute(a[1]);
@@ -2107,20 +2111,21 @@
 		break;
 	case FSYSTEM:
 		fflush(stdout);		/* in case something is buffered already */
-		status = system(getsval(x));
-		u = status;
+		estatus = status = system(getsval(x));
 		if (status != -1) {
 			if (WIFEXITED(status)) {
-				u = WEXITSTATUS(status);
+				estatus = WEXITSTATUS(status);
 			} else if (WIFSIGNALED(status)) {
-				u = WTERMSIG(status) + 256;
+				estatus = WTERMSIG(status) + 256;
 #ifdef WCOREDUMP
 				if (WCOREDUMP(status))
-					u += 256;
+					estatus += 256;
 #endif
 			} else	/* something else?!? */
-				u = 0;
+				estatus = 0;
 		}
+		/* else estatus was set to -1 */
+		u = estatus;
 		break;
 	case FRAND:
 		/* random() returns numbers in [0..2^31-1]
@@ -2444,7 +2449,7 @@
 	start = getsval(x);
 	while (pmatch(pfa, start)) {
 		if (buf == NULL) {
-			if ((pb = buf = malloc(bufsz)) == NULL)
+			if ((pb = buf = (char *) malloc(bufsz)) == NULL)
 				FATAL("out of memory in dosub");
 			tempstat = pfa->initstat;
 			pfa->initstat = 2;
diff --git a/testdir/T.csv b/testdir/T.csv
index 79c1510..e0f3d70 100755
--- a/testdir/T.csv
+++ b/testdir/T.csv
@@ -17,7 +17,7 @@
 	sub(/try /, "")
 	prog = $0
 	printf("%3d  %s\n", nt, prog)
-	prog = sprintf("%s -F\"\\t\" '"'"'%s'"'"'", awk, prog)
+	prog = sprintf("%s '"'"'%s'"'"'", awk, prog)
 	# print "prog is", prog
 	nt2 = 0
 	while (getline > 0) {
diff --git a/testdir/T.overflow b/testdir/T.overflow
index d3d97d4..ac9c0bd 100755
--- a/testdir/T.overflow
+++ b/testdir/T.overflow
@@ -84,3 +84,5 @@
 rm -rf /tmp/awktestfoo*
 $awk 'BEGIN { for (i=1; i <= 1000; i++) print i >("/tmp/awktestfoo" i) }'
 ls /tmp/awktestfoo* | grep '1000' >/dev/null || echo 1>&2 "BAD: T.overflow openfiles"
+rm -rf /tmp/awktestfoo*
+exit 0
diff --git a/testdir/T.split b/testdir/T.split
index f7b24ba..d938404 100755
--- a/testdir/T.split
+++ b/testdir/T.split
@@ -220,5 +220,6 @@
 echo 'cat dog' > $TEMP2
 diff $TEMP1 $TEMP2 || fail 'BAD: T.split(a, b, "[\r\n]+")'
 
+rm -rf $WORKDIR
 
 exit $RESULT