Previous: Sorting lists and vectors | Next: Sorting lists and vectors

(This section, as the libraries it describes, was written mostly by Olin Shivers for the draft of SRFI 32.)

The sort libraries in Scheme 48 include

- vector insert sort (stable)
- vector heap sort
- vector merge sort (stable)
- pure and destructive list merge sort (stable)
- stable vector and list merge
- miscellaneous sort-related procedures: vector and list merging, sorted predicates, vector binary search, vector and list delete-equal-neighbor procedures.
- a general, non-algorithmic set of procedure names for general sorting and merging

There are two different interfaces: "what" (simple) and "how" (detailed).

**Simple**- you specify semantics: datatype (list or vector), mutability, and stability.
**Detailed**- you specify the actual algorithm (quick, heap,
insert, merge). Different algorithms have different properties,
both semantic and pragmatic, so these exports are necessary.
It is necessarily the case that the specifications of these procedures make statements about execution "pragmatics." For example, the sole distinction between heap sort and quick sort--both of which are provided by this library---is one of execution time, which is not a "semantic" distinction. Similar resource-use statements are made about "iterative" procedures, meaning that they can execute on input of arbitrary size in a constant number of stack frames.

The two interfaces share common procedure signatures wherever possible, to facilitate switching a given call from one procedure to another.

These procedures uniformly observe the following parameter order: the data to be sorted comes after the comparison procedure. That is, we write

(sort<list)

not

(sortlist<)

These routines take a *<* comparison procedure, not a * <= * comparison
procedure, and they sort into increasing order. The difference between
a *<* spec and a * <= * spec comes up in two places:

- the definition of an ordered or sorted data set, and
- the definition of a stable sorting algorithm.

In other words, scanning across the data never takes a "downwards" step.

If you use a * <= * procedure where these algorithms expect a *<*
procedure, you may not get the answers you expect. For example,
the `list-sorted?`

procedure will return false if you pass it a * <= * comparison
procedure and an ordered list containing adjacent equal elements.

A "stable" sort is one that preserves the pre-existing order of equal elements. Suppose, for example, that we sort a list of numbers by comparing their absolute values, i.e., using comparison procedure

(lambda (x y) (< (abs x) (abs y)))If we sort a list that contains both 3 and -3:

then a stable sort is an algorithm that will not swap the order of these two elements, that is, the answer is guaranteed to to look like...3, ..., -3 ...

not...3, -3 ...

Choosing...-3, 3 ...

```
(<
```*y* *x*)

means "This is due to the definition of equality, given a *<* comparator:

(and (not (< x y)) (not (< y x)))The definition is rather different, given a

(and (<= x y) (<= y x))A "stable" merge is one that reliably favors one of its data sets when equal items appear in both data sets.

So, if we are merging two lists of numbers ordered by absolute value,
the stable merge operation `list-merge`

(list-merge (lambda (x y) (< (abs x) (abs y))) '(0 -2 4 8 -10) '(-1 3 -4 7))reliably places the 4 of the first list before the equal-comparing -4 of the second list:

(0 -1 -2 4 -4 7 8 -10)Some sort algorithms will

In short, if your comparison procedure *f* answers true to `(`

, then
*f* x x)

- using a stable sorting or merging algorithm will not give you a stable sort or merge,
`list-sorted?`

may surprise you.

(lambda (x y) (not (<= y x)))if need be.

Precise definitions give sharp edges to tools, but require care in use. "Measure twice, cut once."

The vector operations specified below all take optional
`start`

/`end`

arguments indicating a selected subrange
of a vector's elements. If a `start`

parameter or
`start`

/`end`

parameter pair is given to such a
procedure, they must be exact, non-negative integers, such that

where0 <=start<=end<=`(vector-length`

vector)

`List-sort!`

and `List-stable-sort!`

are allowed, but
not required, to alter their arguments' cons cells to construct the
result list. This is consistent with the what-not-how character of the
group of procedures to which they belong (the `sorting`

structure).

The `list-delete-neighbor-dups!`

, `list-merge!`

and
`list-merge-sort!`

procedures, on the other hand, provide
specific algorithms, and, as such, explicitly commit to the use of
side-effects on their input lists in order to guarantee their key
algorithmic properties (e.g., linear-time operation).

Note that there is no "list insert sort" package, as you might as well always use list merge sort. The reference implementation's destructive list merge sort will do fewer

Structure name Functionality `sorting`

General sorting for lists and vectors `sorted`

Sorted predicates for lists and vectors `list-merge-sort`

List merge sort `vector-merge-sort`

Vector merge sort `vector-heap-sort`

Vector heap sort `vector-insert-sort`

Vector insertion sort `delete-neighbor-duplicates`

List and vector delete neighbor duplicates `binary-searches`

Vector binary search

`set-cdr!`

s than a destructive insert sort.
Almost all of the procedures described below are variants of two basic operations: sorting and merging. These procedures are consistently named by composing a set of basic lexemes to indicate what they do.

Lexeme Meaning `sort`

The procedure sorts its input data set by some <comparison procedure.`merge`

The procedure merges two ordered data sets into a single ordered result. `stable`

This lexeme indicates that the sort is a stable one. `vector`

The procedure operates upon vectors. `list`

The procedure operates upon lists. `!`

Procedures that end in `!`

are allowed, and sometimes required, to reuse their input storage to construct their answer.

In the procedures specified below,

- A
`<`

or`=`

parameter is a procedure accepting two arguments taken from the specified procedure's data set(s), and returning a boolean; `Start`

and`end`

parameters are exact, non-negative integers that serve as vector indices selecting a subrange of some associated vector. When specified, they must satisfy the relation

where*0 <=**start*<=*end*<=`(vector-length`

*vector*)*vector*is the associated vector.

If a procedure is said to return "unspecified," this means that
nothing at all is said about what the procedure returns, not even the
number of return values. Such a procedure is not even required to be
consistent from call to call in the nature or number of its return
values. It is simply required to return a value (or values) that may
be passed to a command continuation, e.g. as the value of an
expression appearing as a non-terminal subform of a `begin`

expression. Note that in R* ^{5}*RS, this restricts such a procedure to
returning a single value; non-R

`sorting`

--general sorting packageThis library provides basic sorting and merging functionality suitable for general programming. The procedures are named by their semantic properties, i.e., what they do to the data (sort, stable sort, merge, and so forth).

`(list-sorted?`

*<*list*boolean*`(list-merge`

*<*listlist_{1}_{2}*list*`(list-merge!`

*<*listlist_{1}_{2}*list*`(list-sort`

*<*lis*list*`(list-sort!`

*<*lis*list*`(list-stable-sort`

*<*list*list*`(list-stable-sort!`

*<*list*list*`(list-delete-neighbor-dups`

*=*list*list*`(vector-sorted?`

*<*v [start [end]]*boolean*`(vector-merge`

*<*vv_{1}[start_{2}*1*[end*1*[start*2*[end*2*]]]]*vector*`(vector-merge!`

*<*v vv_{1}[start [start_{2}*1*[end*1*[start*2*[end*2*]]]]]`(vector-sort`

*<*v [start [end]]*vector*`(vector-sort!`

*<*v [start [end]]`(vector-stable-sort`

*<*v [start [end]]*vector*`(vector-stable-sort!`

*<*v [start [end]]`(vector-delete-neighbor-dups`

*=*v [start [end]]*vector*

Procedure Suggested algorithm `list-sort`

vector heap or quick `list-sort!`

list merge sort `list-stable-sort`

vector merge sort `list-stable-sort!`

list merge sort `vector-sort`

heap or quick sort `vector-sort!`

or quick sort`vector-stable-sort`

vector merge sort `vector-stable-sort!`

merge sort

`List-Sorted?`

and `vector-sorted?`

return true if their
input list or vector is in sorted order, as determined by their All four merge operations are stable: an element of the initial list
*list_{1}* or vector

The procedures

`list-merge`

`list-sort`

`list-stable-sort`

`list-delete-neighbor-dups`

The procedure

`list-sort!`

`list-stable-sort!`

On the other hand, the `list-merge!`

procedure
make only a single, iterative, linear-time pass over its argument
list, using `set-cdr!`

s to rearrange the cells of the list
into the final result --it works "in place." Hence, any cons cell
appearing in the result must have originally appeared in an input. The
intent of this iterative-algorithm commitment is to allow the
programmer to be sure that if, for example, `list-merge!`

is asked to
merge two ten-million-element lists, the operation will complete
without performing some extremely (possibly twenty-million) deep
recursion.

The vector procedures

`vector-sort`

`vector-stable-sort`

`vector-delete-neighbor-dups`

The vector procedures

`vector-sort!`

`vector-stable-sort!`

`vector-stable-sort!`

may allocate temporary storage proportional to the size of the
input
.)
`Vector-merge`

returns a vector of length *( end_1-start_1+(end_2-start_2)*.

`Vector-merge!`

writes its result into vector *v*,
beginning at index *start*, for indices less than * end =
start + (end_1-start_1) +
(end_2-start_2)*. The target subvector

The `...-delete-neighbor-dups-...`

procedures:
These procedures delete adjacent duplicate elements from a list or a
vector, using a given element-equality procedure. The first/leftmost
element of a run of equal elements is the one that survives. The list or
vector is not otherwise disordered.

These procedures are linear time--much faster than the *O(n ^{2})* general
duplicate-element deletors that do not assume any "bunching" of elements
(such as the ones provided by SRFI 1). If you want to delete duplicate
elements from a large list or vector, you can sort the elements to bring
equal items together, then use one of these procedures, for a total time
of

The comparison procedure *=* passed to these procedures is always
applied
`(`

where *=* *x* *y*)*x* comes before *y* in the containing list or vector.

`List-delete-neighbor-dups`

does not alter its input list; its answer may share storage with the input list.`Vector-delete-neighbor-dups`

does not alter its input vector, but rather allocates a fresh vector to hold the result.

(list-delete-neighbor-dups = '(1 1 2 7 7 7 0 -2 -2))==>(1 2 7 0 -2) (vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2))==>#(1 2 7 0 -2) (vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2) 3 7)==>#(7 0 -2)

These packages provide more specific sorting functionality, that is, specific committment to particular algorithms that have particular pragmatic consequences (such as memory locality, asymptotic running time) beyond their semantic behaviour (sorting, stable sorting, merging, etc.). Programmers that need a particular algorithm can use one of these packages.

`sorted`

--sorted predicates`(list-sorted?`

*<*list*boolean*`(vector-sorted?`

*<*vector*boolean*`(vector-sorted?`

*<*vector start*boolean*`(vector-sorted?`

*<*vector start end*boolean*

Return `#f`

iff there is an adjacent pair *...x, y ...* in the input
list or vector such that *y < x*. The optional *start*/*end* range
arguments restrict `vector-sorted?`

to the indicated subvector.

`list-merge-sort`

--list merge sort`(list-merge-sort`

*<*list*list*`(list-merge-sort!`

*<*list*list*`(list-merge`

*list*) ->_{1}*<*list_{2}*list*`(list-merge!`

*list*) ->_{1}*<*list_{2}*list*

The `!`

procedures are destructive--they use `set-cdr!`

s to
rearrange the cells of the lists into the proper order. As such, they
do not allocate any extra cons cells--they are "in place" sorts.
The merge operations are stable: an element of *list_{1}* will
come before an equal-comparing element in

`vector-merge-sort`

--vector merge sort`(vector-merge-sort`

*<*vector [start [end [temp]]]*vector*`(vector-merge-sort!`

*<*vector [start [end [temp]]]`(vector-merge`

*<*vectorvector_{1}[start_{2}[end_{1}[start_{1}[end_{2}]]]]_{2}*vector*`(vector-merge!`

*<*vector vectorvector_{1}[start [start_{2}[end_{1}[start_{1}[end_{2}]]]]]_{2}

The optional *start*/*end* arguments provide for sorting of subranges, and
default to 0 and the length of the corresponding vector.

Merge-sorting a vector requires the allocation of a temporary
"scratch" work vector for the duration of the sort. This scratch
vector can be passed in by the client as the optional *temp*
argument; if so, the supplied vector must be of size * <= end*,
and will not be altered outside the range [start,end). If not
supplied, the sort routines allocate one themselves.

The merge operations are stable: an element of *vector_{1}* will
come before an equal-comparing element in

`Vector-merge-sort!`

leaves its result in.*vector*[*start*,*end*)`Vector-merge-sort`

returns a vector of length.*end*-*start*`Vector-merge`

returns a vector of length*(*.*end*_1-*start*_1)+(*end*_2-*start*_2)`Vector-merge!`

writes its result into*vector*, beginning at index*start*, for indices less than. The target subvector*end*=*start*+ (*end*_1-*start*_1) + (*end*_2-*start*_2)

may not overlap either source subvector*vector*[*start*,*end*)*vector*_1[*start*_1,*end*_1), or*vector*_2[*start*_2,*end*_2).

`vector-heap-sort`

--vector heap sort`Vector-heap-sort`

returns a vector of length * end-start*.

`Vector-heap-sort!`

is in-place, leaving its result in
`vector-insert-sort`

--vector insertion sort`Vector-insert-sort`

returns a vector of length.*end*-*start*`Vector-insert-sort!`

is in-place, leaving its result in.*vector*[*start*,*end*)

`delete-neighbor-duplicates`

--list and vector
delete neighbor duplicates`(list-delete-neighbor-dups`

*=*list*list*`(list-delete-neighbor-dups!`

*=*list*list*`(vector-delete-neighbor-dups`

*=*vector [start [end]]*vector*`(vector-delete-neighbor-dups!`

*=*vector [start [end]]*end**'*

These procedures are linear time--much faster than the *O(n ^{2})* general
duplicate-element deletors that do not assume any "bunching" of elements
(such as the ones provided by SRFI 1). If you want to delete duplicate
elements from a large list or vector, you can sort the elements to bring
equal items together, then use one of these procedures, for a total time
of

The comparison procedure = passed to these procedures is always applied

(=xy)

where *x* comes before *y* in the containing list or vector.

`List-delete-neighbor-dups`

does not alter its input list; its answer may share storage with the input list.`Vector-delete-neighbor-dups`

does not alter its input vector, but rather allocates a fresh vector to hold the result.`List-delete-neighbor-dups!`

is permitted, but not required, to mutate its input list in order to construct its answer.`Vector-delete-neighbor-dups!`

reuses its input vector to hold the answer, packing its answer into the index range*[*, where*start*,*end*')*end*is the non-negative exact integer returned as its value. It returns*'**end*as its result. The vector is not altered outside the range*'**[*.*start*,*end*')

(list-delete-neighbor-dups = '(1 1 2 7 7 7 0 -2 -2))==>(1 2 7 0 -2) (vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2))==>#(1 2 7 0 -2) (vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2) 3 7)==>#(7 0 -2) ;; Result left in v[3,9): (let ((v (vector 0 0 0 1 1 2 2 3 3 4 4 5 5 6 6))) (cons (vector-delete-neighbor-dups! = v 3) v))==>(9 . #(0 0 0 1 2 3 4 5 6 4 4 5 5 6 6))

`binary-searches`

--vector binary search`(vector-binary-search`

*<*elt-*>*key key vector [start [end]]*integer or*`#f``(vector-binary-search3`

*compare-proc vector [start [end]]*) ->*integer or*`#f`

`vector-binary-search`

searches *vector* in range
*[ start,end)* (which default to 0 and the length of

(vector-sorted? (lambda (x y) (<(elt-$>$keyx) (elt-$>$keyy)))vectorstartend)==>true

An element *e* of *vector* is a match for *key* if it's
neither less nor greater than the key:

(and (not (<(elt-$>$keye)key)) (not (<key(elt-$>$keye))))

If there is such an element, the procedure returns its index in the vector as an exact integer. If there is no such element in the searched range, the procedure returns false.

(vector-binary-search < car 4 '#((1 . one) (3 . three) (4 . four) (25 . twenty-five)))==>2 (vector-binary-search < car 7 '#((1 . one) (3 . three) (4 . four) (25 . twenty-five)))==>#f

`Vector-binary-search3`

is a variant that uses a three-way comparison
procedure *compare-proc*. *Compare-proc* compares its
parameter to the search key, and returns an
exact integer whose sign indicates its relationship to the search key.

( compare-procx)< 0 => x < search-key( compare-procx)= 0 => x = search-key( compare-procx)> 0 => x > search-key

(vector-binary-search3 (lambda (elt) (- (car elt) 4)) '#((1 . one) (3 . three) (4 . four) (25 . twenty-five)))==>2

Different sort and merge algorithms have different properties. Choose the algorithm that matches your needs:

**Vector insert sort**-
Stable, but only suitable for small vectors--
*O(n*.^{2}) **Vector heap sort**-
Not stable. Guaranteed fast--
*O(nlog(n))**worst*case. Poor locality on large vectors. A very reliable workhorse. **Vector merge sort**-
Stable. Not in-place--requires a temporary buffer of equal size.
Fast--
*O(nlog(n))*--and has good memory locality for large vectors.The implementation of vector merge sort provided by this implementation is, additionally, a "natural" sort, meaning that it exploits existing order in the input data, providing

*O(n)*best case. **Destructive list merge sort**-
Stable, fast and in-place (i.e., allocates no new cons cells). "Fast"
means
*O(nlog(n))*worse-case, and substantially better if the data is already mostly ordered, all the way down to linear time for a completely-ordered input list (i.e., it is a "natural" sort).Note that sorting lists involves chasing pointers through memory, which can be a loser on modern machine architectures because of poor cache and page locality. Sorting vectors has inherently better locality.

This implementation's destructive list merge and merge sort implementations are opportunistic--they avoid redundant

`set-cdr!`

s, and try to take long already-ordered runs of list structure as-is when doing the merges. **Pure list merge sort**-
Stable and fast--
*O(nlog(n))*worst-case, and possibly*O(n)*, depending upon the input list (see discussion above).

Algorithm Stable? Worst case Average case In-place Vector insert Yes O(n^{2})O(n^{2})Yes Vector quick No O(n^{2})O(nlog(n))Yes Vector heap No O(nlog(n))O(nlog(n))Yes Vector merge Yes O(nlog(n))O(nlog(n))No List merge Yes O(nlog(n))O(nlog(n))Either

Previous: Sorting lists and vectors | Next: Sorting lists and vectors