Maintain Your Iterations – The Iterators Secret – Part 1

The old style of iterations known in many languages might be hard to maintain sometimes and won’t always do the same operation with the same efficiency. Depends on your collection, the efficiency of some basic operation might be different. In order to make code style more maintainable and consistent, C++ brought us the Iterators concept. But what is their secret?

Next article: Maintain Your Iterations ā€“ Insecure Iterations ā€“ Part 2


Story Time!

Once upon a time, in a far far code kingdom, there were huge pyramids. The pyramids built by an old and high-level language professional developer, with array-based structures. Those huge nested loops looked with a similar code design:

for (size_t i = 0; i < i_vec.size(); i++) {
    i_vec[i] = i;
    // ...
    for (size_t j = i; j < i_vec.size(); j++) {
        i_vec[j] *= i;
        // ...
    }
    // ...
}

For years everything worked as expected, and as ages of developers replaced in the kingdom, the code got bigger and more complex.

However, things got really slower as the data flowed into the dynamic arrays and forced re-allocation each time and the kingdom emperor called an architect to solve this issue. The architect thought of two solutions:

  1. Use reserved memory to make the process slower only once in some required allocations.
  2. Replace the data structure to non-straight memory allocation structure.

The first one would cause a serious increase in memory usage, and it might lead to a case which non-required memory won’t be available for other architectures in the kingdom. The second, on the other hand, will cause a changes in the legacy, scary, huge nested loops pyramids.

After a little thinking, the architect came up with the best idea he could, that won’t take so long to implement and yet won’t require more non-required memory allocations. He chose the seconds option and created a new class, that inherit from a list-style structure and implemented the operator[] of it:

template <typename T>
class my_list : public std::list<T> {
public:
    T& operator[](size_t index) {
        auto current = this->begin();
        for (; current != this->end() && index--; current = std::next(current));
        return *current;
    }
};

Using the existing generic functions, he could simply replace the std::vector container with my_list without any necessary code modifications:

template <typename T, template <typename...> typename Container>
void init_container(Container<T> &cont, size_t size, T val = T()) {
    while (size--) {
        cont.push_back(val);
    }
}

template <typename T, template <typename...> typename Container>
void print_container(Container<T> &cont) {
    for (size_t i = 0; i < cont.size(); i++) {
        std::cout << cont[i] << " ";
    }
    std::cout << std::endl;
}

int main() {
    my_list<int> i_list; // std::vector<int> i_vec;
    init_container(i_list, 20, 1);
    for (size_t i = 0; i < i_list.size(); i++) {
        i_list[i] = i + 1;
        // ...
        for (size_t j = i; j < i_list.size(); j++) {
            i_list[j] *= i + 1;
            // ...
        }
        // ...
    }
    print_container(i_list);
    return EXIT_SUCCESS;
}

In theory everything was fine. The insertion of a new element to this list is O(1) operation and it doesn’t require straight memory allocation, so there won’t be reallocation for the whole collection if this condition doesn’t meet. However, in reality, the program run even slower. The access to an element in a specific element, unlike straight memory allocation structures, is O(n). Which means that the above code have O(n^2) in case of vector, but O(n^3) in case of list.

The architect needed help, and he decided to come to us with this one. How can we design architecture that no matter what kind of collection we deal with, it’ll act in the most optimized way it can?


Iterators

This C++ feature is the solution we are looking for. In order to use iterators on a specific container, all we need is that the container will implement some methods and functionality for them. Lucky us, all of the STL containers does it, so no matter which one of them we decide to use, it’ll always work with iterators (Next in this series we’ll see how to create our own collection which supports iterators).

Before we answer to the architect, let’s first examine the containers that support iterators functionality.

cont.begin()Returns an iterator to the beginning of a container.
cont.end()Returns an iterator to the end of a container.
cont.rbegin()Returns a reverse iterator to a container.
cont.rend()Returns a reverse end iterator for a container.
cppreference – iterator

An iterator should implement all possible access operators of the desired structure. There are five (six since C++20) legacy types of iterators:

Input Iterator

Guarantee validity of the data only until an iterator passed it. Once an iterator being incremented from a specific address, all iterators for the same address might be invalid (For example: std::istreambuf_iterator).

Required Operators

operator==, operator!=, operator*, operator->, operator++(after this operation, any copies of the previous value are no longer validity guarantee), and operator++(int).

Output Iterator

Guarantee validity of the data only until an iterator passed it or write to it. Once an iterator being incremented from a specific address or assigned a new value, all iterators for the same address might be invalid (for example: std::ostream_iterator).

Forward iterator, Bidirectional iterator, and Random Access iterator satisfy Output iterator requirements when they described as mutable.

Required Operators

operator*().operator=() (The only valid use of operator* with an output iterator is on the left of an assignment: operator* may return a proxy object, which defines a member operator= [which may be a template]), operator++(), and operator++(int).

Forward Iterator

For collections that can’t go backwards from every point (e.g. single directional linked list).

Requirements

  • Must satisfy legacy input iterator.
  • Default constructible.
  • Must be Multipass Guarantee – Changes to another copy of an iterator like incrementing/assignment (if mutable), can’t change the value read from other copies.
    * Unlike Input/Output iterators.
  • Operators: operator++(int) – returns iterator copy, and operator*() returns a reference to the iterator value.

Note

Incrementing end() iterator will lead to an UB.

Bidirectional Iterator

For collections that can go both forward and backward from every point.

Requirements

  • Must satisfy legacy forward iterator.
  • Operators: operator--(), and operator--(int).

Note

Decrementing begin() iterator will lead to an UB.

Random Access Iterator

For collections that can return a collection value in a specific target point in a constant time (e.g. an array).

Requirements

  • Must satisfy legacy bidirectional iterator.
  • Given:
    • value_type, the type denoted by std::iterator_traits::value_type.
    • difference_type, the type denoted by std::iterator_traits::difference_type.
    • reference, the type denoted by std::iterator_traits::reference.
  • Operators:
    • iterator.operator+=(difference_type)
    • iterator.operator+(difference_type)
    • difference_type.operator+(iterator)
    • iterator.operator-=(difference_type)
    • iterator.operator-(difference_type)
    • iterator.operator-(iterator)
    • iterator.operator[](difference_type)
    • iterator.operator<(iterator)
    • iterator.operator>(iterator)
    • iterator.operator>=(iterator)
    • iterator.operator<=(iterator)

Contiguous Iterator (Since C++20)

For collections that are contiguous in the memory.

Requirements

  • Must satisfy Random Access iterator.
  • The following condition must be true: *(iterator + difference_type) == *(std::addressof(*iterator) + difference_type).

STD Iterators Algorithms

Based on these operations, the standard created functions which help us build a maintainable code. A lot of algorithms you can find on <algorithm> header (see cppreference – algorithm), one of them is std::find:

#include <iostream>
#include <algorithm>
// template<class InputIt, class T> constexpr InputIt std::find(InputIt first, InputIt last, const T& value);

int main() {
    std::vector<int> i_vec = {1, 2, 3, 4, 6};
    if (std::find(i_vec.begin(), i_vec.end(), 5) != i_vec.end()) {
        std::cout << "Found!\n";
    } else {
        std::cout << "Not Found!\n";
    }
    return EXIT_SUCCESS;
}

STD Optimized Iterators Actions

The std library also gives us functions which based on the iterator tag, chooses the most optimized operators for a desired action. But we know that different iterators implement different operators. Some of them implementing the operator+= while some of them not. How can the function know which operators exist, and which one is the most optimized way?

Secret Revealed: Tag Dispatching

For every iterator type there is a unique tag the std uses to match operation with the best iterator implementation.

struct input_iterator_tag {};
struct output_iterator_tag{};
struct forward_iterator_tag : public input_iterator_tag {};
struct bidirectional_iterator_tag : public forward_iterator_tag {};
struct random_access_iterator_tag : public bidirectional_iterator_tag {};
struct contiguous_iterator_tag: public random_access_iterator_tag {};
 // Since C++20

For example std::advanced:

template <typename _InputIterator, typename _Distance>
inline constexpr void __advance(_InputIterator& __i, _Distance __n, input_iterator_tag) { 
      __glibcxx_assert(__n >= 0);
      while (__n--)
	++__i;
}

template <typename _BidirectionalIterator, typename _Distance>
inline constexpr void __advance(_BidirectionalIterator& __i, _Distance __n, bidirectional_iterator_tag) {
    if (__n > 0)
        while (__n--)
	    ++__i;
    else
        while (__n++)
            --__i;
}

template<typename _RandomAccessIterator, typename _Distance>
inline constexpr void __advance(_RandomAccessIterator& __i, _Distance __n, random_access_iterator_tag) {
    if (__builtin_constant_p(__n) && __n == 1)
	++__i;
    else if (__builtin_constant_p(__n) && __n == -1)
	--__i;
    else
	__i += __n;
}

/**
 *  @brief A generalization of pointer arithmetic.
 *  @param  __i  An input iterator.
 *  @param  __n  The @a delta by which to change @p __i.
 *  @return  Nothing.
 *
 *  This increments @p i by @p n.  For bidirectional and random access
 *  iterators, @p __n may be negative, in which case @p __i is decremented.
 *
 *  For random access iterators, this uses their @c + and @c - operations
 *  and are constant time.  For other %iterator classes they are linear time.
 */
template<typename _InputIterator, typename _Distance>
inline constexpr void advance(_InputIterator& __i, _Distance __n) {
    typename iterator_traits<_InputIterator>::difference_type __d = __n;
    std::__advance(__i, __d, std::__iterator_category(__i));
}

Our usage is like so:

std::vector<int> i_vec;
std::list<int> i_list;
// ...
auto vec_it = i_vec.begin();
auto list_it = i_list.begin();
// ...
std::advance(vec_it, 5); // call __advance with random_access_iterator_tag
std::advance(list_it, 5); // call __advance with bidirectional_iterator_tag

Back To The Architect

In order to fix the architecture in a way that will fit any collection type, he understands now that he should use the iterators. So after some changes in the legacy code, he finally got it working again, and in a better way:

template <typename T, template <typename...> typename Container>
void init_container(Container<T> &cont, size_t size, T val = T()) {
    while (size--) {
        cont.emplace_back(val);
    }
}

template <typename Iterator>
void print_container(const Iterator &first, const Iterator &last) {
    for (Iterator current = first; current != last; current++) {
        std::cout << *current << " ";
    }
    std::cout << std::endl;
}

int main() {
    std::list<int> i_list;
    init_container(i_list, 20, 1);
    size_t counter = 1;
    for (auto current = i_list.begin(); current != i_list.end(); current++) {
        *current = counter;
        // ...
        for (auto inner_current = current; inner_current != i_list.end(); inner_current++) {
            *inner_current *= counter;
            // ...
        }
        // ...
        counter++;
    }
    print_container(i_list.begin(), i_list.end());
    return EXIT_SUCCESS;
}

Conclusion

Iterators are here to help us writing a more generic and optimized code when handling collections. It is better to get use to them rather skip them as long as you are fine with your operator[], because you can never know when you’ll have to move to another collection, and it’ll help you to use std algorithms.

On the next articles in this series, we’ll see how to avoid critical mistakes with iterators, and how to create our own iterators collections.

GitHub Repository with the examples in this article.

Posts in this series:

> Maintain Your Iterations ā€“ The Iterators Secret ā€“ Part 1
Maintain Your Iterations ā€“ Insecure Iterations ā€“ Part 2
Maintain Your Iterations – Iterators Customization – Part 3

Related Topics

Templates Infinity Theory ā€“ From C++11 to C++20 ā€“ Part 1

Substitution Failure is Not an Error ā€“ SFINAE

4 thoughts on “Maintain Your Iterations – The Iterators Secret – Part 1

Leave a comment