src/tools/miri/src/concurrency/weak_memory.rs - toolchain/rustc - Git at Google

 //! Implementation of C++11-consistent weak memory emulation using store buffers
 //! based on Dynamic Race Detection for C++ ("the paper"):
 //! <https://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2017/POPL.pdf>
 //!
 //! This implementation will never generate weak memory behaviours forbidden by the C++11 model,
 //! but it is incapable of producing all possible weak behaviours allowed by the model. There are
 //! certain weak behaviours observable on real hardware but not while using this.
 //!
 //! Note that this implementation does not fully take into account of C++20's memory model revision to SC accesses
 //! and fences introduced by P0668 (<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html>).
 //! This implementation is not fully correct under the revised C++20 model and may generate behaviours C++20
 //! disallows (<https://github.com/rust-lang/miri/issues/2301>).
 //!
 //! A modification is made to the paper's model to partially address C++20 changes.
 //! Specifically, if an SC load reads from an atomic store of any ordering, then a later SC load cannot read from
 //! an earlier store in the location's modification order. This is to prevent creating a backwards S edge from the second
 //! load to the first, as a result of C++20's coherence-ordered before rules.
 //!
 //! Rust follows the C++20 memory model (except for the Consume ordering and some operations not performable through C++'s
 //! `std::atomic<T>` API). It is therefore possible for this implementation to generate behaviours never observable when the
 //! same program is compiled and run natively. Unfortunately, no literature exists at the time of writing which proposes
 //! an implementable and C++20-compatible relaxed memory model that supports all atomic operation existing in Rust. The closest one is
 //! A Promising Semantics for Relaxed-Memory Concurrency by Jeehoon Kang et al. (<https://www.cs.tau.ac.il/~orilahav/papers/popl17.pdf>)
 //! However, this model lacks SC accesses and is therefore unusable by Miri (SC accesses are everywhere in library code).
 //!
 //! If you find anything that proposes a relaxed memory model that is C++20-consistent, supports all orderings Rust's atomic accesses
 //! and fences accept, and is implementable (with operational semantics), please open a GitHub issue!
 //!
 //! One characteristic of this implementation, in contrast to some other notable operational models such as ones proposed in
 //! Taming Release-Acquire Consistency by Ori Lahav et al. (<https://plv.mpi-sws.org/sra/paper.pdf>) or Promising Semantics noted above,
 //! is that this implementation does not require each thread to hold an isolated view of the entire memory. Here, store buffers are per-location
 //! and shared across all threads. This is more memory efficient but does require store elements (representing writes to a location) to record
 //! information about reads, whereas in the other two models it is the other way round: reads points to the write it got its value from.
 //! Additionally, writes in our implementation do not have globally unique timestamps attached. In the other two models this timestamp is
 //! used to make sure a value in a thread's view is not overwritten by a write that occurred earlier than the one in the existing view.
 //! In our implementation, this is detected using read information attached to store elements, as there is no data structure representing reads.
 //!
 //! The C++ memory model is built around the notion of an 'atomic object', so it would be natural
 //! to attach store buffers to atomic objects. However, Rust follows LLVM in that it only has
 //! 'atomic accesses'. Therefore Miri cannot know when and where atomic 'objects' are being
 //! created or destroyed, to manage its store buffers. Instead, we hence lazily create an
 //! atomic object on the first atomic access to a given region, and we destroy that object
 //! on the next non-atomic or imperfectly overlapping atomic access to that region.
 //! These lazy (de)allocations happen in memory_accessed() on non-atomic accesses, and
 //! get_or_create_store_buffer() on atomic accesses. This mostly works well, but it does
 //! lead to some issues (<https://github.com/rust-lang/miri/issues/2164>).
 //!
 //! One consequence of this difference is that safe/sound Rust allows for more operations on atomic locations
 //! than the C++20 atomic API was intended to allow, such as non-atomically accessing
 //! a previously atomically accessed location, or accessing previously atomically accessed locations with a differently sized operation
 //! (such as accessing the top 16 bits of an AtomicU32). These scenarios are generally undiscussed in formalisations of C++ memory model.
 //! In Rust, these operations can only be done through a `&mut AtomicFoo` reference or one derived from it, therefore these operations
 //! can only happen after all previous accesses on the same locations. This implementation is adapted to allow these operations.
 //! A mixed atomicity read that races with writes, or a write that races with reads or writes will still cause UBs to be thrown.
 //! Mixed size atomic accesses must not race with any other atomic access, whether read or write, or a UB will be thrown.
 //! You can refer to test cases in weak_memory/extra_cpp.rs and weak_memory/extra_cpp_unsafe.rs for examples of these operations.

 // Our and the author's own implementation (tsan11) of the paper have some deviations from the provided operational semantics in §5.3:
 // 1. In the operational semantics, store elements keep a copy of the atomic object's vector clock (AtomicCellClocks::sync_vector in miri),
 // but this is not used anywhere so it's omitted here.
 //
 // 2. In the operational semantics, each store element keeps the timestamp of a thread when it loads from the store.
 // If the same thread loads from the same store element multiple times, then the timestamps at all loads are saved in a list of load elements.
 // This is not necessary as later loads by the same thread will always have greater timestamp values, so we only need to record the timestamp of the first
 // load by each thread. This optimisation is done in tsan11
 // (https://github.com/ChrisLidbury/tsan11/blob/ecbd6b81e9b9454e01cba78eb9d88684168132c7/lib/tsan/rtl/tsan_relaxed.h#L35-L37)
 // and here.
 //
 // 3. §4.5 of the paper wants an SC store to mark all existing stores in the buffer that happens before it
 // as SC. This is not done in the operational semantics but implemented correctly in tsan11
 // (https://github.com/ChrisLidbury/tsan11/blob/ecbd6b81e9b9454e01cba78eb9d88684168132c7/lib/tsan/rtl/tsan_relaxed.cc#L160-L167)
 // and here.
 //
 // 4. W_SC ; R_SC case requires the SC load to ignore all but last store maked SC (stores not marked SC are not
 // affected). But this rule is applied to all loads in ReadsFromSet from the paper (last two lines of code), not just SC load.
 // This is implemented correctly in tsan11
 // (https://github.com/ChrisLidbury/tsan11/blob/ecbd6b81e9b9454e01cba78eb9d88684168132c7/lib/tsan/rtl/tsan_relaxed.cc#L295)
 // and here.

 use std::{
     cell::{Ref, RefCell},
     collections::VecDeque,
 };

 use rustc_const_eval::interpret::{alloc_range, AllocRange, InterpResult, MPlaceTy, Scalar};
 use rustc_data_structures::fx::FxHashMap;

 use crate::*;

 use super::{
     data_race::{GlobalState as DataRaceState, ThreadClockSet},
     range_object_map::{AccessType, RangeObjectMap},
     vector_clock::{VClock, VTimestamp, VectorIdx},
 };

 pub type AllocState = StoreBufferAlloc;

 // Each store buffer must be bounded otherwise it will grow indefinitely.
 // However, bounding the store buffer means restricting the amount of weak
 // behaviours observable. The author picked 128 as a good tradeoff
 // so we follow them here.
 const STORE_BUFFER_LIMIT: usize = 128;

 #[derive(Debug, Clone)]
 pub struct StoreBufferAlloc {
     /// Store buffer of each atomic object in this allocation
     // Behind a RefCell because we need to allocate/remove on read access
     store_buffers: RefCell<RangeObjectMap<StoreBuffer>>,
 }

 impl VisitTags for StoreBufferAlloc {
     fn visit_tags(&self, visit: &mut dyn FnMut(BorTag)) {
         let Self { store_buffers } = self;
         for val in store_buffers
             .borrow()
             .iter()
             .flat_map(|buf| buf.buffer.iter().map(|element| &element.val))
         {
             val.visit_tags(visit);
         }
     }
 }

 #[derive(Debug, Clone, PartialEq, Eq)]
 pub(super) struct StoreBuffer {
     // Stores to this location in modification order
     buffer: VecDeque<StoreElement>,
 }

 /// Whether a load returned the latest value or not.
 #[derive(PartialEq, Eq)]
 enum LoadRecency {
     Latest,
     Outdated,
 }

 #[derive(Debug, Clone, PartialEq, Eq)]
 struct StoreElement {
     /// The identifier of the vector index, corresponding to a thread
     /// that performed the store.
     store_index: VectorIdx,

     /// Whether this store is SC.
     is_seqcst: bool,

     /// The timestamp of the storing thread when it performed the store
     timestamp: VTimestamp,
     /// The value of this store
     // FIXME: this means the store must be fully initialized;
     // we will have to change this if we want to support atomics on
     // (partially) uninitialized data.
     val: Scalar<Provenance>,

     /// Metadata about loads from this store element,
     /// behind a RefCell to keep load op take &self
     load_info: RefCell<LoadInfo>,
 }

 #[derive(Debug, Clone, PartialEq, Eq, Default)]
 struct LoadInfo {
     /// Timestamp of first loads from this store element by each thread
     timestamps: FxHashMap<VectorIdx, VTimestamp>,
     /// Whether this store element has been read by an SC load
     sc_loaded: bool,
 }

 impl StoreBufferAlloc {
     pub fn new_allocation() -> Self {
         Self { store_buffers: RefCell::new(RangeObjectMap::new()) }
     }

     /// When a non-atomic access happens on a location that has been atomically accessed
     /// before without data race, we can determine that the non-atomic access fully happens
     /// after all the prior atomic accesses so the location no longer needs to exhibit
     /// any weak memory behaviours until further atomic accesses.
     pub fn memory_accessed(&self, range: AllocRange, global: &DataRaceState) {
         if !global.ongoing_action_data_race_free() {
             let mut buffers = self.store_buffers.borrow_mut();
             let access_type = buffers.access_type(range);
             match access_type {
                 AccessType::PerfectlyOverlapping(pos) => {
                     buffers.remove_from_pos(pos);
                 }
                 AccessType::ImperfectlyOverlapping(pos_range) => {
                     // We rely on the data-race check making sure this is synchronized.
                     // Therefore we can forget about the old data here.
                     buffers.remove_pos_range(pos_range);
                 }
                 AccessType::Empty(_) => {
                     // The range had no weak behaviours attached, do nothing
                 }
             }
         }
     }

     /// Gets a store buffer associated with an atomic object in this allocation,
     /// or creates one with the specified initial value if no atomic object exists yet.
     fn get_or_create_store_buffer<'tcx>(
         &self,
         range: AllocRange,
         init: Scalar<Provenance>,
     ) -> InterpResult<'tcx, Ref<'_, StoreBuffer>> {
         let access_type = self.store_buffers.borrow().access_type(range);
         let pos = match access_type {
             AccessType::PerfectlyOverlapping(pos) => pos,
             AccessType::Empty(pos) => {
                 let mut buffers = self.store_buffers.borrow_mut();
                 buffers.insert_at_pos(pos, range, StoreBuffer::new(init));
                 pos
             }
             AccessType::ImperfectlyOverlapping(pos_range) => {
                 // Once we reach here we would've already checked that this access is not racy.
                 let mut buffers = self.store_buffers.borrow_mut();
                 buffers.remove_pos_range(pos_range.clone());
                 buffers.insert_at_pos(pos_range.start, range, StoreBuffer::new(init));
                 pos_range.start
             }
         };
         Ok(Ref::map(self.store_buffers.borrow(), |buffer| &buffer[pos]))
     }

     /// Gets a mutable store buffer associated with an atomic object in this allocation
     fn get_or_create_store_buffer_mut<'tcx>(
         &mut self,
         range: AllocRange,
         init: Scalar<Provenance>,
     ) -> InterpResult<'tcx, &mut StoreBuffer> {
         let buffers = self.store_buffers.get_mut();
         let access_type = buffers.access_type(range);
         let pos = match access_type {
             AccessType::PerfectlyOverlapping(pos) => pos,
             AccessType::Empty(pos) => {
                 buffers.insert_at_pos(pos, range, StoreBuffer::new(init));
                 pos
             }
             AccessType::ImperfectlyOverlapping(pos_range) => {
                 // Once we reach here we would've already checked that this access is not racy.
                 buffers.remove_pos_range(pos_range.clone());
                 buffers.insert_at_pos(pos_range.start, range, StoreBuffer::new(init));
                 pos_range.start
             }
         };
         Ok(&mut buffers[pos])
     }
 }

 impl<'mir, 'tcx: 'mir> StoreBuffer {
     fn new(init: Scalar<Provenance>) -> Self {
         let mut buffer = VecDeque::new();
         buffer.reserve(STORE_BUFFER_LIMIT);
         let mut ret = Self { buffer };
         let store_elem = StoreElement {
             // The thread index and timestamp of the initialisation write
             // are never meaningfully used, so it's fine to leave them as 0
             store_index: VectorIdx::from(0),
             timestamp: VTimestamp::ZERO,
             val: init,
             is_seqcst: false,
             load_info: RefCell::new(LoadInfo::default()),
         };
         ret.buffer.push_back(store_elem);
         ret
     }

     /// Reads from the last store in modification order
     fn read_from_last_store(
         &self,
         global: &DataRaceState,
         thread_mgr: &ThreadManager<'_, '_>,
         is_seqcst: bool,
     ) {
         let store_elem = self.buffer.back();
         if let Some(store_elem) = store_elem {
             let (index, clocks) = global.current_thread_state(thread_mgr);
             store_elem.load_impl(index, &clocks, is_seqcst);
         }
     }

     fn buffered_read(
         &self,
         global: &DataRaceState,
         thread_mgr: &ThreadManager<'_, '_>,
         is_seqcst: bool,
         rng: &mut (impl rand::Rng + ?Sized),
         validate: impl FnOnce() -> InterpResult<'tcx>,
     ) -> InterpResult<'tcx, (Scalar<Provenance>, LoadRecency)> {
         // Having a live borrow to store_buffer while calling validate_atomic_load is fine
         // because the race detector doesn't touch store_buffer

         let (store_elem, recency) = {
             // The `clocks` we got here must be dropped before calling validate_atomic_load
             // as the race detector will update it
             let (.., clocks) = global.current_thread_state(thread_mgr);
             // Load from a valid entry in the store buffer
             self.fetch_store(is_seqcst, &clocks, &mut *rng)
         };

         // Unlike in buffered_atomic_write, thread clock updates have to be done
         // after we've picked a store element from the store buffer, as presented
         // in ATOMIC LOAD rule of the paper. This is because fetch_store
         // requires access to ThreadClockSet.clock, which is updated by the race detector
         validate()?;

         let (index, clocks) = global.current_thread_state(thread_mgr);
         let loaded = store_elem.load_impl(index, &clocks, is_seqcst);
         Ok((loaded, recency))
     }

     fn buffered_write(
         &mut self,
         val: Scalar<Provenance>,
         global: &DataRaceState,
         thread_mgr: &ThreadManager<'_, '_>,
         is_seqcst: bool,
     ) -> InterpResult<'tcx> {
         let (index, clocks) = global.current_thread_state(thread_mgr);

         self.store_impl(val, index, &clocks.clock, is_seqcst);
         Ok(())
     }

     #[allow(clippy::if_same_then_else, clippy::needless_bool)]
     /// Selects a valid store element in the buffer.
     fn fetch_store<R: rand::Rng + ?Sized>(
         &self,
         is_seqcst: bool,
         clocks: &ThreadClockSet,
         rng: &mut R,
     ) -> (&StoreElement, LoadRecency) {
         use rand::seq::IteratorRandom;
         let mut found_sc = false;
         // FIXME: we want an inclusive take_while (stops after a false predicate, but
         // includes the element that gave the false), but such function doesn't yet
         // exist in the standard library https://github.com/rust-lang/rust/issues/62208
         // so we have to hack around it with keep_searching
         let mut keep_searching = true;
         let candidates = self
             .buffer
             .iter()
             .rev()
             .take_while(move |&store_elem| {
                 if !keep_searching {
                     return false;
                 }

                 keep_searching = if store_elem.timestamp <= clocks.clock[store_elem.store_index] {
                     // CoWR: if a store happens-before the current load,
                     // then we can't read-from anything earlier in modification order.
                     // C++20 §6.9.2.2 [intro.races] paragraph 18
                     false
                 } else if store_elem.load_info.borrow().timestamps.iter().any(
                     |(&load_index, &load_timestamp)| load_timestamp <= clocks.clock[load_index],
                 ) {
                     // CoRR: if there was a load from this store which happened-before the current load,
                     // then we cannot read-from anything earlier in modification order.
                     // C++20 §6.9.2.2 [intro.races] paragraph 16
                     false
                 } else if store_elem.timestamp <= clocks.fence_seqcst[store_elem.store_index] {
                     // The current load, which may be sequenced-after an SC fence, cannot read-before
                     // the last store sequenced-before an SC fence in another thread.
                     // C++17 §32.4 [atomics.order] paragraph 6
                     false
                 } else if store_elem.timestamp <= clocks.write_seqcst[store_elem.store_index]
                     && store_elem.is_seqcst
                 {
                     // The current non-SC load, which may be sequenced-after an SC fence,
                     // cannot read-before the last SC store executed before the fence.
                     // C++17 §32.4 [atomics.order] paragraph 4
                     false
                 } else if is_seqcst
                     && store_elem.timestamp <= clocks.read_seqcst[store_elem.store_index]
                 {
                     // The current SC load cannot read-before the last store sequenced-before
                     // the last SC fence.
                     // C++17 §32.4 [atomics.order] paragraph 5
                     false
                 } else if is_seqcst && store_elem.load_info.borrow().sc_loaded {
                     // The current SC load cannot read-before a store that an earlier SC load has observed.
                     // See https://github.com/rust-lang/miri/issues/2301#issuecomment-1222720427
                     // Consequences of C++20 §31.4 [atomics.order] paragraph 3.1, 3.3 (coherence-ordered before)
                     // and 4.1 (coherence-ordered before between SC makes global total order S)
                     false
                 } else {
                     true
                 };

                 true
             })
             .filter(|&store_elem| {
                 if is_seqcst && store_elem.is_seqcst {
                     // An SC load needs to ignore all but last store maked SC (stores not marked SC are not
                     // affected)
                     let include = !found_sc;
                     found_sc = true;
                     include
                 } else {
                     true
                 }
             });

         let chosen = candidates.choose(rng).expect("store buffer cannot be empty");
         if std::ptr::eq(chosen, self.buffer.back().expect("store buffer cannot be empty")) {
             (chosen, LoadRecency::Latest)
         } else {
             (chosen, LoadRecency::Outdated)
         }
     }

     /// ATOMIC STORE IMPL in the paper (except we don't need the location's vector clock)
     fn store_impl(
         &mut self,
         val: Scalar<Provenance>,
         index: VectorIdx,
         thread_clock: &VClock,
         is_seqcst: bool,
     ) {
         let store_elem = StoreElement {
             store_index: index,
             timestamp: thread_clock[index],
             // In the language provided in the paper, an atomic store takes the value from a
             // non-atomic memory location.
             // But we already have the immediate value here so we don't need to do the memory
             // access
             val,
             is_seqcst,
             load_info: RefCell::new(LoadInfo::default()),
         };
         self.buffer.push_back(store_elem);
         if self.buffer.len() > STORE_BUFFER_LIMIT {
             self.buffer.pop_front();
         }
         if is_seqcst {
             // Every store that happens before this needs to be marked as SC
             // so that in a later SC load, only the last SC store (i.e. this one) or stores that
             // aren't ordered by hb with the last SC is picked.
             self.buffer.iter_mut().rev().for_each(|elem| {
                 if elem.timestamp <= thread_clock[elem.store_index] {
                     elem.is_seqcst = true;
                 }
             })
         }
     }
 }

 impl StoreElement {
     /// ATOMIC LOAD IMPL in the paper
     /// Unlike the operational semantics in the paper, we don't need to keep track
     /// of the thread timestamp for every single load. Keeping track of the first (smallest)
     /// timestamp of each thread that has loaded from a store is sufficient: if the earliest
     /// load of another thread happens before the current one, then we must stop searching the store
     /// buffer regardless of subsequent loads by the same thread; if the earliest load of another
     /// thread doesn't happen before the current one, then no subsequent load by the other thread
     /// can happen before the current one.
     fn load_impl(
         &self,
         index: VectorIdx,
         clocks: &ThreadClockSet,
         is_seqcst: bool,
     ) -> Scalar<Provenance> {
         let mut load_info = self.load_info.borrow_mut();
         load_info.sc_loaded |= is_seqcst;
         let _ = load_info.timestamps.try_insert(index, clocks.clock[index]);
         self.val
     }
 }

 impl<'mir, 'tcx: 'mir> EvalContextExt<'mir, 'tcx> for crate::MiriInterpCx<'mir, 'tcx> {}
 pub(super) trait EvalContextExt<'mir, 'tcx: 'mir>:
     crate::MiriInterpCxExt<'mir, 'tcx>
 {
     fn buffered_atomic_rmw(
         &mut self,
         new_val: Scalar<Provenance>,
         place: &MPlaceTy<'tcx, Provenance>,
         atomic: AtomicRwOrd,
         init: Scalar<Provenance>,
     ) -> InterpResult<'tcx> {
         let this = self.eval_context_mut();
         let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr())?;
         if let (
             crate::AllocExtra { weak_memory: Some(alloc_buffers), .. },
             crate::MiriMachine { data_race: Some(global), threads, .. },
         ) = this.get_alloc_extra_mut(alloc_id)?
         {
             if atomic == AtomicRwOrd::SeqCst {
                 global.sc_read(threads);
                 global.sc_write(threads);
             }
             let range = alloc_range(base_offset, place.layout.size);
             let buffer = alloc_buffers.get_or_create_store_buffer_mut(range, init)?;
             buffer.read_from_last_store(global, threads, atomic == AtomicRwOrd::SeqCst);
             buffer.buffered_write(new_val, global, threads, atomic == AtomicRwOrd::SeqCst)?;
         }
         Ok(())
     }

     fn buffered_atomic_read(
         &self,
         place: &MPlaceTy<'tcx, Provenance>,
         atomic: AtomicReadOrd,
         latest_in_mo: Scalar<Provenance>,
         validate: impl FnOnce() -> InterpResult<'tcx>,
     ) -> InterpResult<'tcx, Scalar<Provenance>> {
         let this = self.eval_context_ref();
         if let Some(global) = &this.machine.data_race {
             let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr())?;
             if let Some(alloc_buffers) = this.get_alloc_extra(alloc_id)?.weak_memory.as_ref() {
                 if atomic == AtomicReadOrd::SeqCst {
                     global.sc_read(&this.machine.threads);
                 }
                 let mut rng = this.machine.rng.borrow_mut();
                 let buffer = alloc_buffers.get_or_create_store_buffer(
                     alloc_range(base_offset, place.layout.size),
                     latest_in_mo,
                 )?;
                 let (loaded, recency) = buffer.buffered_read(
                     global,
                     &this.machine.threads,
                     atomic == AtomicReadOrd::SeqCst,
                     &mut *rng,
                     validate,
                 )?;
                 if global.track_outdated_loads && recency == LoadRecency::Outdated {
                     this.emit_diagnostic(NonHaltingDiagnostic::WeakMemoryOutdatedLoad);
                 }

                 return Ok(loaded);
             }
         }

         // Race detector or weak memory disabled, simply read the latest value
         validate()?;
         Ok(latest_in_mo)
     }

     fn buffered_atomic_write(
         &mut self,
         val: Scalar<Provenance>,
         dest: &MPlaceTy<'tcx, Provenance>,
         atomic: AtomicWriteOrd,
         init: Scalar<Provenance>,
     ) -> InterpResult<'tcx> {
         let this = self.eval_context_mut();
         let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(dest.ptr())?;
         if let (
             crate::AllocExtra { weak_memory: Some(alloc_buffers), .. },
             crate::MiriMachine { data_race: Some(global), threads, .. },
         ) = this.get_alloc_extra_mut(alloc_id)?
         {
             if atomic == AtomicWriteOrd::SeqCst {
                 global.sc_write(threads);
             }

             // UGLY HACK: in write_scalar_atomic() we don't know the value before our write,
             // so init == val always. If the buffer is fresh then we would've duplicated an entry,
             // so we need to remove it.
             // See https://github.com/rust-lang/miri/issues/2164
             let was_empty = matches!(
                 alloc_buffers
                     .store_buffers
                     .borrow()
                     .access_type(alloc_range(base_offset, dest.layout.size)),
                 AccessType::Empty(_)
             );
             let buffer = alloc_buffers
                 .get_or_create_store_buffer_mut(alloc_range(base_offset, dest.layout.size), init)?;
             if was_empty {
                 buffer.buffer.pop_front();
             }

             buffer.buffered_write(val, global, threads, atomic == AtomicWriteOrd::SeqCst)?;
         }

         // Caller should've written to dest with the vanilla scalar write, we do nothing here
         Ok(())
     }

     /// Caller should never need to consult the store buffer for the latest value.
     /// This function is used exclusively for failed atomic_compare_exchange_scalar
     /// to perform load_impl on the latest store element
     fn perform_read_on_buffered_latest(
         &self,
         place: &MPlaceTy<'tcx, Provenance>,
         atomic: AtomicReadOrd,
         init: Scalar<Provenance>,
     ) -> InterpResult<'tcx> {
         let this = self.eval_context_ref();

         if let Some(global) = &this.machine.data_race {
             if atomic == AtomicReadOrd::SeqCst {
                 global.sc_read(&this.machine.threads);
             }
             let size = place.layout.size;
             let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr())?;
             if let Some(alloc_buffers) = this.get_alloc_extra(alloc_id)?.weak_memory.as_ref() {
                 let buffer = alloc_buffers
                     .get_or_create_store_buffer(alloc_range(base_offset, size), init)?;
                 buffer.read_from_last_store(
                     global,
                     &this.machine.threads,
                     atomic == AtomicReadOrd::SeqCst,
                 );
             }
         }
         Ok(())
     }
 }