| //! Implementation of C++11-consistent weak memory emulation using store buffers |
| //! based on Dynamic Race Detection for C++ ("the paper"): |
| //! <https://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2017/POPL.pdf> |
| //! |
| //! This implementation will never generate weak memory behaviours forbidden by the C++11 model, |
| //! but it is incapable of producing all possible weak behaviours allowed by the model. There are |
| //! certain weak behaviours observable on real hardware but not while using this. |
| //! |
| //! Note that this implementation does not fully take into account of C++20's memory model revision to SC accesses |
| //! and fences introduced by P0668 (<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html>). |
| //! This implementation is not fully correct under the revised C++20 model and may generate behaviours C++20 |
| //! disallows (<https://github.com/rust-lang/miri/issues/2301>). |
| //! |
| //! A modification is made to the paper's model to partially address C++20 changes. |
| //! Specifically, if an SC load reads from an atomic store of any ordering, then a later SC load cannot read from |
| //! an earlier store in the location's modification order. This is to prevent creating a backwards S edge from the second |
| //! load to the first, as a result of C++20's coherence-ordered before rules. |
| //! |
| //! Rust follows the C++20 memory model (except for the Consume ordering and some operations not performable through C++'s |
| //! `std::atomic<T>` API). It is therefore possible for this implementation to generate behaviours never observable when the |
| //! same program is compiled and run natively. Unfortunately, no literature exists at the time of writing which proposes |
| //! an implementable and C++20-compatible relaxed memory model that supports all atomic operation existing in Rust. The closest one is |
| //! A Promising Semantics for Relaxed-Memory Concurrency by Jeehoon Kang et al. (<https://www.cs.tau.ac.il/~orilahav/papers/popl17.pdf>) |
| //! However, this model lacks SC accesses and is therefore unusable by Miri (SC accesses are everywhere in library code). |
| //! |
| //! If you find anything that proposes a relaxed memory model that is C++20-consistent, supports all orderings Rust's atomic accesses |
| //! and fences accept, and is implementable (with operational semantics), please open a GitHub issue! |
| //! |
| //! One characteristic of this implementation, in contrast to some other notable operational models such as ones proposed in |
| //! Taming Release-Acquire Consistency by Ori Lahav et al. (<https://plv.mpi-sws.org/sra/paper.pdf>) or Promising Semantics noted above, |
| //! is that this implementation does not require each thread to hold an isolated view of the entire memory. Here, store buffers are per-location |
| //! and shared across all threads. This is more memory efficient but does require store elements (representing writes to a location) to record |
| //! information about reads, whereas in the other two models it is the other way round: reads points to the write it got its value from. |
| //! Additionally, writes in our implementation do not have globally unique timestamps attached. In the other two models this timestamp is |
| //! used to make sure a value in a thread's view is not overwritten by a write that occurred earlier than the one in the existing view. |
| //! In our implementation, this is detected using read information attached to store elements, as there is no data structure representing reads. |
| //! |
| //! The C++ memory model is built around the notion of an 'atomic object', so it would be natural |
| //! to attach store buffers to atomic objects. However, Rust follows LLVM in that it only has |
| //! 'atomic accesses'. Therefore Miri cannot know when and where atomic 'objects' are being |
| //! created or destroyed, to manage its store buffers. Instead, we hence lazily create an |
| //! atomic object on the first atomic access to a given region, and we destroy that object |
| //! on the next non-atomic or imperfectly overlapping atomic access to that region. |
| //! These lazy (de)allocations happen in memory_accessed() on non-atomic accesses, and |
| //! get_or_create_store_buffer() on atomic accesses. This mostly works well, but it does |
| //! lead to some issues (<https://github.com/rust-lang/miri/issues/2164>). |
| //! |
| //! One consequence of this difference is that safe/sound Rust allows for more operations on atomic locations |
| //! than the C++20 atomic API was intended to allow, such as non-atomically accessing |
| //! a previously atomically accessed location, or accessing previously atomically accessed locations with a differently sized operation |
| //! (such as accessing the top 16 bits of an AtomicU32). These scenarios are generally undiscussed in formalisations of C++ memory model. |
| //! In Rust, these operations can only be done through a `&mut AtomicFoo` reference or one derived from it, therefore these operations |
| //! can only happen after all previous accesses on the same locations. This implementation is adapted to allow these operations. |
| //! A mixed atomicity read that races with writes, or a write that races with reads or writes will still cause UBs to be thrown. |
| //! Mixed size atomic accesses must not race with any other atomic access, whether read or write, or a UB will be thrown. |
| //! You can refer to test cases in weak_memory/extra_cpp.rs and weak_memory/extra_cpp_unsafe.rs for examples of these operations. |
| |
| // Our and the author's own implementation (tsan11) of the paper have some deviations from the provided operational semantics in §5.3: |
| // 1. In the operational semantics, store elements keep a copy of the atomic object's vector clock (AtomicCellClocks::sync_vector in miri), |
| // but this is not used anywhere so it's omitted here. |
| // |
| // 2. In the operational semantics, each store element keeps the timestamp of a thread when it loads from the store. |
| // If the same thread loads from the same store element multiple times, then the timestamps at all loads are saved in a list of load elements. |
| // This is not necessary as later loads by the same thread will always have greater timestamp values, so we only need to record the timestamp of the first |
| // load by each thread. This optimisation is done in tsan11 |
| // (https://github.com/ChrisLidbury/tsan11/blob/ecbd6b81e9b9454e01cba78eb9d88684168132c7/lib/tsan/rtl/tsan_relaxed.h#L35-L37) |
| // and here. |
| // |
| // 3. §4.5 of the paper wants an SC store to mark all existing stores in the buffer that happens before it |
| // as SC. This is not done in the operational semantics but implemented correctly in tsan11 |
| // (https://github.com/ChrisLidbury/tsan11/blob/ecbd6b81e9b9454e01cba78eb9d88684168132c7/lib/tsan/rtl/tsan_relaxed.cc#L160-L167) |
| // and here. |
| // |
| // 4. W_SC ; R_SC case requires the SC load to ignore all but last store maked SC (stores not marked SC are not |
| // affected). But this rule is applied to all loads in ReadsFromSet from the paper (last two lines of code), not just SC load. |
| // This is implemented correctly in tsan11 |
| // (https://github.com/ChrisLidbury/tsan11/blob/ecbd6b81e9b9454e01cba78eb9d88684168132c7/lib/tsan/rtl/tsan_relaxed.cc#L295) |
| // and here. |
| |
| use std::{ |
| cell::{Ref, RefCell}, |
| collections::VecDeque, |
| }; |
| |
| use rustc_const_eval::interpret::{alloc_range, AllocRange, InterpResult, MPlaceTy, Scalar}; |
| use rustc_data_structures::fx::FxHashMap; |
| |
| use crate::*; |
| |
| use super::{ |
| data_race::{GlobalState as DataRaceState, ThreadClockSet}, |
| range_object_map::{AccessType, RangeObjectMap}, |
| vector_clock::{VClock, VTimestamp, VectorIdx}, |
| }; |
| |
| pub type AllocState = StoreBufferAlloc; |
| |
| // Each store buffer must be bounded otherwise it will grow indefinitely. |
| // However, bounding the store buffer means restricting the amount of weak |
| // behaviours observable. The author picked 128 as a good tradeoff |
| // so we follow them here. |
| const STORE_BUFFER_LIMIT: usize = 128; |
| |
| #[derive(Debug, Clone)] |
| pub struct StoreBufferAlloc { |
| /// Store buffer of each atomic object in this allocation |
| // Behind a RefCell because we need to allocate/remove on read access |
| store_buffers: RefCell<RangeObjectMap<StoreBuffer>>, |
| } |
| |
| impl VisitTags for StoreBufferAlloc { |
| fn visit_tags(&self, visit: &mut dyn FnMut(BorTag)) { |
| let Self { store_buffers } = self; |
| for val in store_buffers |
| .borrow() |
| .iter() |
| .flat_map(|buf| buf.buffer.iter().map(|element| &element.val)) |
| { |
| val.visit_tags(visit); |
| } |
| } |
| } |
| |
| #[derive(Debug, Clone, PartialEq, Eq)] |
| pub(super) struct StoreBuffer { |
| // Stores to this location in modification order |
| buffer: VecDeque<StoreElement>, |
| } |
| |
| /// Whether a load returned the latest value or not. |
| #[derive(PartialEq, Eq)] |
| enum LoadRecency { |
| Latest, |
| Outdated, |
| } |
| |
| #[derive(Debug, Clone, PartialEq, Eq)] |
| struct StoreElement { |
| /// The identifier of the vector index, corresponding to a thread |
| /// that performed the store. |
| store_index: VectorIdx, |
| |
| /// Whether this store is SC. |
| is_seqcst: bool, |
| |
| /// The timestamp of the storing thread when it performed the store |
| timestamp: VTimestamp, |
| /// The value of this store |
| // FIXME: this means the store must be fully initialized; |
| // we will have to change this if we want to support atomics on |
| // (partially) uninitialized data. |
| val: Scalar<Provenance>, |
| |
| /// Metadata about loads from this store element, |
| /// behind a RefCell to keep load op take &self |
| load_info: RefCell<LoadInfo>, |
| } |
| |
| #[derive(Debug, Clone, PartialEq, Eq, Default)] |
| struct LoadInfo { |
| /// Timestamp of first loads from this store element by each thread |
| timestamps: FxHashMap<VectorIdx, VTimestamp>, |
| /// Whether this store element has been read by an SC load |
| sc_loaded: bool, |
| } |
| |
| impl StoreBufferAlloc { |
| pub fn new_allocation() -> Self { |
| Self { store_buffers: RefCell::new(RangeObjectMap::new()) } |
| } |
| |
| /// When a non-atomic access happens on a location that has been atomically accessed |
| /// before without data race, we can determine that the non-atomic access fully happens |
| /// after all the prior atomic accesses so the location no longer needs to exhibit |
| /// any weak memory behaviours until further atomic accesses. |
| pub fn memory_accessed(&self, range: AllocRange, global: &DataRaceState) { |
| if !global.ongoing_action_data_race_free() { |
| let mut buffers = self.store_buffers.borrow_mut(); |
| let access_type = buffers.access_type(range); |
| match access_type { |
| AccessType::PerfectlyOverlapping(pos) => { |
| buffers.remove_from_pos(pos); |
| } |
| AccessType::ImperfectlyOverlapping(pos_range) => { |
| // We rely on the data-race check making sure this is synchronized. |
| // Therefore we can forget about the old data here. |
| buffers.remove_pos_range(pos_range); |
| } |
| AccessType::Empty(_) => { |
| // The range had no weak behaviours attached, do nothing |
| } |
| } |
| } |
| } |
| |
| /// Gets a store buffer associated with an atomic object in this allocation, |
| /// or creates one with the specified initial value if no atomic object exists yet. |
| fn get_or_create_store_buffer<'tcx>( |
| &self, |
| range: AllocRange, |
| init: Scalar<Provenance>, |
| ) -> InterpResult<'tcx, Ref<'_, StoreBuffer>> { |
| let access_type = self.store_buffers.borrow().access_type(range); |
| let pos = match access_type { |
| AccessType::PerfectlyOverlapping(pos) => pos, |
| AccessType::Empty(pos) => { |
| let mut buffers = self.store_buffers.borrow_mut(); |
| buffers.insert_at_pos(pos, range, StoreBuffer::new(init)); |
| pos |
| } |
| AccessType::ImperfectlyOverlapping(pos_range) => { |
| // Once we reach here we would've already checked that this access is not racy. |
| let mut buffers = self.store_buffers.borrow_mut(); |
| buffers.remove_pos_range(pos_range.clone()); |
| buffers.insert_at_pos(pos_range.start, range, StoreBuffer::new(init)); |
| pos_range.start |
| } |
| }; |
| Ok(Ref::map(self.store_buffers.borrow(), |buffer| &buffer[pos])) |
| } |
| |
| /// Gets a mutable store buffer associated with an atomic object in this allocation |
| fn get_or_create_store_buffer_mut<'tcx>( |
| &mut self, |
| range: AllocRange, |
| init: Scalar<Provenance>, |
| ) -> InterpResult<'tcx, &mut StoreBuffer> { |
| let buffers = self.store_buffers.get_mut(); |
| let access_type = buffers.access_type(range); |
| let pos = match access_type { |
| AccessType::PerfectlyOverlapping(pos) => pos, |
| AccessType::Empty(pos) => { |
| buffers.insert_at_pos(pos, range, StoreBuffer::new(init)); |
| pos |
| } |
| AccessType::ImperfectlyOverlapping(pos_range) => { |
| // Once we reach here we would've already checked that this access is not racy. |
| buffers.remove_pos_range(pos_range.clone()); |
| buffers.insert_at_pos(pos_range.start, range, StoreBuffer::new(init)); |
| pos_range.start |
| } |
| }; |
| Ok(&mut buffers[pos]) |
| } |
| } |
| |
| impl<'mir, 'tcx: 'mir> StoreBuffer { |
| fn new(init: Scalar<Provenance>) -> Self { |
| let mut buffer = VecDeque::new(); |
| buffer.reserve(STORE_BUFFER_LIMIT); |
| let mut ret = Self { buffer }; |
| let store_elem = StoreElement { |
| // The thread index and timestamp of the initialisation write |
| // are never meaningfully used, so it's fine to leave them as 0 |
| store_index: VectorIdx::from(0), |
| timestamp: VTimestamp::ZERO, |
| val: init, |
| is_seqcst: false, |
| load_info: RefCell::new(LoadInfo::default()), |
| }; |
| ret.buffer.push_back(store_elem); |
| ret |
| } |
| |
| /// Reads from the last store in modification order |
| fn read_from_last_store( |
| &self, |
| global: &DataRaceState, |
| thread_mgr: &ThreadManager<'_, '_>, |
| is_seqcst: bool, |
| ) { |
| let store_elem = self.buffer.back(); |
| if let Some(store_elem) = store_elem { |
| let (index, clocks) = global.current_thread_state(thread_mgr); |
| store_elem.load_impl(index, &clocks, is_seqcst); |
| } |
| } |
| |
| fn buffered_read( |
| &self, |
| global: &DataRaceState, |
| thread_mgr: &ThreadManager<'_, '_>, |
| is_seqcst: bool, |
| rng: &mut (impl rand::Rng + ?Sized), |
| validate: impl FnOnce() -> InterpResult<'tcx>, |
| ) -> InterpResult<'tcx, (Scalar<Provenance>, LoadRecency)> { |
| // Having a live borrow to store_buffer while calling validate_atomic_load is fine |
| // because the race detector doesn't touch store_buffer |
| |
| let (store_elem, recency) = { |
| // The `clocks` we got here must be dropped before calling validate_atomic_load |
| // as the race detector will update it |
| let (.., clocks) = global.current_thread_state(thread_mgr); |
| // Load from a valid entry in the store buffer |
| self.fetch_store(is_seqcst, &clocks, &mut *rng) |
| }; |
| |
| // Unlike in buffered_atomic_write, thread clock updates have to be done |
| // after we've picked a store element from the store buffer, as presented |
| // in ATOMIC LOAD rule of the paper. This is because fetch_store |
| // requires access to ThreadClockSet.clock, which is updated by the race detector |
| validate()?; |
| |
| let (index, clocks) = global.current_thread_state(thread_mgr); |
| let loaded = store_elem.load_impl(index, &clocks, is_seqcst); |
| Ok((loaded, recency)) |
| } |
| |
| fn buffered_write( |
| &mut self, |
| val: Scalar<Provenance>, |
| global: &DataRaceState, |
| thread_mgr: &ThreadManager<'_, '_>, |
| is_seqcst: bool, |
| ) -> InterpResult<'tcx> { |
| let (index, clocks) = global.current_thread_state(thread_mgr); |
| |
| self.store_impl(val, index, &clocks.clock, is_seqcst); |
| Ok(()) |
| } |
| |
| #[allow(clippy::if_same_then_else, clippy::needless_bool)] |
| /// Selects a valid store element in the buffer. |
| fn fetch_store<R: rand::Rng + ?Sized>( |
| &self, |
| is_seqcst: bool, |
| clocks: &ThreadClockSet, |
| rng: &mut R, |
| ) -> (&StoreElement, LoadRecency) { |
| use rand::seq::IteratorRandom; |
| let mut found_sc = false; |
| // FIXME: we want an inclusive take_while (stops after a false predicate, but |
| // includes the element that gave the false), but such function doesn't yet |
| // exist in the standard library https://github.com/rust-lang/rust/issues/62208 |
| // so we have to hack around it with keep_searching |
| let mut keep_searching = true; |
| let candidates = self |
| .buffer |
| .iter() |
| .rev() |
| .take_while(move |&store_elem| { |
| if !keep_searching { |
| return false; |
| } |
| |
| keep_searching = if store_elem.timestamp <= clocks.clock[store_elem.store_index] { |
| // CoWR: if a store happens-before the current load, |
| // then we can't read-from anything earlier in modification order. |
| // C++20 §6.9.2.2 [intro.races] paragraph 18 |
| false |
| } else if store_elem.load_info.borrow().timestamps.iter().any( |
| |(&load_index, &load_timestamp)| load_timestamp <= clocks.clock[load_index], |
| ) { |
| // CoRR: if there was a load from this store which happened-before the current load, |
| // then we cannot read-from anything earlier in modification order. |
| // C++20 §6.9.2.2 [intro.races] paragraph 16 |
| false |
| } else if store_elem.timestamp <= clocks.fence_seqcst[store_elem.store_index] { |
| // The current load, which may be sequenced-after an SC fence, cannot read-before |
| // the last store sequenced-before an SC fence in another thread. |
| // C++17 §32.4 [atomics.order] paragraph 6 |
| false |
| } else if store_elem.timestamp <= clocks.write_seqcst[store_elem.store_index] |
| && store_elem.is_seqcst |
| { |
| // The current non-SC load, which may be sequenced-after an SC fence, |
| // cannot read-before the last SC store executed before the fence. |
| // C++17 §32.4 [atomics.order] paragraph 4 |
| false |
| } else if is_seqcst |
| && store_elem.timestamp <= clocks.read_seqcst[store_elem.store_index] |
| { |
| // The current SC load cannot read-before the last store sequenced-before |
| // the last SC fence. |
| // C++17 §32.4 [atomics.order] paragraph 5 |
| false |
| } else if is_seqcst && store_elem.load_info.borrow().sc_loaded { |
| // The current SC load cannot read-before a store that an earlier SC load has observed. |
| // See https://github.com/rust-lang/miri/issues/2301#issuecomment-1222720427 |
| // Consequences of C++20 §31.4 [atomics.order] paragraph 3.1, 3.3 (coherence-ordered before) |
| // and 4.1 (coherence-ordered before between SC makes global total order S) |
| false |
| } else { |
| true |
| }; |
| |
| true |
| }) |
| .filter(|&store_elem| { |
| if is_seqcst && store_elem.is_seqcst { |
| // An SC load needs to ignore all but last store maked SC (stores not marked SC are not |
| // affected) |
| let include = !found_sc; |
| found_sc = true; |
| include |
| } else { |
| true |
| } |
| }); |
| |
| let chosen = candidates.choose(rng).expect("store buffer cannot be empty"); |
| if std::ptr::eq(chosen, self.buffer.back().expect("store buffer cannot be empty")) { |
| (chosen, LoadRecency::Latest) |
| } else { |
| (chosen, LoadRecency::Outdated) |
| } |
| } |
| |
| /// ATOMIC STORE IMPL in the paper (except we don't need the location's vector clock) |
| fn store_impl( |
| &mut self, |
| val: Scalar<Provenance>, |
| index: VectorIdx, |
| thread_clock: &VClock, |
| is_seqcst: bool, |
| ) { |
| let store_elem = StoreElement { |
| store_index: index, |
| timestamp: thread_clock[index], |
| // In the language provided in the paper, an atomic store takes the value from a |
| // non-atomic memory location. |
| // But we already have the immediate value here so we don't need to do the memory |
| // access |
| val, |
| is_seqcst, |
| load_info: RefCell::new(LoadInfo::default()), |
| }; |
| self.buffer.push_back(store_elem); |
| if self.buffer.len() > STORE_BUFFER_LIMIT { |
| self.buffer.pop_front(); |
| } |
| if is_seqcst { |
| // Every store that happens before this needs to be marked as SC |
| // so that in a later SC load, only the last SC store (i.e. this one) or stores that |
| // aren't ordered by hb with the last SC is picked. |
| self.buffer.iter_mut().rev().for_each(|elem| { |
| if elem.timestamp <= thread_clock[elem.store_index] { |
| elem.is_seqcst = true; |
| } |
| }) |
| } |
| } |
| } |
| |
| impl StoreElement { |
| /// ATOMIC LOAD IMPL in the paper |
| /// Unlike the operational semantics in the paper, we don't need to keep track |
| /// of the thread timestamp for every single load. Keeping track of the first (smallest) |
| /// timestamp of each thread that has loaded from a store is sufficient: if the earliest |
| /// load of another thread happens before the current one, then we must stop searching the store |
| /// buffer regardless of subsequent loads by the same thread; if the earliest load of another |
| /// thread doesn't happen before the current one, then no subsequent load by the other thread |
| /// can happen before the current one. |
| fn load_impl( |
| &self, |
| index: VectorIdx, |
| clocks: &ThreadClockSet, |
| is_seqcst: bool, |
| ) -> Scalar<Provenance> { |
| let mut load_info = self.load_info.borrow_mut(); |
| load_info.sc_loaded |= is_seqcst; |
| let _ = load_info.timestamps.try_insert(index, clocks.clock[index]); |
| self.val |
| } |
| } |
| |
| impl<'mir, 'tcx: 'mir> EvalContextExt<'mir, 'tcx> for crate::MiriInterpCx<'mir, 'tcx> {} |
| pub(super) trait EvalContextExt<'mir, 'tcx: 'mir>: |
| crate::MiriInterpCxExt<'mir, 'tcx> |
| { |
| fn buffered_atomic_rmw( |
| &mut self, |
| new_val: Scalar<Provenance>, |
| place: &MPlaceTy<'tcx, Provenance>, |
| atomic: AtomicRwOrd, |
| init: Scalar<Provenance>, |
| ) -> InterpResult<'tcx> { |
| let this = self.eval_context_mut(); |
| let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr())?; |
| if let ( |
| crate::AllocExtra { weak_memory: Some(alloc_buffers), .. }, |
| crate::MiriMachine { data_race: Some(global), threads, .. }, |
| ) = this.get_alloc_extra_mut(alloc_id)? |
| { |
| if atomic == AtomicRwOrd::SeqCst { |
| global.sc_read(threads); |
| global.sc_write(threads); |
| } |
| let range = alloc_range(base_offset, place.layout.size); |
| let buffer = alloc_buffers.get_or_create_store_buffer_mut(range, init)?; |
| buffer.read_from_last_store(global, threads, atomic == AtomicRwOrd::SeqCst); |
| buffer.buffered_write(new_val, global, threads, atomic == AtomicRwOrd::SeqCst)?; |
| } |
| Ok(()) |
| } |
| |
| fn buffered_atomic_read( |
| &self, |
| place: &MPlaceTy<'tcx, Provenance>, |
| atomic: AtomicReadOrd, |
| latest_in_mo: Scalar<Provenance>, |
| validate: impl FnOnce() -> InterpResult<'tcx>, |
| ) -> InterpResult<'tcx, Scalar<Provenance>> { |
| let this = self.eval_context_ref(); |
| if let Some(global) = &this.machine.data_race { |
| let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr())?; |
| if let Some(alloc_buffers) = this.get_alloc_extra(alloc_id)?.weak_memory.as_ref() { |
| if atomic == AtomicReadOrd::SeqCst { |
| global.sc_read(&this.machine.threads); |
| } |
| let mut rng = this.machine.rng.borrow_mut(); |
| let buffer = alloc_buffers.get_or_create_store_buffer( |
| alloc_range(base_offset, place.layout.size), |
| latest_in_mo, |
| )?; |
| let (loaded, recency) = buffer.buffered_read( |
| global, |
| &this.machine.threads, |
| atomic == AtomicReadOrd::SeqCst, |
| &mut *rng, |
| validate, |
| )?; |
| if global.track_outdated_loads && recency == LoadRecency::Outdated { |
| this.emit_diagnostic(NonHaltingDiagnostic::WeakMemoryOutdatedLoad); |
| } |
| |
| return Ok(loaded); |
| } |
| } |
| |
| // Race detector or weak memory disabled, simply read the latest value |
| validate()?; |
| Ok(latest_in_mo) |
| } |
| |
| fn buffered_atomic_write( |
| &mut self, |
| val: Scalar<Provenance>, |
| dest: &MPlaceTy<'tcx, Provenance>, |
| atomic: AtomicWriteOrd, |
| init: Scalar<Provenance>, |
| ) -> InterpResult<'tcx> { |
| let this = self.eval_context_mut(); |
| let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(dest.ptr())?; |
| if let ( |
| crate::AllocExtra { weak_memory: Some(alloc_buffers), .. }, |
| crate::MiriMachine { data_race: Some(global), threads, .. }, |
| ) = this.get_alloc_extra_mut(alloc_id)? |
| { |
| if atomic == AtomicWriteOrd::SeqCst { |
| global.sc_write(threads); |
| } |
| |
| // UGLY HACK: in write_scalar_atomic() we don't know the value before our write, |
| // so init == val always. If the buffer is fresh then we would've duplicated an entry, |
| // so we need to remove it. |
| // See https://github.com/rust-lang/miri/issues/2164 |
| let was_empty = matches!( |
| alloc_buffers |
| .store_buffers |
| .borrow() |
| .access_type(alloc_range(base_offset, dest.layout.size)), |
| AccessType::Empty(_) |
| ); |
| let buffer = alloc_buffers |
| .get_or_create_store_buffer_mut(alloc_range(base_offset, dest.layout.size), init)?; |
| if was_empty { |
| buffer.buffer.pop_front(); |
| } |
| |
| buffer.buffered_write(val, global, threads, atomic == AtomicWriteOrd::SeqCst)?; |
| } |
| |
| // Caller should've written to dest with the vanilla scalar write, we do nothing here |
| Ok(()) |
| } |
| |
| /// Caller should never need to consult the store buffer for the latest value. |
| /// This function is used exclusively for failed atomic_compare_exchange_scalar |
| /// to perform load_impl on the latest store element |
| fn perform_read_on_buffered_latest( |
| &self, |
| place: &MPlaceTy<'tcx, Provenance>, |
| atomic: AtomicReadOrd, |
| init: Scalar<Provenance>, |
| ) -> InterpResult<'tcx> { |
| let this = self.eval_context_ref(); |
| |
| if let Some(global) = &this.machine.data_race { |
| if atomic == AtomicReadOrd::SeqCst { |
| global.sc_read(&this.machine.threads); |
| } |
| let size = place.layout.size; |
| let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr())?; |
| if let Some(alloc_buffers) = this.get_alloc_extra(alloc_id)?.weak_memory.as_ref() { |
| let buffer = alloc_buffers |
| .get_or_create_store_buffer(alloc_range(base_offset, size), init)?; |
| buffer.read_from_last_store( |
| global, |
| &this.machine.threads, |
| atomic == AtomicReadOrd::SeqCst, |
| ); |
| } |
| } |
| Ok(()) |
| } |
| } |