Hướng dẫn python in memory file

In-memory databases usually do not support memory paging option (for the whole database or certain tables), i,e, total size of the database should be smaller than the available physical memory or maximum shared memory size.

Nội dung chính

  • In-memory database systems and technologies
  • Trong bài viết này
  • Hybrid buffer pool
  • In-memory OLTP
  • Configuring persistent memory support for Linux
  • Persisted log Buffer
  • Phản hồi

Depending on your application, data-access pattern, size of database and available system memory for database, you have a few choices:

a. Pickled Python Data in File System
It stores structured Python data structure (such as list of dictionaries/lists/tuples/sets, dictionary of lists/pandas dataframes/numpy series, etc.) in pickled format so that they could be used immediately and convienently upon unpickled. AFAIK, Python does not use file system as backing store for Python objects in memory implicitly but host operating system may swap out Python processes for higher priority processes. This is suitable for static data, having smaller memory size compared to available system memory. These pickled data could be copied to other computers, read by multiple dependent or independent processes in the same computer. The actual database file or memory size has higher overhead than size of the data. It is the fastest way to access the data as the data is in the same memory of the Python process, and without a query parsing step.

b. In-memory Database
It stores dynamic or static data in the memory. Possible in-memory libraries that with Python API binding are Redis, sqlite3, Berkeley Database, rqlite, etc. Different in-memory databases offer different features

  • Database may be locked in the physical memory so that it is not swapped to memory backing store by the host operating system. However the actual implementation for the same libray may vary across different operating systems.
  • The database may be served by a database server process.
  • The in-memory may be accessed by multiple dependent or independent processes.
  • Support full, partial or no ACID model.
  • In-memory database could be persistent to physical files so that it is available when the host operating is restarted.
  • Support snapshots or/and different database copies for backup or database management.
  • Support distributed database using master-slave, cluster models.
  • Support from simple key-value lookup to advanced query, filter, group functions (such as SQL, NoSQL)

c. Memory-map Database/Data Structure
It stores static or dynamic data which could be larger than physical memory of the host operating system. Python developers could use API such as mmap.mmap() numpy.memmap() to map certain files into process memory space. The files could be arranged into index and data so that data could be lookup/accessed via index lookup. This is actually the mechanism used by various database libraries. Python developers could implement custom techniques to access/update data efficiency.

Chuyển đến nội dung chính

Trình duyệt này không còn được hỗ trợ nữa.

Hãy nâng cấp lên Microsoft Edge để tận dụng các tính năng mới nhất, bản cập nhật bảo mật và hỗ trợ kỹ thuật.

In-memory database systems and technologies

  • Bài viết
  • 03/30/2022
  • 2 phút để đọc

Trong bài viết này

Applies to: SQL Server (all supported versions)

This page is intended to serve as a reference page for in-memory features and technologies within SQL Server. The concept of an in-memory database system refers to a database system that has been designed to take advantage of larger memory capacities available on modern database systems. An in-memory database may be relational or non-relational in nature.

It is assumed often, that the performance advantages of an in-memory database system are mostly owing to it being faster to access data that is resident in memory rather than data that sitting on even the fastest available disk subsystems (by several orders of magnitude). However, many SQL Server workloads can fit their entire working set in available memory. Many in-memory database systems can persist data to disk and may not always be able to fit the entire data set in available memory.

A fast volatile cache that fronts a considerably slower but durable media has been predominant for relational database workloads. It necessitates particular approaches to workload management. The opportunities presented by faster memory transfer rates, greater capacity, or even persistent memory facilitates the development of new features and technologies that can spur new approaches to relational database workload management.

Hybrid buffer pool

Applies to: SQL Server (all supported versions)

Hybrid buffer pool expands the buffer pool for database files residing on byte-addressable persistent memory storage devices for both Windows and Linux platforms with SQL Server 2019 (15.x).

Applies to: SQL Server (all supported versions)

SQL Server 2019 (15.x) introduces a new feature that is memory-optimized tempdb metadata, which effectively removes some contention bottlenecks and unlocks a new level of scalability for tempdb-heavy workloads.

For more information on recent tempb improvements including memory-optimized metadata in SQL Server 2019 (15.x) and newer features, review System Page Latch Concurrency Enhancements (Ep. 6) | Data Exposed.

In-memory OLTP

Applies to: SQL Server (all supported versions)

In-memory OLTP is a database technology available in SQL Server and SQL Database for optimizing performance of transaction processing, data ingestion, data load, and transient data scenarios.

Configuring persistent memory support for Linux

Applies to: SQL Server (all supported versions) - Linux

SQL Server 2019 (15.x) describes how to configure persistent memory (PMEM) using the ndctl utility persistent memory.

Persisted log Buffer

Service Pack 1 of SQL Server 2016 (13.x) introduced a performance optimization for write intensive workloads that were bound by WRITELOG waits. Persistent memory is used to store the log buffer. This buffer, which is small (20 MB per user database), has to be flushed to disk in order for the transactions written to the transaction log to be hardened. For write intensive OLTP workloads, this flushing mechanism can become a bottleneck. With the log buffer on persistent memory, the number of operations required to harden the log is reduced, improving overall transaction times and increasing workload performance. This process was introduced as Tail of Log Caching. However, there was a perceived conflict with Tail Log Backups and the traditional understanding that the tail of the log was the portion of the transaction log hardened but not yet backed up. Since the official feature name is Persisted Log Buffer, this is the name used here.

See Add persisted log buffer to a database.

Phản hồi

Gửi và xem ý kiến phản hồi dành cho