Geometric Dumping

Generating geometrically spaced trajectory data with geo-rev11.py and geometricdump.com, and analyzing it with 2_time_geom.py and s4_tb_fft_ww_cycframes.py.

← / → or space · d toggles dark mode · click the slide counter to jump

The problem

2_time_std.py and the linear S4 scripts work on linearly spaced dumps. That is fine until the dynamics spread over many decades: a correlation function can only be evaluated at time separations that exist in the saved trajectory, and a single linear spacing cannot cover all the decades at once.

Numbers for the production region used throughout these slides (10⁶ steps ≈ 100 τ_α, 100k particles, ~1.2 MB per frame):

Dump schedule	Frames / run	Size / run	Lag coverage
linear, every step	10⁶	~1.2 TB	everything
linear, every 1000 steps	1,000	1.2 GB	nothing below 1 LJ; the first 3 decades are missing
geometric (25 dumps/cycle)	2,775	3.3 GB	1 step to full span, uniform in log t

Why repeated short sequences

A single geometric series from step 1 to the end of the run would solve storage. It fails for a different reason. Correlation functions are averaged over initial times t₀: the same separation t is measured from many starting frames. The translational overlap, for example:

w_j(t₀+t, t₀; a) ≡ Θ(a − |r_j(t₀+t) − r_j(t₀)|)
F_o(t) ≡ (1/N_t₀) N_t₀Σt₀=1 (1/N) NΣj=1 ⟨w_j(t₀+t, t₀; a)⟩

A single geometric series has only one usable initial time, the first frame.
To include a separation t in the average, that separation has to be reachable from every initial time. So the dump pattern following each initial time must be the same.

The schedule is therefore built in two levels:

Within a cycle: dumps at t_n = t₀ rⁿ, log-uniform coverage of separations.
Between cycles: cycle starts linearly spaced, giving many equivalent initial times (time-translation invariance).

Cost: separations between the cycle length Λ and its multiples are coarser. In practice the set of available separations stays dense enough (slide 13 shows it exactly).

The schedule

Generated live with the geo-rev11.py --small --old algorithm. Each tick is one frame. The time axis is fixed at 160 LJ, so changing -t or the cycle count changes the coverage you see.

-n dumps/cycle goal 27 -t cycle length (LJ) 9 cycles 4

Top: the dumps in real (linear) time. Bottom: one row per cycle, each showing the time of its dumps measured from that cycle's own start, on a log axis. The rows are identical: every cycle repeats the same pattern.

geo-rev11.py

Prints three text files that a LAMMPS loop reads line by line. One frame per DCD file.

File	Contents
`run_times.txt`	steps to run before each dump (argument to `run`)
`dump_times.txt`	cumulative timestep of each dump (dump trigger / bookkeeping)
`file_names.txt`	output filename for each frame

Option	Meaning
`-a [float]`	(LJ time units) first geometric term. Without `--old`, this is the first runtime; with `--old`, this is the first dump offset. Values below one timestep floor to one timestep.
`-t [float]`	(LJ time units) endpoint used with `-a` and `-n` to set the geometric ratio. With `--old`, it is the sequence span; without `--old`, it is the largest single runtime.
`-n [int]`	(count) target number of geometric terms. Terms whose runtime floors to zero timesteps are dropped, so the true dump count can be lower than this target.
`-d [float]`	(LJ time units) simulation timestep, used to convert LJ times to integer timesteps.
`-r [int]`	(count) number of times to repeat the generated sequence back to back.
`-c [int]`	(file index) initial filename counter before each generated file increments it. Set this to the last already-used index; the first generated file is `-c + 1`.
`-m [int]`	(timesteps) offset added to every cumulative dump time, used to continue the timestep counter after a previous sequence.
`--small`	(mode flag) generate the simple geometric-series schedule used by this workflow. Without it, geo-rev11.py uses the older long-series mode.
`--old`	(mode flag) treat the geometric terms as cumulative dump offsets and compute runtimes as their differences. This makes every repeat have identical dump offsets.
`--test`	(diagnostic flag) print diagnostics instead of loop-file output. Do not use this output as LAMMPS schedule files.

Output files are opened in append mode. Delete them before re-running, or two schedules get concatenated.

The LAMMPS loop

# --- Looping Parameters Setup --- variable loopvar loop 420 variable eruntime file run_times.txt variable filename file file_names.txt variable dumpint file dump_times.txt

loopvar sets the number of loop iterations, which should equal the number of lines in the generated files.
Each variable ... file reads one line at a time from run_times.txt, file_names.txt, and dump_times.txt.

reset_timestep 0 # initial run/dump to initialize output dump 1 all dcd 2 traj1.dcd dump_modify 1 unwrap yes run 1 undump 1

Resetting the timestep gives the schedule a known origin.
The one-step initialization writes the step-0 frame before the generated loop begins. That file consumes the first filename index, so the generated file counter starts past it.
dump_modify unwrap yes writes unwrapped coordinates, which the downstream analysis expects.

# --- Main Production Loop --- label loopstart dump 1 all dcd ${dumpint} ${filename} dump_modify 1 unwrap yes run ${eruntime} undump 1

For the current schedule line, LAMMPS opens one DCD dump using the absolute dump timestep and filename from the files.
run ${eruntime} advances exactly to that dump time, so the file receives one frame, then undump closes it before the next iteration.

next loopvar next eruntime next filename next dumpint jump SELF loopstart

next advances the loop counter and each file-backed variable to the next schedule line.
jump SELF loopstart repeats until loopvar is exhausted, producing one DCD file per generated schedule entry.

--old vs default: what is geometric?

default: geometric runtimes

runtime_k = ⌊t₀ r^k⌋; dump time = running sum. -t sets the largest single gap. The sequence ends wherever the sum lands.

Used for the long sequence.

--old: geometric dump times

dump offset_k = ⌊t₀ r^k⌋ exactly; runtime = difference from the previous dump. The sequence ends exactly at -t, and every repeat reproduces identical offsets.

Used for the short sequences. The analysis scripts verify that all cycles are identical, and this mode guarantees it.

Top: -a 0.001 -t 9 -n 27, one sequence. Bottom: the same flags with --old and -r 3, on the same linear axis. Both production commands use --small; the non---small path is the legacy long-series mode and predates the current workflow.

Two ways to use the schedule

A: short sequences only

System is equilibrated before recording. Every cycle start is an equivalent initial time. Analyze with 2_time_geom.py and s4_tb_fft_ww_cycframes.py.

B: long sequence, then short sequences

During aging, correlations depend on the waiting time t_w, and the evolution is slow in log t_w, so one geometric series samples it with a few hundred frames. Analyze the long region with 2_time_aging.py (explicit initial frames); analyze the short region with the cyclic scripts, starting at the right file index (-z).

Don't point a cyclic script at the stitched long+short set: the long region doesn't repeat, and the cycle check fails (correctly).

geometricdump.com

You can hand-pick geo-rev11.py flags, but they interact: the ratio depends on -a, -t, -n, and the stitch values -c and -m depend on the long sequence's output. The site takes system-level inputs and derives the flags.

You provide

dt
τ_α
τ_equil
expressions like 100*tau_alpha work

→

It derives

long: -t = τ_α/2, -n searched so the last dump lands on τ_equil
short: length 0.9 τ_α, -n 27, -r = total/length
stitch: -c, -m from the long output (+ the step-0 file)

→

You get

both commands with per-flag explanations
consistency checks
DCD size estimate
plots
downloadable schedule files
LAMMPS loop snippet

The reason for the specific defaults is dump density at τ_α: next slide.

Why cycle length ≈ 0.9 τ_α

A schedule can look reasonable and still measure τ_α badly. The available separations get sparse toward the end of a cycle and dense again right after the cycle boundary. The cycle length decides which region τ_α falls in.

cycle length Λ / τ_α 0.90 τ_α = 10 LJ, dt = 0.001, -n 27

At the default Λ = 0.9 τ_α, τ_α sits just past the cycle boundary, in dense spacing. Push Λ above τ_α and τ_α lands in the sparse tail of the cycle, where the nearest available separations are far apart.

Worked example

dt = 0.001, τ_α = 10 LJ, τ_equil = 200 LJ, production 100 τ_α.

python geo-rev11.py -a 0 -t 5 -n 338 -d 0.001 -c 1 --small -r 1 -m 0

Long sequence: one sequence of geometrically growing runtimes, largest runtime τ_α/2 = 5 LJ, with 338 terms (the website searches -n for the value whose summed runtimes land closest to τ_equil; here the sum is 200,148 steps ≈ 200 LJ), first runtime as small as possible, filename counter starting at 1 (the step-0 dump took short1.dcd), timestep counter starting at 0:

python geo-rev11.py -a 0.001 -t 9 -n 27 -d 0.001 -c 339 --small --old -r 111 -m 200148

Short sequences: 111 repeats of a sequence whose dump times are geometric (--old), each spanning 9 LJ = 0.9 τ_α with a goal of 27 dumps and the first dump 1 step in, continuing the filename counter from the long sequence and the timestep counter from its last dump:

	Long	Short (production)
dumps	338	111 cycles × 25 = 2,775
files	`short2.dcd` – `short339.dcd`	`short340.dcd` – `short3114.dcd`
timesteps	1 – 200,148 (≈ 200 LJ)	200,149 – 1,199,148 (≈ 1,000 LJ)

One short sequence (production cycle), in steps. Dump times, measured from the start of the sequence (the last is 9 LJ / 0.001 = 9,000):

1 2 4 5 8 11 16 23 33 47 66 94 134 191 271 385 546 775 1100 1562 2217 3147 4467 6340 9000

and the corresponding runtimes (the gaps between dumps):

1 1 2 1 3 3 5 7 10 14 19 28 40 57 80 114 161 229 325 462 655 930 1320 1873 2660

What the cyclic scripts do before computing

The analysis scripts don't read dump_times.txt directly. The schedule is recovered from the DCD metadata. 2_time_geom.py and s4_tb_fft_ww_cycframes.py both start with these steps (L = frames per cycle, the -c / -G argument):

Measure the cycle. The times of the first L+1 frames give the L time gaps that make up one cycle.
Verify it. Every frame time in the whole trajectory has to match that repeating pattern. If one frame doesn't, the script stops and reports which frame in which file.
Rotate. The starting point within the cycle is shifted so analysis begins at the smallest gap.
List the available separations. Adding the repeating gaps end to end gives every time separation that exists in the data. Requested lags are then moved to the closest available separation (compared on a log scale).

Step 2 is why --small --old matters at generation time, and why a wrong -c, -G, or -z fails immediately instead of producing wrong numbers.

Lag selection: -g and -U

-g N asks for N lag values, geometrically spaced between a first lag and the longest separation in the data. The data only contains specific separations, so each requested value is moved to the closest one available. When two requests land on the same separation it is kept once, which is why fewer values come out than went in.

-U sets the first requested lag. -U n (positive) starts n frames in. -U 0, the old default, starts at 1.0 LJ, so no separations shorter than 1 LJ get requested. Try it below.

-g 50 -U worked-example data: 2,775 frames, 25-frame cycle (actual progression.py algorithm)

2_time_std vs 2_time_geom

Same observables, same output format. The difference is the dump schedule each one expects.

2_time_std.py

reads traj<n>.dcd, count -n
evenly spaced dumps
initial times every -d frames
lags used directly as frame counts

shared

MSD, overlap (-a), F_s (-q), orientational correlations (--polyatomic*)
run averages and across-run standard deviations (-r runs)
frame range -s/-m, last initial time -k
lag request flags -f / -g / -w, --low/high-interval
output column 1 is the real lag time, so results plot together

2_time_geom.py

reads short<n>.dcd, count -t, first file -z
requires -c, the true frames per cycle
verifies the cycle from DCD metadata; file tests first*
initial times at cycle starts
requested lags moved to available separations; -U

python 2_time_geom.py -r 5 -t 2775 -z 340 -c 25 -g 50 -U 1 > out.txt

For the worked example: average over 5 runs, reading 2,775 short files per run starting at short340.dcd, with 25 frames per cycle, requesting 50 geometric lags with the first lag one frame in:

* file tests: before computing, the script checks that every expected DCD file exists and opens readably (--skip-file-tests skips this part; the cycle verification always runs). Needs at least 2 full cycles of frames. --negvals is forced off in the geom version.

S4 scripts

S₄(q, t_b): spatial correlations of mobility. For each interval length t_b, particles get a mobility weight w(t₀, t₀+t_b), the weights are binned on an FFT grid, and the structure factor of that field is computed. Output per t_b: total, self, and distinct parts, with errors from across-run scatter. At least 2 runs required.

w function: --W1-theta 0.25 (step function on displacement, same form as the overlap on slide 3) is the usual choice. Also available: --W1-gauss, --W1-exp, --W1-dir-theta (orientational, needs --polyatomic).
t_a (-a) and t_c (-c) offset the two intervals or make them unequal. We usually leave both at 0, which compares an interval with itself. Nonzero values require explicit --W2-* options.
q output: all q vectors by default, or sorted into magnitude bins with the qshell flags (-q discrete region, -v shell region, -l shell count).
Required: -x FFT grid size, -y box size.

s4_tb_fft_ww_linframes vs _cycframes

Same S4 computation. cycframes adds the cycle steps from slide 12, plus one extra time check.

…linframes.py

reads traj<n>.dcd via -n
t_b values in frame units
-d initial-time spacing in frames (default 10)
-M margin in multiples of -d

shared

S₄ total / self / distinct with across-run errors
-x FFT grid, -y box size (both required)
t_a (-a), t_c (-c), usually 0; W1/W2 flags
qshell flags -q / -v / -l
-k last initial time; -i one output file per t_b; -r ≥ 2

…cycframes.py

prefix -N (default short), count -n, first file -z
requires -G, the true frames per cycle
t_b values moved to available separations; -U, --custom-geom-base
-d in cycles (default 1); -M in cycles
realized (t_a, t_b, t_c) at each initial time must match the first occurrence within -S; mismatches are skipped

python s4_tb_fft_ww_cycframes.py -r 5 -n 2775 -N short -z 340 -G 25 -g 30 -x 24 -y 47.0 --W1-theta 0.25

For the worked example: average over 5 runs, reading 2,775 files named short*.dcd starting at index 340, with 25 frames per cycle, 30 requested t_b values, a 24³ FFT grid for a box of side 47.0, and a Heaviside mobility function (translational overlap) with threshold radius 0.25:

Things to be careful about

Issue	Behavior / fix
re-running geo-rev11 without deleting outputs	schedules concatenate silently (append mode)
set_len is the true dumps per cycle, not the goal	goal `-n 27` produces 25 surviving dumps, so `-c 25` / `-G 25`. The wrong value fails the cycle check
forgetting the step-0 dump	it consumes the first filename; first generated file is index 2, and the first production file in the example is `short340.dcd` (1 + 338 + 1)
wrong `-z` in long+short data	long-sequence frames enter the cycle check and it fails
`--skip-file-tests`	skips only the existence/readability tests; the cycle check always runs
`-U < 0`	unsupported; the script warns that this path is incorrect, so leave `-U` nonnegative
S4 with 1 run	hard error; errors come from across-run scatter
t_a or t_c ≠ 0 without `--W2-*`	hard error
too few frames	`2_time_geom` needs at least 2 cycles; cycframes warns if `-M` leaves too little room and may fail at the last t_b

Command quick reference

The worked example end to end: dt = 0.001, τ_α = 10 LJ, τ_equil = 200 LJ, production 100 τ_α, 5 runs.

# 1. schedule files (or use geometricdump.com) python geo-rev11.py -a 0 -t 5 -n 338 -d 0.001 -c 1 --small -r 1 -m 0 python geo-rev11.py -a 0.001 -t 9 -n 27 -d 0.001 -c 339 --small --old -r 111 -m 200148

# 2. LAMMPS: loop over run_times.txt / dump_times.txt / file_names.txt # use the loop snippet from slide 6

# 3. two-time correlations on the production region python 2_time_geom.py -r 5 -t 2775 -z 340 -c 25 -g 50 -U 1 > out.txt

# 4. S4 on the production region python s4_tb_fft_ww_cycframes.py -r 5 -n 2775 -z 340 -N short -G 25 -g 30 -x 24 -y 47.0 --W1-theta 0.25

-c / -G take the true dumps per sequence (25), not the -n goal (27). -z 340 = 1 step-0 file + 338 long files + 1.

Summary

`geo-rev11.py`	generates the schedule files (`run_times`, `dump_times`, `file_names`) for the LAMMPS loop
geometricdump.com	derives the geo-rev11 flags from dt, τ_α, τ_equil; emits both commands and the files
`2_time_geom.py`	MSD, overlap, F_s, orientational correlations on cyclic dumps
`s4_tb_fft_ww_cycframes.py`	S₄(q, t_b) on cyclic dumps
`2_time_aging.py`	two-time correlations vs (t_w, t_w+t) for the long-sequence region

Repo docs: files_readme/, geometric-dumping/README.md.

Geometric Dumping

The problem

Why repeated short sequences

The schedule

geo-rev11.py

The LAMMPS loop

--old vs default: what is geometric?

default: geometric runtimes

--old: geometric dump times

Two ways to use the schedule

A: short sequences only

B: long sequence, then short sequences

geometricdump.com

Why cycle length ≈ 0.9 τα

Worked example

What the cyclic scripts do before computing

Lag selection: -g and -U

2_time_std vs 2_time_geom

2_time_std.py

shared

2_time_geom.py

S4 scripts

s4_tb_fft_ww_linframes vs _cycframes

…linframes.py

shared

…cycframes.py

Things to be careful about

Command quick reference

Summary

Why cycle length ≈ 0.9 τ_α