Geometric Dumping

Generating geometrically spaced trajectory data with geo-rev11.py and geometricdump.com, and analyzing it with 2_time_geom.py and s4_tb_fft_ww_cycframes.py.

← / → or space · d toggles dark mode · click the slide counter to jump

The problem

2_time_std.py and the linear S4 scripts work on linearly spaced dumps. That is fine until the dynamics spread over many decades: a correlation function can only be evaluated at time separations that exist in the saved trajectory, and a single linear spacing cannot cover all the decades at once.

Numbers for the production region used throughout these slides (106 steps ≈ 100 τα, 100k particles, ~1.2 MB per frame):

Dump scheduleFrames / runSize / runLag coverage
linear, every step106~1.2 TBeverything
linear, every 1000 steps1,0001.2 GBnothing below 1 LJ; the first 3 decades are missing
geometric (25 dumps/cycle)2,7753.3 GB1 step to full span, uniform in log t

Why repeated short sequences

A single geometric series from step 1 to the end of the run would solve storage. It fails for a different reason. Correlation functions are averaged over initial times t0: the same separation t is measured from many starting frames. The translational overlap, for example:

wj(t0+t, t0; a) ≡ Θ(a − |rj(t0+t) − rj(t0)|)
Fo(t) ≡ (1/Nt₀) Nt₀Σt₀=1 (1/N) NΣj=1 ⟨wj(t0+t, t0; a)⟩

The schedule is therefore built in two levels:

Cost: separations between the cycle length Λ and its multiples are coarser. In practice the set of available separations stays dense enough (slide 13 shows it exactly).

The schedule

Generated live with the geo-rev11.py --small --old algorithm. Each tick is one frame. The time axis is fixed at 160 LJ, so changing -t or the cycle count changes the coverage you see.

Top: the dumps in real (linear) time. Bottom: one row per cycle, each showing the time of its dumps measured from that cycle's own start, on a log axis. The rows are identical: every cycle repeats the same pattern.

geo-rev11.py

Prints three text files that a LAMMPS loop reads line by line. One frame per DCD file.

FileContents
run_times.txtsteps to run before each dump (argument to run)
dump_times.txtcumulative timestep of each dump (dump trigger / bookkeeping)
file_names.txtoutput filename for each frame
OptionMeaning
-a [float](LJ time units) first geometric term. Without --old, this is the first runtime; with --old, this is the first dump offset. Values below one timestep floor to one timestep.
-t [float](LJ time units) endpoint used with -a and -n to set the geometric ratio. With --old, it is the sequence span; without --old, it is the largest single runtime.
-n [int](count) target number of geometric terms. Terms whose runtime floors to zero timesteps are dropped, so the true dump count can be lower than this target.
-d [float](LJ time units) simulation timestep, used to convert LJ times to integer timesteps.
-r [int](count) number of times to repeat the generated sequence back to back.
-c [int](file index) initial filename counter before each generated file increments it. Set this to the last already-used index; the first generated file is -c + 1.
-m [int](timesteps) offset added to every cumulative dump time, used to continue the timestep counter after a previous sequence.
--small(mode flag) generate the simple geometric-series schedule used by this workflow. Without it, geo-rev11.py uses the older long-series mode.
--old(mode flag) treat the geometric terms as cumulative dump offsets and compute runtimes as their differences. This makes every repeat have identical dump offsets.
--test(diagnostic flag) print diagnostics instead of loop-file output. Do not use this output as LAMMPS schedule files.
Output files are opened in append mode. Delete them before re-running, or two schedules get concatenated.

The LAMMPS loop

# --- Looping Parameters Setup --- variable loopvar loop 420 variable eruntime file run_times.txt variable filename file file_names.txt variable dumpint file dump_times.txt
reset_timestep 0 # initial run/dump to initialize output dump 1 all dcd 2 traj1.dcd dump_modify 1 unwrap yes run 1 undump 1
# --- Main Production Loop --- label loopstart dump 1 all dcd ${dumpint} ${filename} dump_modify 1 unwrap yes run ${eruntime} undump 1
next loopvar next eruntime next filename next dumpint jump SELF loopstart

--old vs default: what is geometric?

default: geometric runtimes

runtimek = ⌊t0 rk⌋; dump time = running sum. -t sets the largest single gap. The sequence ends wherever the sum lands.

Used for the long sequence.

--old: geometric dump times

dump offsetk = ⌊t0 rk⌋ exactly; runtime = difference from the previous dump. The sequence ends exactly at -t, and every repeat reproduces identical offsets.

Used for the short sequences. The analysis scripts verify that all cycles are identical, and this mode guarantees it.

Top: -a 0.001 -t 9 -n 27, one sequence. Bottom: the same flags with --old and -r 3, on the same linear axis. Both production commands use --small; the non---small path is the legacy long-series mode and predates the current workflow.

Two ways to use the schedule

A: short sequences only

System is equilibrated before recording. Every cycle start is an equivalent initial time. Analyze with 2_time_geom.py and s4_tb_fft_ww_cycframes.py.

B: long sequence, then short sequences

During aging, correlations depend on the waiting time tw, and the evolution is slow in log tw, so one geometric series samples it with a few hundred frames. Analyze the long region with 2_time_aging.py (explicit initial frames); analyze the short region with the cyclic scripts, starting at the right file index (-z).

Don't point a cyclic script at the stitched long+short set: the long region doesn't repeat, and the cycle check fails (correctly).

geometricdump.com

You can hand-pick geo-rev11.py flags, but they interact: the ratio depends on -a, -t, -n, and the stitch values -c and -m depend on the long sequence's output. The site takes system-level inputs and derives the flags.

You provide
  • dt
  • τα
  • τequil
  • expressions like 100*tau_alpha work
It derives
  • long: -t = τα/2, -n searched so the last dump lands on τequil
  • short: length 0.9 τα, -n 27, -r = total/length
  • stitch: -c, -m from the long output (+ the step-0 file)
You get
  • both commands with per-flag explanations
  • consistency checks
  • DCD size estimate
  • plots
  • downloadable schedule files
  • LAMMPS loop snippet

The reason for the specific defaults is dump density at τα: next slide.

Why cycle length ≈ 0.9 τα

A schedule can look reasonable and still measure τα badly. The available separations get sparse toward the end of a cycle and dense again right after the cycle boundary. The cycle length decides which region τα falls in.

τα = 10 LJ, dt = 0.001, -n 27

At the default Λ = 0.9 τα, τα sits just past the cycle boundary, in dense spacing. Push Λ above τα and τα lands in the sparse tail of the cycle, where the nearest available separations are far apart.

Worked example

dt = 0.001, τα = 10 LJ, τequil = 200 LJ, production 100 τα.



python geo-rev11.py -a 0 -t 5 -n 338 -d 0.001 -c 1 --small -r 1 -m 0

Long sequence: one sequence of geometrically growing runtimes, largest runtime τα/2 = 5 LJ, with 338 terms (the website searches -n for the value whose summed runtimes land closest to τequil; here the sum is 200,148 steps ≈ 200 LJ), first runtime as small as possible, filename counter starting at 1 (the step-0 dump took short1.dcd), timestep counter starting at 0:



python geo-rev11.py -a 0.001 -t 9 -n 27 -d 0.001 -c 339 --small --old -r 111 -m 200148

Short sequences: 111 repeats of a sequence whose dump times are geometric (--old), each spanning 9 LJ = 0.9 τα with a goal of 27 dumps and the first dump 1 step in, continuing the filename counter from the long sequence and the timestep counter from its last dump:




LongShort (production)
dumps338111 cycles × 25 = 2,775
filesshort2.dcdshort339.dcdshort340.dcdshort3114.dcd
timesteps1 – 200,148 (≈ 200 LJ)200,149 – 1,199,148 (≈ 1,000 LJ)

One short sequence (production cycle), in steps. Dump times, measured from the start of the sequence (the last is 9 LJ / 0.001 = 9,000):

1 2 4 5 8 11 16 23 33 47 66 94 134 191 271 385 546 775 1100 1562 2217 3147 4467 6340 9000

and the corresponding runtimes (the gaps between dumps):

1 1 2 1 3 3 5 7 10 14 19 28 40 57 80 114 161 229 325 462 655 930 1320 1873 2660

What the cyclic scripts do before computing

The analysis scripts don't read dump_times.txt directly. The schedule is recovered from the DCD metadata. 2_time_geom.py and s4_tb_fft_ww_cycframes.py both start with these steps (L = frames per cycle, the -c / -G argument):

  1. Measure the cycle. The times of the first L+1 frames give the L time gaps that make up one cycle.
  2. Verify it. Every frame time in the whole trajectory has to match that repeating pattern. If one frame doesn't, the script stops and reports which frame in which file.
  3. Rotate. The starting point within the cycle is shifted so analysis begins at the smallest gap.
  4. List the available separations. Adding the repeating gaps end to end gives every time separation that exists in the data. Requested lags are then moved to the closest available separation (compared on a log scale).

Step 2 is why --small --old matters at generation time, and why a wrong -c, -G, or -z fails immediately instead of producing wrong numbers.

Lag selection: -g and -U

-g N asks for N lag values, geometrically spaced between a first lag and the longest separation in the data. The data only contains specific separations, so each requested value is moved to the closest one available. When two requests land on the same separation it is kept once, which is why fewer values come out than went in.

-U sets the first requested lag. -U n (positive) starts n frames in. -U 0, the old default, starts at 1.0 LJ, so no separations shorter than 1 LJ get requested. Try it below.



worked-example data: 2,775 frames, 25-frame cycle (actual progression.py algorithm)

2_time_std vs 2_time_geom

Same observables, same output format. The difference is the dump schedule each one expects.

2_time_std.py

  • reads traj<n>.dcd, count -n
  • evenly spaced dumps
  • initial times every -d frames
  • lags used directly as frame counts

shared

  • MSD, overlap (-a), Fs (-q), orientational correlations (--polyatomic*)
  • run averages and across-run standard deviations (-r runs)
  • frame range -s/-m, last initial time -k
  • lag request flags -f / -g / -w, --low/high-interval
  • output column 1 is the real lag time, so results plot together

2_time_geom.py

  • reads short<n>.dcd, count -t, first file -z
  • requires -c, the true frames per cycle
  • verifies the cycle from DCD metadata; file tests first*
  • initial times at cycle starts
  • requested lags moved to available separations; -U


python 2_time_geom.py -r 5 -t 2775 -z 340 -c 25 -g 50 -U 1 > out.txt

For the worked example: average over 5 runs, reading 2,775 short files per run starting at short340.dcd, with 25 frames per cycle, requesting 50 geometric lags with the first lag one frame in:

* file tests: before computing, the script checks that every expected DCD file exists and opens readably (--skip-file-tests skips this part; the cycle verification always runs). Needs at least 2 full cycles of frames. --negvals is forced off in the geom version.

S4 scripts

S4(q, tb): spatial correlations of mobility. For each interval length tb, particles get a mobility weight w(t0, t0+tb), the weights are binned on an FFT grid, and the structure factor of that field is computed. Output per tb: total, self, and distinct parts, with errors from across-run scatter. At least 2 runs required.

s4_tb_fft_ww_linframes vs _cycframes

Same S4 computation. cycframes adds the cycle steps from slide 12, plus one extra time check.

…linframes.py

  • reads traj<n>.dcd via -n
  • tb values in frame units
  • -d initial-time spacing in frames (default 10)
  • -M margin in multiples of -d

shared

  • S4 total / self / distinct with across-run errors
  • -x FFT grid, -y box size (both required)
  • ta (-a), tc (-c), usually 0; W1/W2 flags
  • qshell flags -q / -v / -l
  • -k last initial time; -i one output file per tb; -r ≥ 2

…cycframes.py

  • prefix -N (default short), count -n, first file -z
  • requires -G, the true frames per cycle
  • tb values moved to available separations; -U, --custom-geom-base
  • -d in cycles (default 1); -M in cycles
  • realized (ta, tb, tc) at each initial time must match the first occurrence within -S; mismatches are skipped


python s4_tb_fft_ww_cycframes.py -r 5 -n 2775 -N short -z 340 -G 25 -g 30 -x 24 -y 47.0 --W1-theta 0.25

For the worked example: average over 5 runs, reading 2,775 files named short*.dcd starting at index 340, with 25 frames per cycle, 30 requested tb values, a 24³ FFT grid for a box of side 47.0, and a Heaviside mobility function (translational overlap) with threshold radius 0.25:

Things to be careful about

IssueBehavior / fix
re-running geo-rev11 without deleting outputsschedules concatenate silently (append mode)
set_len is the true dumps per cycle, not the goalgoal -n 27 produces 25 surviving dumps, so -c 25 / -G 25. The wrong value fails the cycle check
forgetting the step-0 dumpit consumes the first filename; first generated file is index 2, and the first production file in the example is short340.dcd (1 + 338 + 1)
wrong -z in long+short datalong-sequence frames enter the cycle check and it fails
--skip-file-testsskips only the existence/readability tests; the cycle check always runs
-U < 0unsupported; the script warns that this path is incorrect, so leave -U nonnegative
S4 with 1 runhard error; errors come from across-run scatter
ta or tc ≠ 0 without --W2-*hard error
too few frames2_time_geom needs at least 2 cycles; cycframes warns if -M leaves too little room and may fail at the last tb

Command quick reference

The worked example end to end: dt = 0.001, τα = 10 LJ, τequil = 200 LJ, production 100 τα, 5 runs.

# 1. schedule files (or use geometricdump.com) python geo-rev11.py -a 0 -t 5 -n 338 -d 0.001 -c 1 --small -r 1 -m 0 python geo-rev11.py -a 0.001 -t 9 -n 27 -d 0.001 -c 339 --small --old -r 111 -m 200148
# 2. LAMMPS: loop over run_times.txt / dump_times.txt / file_names.txt # use the loop snippet from slide 6
# 3. two-time correlations on the production region python 2_time_geom.py -r 5 -t 2775 -z 340 -c 25 -g 50 -U 1 > out.txt
# 4. S4 on the production region python s4_tb_fft_ww_cycframes.py -r 5 -n 2775 -z 340 -N short -G 25 -g 30 -x 24 -y 47.0 --W1-theta 0.25

-c / -G take the true dumps per sequence (25), not the -n goal (27). -z 340 = 1 step-0 file + 338 long files + 1.

Summary

geo-rev11.pygenerates the schedule files (run_times, dump_times, file_names) for the LAMMPS loop
geometricdump.comderives the geo-rev11 flags from dt, τα, τequil; emits both commands and the files
2_time_geom.pyMSD, overlap, Fs, orientational correlations on cyclic dumps
s4_tb_fft_ww_cycframes.pyS4(q, tb) on cyclic dumps
2_time_aging.pytwo-time correlations vs (tw, tw+t) for the long-sequence region

Repo docs: files_readme/, geometric-dumping/README.md.

slide 1