Last active
September 12, 2022 17:29
-
-
Save KingCode/ee836557dd8d140b6dba46312bfaae7c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
See | |
https://gist.github.com/ericnormand/7174aaccc71025de86ddac77553f8595 | |
How many digits? | |
Imagine you took all the integers between n and m (exclusive, n < | |
m) and concatenated them together. How many digits would you have? | |
Write a function that takes two numbers and returns how many | |
digits. Note that the numbers can get very big, so it is not | |
possible to build the string in the general case. | |
Examples: | |
(num-digits 0 1) ;=> 0 (there are no integers between 0 and 1) | |
(num-digits 0 10) ;=> 9 (1, 2, 3, 4, 5, 6, 7, 8, 9) | |
(num-digits 9 100) ;=> 180 | |
_____________________________________________________ | |
;; Model ;;;;;;;;;;;; | |
; | |
Since the width (number of digits) of an integer is derived from the | |
sum of powers of the base, and that all integers with the same width | |
follow each other (negatives and positivies are further categorized, | |
see below), we can partition all numbers according to their width. | |
DEFINITIONS | |
The WIDTH of a number is the number of digits of its absolute value; | |
the width of a slice is the width of all numbers for that slice. | |
A SLICE is a tuple [a, w] depicting the sequence of unsigned integers | |
(or the absolute value of signed integers) in some base `b` having | |
the same width, starting with `a` (the anchor) having width `w`, | |
e.g. [10, 2] [100, 3],[0, 1]: | |
-infinity________lo_______..._______hi________+infinity | |
| | | | ... | | | |
slices' boundaries | |
^ ^ | |
slo shi | |
(slo => slice containing lo etc) | |
A slice CONTAINS value x if x has the width equal to the slice's width | |
attribute. | |
The ANCHOR of a slice is its lowest unsigned integer, e.g. 10 for slice | |
of width 2 for both negative and positive ranges. | |
The CEILING of a slice with width W is the lowest unsigned integer in the | |
slice with width W + 1, or b^W. | |
The SIZE of a slice is the number of elements having its width. | |
The COUNT of a slice is the SIZE, minus those elements outside of the | |
boundary (lo or hi) when applicable. | |
The LENGTH of a slice is the COUNT x WIDTH | |
An END slice is a slice containing a boundary point. | |
A MIDDLE, or FULL, slice is a slice which is not an END slice. | |
The BODY of slice is all middle slices. | |
The LOW SLICE is the slice containing the `lo` boundary; the HIGH SLICE | |
contains the `hi` boundary. | |
The BOTTOM slice represents numbers within the boundaries having | |
the smallest width. This is an end slice when both boundaries are | |
on the same side of the origin. | |
The TOP, or TIP slice represents numbers within the boundaries having | |
the largest width. This is always one of the end slices. | |
A slice is GREATER/SMALLER than another when it has a larger/smaller | |
width. | |
A SHARED slice is one which contains values on both sides of the origin, | |
e.g. [10,2] within the range (-120,200) | |
A slice COVER is the collection of both end and body slices. | |
A BLOCK of slices represents an ordered sequence of slices for which either | |
a single or the same, computation can be used to the sum of the length of | |
each slice. The block can be represented by a simple interval between | |
the first of its slice's anchor and the last slice's ceiling. | |
By definition, each end slice must have its own block. | |
A SYSTEM is collection of all blocks necessary to compute the number of digits | |
between the endpoints. Blocks don't need to be ordered. | |
PROPERTIES | |
A full slice has a count of (b - 1) x b^(width - 1) elements, each with width | |
<slice-width> digits, and the slice length is count X width: | |
length( slice[10,2]) = 9 x 10^1 x 2 = 180, | |
length( slice[100,3]) = 9 x 10^2 x 3 = 2700, etc. | |
Since num-digits(x, y) = num-digits(-y, -x), there are some interesting | |
properties: | |
- when boundaries are on both sides, the end slice counts are: | |
ceiling - lo for the low slice, | |
hi - anchor for the high slice | |
-- for boundaries both <= 0, use their absolute values and reverse them | |
to obtain the same cover | |
Shared slices offer a speedup opportunity: | |
- Within a range bounded on both sides of the origin, some widths | |
have their slice counted twice, e.g. interval (-15,111) has two occurrences | |
of slice [0,1], and therefore can be counted twice (with the caveat of | |
value 0, see below). Also, the beginning of the next slice [10 2] can also | |
be counted twice because both (-15,-10) and [10,14] are part of that slice. | |
- Once the middle slices are isolated they can be processed at the block | |
level, making room for performance improvements. | |
SYSTEMS | |
- Depending on the position of endpoints around the origin there are four | |
possible block setups, or systems: | |
S1) Both endpoints on the same side of the origin (if on the negative | |
side, the system can be converted using absolute values and | |
swapping the endpoints): | |
<---------------------0--lo------hi------------> | |
S1-1) If on the same slice, system is a single block | |
containing the slice for both endpoints | |
=> <Low + Hi Slice Block> | |
S1-2) Otherwise, system is three blocks: | |
=> <Low Slice Block> <Middle Block> <Hi Slice Block> | |
S2) Endpoints are on different sides of the origin. In this scenario, | |
the interval between the lesser width of the two endpoints, and | |
zero, can be counted twice: | |
(here lo, hi on different slices, neither on [0 1]) | |
<--- lo ------------- 0 ------ hi -----------> | |
| | | | | | | | slices counted twice | |
| | | | | slices counted once | |
^ | |
(one slice) | |
S2-0) `lo` and `hi` are on the [0 1] slice: | |
=> <Low + Hi Slice Block> | |
S2-1) `lo` and `hi` are on the same slice other than [0 1]: | |
=> <Low + Hi Slice Block> <Middle Block counted 2x except zero> | |
S2-2) `lo` and `hi` are on different slices: | |
=> <lo Slice Block> | |
<Middle Block A counted 2x except zero> | |
<Middle Block B counted once> | |
<hi Slice Block> | |
Block A above spans the interval [0,minS), and | |
Block B spans (minS, maxS) (could be empty if | |
where minS and maxS are the lesser and greater slices | |
of lo slice and hi slice, resp. | |
COUNTING THE END SLICES | |
There are a few cases for which end slices are counted. Fortunately, | |
they follow closely the system categories. However for S2-2 there | |
are several subtleties when one endpoint is in slice [0 1]: | |
S1-1) unique end slice count is => hi - lo - 1 | |
S1-2-A) for lo slice, count is => ceiling - lo - 1 | |
S1-2-B) for hi slice, count is => hi - anchor | |
S2-0) unique slice count is => hi - lo -1 | |
S2-1) unique end-slice count is | |
=> (hi - anchor) + (-low - anchor) | |
S2-2-A) end slice not in [0 1], and is maxS (see above): | |
=> abs(endpoint) - anchor. | |
S2-2-B) end slice not in [0 1], and is minS: | |
=> abs(endpoint) - anchor | |
+ ceiling - anchor | |
(the full slice is also counted once) | |
*Fake* S2-2-C) end slice is [0 1] and is maxS: | |
=> then this is the same as S2-0 and thus ignored | |
S2-2-D) end slice is [0 1] and is minS: | |
=> abs(endpoint) + (ceiling - 1) | |
(taking care of not repeating the zero) | |
Thus all types of end slices are accounted for, and other slices | |
can have their counts/lenghts deducted within their blocks | |
STRATEGY | |
- Within each middle block interval [a0, ceil), we deduce from a0 | |
the next anchor and ceiling etc, until we reach the interval ceiling, | |
aggregating the total length along the way, with at at least O(log N) | |
performance on the largest width N. | |
Some additional insights might even lead to a formula processing | |
the block in a single computation, achieving O(1) performance, e.g | |
if a formula for Summation(b^(w - 1)w) over all the widths, can be found. | |
- The S2-2 type blocks can provide up to 2x speedup. | |
- To avoid having to write cluttered logic branching, we can dispatch | |
computations to short functions based on the categories tested above. | |
Enter multimethods... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here is the code (clojure 1.11+ required for
clojure.math
):