Skip to content

Instantly share code, notes, and snippets.

@KingCode
Last active September 12, 2022 17:29
Show Gist options
  • Save KingCode/ee836557dd8d140b6dba46312bfaae7c to your computer and use it in GitHub Desktop.
Save KingCode/ee836557dd8d140b6dba46312bfaae7c to your computer and use it in GitHub Desktop.
See
https://gist.github.com/ericnormand/7174aaccc71025de86ddac77553f8595
How many digits?
Imagine you took all the integers between n and m (exclusive, n <
m) and concatenated them together. How many digits would you have?
Write a function that takes two numbers and returns how many
digits. Note that the numbers can get very big, so it is not
possible to build the string in the general case.
Examples:
(num-digits 0 1) ;=> 0 (there are no integers between 0 and 1)
(num-digits 0 10) ;=> 9 (1, 2, 3, 4, 5, 6, 7, 8, 9)
(num-digits 9 100) ;=> 180
_____________________________________________________
;; Model ;;;;;;;;;;;;
;
Since the width (number of digits) of an integer is derived from the
sum of powers of the base, and that all integers with the same width
follow each other (negatives and positivies are further categorized,
see below), we can partition all numbers according to their width.
DEFINITIONS
The WIDTH of a number is the number of digits of its absolute value;
the width of a slice is the width of all numbers for that slice.
A SLICE is a tuple [a, w] depicting the sequence of unsigned integers
(or the absolute value of signed integers) in some base `b` having
the same width, starting with `a` (the anchor) having width `w`,
e.g. [10, 2] [100, 3],[0, 1]:
-infinity________lo_______..._______hi________+infinity
| | | | ... | |
slices' boundaries
^ ^
slo shi
(slo => slice containing lo etc)
A slice CONTAINS value x if x has the width equal to the slice's width
attribute.
The ANCHOR of a slice is its lowest unsigned integer, e.g. 10 for slice
of width 2 for both negative and positive ranges.
The CEILING of a slice with width W is the lowest unsigned integer in the
slice with width W + 1, or b^W.
The SIZE of a slice is the number of elements having its width.
The COUNT of a slice is the SIZE, minus those elements outside of the
boundary (lo or hi) when applicable.
The LENGTH of a slice is the COUNT x WIDTH
An END slice is a slice containing a boundary point.
A MIDDLE, or FULL, slice is a slice which is not an END slice.
The BODY of slice is all middle slices.
The LOW SLICE is the slice containing the `lo` boundary; the HIGH SLICE
contains the `hi` boundary.
The BOTTOM slice represents numbers within the boundaries having
the smallest width. This is an end slice when both boundaries are
on the same side of the origin.
The TOP, or TIP slice represents numbers within the boundaries having
the largest width. This is always one of the end slices.
A slice is GREATER/SMALLER than another when it has a larger/smaller
width.
A SHARED slice is one which contains values on both sides of the origin,
e.g. [10,2] within the range (-120,200)
A slice COVER is the collection of both end and body slices.
A BLOCK of slices represents an ordered sequence of slices for which either
a single or the same, computation can be used to the sum of the length of
each slice. The block can be represented by a simple interval between
the first of its slice's anchor and the last slice's ceiling.
By definition, each end slice must have its own block.
A SYSTEM is collection of all blocks necessary to compute the number of digits
between the endpoints. Blocks don't need to be ordered.
PROPERTIES
A full slice has a count of (b - 1) x b^(width - 1) elements, each with width
<slice-width> digits, and the slice length is count X width:
length( slice[10,2]) = 9 x 10^1 x 2 = 180,
length( slice[100,3]) = 9 x 10^2 x 3 = 2700, etc.
Since num-digits(x, y) = num-digits(-y, -x), there are some interesting
properties:
- when boundaries are on both sides, the end slice counts are:
ceiling - lo for the low slice,
hi - anchor for the high slice
-- for boundaries both <= 0, use their absolute values and reverse them
to obtain the same cover
Shared slices offer a speedup opportunity:
- Within a range bounded on both sides of the origin, some widths
have their slice counted twice, e.g. interval (-15,111) has two occurrences
of slice [0,1], and therefore can be counted twice (with the caveat of
value 0, see below). Also, the beginning of the next slice [10 2] can also
be counted twice because both (-15,-10) and [10,14] are part of that slice.
- Once the middle slices are isolated they can be processed at the block
level, making room for performance improvements.
SYSTEMS
- Depending on the position of endpoints around the origin there are four
possible block setups, or systems:
S1) Both endpoints on the same side of the origin (if on the negative
side, the system can be converted using absolute values and
swapping the endpoints):
<---------------------0--lo------hi------------>
S1-1) If on the same slice, system is a single block
containing the slice for both endpoints
=> <Low + Hi Slice Block>
S1-2) Otherwise, system is three blocks:
=> <Low Slice Block> <Middle Block> <Hi Slice Block>
S2) Endpoints are on different sides of the origin. In this scenario,
the interval between the lesser width of the two endpoints, and
zero, can be counted twice:
(here lo, hi on different slices, neither on [0 1])
<--- lo ------------- 0 ------ hi ----------->
| | | | | | | | slices counted twice
| | | | | slices counted once
^
(one slice)
S2-0) `lo` and `hi` are on the [0 1] slice:
=> <Low + Hi Slice Block>
S2-1) `lo` and `hi` are on the same slice other than [0 1]:
=> <Low + Hi Slice Block> <Middle Block counted 2x except zero>
S2-2) `lo` and `hi` are on different slices:
=> <lo Slice Block>
<Middle Block A counted 2x except zero>
<Middle Block B counted once>
<hi Slice Block>
Block A above spans the interval [0,minS), and
Block B spans (minS, maxS) (could be empty if
where minS and maxS are the lesser and greater slices
of lo slice and hi slice, resp.
COUNTING THE END SLICES
There are a few cases for which end slices are counted. Fortunately,
they follow closely the system categories. However for S2-2 there
are several subtleties when one endpoint is in slice [0 1]:
S1-1) unique end slice count is => hi - lo - 1
S1-2-A) for lo slice, count is => ceiling - lo - 1
S1-2-B) for hi slice, count is => hi - anchor
S2-0) unique slice count is => hi - lo -1
S2-1) unique end-slice count is
=> (hi - anchor) + (-low - anchor)
S2-2-A) end slice not in [0 1], and is maxS (see above):
=> abs(endpoint) - anchor.
S2-2-B) end slice not in [0 1], and is minS:
=> abs(endpoint) - anchor
+ ceiling - anchor
(the full slice is also counted once)
*Fake* S2-2-C) end slice is [0 1] and is maxS:
=> then this is the same as S2-0 and thus ignored
S2-2-D) end slice is [0 1] and is minS:
=> abs(endpoint) + (ceiling - 1)
(taking care of not repeating the zero)
Thus all types of end slices are accounted for, and other slices
can have their counts/lenghts deducted within their blocks
STRATEGY
- Within each middle block interval [a0, ceil), we deduce from a0
the next anchor and ceiling etc, until we reach the interval ceiling,
aggregating the total length along the way, with at at least O(log N)
performance on the largest width N.
Some additional insights might even lead to a formula processing
the block in a single computation, achieving O(1) performance, e.g
if a formula for Summation(b^(w - 1)w) over all the widths, can be found.
- The S2-2 type blocks can provide up to 2x speedup.
- To avoid having to write cluttered logic branching, we can dispatch
computations to short functions based on the categories tested above.
Enter multimethods...
@KingCode
Copy link
Author

KingCode commented Sep 12, 2022

Here is the code (clojure 1.11+ required for clojure.math):

(require '[clojure.math :as m])

(defn pow [base p]
  (-> (m/pow (bigint base) (bigint p)) bigint))

(defn width [n base]
  (if (zero? n) 
    1
    (->> [0 (abs n)] (iterate (fn [[p x]] [(inc p) (quot x base)])) 
         (drop-while (fn [[_ x]] (< 0 x)))
         ffirst)))

(defn anchor [width base]
  (if (= 1 width)
    0
    (pow base (dec width))))

(defn block-length 
  "Computes the length of a block of full slices in the width interval 
  [wfrom wto).
  "
[[afrom wfrom cfrom :as from-slice] wto base]
  (let [cfrom (or cfrom (pow base wfrom))]
    (loop [a afrom w wfrom c cfrom acc 0]
      ;; (println :A a :W w :ACC acc)
      (if (= wto w)
        acc
        (recur (if (zero? a) 10 (*' a base)) 
               (inc' w)
               (*' c base)
               (-> c (-' a) (*' w) (+' acc)))))))

(defn w "Yields the slice width" [slice]
  (second slice))

(defn a "Yields the slice anchor" [slice]
  (first slice))

(defn c "Yields the slice ceiling" [slice base]
  (if (< 2 (count slice))
    (or (nth slice 2) (pow base (w slice)))
    (pow base (w slice))))

(defn end? [low hiw slice]
  (or (= hiw (w slice)) (= low (w slice))))

(defn tag-pos
  "Yields an indicator of lo and hi positions relative to zero."
[lo, hi]
  (let [lo<0 (< lo 0) lo=0 (= 0 lo)
        hi<0 (< hi 0) hi=0 (= 0 hi)]
    (case [lo<0 lo=0 hi<0 hi=0]
      ([false false false false]
       [false true false false]
       [false false false true]) ::++   ;; both lo, hi >= 0
      ([true false true false]
       [false true true false]
       [true false false true]) ::--    ;; both lo, hi <= 0
      [true false false false] ::-+     ;; lo < 0, hi >= 0
      [false true false true] ::00)))   ;; lo = hi = 0 

(defn tag-spread
  "Yields an indicator of widths of slices for lo and hi."
[lo low hi hiw]
  (cond
    (= low hiw) ::s=
    (< low hiw) ::s<
    :else ::s>))

(defn minmax-slices 
  "Yields los and his in sorted order of width"
  [los his]
  (->> [los his] (sort-by w)))

(defn tag-minmax
  "Yields an indicator of whether slice is for the endpoint
  with smallest or greatest width, or none if not an end slice."
[los his slice]
  (let [min-w (min (w los) (w his))
        max-w (max (w los) (w his))]
    (condp = (w slice)
      min-w ::min
      max-w ::max
      ::none)))

(defn tag?-origin
  "Yields a predicate indicator of whether slice is [0 1]"
[slice]
  (if (= [0 1] [(a slice) (w slice)])
    ::true ::false))

(defn tag?-end-on-slice0
  "Yields a predicate indicator of whether one of low, hiw is 1,
  i.e. for slice [0 1]"
[low hiw]
  (if (some #{1} [low hiw]) ::true ::false))

(defn tag?-end
  "Yields a predicate indicator of whether slice contains an endpoint."
[low hiw slice]
  (if (end? low hiw slice)
    ::true ::false))

(defn tag-end
  "Yields an indicator of which type of endpoint slice is for, if any;
  ::none otherwise."
[low hiw slice]
  (cond 
    (= low (w slice)) ::los
    (= hiw (w slice)) ::his
    :else ::none))

(defn derive-from! [parent & tags] (doseq [t tags] (derive t parent)))
(derive-from! ::same-side, ::++ ::--)
(derive-from! ::not-s=, ::s< ::s>)
(derive-from! ::any,
              ::true ::false ::same-side ::-+ ::00 ::s< ::s=  ::s>
              ::los ::his ::none ::min ::max)

(defn ->end-slices [lo hi base]
  (let [[low hiw] [(width lo base) (width hi base)]
        [loa hia] [(anchor low base) (anchor hiw base)]]
    [[loa low] [hia hiw]]))

(defn make-blocks-dispatch
  "Yields a dispatch value for `make-blocks`"
[lo los hi his base]
  (let [low (w los)
        hiw (w his)]
    [(tag-pos lo hi)
     (tag-spread lo low hi hiw)
     (tag?-end-on-slice0 low hiw)]))

(def make-blocks-dispatch-table
  ;;    eps-vs-0,  ep1-vs-ep2-width   some-end-on-slice0? 
  {:S1-1 [::++      ::s=                    ::any]
   :S1-2 [::++      ::s<                    ::any]
   :S1-3 [::--      ::any                   ::any]
   :S2-0 [::-+      ::s=                    ::true]
   :S2-1 [::-+      ::s=                    ::false]
   :S2-2 [::-+      ::not-s=                ::any]})

(ns-unmap *ns* 'make-blocks)
(defmulti make-blocks 
  "Given a context deduced from the arguments, yields 
  a sequence of thunks representing blocks Each block's invocation 
  yields a partial result for the total number-of-digits."
make-blocks-dispatch)

(defn count-end-slice-dispatch
  "Yields a dispatch value for `count-end-slice`"
[lo los hi his slice _]
  (let [low (w los) 
        hiw (w his)]
    [(tag-pos lo hi)
     (tag-spread lo low hi hiw)
     (tag-end low hiw slice)
     (tag?-origin slice)
     (tag-minmax los his slice)
     ;;for safety:
     (tag?-end low hiw slice)]))

(def count-end-slice-dispatch-table
  ;;        eps-vs-0, eps-slices-width, los or his,in [0 1] slice?, min/maxS,  end?
  {:S1-1    [::++      ::s=               ::any      ::any         ::any  ::true]
   :S1-2-A  [::++      ::not-s=           ::los      ::any         ::any  ::true]
   :S1-2-B  [::++      ::not-s=           ::his      ::any         ::any  ::true]
   :S1-3
            [::--      ::any              ::any      ::any         ::any  ::true]
   :S2-0    [::-+      ::s=               ::any      ::true        ::any  ::true]
   :S2-1    [::-+      ::s=               ::any      ::false       ::any  ::true]
   :S2-2-A  [::-+      ::not-s=           ::any      ::false       ::max  ::true]
   :S2-2-B  [::-+      ::not-s=           ::any      ::false       ::min  ::true]
   :S2-2-D  [::-+      ::not-s=           ::any      ::true        ::min  ::true]})

(ns-unmap *ns* 'count-end-slice)
(defmulti count-end-slice 
  "Returns the number of elements contained in an endpoint slice, based on 
  the context deduced from the arguments: endpoints' positions relative 
  to the origin; the relationship between endpoints' widths; the endpoint
  `slice` is for; whether the endpoint is in the [0 1] slice; if `lo` and `hi`
  are on both sides of the origin, which of minS/maxS `slice` is for.
  `slice` must contain an endpoint or an exception is thrown.  
"
count-end-slice-dispatch)

(defn endpoint [lo los hi his slice]                        
  (if (= (w los) (w slice))
    lo hi))

;; Utility to reduce defmethod boiler plate below.
(defmacro defmethods-with [mm-name dispatch-m params & k+bodies]
  (let [m (eval dispatch-m)
        k+bodies (partition 2 k+bodies)]
    (cons 'do
          (map (fn [[k body]]
                 `(defmethod ~mm-name ~(get m k) ~params ~@body))
               k+bodies))))

(defmethods-with count-end-slice count-end-slice-dispatch-table 
  [lo los hi his slice base]
  :S1-1
  [(- hi lo 1)]

  :S1-2-A
  [(- (c slice base) lo 1)]

  :S1-2-B
  [(- hi (a slice))]

  :S1-3 ;; may be redundant if done at block level
  [(count-end-slice (- hi) his (- lo) los slice base)]

  :S2-0
  [(- hi lo 1)]

  :S2-1
  [(+ (- hi (a slice)) 
      (- (- lo) (a slice)))]

  :S2-2-A
  [(- (abs (endpoint lo los hi his slice)) (a slice))]

  :S2-2-B
  [(+ (- (abs (endpoint lo los hi his slice)) (a slice))
      (- (c slice base) (a slice)))]

  :S2-2-D
  [(+ (abs (endpoint lo los his slice)) (- (c slice base) 1))])

(defmethod count-end-slice :default [lo los hi his slice base]
  (throw (ex-info "No matching count-end-slice dispatch!"
                  {:lo lo :los los :hi hi 
                   :his his :slice slice
                   :dispatch-value
                   (count-end-slice-dispatch lo los hi his slice)})))

(defmethods-with make-blocks make-blocks-dispatch-table
  [lo los hi his base]
  :S1-1 
  [[#(-> (count-end-slice lo los hi his los base) (* (w los)))]] 
 
  :S1-3 ;; S1-1, reversed
  [(make-blocks (- hi) his (- lo) los base)]

  :S1-2
  [[#(-> (count-end-slice lo los hi his los base) (* (w los)))
    #(let [anchor (c los base), width (inc (w los))]
       (block-length [anchor width] (w his) base))
    #(-> (count-end-slice lo los hi his his base) (* (w his)))]]

  :S2-0 
  [[#(-> (count-end-slice lo los hi his los base) (* (w los)))]]

  :S2-1
  [[#(-> (count-end-slice lo los hi his los base) (* (w los)))
    #(-> (block-length [0 1] (w los) base)
         (* 2) (- 1))]]
  
  :S2-2
  [(let [[minS maxS] (minmax-slices los his)
         minS+1 [(-> minS a (* base)) (-> minS w inc)]]
     [#(-> (count-end-slice lo los hi his los base) (* (w los)))
      #(-> (block-length [0 1] (w minS) base) (* 2) (- 1)) ;; Block A
      #(-> (block-length minS+1 (w maxS) base))            ;; Block B
      #(-> (count-end-slice lo los hi his his base) (* (w his)))])])


(defmethod make-blocks :default [lo los hi his base]
  (ex-info "No matching system dispatch!"
             {:lo lo :los los :hi hi 
              :his his :base base 
              :dispatch-value 
              (make-blocks-dispatch lo los hi his base)}))

(defn num-digits
  ([lo hi]
   (num-digits lo hi 10))
  ([lo hi base]
   (let [ [lo hi] (map bigint [lo hi])
         [los his] (->end-slices lo hi base)
         blocks (make-blocks lo los hi his base)]
     (->> blocks
          (reduce (fn [acc f] (+ acc (f)))
                  0)))))

(num-digits 0 1) ;=> 0 (there are no integers between 0 and 1)
(num-digits 0 10) ;=> 9 (1, 2, 3, 4, 5, 6, 7, 8, 9)
(num-digits 9 100) ;=> 180
(num-digits -10 0) ;=> 9

(num-digits -10e180 10e200) ;=> 200888888888888911211685432098765430077777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777779N

(time (dotimes [_ 25](num-digits -10e180 10e200)))
;;=> "Elapsed time: 11.012467 msecs"

;; Non-decimal bases?
(num-digits -2r10 2r101 2) ;;=> 10N
(num-digits -36rZ 36rA 36) ;;=> 44N

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment