8000 Issue in `perf/mapv` · Issue #66961 · metabase/metabase · GitHub
[go: up one dir, main page]

Skip to content

Issue in perf/mapv #66961

@dpsutton

Description

@dpsutton

Describe the bug

(mapv (comp first vector)
      [1 2]
      (concat (range 1000)
              (for [_ (range 10)]
                (throw (ex-info "you should not be here" {})))))
[1 2]
performance-test=> 
(perf/mapv (comp first vector)
           [1 2]
           (concat (range 1000)
                   (for [_ (range 10)]
                     (throw (ex-info "you should not be here" {})))))
Execution error (ExceptionInfo) at metabase.util.performance-test/eval2368416$iter$fn$fn (REPL:604).
you should not be here

This is a succinct demo of the bug. And the following will be an infinite loop:

(mapv (comp first vector)
      [1 2]
      (range))
[1 2]

if you use perf/mapv

Issue: it uses count to determine if it's under chunking limits at <= 32 for optimizations. But will try to count infinite sequences or large sequences.

A trivial adaptation to use bounded-count has to update a helper

#?(:clj
   (defn- smallest-count
     (^long [c1 c2] (min (count c1) (count c2)))
     (^long [c1 c2 c3] (min (count c1) (count c2) (count c3)))
     (^long [c1 c2 c3 c4] (min (count c1) (count c2) (count c3) (count c4)))))

and then if you put a bound parameter to each of these you run into one more issue:

performance-test=> (:clj
                    (defn- smallest-count
                      (^long [bound c1 c2] (min (bounded-count bound c1) (bounded-count bound c2)))
                      (^long [bound c1 c2 c3] (min (bounded-count bound c1) (bounded-count bound c2) (bounded-count bound c3)))
                      (^long [bound c1 c2 c3 c4] (min (bounded-count bound c1) (bounded-count bound c2) (bounded-count bound c3) (bounded-count bound c4)))))
Syntax error (IllegalArgumentException) compiling fn* at (REPL:609:21).
fns taking primitives support only 4 or fewer args

At this point i kinda backed away.

diff --git a/src/metabase/util/performance.cljc b/src/metabase/util/performance.cljc
index 7e837d40a97..a201d8ff7a2 100644
--- a/src/metabase/util/performance.cljc
+++ b/src/metabase/util/performance.cljc
@@ -193,22 +193,22 @@
 
 #?(:clj
    (defn- smallest-count
-     (^long [c1 c2] (min (count c1) (count c2)))
-     (^long [c1 c2 c3] (min (count c1) (count c2) (count c3)))
-     (^long [c1 c2 c3 c4] (min (count c1) (count c2) (count c3) (count c4)))))
+     (^long [bound c1 c2] (min (bounded-count bound c1) (bounded-count bound c2)))
+     (^long [bound c1 c2 c3] (min (bounded-count bound c1) (bounded-count bound c2) (bounded-count bound c3)))
+     (^long [bound c1 c2 c3 c4] (min (bounded-count bound c1) (bounded-count bound c2) (bounded-count bound c3) (bounded-count bound c4)))))
 
 (defn mapv
   "Drop-in replacement for `clojure.core/mapv`.
   Iterates multiple collections more efficiently and uses Java iterators under the hood (the CLJ version). CLJS
   version is only optimized for a single collection arity."
   ([f coll1]
-   (let [n (count coll1)]
+   (let [n (bounded-count 33 coll1)]
      (cond (= n 0) []
            (<= n 32) (small-persistent! (reduce apply-and-small-conj! (small-transient n f) coll1))
            :else (persistent! (reduce #(conj! %1 (f %2)) (transient []) coll1)))))
   ([f coll1 coll2]
    #?(:clj
-      (let [n (smallest-count coll1 coll2)]
+      (let [n (smallest-count 33 coll1 coll2)]
         (cond (= n 0) []
               (<= n 32) (small-persistent! (reduce apply-and-small-conj! (small-transient n f) coll1 coll2))
               :else (persistent! (reduce #(conj! %1 (f %2 %3)) (transient []) coll1 coll2))))
@@ -216,7 +216,7 @@
       (core/mapv f coll1 coll2)))
   ([f coll1 coll2 coll3]
    #?(:clj
-      (let [n (smallest-count coll1 coll2 coll3)]
+      (let [n (smallest-count 33 coll1 coll2 coll3)]
         (cond (= n 0) []
               (<= n 32) (small-persistent! (reduce apply-and-small-conj! (small-transient n f) coll1 coll2 coll3))
               :else (persistent! (reduce #(conj! %1 (f %2 %3 %4)) (transient []) coll1 coll2 coll3))))
@@ -224,7 +224,7 @@
       (core/mapv f coll1 coll2 coll3)))
   ([f coll1 coll2 coll3 coll4]
    #?(:clj
-      (let [n (smallest-count coll1 coll2 coll3 coll4)]
+      (let [n (smallest-count 33 coll1 coll2 coll3 coll4)]
         (cond (= n 0) []
               (<= n 32) (small-persistent! (reduce apply-and-small-conj! (small-transient n f) coll1 coll2 coll3 coll4))
               :else (persistent! (reduce #(conj! %1 (f %2 %3 %4 %5)) (transient []) coll1 coll2 coll3 coll4))))
diff --git a/test/metabase/util/performance_test.cljc b/test/metabase/util/performance_test.cljc
index f438f2e58fe..5c2df7dff00 100644
--- a/test/metabase/util/performance_test.cljc
+++ b/test/metabase/util/performance_test.cljc
@@ -27,6 +27,15 @@
                                      (= r 2) vec)))))]
       (is (= (apply concat inputs) (apply perf/concat inputs))))))
 
+(deftest mapv-test
+  (testing "Does not realize long or infinite sequences"
+    (is (= [1 2]
+           (perf/mapv (comp first vector)
+                      [1 2]
+                      (concat (range 1000)
+                              (for [_ (range 10)]
+                                (throw (ex-info "you should not be here" {})))))))))
+
 (defn- mapv-via-run! [f coll]
   (let [v (volatile! [])]
     (perf/run! #(vswap! v conj (f %)) coll)

To Reproduce

see above

Expected behavior

No response

Logs

No response

Information about your Metabase installation

master

Severity

p2

Additional context

bad footgun

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0