arangodb · jsteemann · Dec 21, 2018 · Dec 3, 2018 · Dec 3, 2018 · Dec 4, 2018
@@ -118,6 +118,30 @@ details, including the index-identifier, is returned.
     @endDocuBlock ensureHashIndexArray
 
 
+<!-- Creating Hash Index in Background
+---------------------------------
+
+{% hint 'info' %}
+This section only applies to the *rocksdb* storage engine
+{% endhint %}
+
+Creating new indexes is by default done under an exclusive collection lock. This means
+that the collection (or the respective shards) are not available as long as the index
+is created. This "foreground" index creation can be undesireable, if you have to perform it
+on a live system without a dedicated maintenance window.
+
+Indexes can also be created in "background", not using an exclusive lock during the creation. 
+The collection remains available, other CRUD operations can run on the collection while the index is created.
+This can be achieved by using the *inBackground* option.
+
+To create an hash index in the background in *arangosh* just specify `inBackground: true`:
+
+```js
+db.collection.ensureIndex({ type: "hash", fields: [ "value" ], inBackground: true });
+``` -->
+
+For more information see [Creating Indexes in Background](IndexBasics.md#creating-indexes-in-background)
+
 Ensure uniqueness of relations in edge collections
 --------------------------------------------------
 

@@ -22,6 +22,14 @@ are covered by an edge collection's edge index automatically.
 Using the system attribute `_id` in user-defined indexes is not possible, but 
 indexing `_key`, `_rev`, `_from`, and `_to` is.
 
+<!-- Creating new indexes is usually done under an exclusive collection lock. The collection is not
+available as long as the index is created.  This "foreground" index creation can be undesireable, 
+if you have to perform it on a live system without a dedicated maintenance window.
+
+For potentially long running index  creation operations the _rocksdb_ storage-engine also supports 
+creating indexes in "background". The colletion remains available durint the index creation, 
+see the section [Creating Indexes in Background](#creating-indexes-in-background) for more information. -->
+
 ArangoDB provides the following index types:
 
 Primary Index
@@ -243,31 +251,6 @@ Skiplist indexes support [indexing array values](#indexing-array-values) if the
 attribute name is extended with a <i>[\*]</i>`.
 
 
-Persistent Index
-----------------
-
-The persistent index is a sorted index with persistence. The index entries are written to
-disk when documents are stored or updated. That means the index entries do not need to be
-rebuilt from the collection data when the server is restarted or the indexed collection
-is initially loaded. Thus using persistent indexes may reduce collection loading times.
-
-The persistent index type can be used for secondary indexes at the moment. That means the
-persistent index currently cannot be made the only index for a collection, because there
-will always be the in-memory primary index for the collection in addition, and potentially
-more indexes (such as the edges index for an edge collection).
-
-The index implementation is using the RocksDB engine, and it provides logarithmic complexity
-for insert, update, and remove operations. As the persistent index is not an in-memory
-index, it does not store pointers into the primary index as all the in-memory indexes do,
-but instead it stores a document's primary key. To retrieve a document via a persistent
-index via an index value lookup, there will therefore be an additional O(1) lookup into 
-the primary index to fetch the actual document.
-
-As the persistent index is sorted, it can be used for point lookups, range queries and sorting
-operations, but only if either all index attributes are provided in a query, or if a leftmost 
-prefix of the index attributes is specified.
-
-
 Geo Index
 ---------
 
@@ -307,6 +290,37 @@ minimum length will be included in the index.
 The fulltext index is used via dedicated functions in AQL or the simple queries, but will
 not be enabled for other types of queries or conditions.
 
+
+Persistent Index
+----------------
+
+{% hint 'warning' %}
+this index should not be used anymore, instead use the rocksdb storage engine
+with either the *skiplist* or *hash* index.
+{% endhint %}
+
+The persistent index is a sorted index with persistence. The index entries are written to
+disk when documents are stored or updated. That means the index entries do not need to be
+rebuilt from the collection data when the server is restarted or the indexed collection
+is initially loaded. Thus using persistent indexes may reduce collection loading times.
+
+The persistent index type can be used for secondary indexes at the moment. That means the
+persistent index currently cannot be made the only index for a collection, because there
+will always be the in-memory primary index for the collection in addition, and potentially
+more indexes (such as the edges index for an edge collection).
+
+The index implementation is using the RocksDB engine, and it provides logarithmic complexity
+for insert, update, and remove operations. As the persistent index is not an in-memory
+index, it does not store pointers into the primary index as all the in-memory indexes do,
+but instead it stores a document's primary key. To retrieve a document via a persistent
+index via an index value lookup, there will therefore be an additional O(1) lookup into 
+the primary index to fetch the actual document.
+
+As the persistent index is sorted, it can be used for point lookups, range queries and sorting
+operations, but only if either all index attributes are provided in a query, or if a leftmost 
+prefix of the index attributes is specified.
+
+
 Indexing attributes and sub-attributes
 --------------------------------------
 
@@ -534,3 +548,63 @@ optimizer may prefer the default edge index over vertex centric indexes
 based on the costs it estimates, even if a vertex centric index might
 in fact be faster. Vertex centric indexes are more likely to be chosen
 for highly connected graphs and with RocksDB storage engine.
+
+<!-- 
+Creating Indexes in Background
+------------------------------
+
+{% hint 'info' %}
+This section only applies to the *rocksdb* storage engine
+{% endhint %}
+
+Creating new indexes is by default done under an exclusive collection lock. This means
+that the collection (or the respective shards) are not available as long as the index
+is created. This "foreground" index creation can be undesireable, if you have to perform it
+on a live system without a dedicated maintenance window.
+
+Indexes can also be created in "background", not using an exclusive lock during the creation. 
+The collection remains available, other CRUD operations can run on the collection while the index is created.
+This can be achieved by using the *inBackground* option.
+
+To create a indexes in the background in *arangosh* just specify `inBackground: true`, 
+like in the following examples:
+
+```js
+// create the hash index in the background
+db.collection.ensureIndex({ type: "hash", fields: [ "value" ], unique: false, inBackground: true });
+db.collection.ensureIndex({ type: "hash", fields: [ "email" ], unique: true, inBackground: true });
+
+// skiplist indexes work also of course
+db.collection.ensureIndex({ type :"skiplist", fields: ["abc", "cdef"], unique: true, inBackground: true });
+db.collection.ensureIndex({ type :"skiplist", fields: ["abc", "cdef"], sparse: true, inBackground: true });
+
+// also supported on fulltext indexes
+db.collection.ensureIndex({ type: "geo", fields: [ "latitude", "longitude"], inBackground: true });
+db.collection.ensureIndex({ type: "geo", fields: [ "latitude", "longitude"], inBackground: true });
+db.collection.ensureIndex({ type: "fulltext", fields: [ "text" ], minLength: 4, inBackground: true })
+```
+
+### Behaviour
+
+Indexes that are still in the build process will not be visible via the ArangoDB API. Nevertheless it is not
+possible to create the same index twice via the *ensureIndex* API. AQL Queries will not use these indexes either
+until the indexes report back as finished. Note that the initial *ensureIndex* call or HTTP request will block until the index is completely ready. Existing single-threaded client programs can safely specify the 
+*inBackground* option as *true* and continue to work as before.
+
+{% hint 'info' %}
+Should you be building an index in the background you cannot rename or drop the collection.
+These operations will block until the index creation is finished.
+{% endhint %}
+
+Interrupted index build (i.e. due to a server crash) will remove the partially build index. 
+In the ArangoDB cluster the index might then be automatically recreated on affected shards.
+
+### Performance
+
+The backround index creation might be slower than the "foreground" index creation and require more RAM. 
+Under a write heavy load (specifically many remove, update or replace) operations, 
+the background index creation needs to keep a list of removed documents in RAM. This might become unsuistainable
+if this list grows to tens of millions of entries.
+
+Building an index is always a write heavy operation (internally), it is alsways a good idea to build indexes
+during times with less load.  -->
@@ -185,3 +185,28 @@ and
 { "a" : { "c" : 1, "b" : 1 } }
 ```
 will match.
+
+
+<!-- Creating Skiplist Index in Background
+---------------------------------
+
+{% hint 'info' %}
+This section only applies to the *rocksdb* storage engine
+{% endhint %}
+
+Creating new indexes is by default done under an exclusive collection lock. This means
+that the collection (or the respective shards) are not available as long as the index
+is created. This "foreground" index creation can be undesireable, if you have to perform it
+on a live system without a dedicated maintenance window.
+
+Indexes can also be created in "background", not using an exclusive lock during the creation. 
+The collection remains available, other CRUD operations can run on the collection while the index is created.
+This can be achieved by using the *inBackground* option.
+
+To create a Skiplist index in the background in *arangosh* just specify `inBackground: true`:
+
+```js
+db.collection.ensureIndex({ type: "skiplist", fields: [ "value" ], inBackground: true });
+```
+
+For more information see [Creating Indexes in Background](IndexBasics.md#creating-indexes-in-background) -->
diff --git a/arangod/Agency/Supervision.cpp b/arangod/Agency/Supervision.cpp
@@ -1066,7 +1066,7 @@ void Supervision::readyOrphanedIndexCreations() {
           indexes = collection("indexes").getArray();
           if (indexes.length() > 0) {
             for (auto const& planIndex : VPackArrayIterator(indexes)) {
-              if (planIndex.hasKey("isBuilding") && collection.has("shards")) {
+              if (planIndex.hasKey(StaticStrings::IndexIsBuilding) && collection.has("shards")) {
                 auto const& planId = planIndex.get("id");
                 auto const& shards = collection("shards");
                 if (collection.has("numberOfShards") &&
@@ -1121,7 +1121,7 @@ void Supervision::readyOrphanedIndexCreations() {
                     { VPackObjectBuilder props(envelope.get());
                       for (auto const& prop : VPackObjectIterator(planIndex)) {
                         auto const& key = prop.key.copyString();
-                        if (key != "isBuilding") {
+                        if (key != StaticStrings::IndexIsBuilding) {
                           envelope->add(key, prop.value);
                         }
                       }}

diff --git a/arangod/Aql/OptimizerRules.cpp b/arangod/Aql/OptimizerRules.cpp
@@ -6642,15 +6642,15 @@ static bool geoFuncArgCheck(ExecutionPlan* plan, AstNode const* args,
   info.collectionNodeToReplace = collNode;
   info.collectionNodeOutVar = collNode->outVariable();
   info.collection = collNode->collection();
-  std::shared_ptr<LogicalCollection> coll =
-      collNode->collection()->getCollection();
-
-  // check for suitable indexes
-  for (std::shared_ptr<arangodb::Index> idx : coll->getIndexes()) {
+
+  // we should not access the LogicalCollection directly
+  Query* query = plan->getAst()->query();
+  auto indexes = query->trx()->indexesForCollection(info.collection->name());
+  // check for suitiable indexes
+  for (std::shared_ptr<arangodb::Index> idx : indexes) {
     // check if current index is a geo-index
-    bool isGeo =
-        idx->type() == arangodb::Index::IndexType::TRI_IDX_TYPE_GEO_INDEX;
-    if (isGeo && idx->fields().size() == 1) {  // individual fields
+    bool isGeo = idx->type() == arangodb::Index::IndexType::TRI_IDX_TYPE_GEO_INDEX;
+    if (isGeo && idx->fields().size() == 1) { // individual fields
       // check access paths of attributes in ast and those in index match
       if (idx->fields()[0] == attributeAccess.second) {
         if (info.index != nullptr && info.index != idx) {

diff --git a/arangod/Aql/OptimizerRulesReplaceFunctions.cpp b/arangod/Aql/OptimizerRulesReplaceFunctions.cpp
@@ -195,7 +195,6 @@ std::pair<AstNode*, AstNode*> getAttributeAccessFromIndex(Ast* ast, AstNode* doc
   for(auto& idx : indexes){
     if(Index::isGeoIndex(idx->type())) {
       // we take the first index that is found
-
       bool isGeo1 = idx->type() == Index::IndexType::TRI_IDX_TYPE_GEO1_INDEX;
       bool isGeo2 = idx->type() == Index::IndexType::TRI_IDX_TYPE_GEO2_INDEX;
       bool isGeo = idx->type() == Index::IndexType::TRI_IDX_TYPE_GEO_INDEX;

diff --git a/arangod/Cluster/ClusterInfo.cpp b/arangod/Cluster/ClusterInfo.cpp
@@ -2433,14 +2433,11 @@ int ClusterInfo::ensureIndexCoordinator(
 
   // check index id
   uint64_t iid = 0;
-
   VPackSlice const idSlice = slice.get(StaticStrings::IndexId);
-  if (idSlice.isString()) {
-    // use predefined index id
+  if (idSlice.isString()) { // use predefined index id
     iid = arangodb::basics::StringUtils::uint64(idSlice.copyString());
   }
-  if (iid == 0) {
-    // no id set, create a new one!
+  if (iid == 0) { // no id set, create a new one!
     iid = uniqid();
   }
   std::string const idString = arangodb::basics::StringUtils::itoa(iid);
@@ -2629,14 +2626,14 @@ int ClusterInfo::ensureIndexCoordinatorInner(
     for (auto const& e : VPackObjectIterator(slice)) {
       TRI_ASSERT(e.key.isString());
       std::string const& key = e.key.copyString();
-      if (key != StaticStrings::IndexId && key != "isBuilding") {
+      if (key != StaticStrings::IndexId && key != StaticStrings::IndexIsBuilding) {
         ob->add(e.key);
         ob->add(e.value);
       }
     }
     if (numberOfShards > 0 &&
         !slice.get(StaticStrings::IndexType).isEqualString("arangosearch")) {
-      ob->add("isBuilding", VPackValue(true));
+      ob->add(StaticStrings::IndexIsBuilding, VPackValue(true));
     }
     ob->add(StaticStrings::IndexId, VPackValue(idString));
   }
@@ -2709,7 +2706,7 @@ int ClusterInfo::ensureIndexCoordinatorInner(
         { VPackObjectBuilder o(&finishedPlanIndex);
           for (auto const& entry : VPackObjectIterator(newIndexBuilder.slice())) {
             auto const key = entry.key.copyString();
-            if (key != "isBuilding" && key != "isNewlyCreated") {
+            if (key != StaticStrings::IndexIsBuilding && key != "isNewlyCreated") {
               finishedPlanIndex.add(entry.key.copyString(), entry.value);
             }
           }

diff --git a/arangod/ClusterEngine/ClusterCollection.cpp b/arangod/ClusterEngine/ClusterCollection.cpp
@@ -360,42 +360,6 @@ void ClusterCollection::prepareIndexes(
   TRI_ASSERT(!_indexes.empty());
 }
 
-static std::shared_ptr<Index> findIndex(
-    velocypack::Slice const& info,
-    std::vector<std::shared_ptr<Index>> const& indexes) {
-  TRI_ASSERT(info.isObject());
-
-  // extract type
-  VPackSlice value = info.get("type");
-
-  if (!value.isString()) {
-    // Compatibility with old v8-vocindex.
-    THROW_ARANGO_EXCEPTION_MESSAGE(TRI_ERROR_INTERNAL,
-                                   "invalid index type definition");
-  }
-
-  std::string tmp = value.copyString();
-  arangodb::Index::IndexType const type = arangodb::Index::type(tmp.c_str());
-
-  for (auto const& idx : indexes) {
-    if (idx->type() == type) {
-      // Only check relevant indexes
-      if (idx->matchesDefinition(info)) {
-        // We found an index for this definition.
-        return idx;
-      }
-    }
-  }
-  return nullptr;
-}
-
-/// @brief Find index by definition
-std::shared_ptr<Index> ClusterCollection::lookupIndex(
-    velocypack::Slice const& info) const {
-  READ_LOCKER(guard, _indexesLock);
-  return findIndex(info, _indexes);
-}
-
 std::shared_ptr<Index> ClusterCollection::createIndex(
     arangodb::velocypack::Slice const& info, bool restore,
     bool& created) {
@@ -404,23 +368,19 @@ std::shared_ptr<Index> ClusterCollection::createIndex(
   WRITE_LOCKER(guard, _exclusiveLock);
   std::shared_ptr<Index> idx;
 
-  {
-    WRITE_LOCKER(guard, _indexesLock);
-    idx = findIndex(info, _indexes);
-    if (idx) {
-      created = false;
-      // We already have this index.
-      return idx;
-    }
+  WRITE_LOCKER(guard2, _indexesLock);
+  idx = lookupIndex(info);
+  if (idx) {
+    created = false;
+    // We already have this index.
+    return idx;
   }
 
   StorageEngine* engine = EngineSelectorFeature::ENGINE;
   TRI_ASSERT(engine != nullptr);
 
   // We are sure that we do not have an index of this type.
-  // We also hold the lock.
-  // Create it
-
+  // We also hold the lock. Create it
   idx = engine->indexFactory().prepareIndexFromSlice(
     info, true, _logicalCollection, false
   );

diff --git a/arangod/ClusterEngine/ClusterCollection.h b/arangod/ClusterEngine/ClusterCollection.h
@@ -104,9 +104,6 @@ class ClusterCollection final : public PhysicalCollection {
 
   void prepareIndexes(arangodb::velocypack::Slice indexesSlice) override;
 
-  /// @brief Find index by definition
-  std::shared_ptr<Index> lookupIndex(velocypack::Slice const&) const override;
-
   std::shared_ptr<Index> createIndex(arangodb::velocypack::Slice const& info,
                                      bool restore, bool& created) override;