MongoDB queries

Mongo find explained better.
Mongo queries to show why validation is a good thing.
Mongo queries compare to sql.
Aggregation you can't do with normal queries.
Explain and Atlas
Finding slow queries. currentOp and oplog
Concurrency and isolation

Mongo find explained better

The most important thing to remember is the function ":" is overloaded. Sometimes it can mean equal or "into" like an array. For example if you use db.col1.find({field1 : "123"}) if will return documents in the collection "col1" where the field "field1" is equal to "123" OR AND A BIG OR when the field "field1" is an an array and one of the elements is euqal to "123". Truly annoying if you don'tr know that, but powerful. I will try to list a brief list of overloaded operators in Mongo.
Another thing will documents is they may not have a field you are searching for. For example, if you want to know documents that have a bad value, well, not having a value might be bad. So you have to test if the field exists, what type it is, and if the value is bad. TODO: example:
You can force structure to your documents in 3.2 is later. TODO: put example here.
For aggregation pipline, it is nothing magical. Aggreation pipline is just making a series of queries based on what the first query does. The 2nd and later queries can delete field, add fields, do distinct and group, etc. The biggest thing for Aggregation Pipeline is to minimize the documents examined to improve speed and also so the later commands in the pipline can be faster by having less documents to analyze. The first query creates a temporary list of collections to be used by each step in the pipline (which can add and delete or modify fields).

Execute this setup in mongo shell or save it to a file called "mark1.js" and in the mongo shell load("mark1.js")

You will note the entire thing has to be in javascript for load to work. This is an example of validation, ttl, javascript, and when the purge happens. When you run the count, it also print the time, and that is the second every minute it purges indexes.


db = db.getSiblingDB('markNielsen');
db.overloaded.drop();

db.createCollection( "overloaded",  {
  validator: { i: {$type: "long"}, f: {$type: "double"}, s: {$type: "string"}, t: {$type: "date"} } 
} )

function fillData() {

  for( var i = 0; i < 100000; i++ ) {
    var f = i + 0.0001;
    var s = i.toString();
    var t = new Date();

    var u = i;	 	       
    var l = 0;
    if ( i % 15 == 0 ) { u = i.toString(); l = 1;}
    if ( i % 15 == 1 ) { u = i + 0.002;  l = 1;}
    if ( i % 15 == 2 ) { u = new Date();  l = 1;}
    if ( i % 15 == 3 ) { u = [i, i+1, i+2, i+3];  l = 1;}
    if ( i % 15 == 4 ) { u = {Blah1: i, Blah2: i+1, Blah3: i+2};  l = 1;}
    if ( i % 15 == 5 ) { l = 2;}
    if ( i % 15 == 6 ) { u = null;}		      
    if ( i % 15 == 7 ) { u = true;}
    if ( i % 15 == 8 ) { u = false;}		      
    if ( i % 15 == 9 ) { u = i+1;}
    if ( i % 15 == 10 ) { u = i+2;}
    if ( i % 15 == 11 ) { u = "removeme";}
		      
    var rm = new Date(); 
		      
    if ( i % 20 == 0 ) {
      if ( i % 100 == 0 ) {
            db.overloaded.insert( {i : NumberLong(i), f: f, s: s, t: t, u: u, lopsided: l, sometimes: i, removeme: rm } );
	}
      else  {db.overloaded.insert( {i : NumberLong(i), f: f, s: s, t: t, u: u, lopsided: l, sometimes: i} );}		       
      } 
    else    {db.overloaded.insert( {i : NumberLong(i), f: f, s: s, t: t, u: u, lopsided: l } );}

    // print something out every 1000		       
    if (i % 10000 == 0) {print("Count:" + i );}

    }
}

fillData()

db.overloaded.createIndexes([ {i: 1}, {f: 1},{s :1},{t:1},{u:1}], {background:true});
db.overloaded.createIndex({partial:1}, {partialFilterExpression: { sometimes: { $exists: true }}});
db.overloaded.createIndex({lopsided:1}, {partialFilterExpression: { lopsided: { $gte: 1}}});

// Certain entries will be removed starting after 10 seconds.
// This makes it to you have a small index to delete stuff from. 		      
db.overloaded.createIndex( {removeme: 1}, { partialFilterExpression: { removeme: {$exists: true}}, expireAfterSeconds: 10, background: true});

And after you have done that, check the indexes and the types of the fields. Do this quickly to watch the ttl in work.

// Let's check indexes
db.overloaded.getIndexes();
// Let's check the fields		      
db.overloaded.aggregate([   { "$project": {"i":{"$type":"$i"},"f":{"$type":"$f"},"s":{"$type":"$s"},"t":{"$type":"$t"}} }, {$limit : 3}    ]);

print ("Watch ttl in action ...... this could take up to two minutes if it missed its window of every minute. ")
print("See the count says they are there.");

for( var i = 0; i < 120; i++ ) {   sleep(1000); v = db.overloaded.find({removeme: {$exists: 1}}).count(); d = new Date(); print (i + " " + v + " " + d)}

Now I will make a table. It will make different queries with different expected results. You should always have some sort of validation and use Mongo 3.4 at least.

Mongo queries to show why validation is a good thing.

Notes	Query	Result
Validation makes "i" always long, "s" a string, "t" a date, and "f" a float.	db.overloaded.find({i: {$lt : 4}}, {_id : 0, i: 1, s:1, f:1})	{ "i" : NumberLong(1), "f" : 1.0001, "s" : "1") } { "i" : NumberLong(2), "f" : 2.0001, "s" : "2") } { "i" : NumberLong(3), "f" : 3.0001, "s" : "3") }
Because there is no validation, the function ":" is overloaded and could return mutiple values of different types and screw up your programming. "u" is unknown and unvalidated.	db.overloaded.find({u: 5}, {_id : 0, s:1, u: 1})	{ "s" : "3", "u" : [ 3, 4, 5, 6 ] } { "s" : "5", "u" : 5 }
Aggregate to show what types they are	db.overloaded.aggregate([ {$match: {u: 5}}, {$project: {_id:0, u:1, u_type : { $type: "$u" } } } ])	{ "u" : [ 3, 4, 5, 6 ], "u_type" : "array" } { "u" : 5, "u_type" : "double" }
Since you don't have validation, use this {u : {$not : {$gte: []}}}	db.overloaded.find( { $and : [ {u : {$not : {$gte: []}}}, {u: 5}]}, {_id : 0, u: 1} ) ;	{ "u" : 5 }
We used "$and" because in find, it only looks at the last match for a variable otherwise.
	db.overloaded.find({u: 14, u:13}, {_id : 0, u: 1})	{"u" : 13 }
	db.overloaded.find({u: 13, u:14}, {_id : 0, u: 1})	{ "u" : 14 }
And our aggregate to do the same thing. We can use mutiple "$match" instead of "$and". But its probably more efficient to use "$and" inside of a "$match".	db.overloaded.aggregate([ {$match: {u: 5}}, {$match: {u: {$not : {$gte: []}}} }, {$project: {_id:0, u:1 } } ])	{ "u" : 5 }
	db.overloaded.aggregate([ {$match: {$and : [{u: 5},{u: {$not : {$gte: []}}}] }}, {$project: {_id:0, u:1 } } ])	{ "u" : 5 }
But functions on arrays only returns arrays	db.overloaded.find({u: { $elemMatch: { $eq: 5 } }}, {_id : 0, u: 1})	{ "u" : [ 3, 4, 5, 6 ] }
Aggregation example	db.overloaded.aggregate([ {$match : {u: { $elemMatch: { $eq: 5 } }}}, { $project : {_id : 0, u: 1}}])	{ "u" : [ 3, 4, 5, 6 ] }

Mongo queries compare to sql.

Links

sql to mongo map: https://docs.mongodb.com/manual/reference/sql-aggregation-comparison/
Quick reference cards: https://www.mongodb.com/collateral/quick-reference-cards

Aggregation you can't do with normal queries.

Aggregation optimization

Aggregation Optimization

Explain and Atlas

Finding slow queries. currentOp and oplog

Concurrency and isolation

We are using mongo 3.4 on ubuntu, 64 bit. We create a collection and show the updates will provide with inconsistent results from a lock perspective, because locks yield. This is expected in mongo.

Even though this is the way mogno is, it switches locks on long runnning queries which can lead to inconsistent results. DO this:


wget http://www.menprojects.com/public/mongo/inconsistent.txt
mongo inconsistent.txt
mongo markNielsen --quiet --eval 'db.inconsistent.update({}, { $set: { v: 1} }, {multi: true})' &
mongo markNielsen --quiet --eval 'db.inconsistent.update({$or :  [ {_id : 1} ,{_id: 100000} ]}, {$set : {v:1}}, {multi: true})' &
mongo markNielsen --quiet --eval 'db.inconsistent.update({$or :  [ {_id : 2} ,{_id: 200000} ]}, {$set : {v:2}}, {multi: true})' &
mongo markNielsen --quiet --eval 'db.inconsistent.update({$or :  [ {_id : 3} ,{_id: 300000} ]}, {$set : {v:3}}, {multi: true})' &
mongo markNielsen --quiet --eval 'db.inconsistent.update({$or :  [ {_id : 4} ,{_id: 400000} ]}, {$set : {v:4}}, {multi: true})' &
mongo markNielsen --quiet --eval 'db.inconsistent.update({$or :  [ {_id : 5} ,{_id: 500000} ]}, {$set : {v:4}}, {multi: true})' &
mongo markNielsen --quiet --eval 'db.inconsistent.update({$or :  [ {_id : 6} ,{_id: 600000} ]}, {$set : {v:5}}, {multi: true})' &

mongo markNielsen --quiet --eval 'db.inconsistent.find({v:1}).count()'
mongo markNielsen --quiet --eval 'db.inconsistent.find({v:2}).count()'
mongo markNielsen --quiet --eval 'db.inconsistent.find({v:3}).count()'
mongo markNielsen --quiet --eval 'db.inconsistent.find({v:4}).count()'
mongo markNielsen --quiet --eval 'db.inconsistent.find({v:5}).count()'
mongo markNielsen --quiet --eval 'db.inconsistent.find({v:6}).count()'

All the answers should be "2" if all write queries happen sequentially. However, if you see some 0 or 1 counts the big query didn't finish while the other wrote, and the big query overwrote them eventuallty. Especially with "1".

My results may differ than yours.

    $ mongo markNielsen --quiet --eval 'db.inconsistent.find({v:1}).count()'
    1357932
    $ mongo markNielsen --quiet --eval 'db.inconsistent.find({v:2}).count()'
    0
    $ mongo markNielsen --quiet --eval 'db.inconsistent.find({v:3}).count()'
    1
    $ mongo markNielsen --quiet --eval 'db.inconsistent.find({v:4}).count()'
    2
    $ mongo markNielsen --quiet --eval 'db.inconsistent.find({v:5}).count()'
    1
    $ mongo markNielsen --quiet --eval 'db.inconsistent.find({v:6}).count()'
    0