Mongodb M101p Homework 5

 [Solution] Week 4 :Performance : M101P: MongoDB for Developers

Homework 4.1 : 

Suppose you have a collection with the following indexes:

> db.products.getIndexes() [ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "store.products", "name" : "_id_" }, { "v" : 1, "key" : { "sku" : 1 }, "unique" : true, "ns" : "store.products", "name" : "sku_1" }, { "v" : 1, "key" : { "price" : -1 }, "ns" : "store.products", "name" : "price_-1" }, { "v" : 1, "key" : { "description" : 1 }, "ns" : "store.products", "name" : "description_1" }, { "v" : 1, "key" : { "category" : 1, "brand" : 1 }, "ns" : "store.products", "name" : "category_1_brand_1" }, { "v" : 1, "key" : { "reviews.author" : 1 }, "ns" : "store.products", "name" : "reviews.author_1" } ]

Which of the following queries can utilize at least one index to find all matching documents, or to sort? Check all that apply.

Note: the text for some answers may wrap; you can ignore the wrapping.



Homework 4.2 : 

Suppose you have a collection called tweets whose documents contain information about the created_at time of the tweet and the user's followers_count at the time they issued the tweet. What can you infer from the following explain output?

> db.tweets.explain("executionStats").find( { "user.followers_count" : { $gt : 1000 } } ).limit(10).skip(5000).sort( { created_at : 1 } ) { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "twitter.tweets", "indexFilterSet" : false, "parsedQuery" : { "user.followers_count" : { "$gt" : 1000 } }, "winningPlan" : { "stage" : "LIMIT", "limitAmount" : 0, "inputStage" : { "stage" : "SKIP", "skipAmount" : 0, "inputStage" : { "stage" : "FETCH", "filter" : { "user.followers_count" : { "$gt" : 1000 } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "created_at" : -1 }, "indexName" : "created_at_-1", "isMultiKey" : false, "direction" : "backward", "indexBounds" : { "created_at" : [ "[MinKey, MaxKey]" ] } } } } }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 10, "executionTimeMillis" : 563, "totalKeysExamined" : 251120, "totalDocsExamined" : 251120, "executionStages" : { "stage" : "LIMIT", "nReturned" : 10, "executionTimeMillisEstimate" : 500, "works" : 251121, "advanced" : 10, "needTime" : 251110, "needFetch" : 0, "saveState" : 1961, "restoreState" : 1961, "isEOF" : 1, "invalidates" : 0, "limitAmount" : 0, "inputStage" : { "stage" : "SKIP", "nReturned" : 10, "executionTimeMillisEstimate" : 500, "works" : 251120, "advanced" : 10, "needTime" : 251110, "needFetch" : 0, "saveState" : 1961, "restoreState" : 1961, "isEOF" : 0, "invalidates" : 0, "skipAmount" : 0, "inputStage" : { "stage" : "FETCH", "filter" : { "user.followers_count" : { "$gt" : 1000 } }, "nReturned" : 5010, "executionTimeMillisEstimate" : 490, "works" : 251120, "advanced" : 5010, "needTime" : 246110, "needFetch" : 0, "saveState" : 1961, "restoreState" : 1961, "isEOF" : 0, "invalidates" : 0, "docsExamined" : 251120, "alreadyHasObj" : 0, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 251120, "executionTimeMillisEstimate" : 100, "works" : 251120, "advanced" : 251120, "needTime" : 0, "needFetch" : 0, "saveState" : 1961, "restoreState" : 1961, "isEOF" : 0, "invalidates" : 0, "keyPattern" : { "created_at" : -1 }, "indexName" : "created_at_-1", "isMultiKey" : false, "direction" : "backward", "indexBounds" : { "created_at" : [ "[MinKey, MaxKey]" ] }, "keysExamined" : 251120, "dupsTested" : 0, "dupsDropped" : 0, "seenInvalidated" : 0, "matchTested" : 0 } } } } }, "serverInfo" : { "host" : "generic-name.local", "port" : 27017, "version" : "3.0.1", "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952" }, "ok" : 1 }

Homework 4.3 :

use blog db.posts.drop()
From the mac or PC terminal window
mongoimport --drop -d blog -c posts posts.json

The blog has been enhanced so that it can also display the top 10 most recent posts by tag. There are hyperlinks from the post tags to the page that displays the 10 most recent blog entries for that tag. (run the blog and it will be obvious)

Your assignment is to make the following blog pages fast:

The blog home page
The page that displays blog posts by tag (http://localhost:8082/tag/whatever)
The page that displays a blog entry by permalink (http://localhost:8082/post/permalink)
By fast, we mean that indexes should be in place to satisfy these queries such that we only need to scan the number of documents we are going to return.

To figure out what queries you need to optimize, you can read the blog.py code and see what it does to display those pages. Isolate those queries and use explain to explore.

Once you have added the indexes to make those pages fast run the following

python validate.py

(note that for folks who are using MongoLabs or MongoHQ there are some command line options to validate.py to make it possible to use those services) Now enter the validation code below.

Making the Blog fast
Please download hw4-3.zip from the Download Handout link to get started. This assignment requires Mongo 3.0 or above.

In this homework assignment you will be adding some indexes to the post collection to make the blog fast.

We have provided the full code for the blog application and you don't need to make any changes, or even run the blog. But you can, for fun.

We are also providing a patriotic (if you are an American) data set for the blog. There are 1000 entries with lots of comments and tags. You must load this dataset to complete the problem.

From the mongo shell:

Solution : 893jfns29f728fn29f20f2


Homework 4.4 :

In this problem you will analyze a profile log taken from a different mongoDB instance and you will import it into a collection named sysprofile. To start, please download sysprofile.json from Download Handout link and import it with the following command:

mongoimport --drop -d m101 -c profile sysprofile.json
Now query the profile data, looking for all queries to the students collection in the database school2, sorted in order of decreasing latency. What is the latency of the longest running operation to the collection, in milliseconds?

Solution :



Enjoy....!!!!

Feel free to comment below your experience with above approach and If you still find any problem  with above steps Let me know I would love to help you to resolve your  problem.

 If you want to take your Technological Knowledge to the Next Level and For More Technological information Stay tuned to Visionfortech


 [Solution] Week 5 :Aggregation Framework : M101P: MongoDB for Developers

Homework 5.1 : 

Finding the most frequent author of comments on your blog

In this assignment you will use the aggregation framework to find the most frequent author of comments on your blog. We will be using a data set similar to ones we've used before.

Start by downloading the handout zip file for this problem. Then import into your blog database as follows:

mongoimport --drop -d blog -c posts posts.json


Now use the aggregation framework to calculate the author with the greatest number of comments.

To help you verify your work before submitting, the author with the fewest comments is Mariela Sherer and she commented 387 times.

Please choose your answer below for the most prolific comment author:


Homework 5.2 : 


Crunching the Zipcode dataset
Please calculate the average population of cities in California (abbreviation CA) and New York (NY) (taken together) with populations over 25,000.
For this problem, assume that a city name that appears in more than one state represents two separate cities.
Please round the answer to a whole number.
Hint: The answer for CT and NJ (using this data set) is 38177.
Please note:
Different states might have the same city name.
A city might have multiple zip codes.

For this problem, we have used a subset of the data you previously used in zips.json, not the full set. For this set, there are only 200 documents (and 200 zip codes), and all of them are in New York, Connecticut, New Jersey, and California.
You can download the handout and perform your analysis on your machine with

mongoimport --drop -d test -c zips small_zips.json Note

This is raw data from the United States Postal Service. If you notice that while importing, there are a few duplicates fear not, this is expected and will not affect your answer.

Once you've generated your aggregation query and found your answer, select it from the choices below.
Please use the Aggregation pipeline to solve this problem.

Homework 5.3 :

Solution : 1


Homework 5.4 :

Removing Rural Residents

In this problem you will calculate the number of people who live in a zip code in the US where the city starts with a digit. We will take that to mean they don't really live in a city. Once again, you will be using the zip code collection, which you will find in the 'handouts' link in this page. Import it into your mongod using the following command from the command line:

mongoimport --drop -d test -c zips zips.json

This is raw data from the United States Postal Service. If you notice that while importing, there are a few duplicates fear not, this is expected and will not affect your answer.

If you imported it correctly, you can go to the test database in the mongo shell and conform that

> db.zips.count()

yields 29,467 documents.

The project operator can extract the first digit from any field. For example, to extract the first digit from the city field, you could write this query:

db.zips.aggregate([ {$project: { first_char: {$substr : ["$city",0,1]}, } } ])

Using the aggregation framework, calculate the sum total of people who are living in a zip code where the city starts with a digit. Choose the answer below.

You will need to probably change your projection to send more info through than just that first character. Also, you will need a filtering step to get rid of all documents where the city does not start with a digital (0-9).

                                                Solution :298016

Enjoy....!!!!

Feel free to comment below your experience with above approach and If you still find any problem  with above steps Let me know I would love to help you to resolve your  problem.

 If you want to take your Technological Knowledge to the Next Level and For More Technological information Stay tuned to Visionfortech


Related Post :

- Week 1 : Introduction : M101P: MongoDB for Developers


- Week 2 : CRUD : M101P: MongoDB for Developers


- Week 3 : Schema Design : M101P: MongoDB for Developers


- Week 4 : Performance : M101P: MongoDB for Developers

Who's the easiest grader on campus?
Download the handout and mongoimport.
The documents look like this:

{ "_id" : ObjectId("50b59cd75bed76f46522c392"), "student_id" : 10, "class_id" : 5, "scores" : [ { "type" : "exam", "score" : 69.17634380939022 }, { "type" : "quiz", "score" : 61.20182926719762 }, { "type" : "homework", "score" : 73.3293624199466 }, { "type" : "homework", "score" : 15.206314042622903 }, { "type" : "homework", "score" : 36.75297723087603 }, { "type" : "homework", "score" : 64.42913107330241 } ] } There are documents for each student (student_id) across a variety of classes (class_id). Note that not all students in the same class have the same exact number of assessments. Some students have three homework assignments, etc.

Your task is to calculate the class with the best average student performance. This involves calculating an average for each student in each class of all non-quiz assessments and then averaging those numbers to get a class average. To be clear, each student's average includes only exams and homework grades. Don't include their quiz scores in the calculation.

What is the class_id which has the highest average student performance?
Hint/Strategy: You need to group twice to solve this problem. You must figure out the GPA that each student has achieved in a class and then average those numbers to get a class average. After that, you just need to sort. The class with the lowest average is the class with class_id=2. Those students achieved a class average of 37.6
mongoimport --drop -d test -c grades grades.json Below, choose the class_id with the highest average student average.

0 Thoughts to “Mongodb M101p Homework 5

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *