Category «70-773»

Exam 70-773: Analyzing Big Data with Microsoft R

Which three actions should you perform?

You have a dataset that has multiple blocks and only numeric variables.You are computing in a local compute context.You plan to lag a variable named x to create a new variable named x_lagged by using a transform function. Youwill create a new element in the output of the function.You need to minimize the number of …

You need to reduce the amount of time required to estimate each model without losing any information in the predictors.

You are running a large logistic regression for 1,000 feature variables by using the LoisticRegression() functionin the MicrosoftML package. All of the predictor variables are numeric.Currently, you specify the input variables separately by using the following formula.Outcome ~ Feature000 + Feature001 + Feature002 + … + Feature999You discover that it takes 20 minutes to estimate …

Which statement should you use for each environment?

DRAG DROPYou need to set the compute context for three different target environments.Which statement should you use for each environment? To answer, drag the appropriate statements to thecorrect execution contexts. Each statement may be used once, more than once, or not at all. You may need todrag the split bar between panes or scroll to …

Which function should you use?

You have a dataset that has a character variable.You need to create a bag of counts of n-grams.Which function should you use? A. featurizeText() B. categoricalHash() C. concat() D. selectFeatures() E. categorical() Explanation: https://docs.microsoft.com/en-us/machine-learning-server/python-reference/microsoftml/featurizetext Show Answer

Which R code segment should you use?

You plan to analyze data on a local computer. To improve performance, you plan to alternate the operationbetween a Microsoft SQL Server and the local computer.You need to run complex code on the SQL Server, and then revert to the local compute context.Which R code segment should you use? A. sqlCompute <- RxInSqlServer(connectionString = “Driver=SQL …

Which data source should you use?

You need to use the ScaleR distributed processing in an Apache Hadoop environment.Which data source should you use? A. Microsoft SQL Server database B. XDF data files C. ODBC data D. Teradata database Explanation: https://docs.microsoft.com/en-us/machine-learning-server/r/how-to-revoscaler-hadoop Show Answer

What are three possible compute contexts that you can use to achieve this goal?

You are planning the compute contexts for your environment.You need to execute rx-function calls in parallel.What are three possible compute contexts that you can use to achieve this goal? Each correct answer presentsa complete solution.NOTE: Each correct selection is worth one point. A. local parallel B. Spark C. local sequential D. Map Reduce E. SQL …

You need to build time series models to execute forecasting reports on the fact records.

You have cloud and on-premises resources that include Microsoft SQL Server and a big data environment inApache Hadoop.You have 50 billion fact records.You need to build time series models to execute forecasting reports on the fact records.What should you use? A. RxSpark on the Hadoop cluster B. RxHadoopMR on the Hadoop cluster C. RxLocalseq on …