What’s an attribute? What’s a data instance?

Chapter 2 Assignment

  1. What’s an attribute? What’s a data instance?
  2. What’s noise? How can noise be reduced in a dataset?
  3. Define outlier. Describe 2 different approaches to detect outliers in a dataset.
  4. Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.
  5. Given a sample dataset with missing values, apply an appropriate technique to deal with them.
  6. Give 2 examples in which aggregation is useful.
  7. Given a sample dataset, apply aggregation of data values.
  8. What’s sampling?
  9. What’s simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).
  10. What’s stratified sampling?
  11. What’s “the curse of dimensionality”?
  12. Provide a brief description of what Principal Components Analysis (PCA) does. [Hint: See Appendix A and your lecture notes.] State what’s the input and what the output of PCA is.
  13. What’s the difference between dimensionality reduction and feature selection?
  14. Describe in detail 2 different techniques for feature selection.
  15. Given a sample dataset (represented by a set of attributes, a correlation matrix, a co-variance matrix, …), apply feature selection techniques to select the best attributes to keep (or equivalently, the best attributes to remove).
  16. What’s the difference between feature selection and feature extraction?
  17. Give two examples of data in which feature extraction would be useful.
  18. Given a sample dataset, apply feature extraction.
  19. What’s data discretization and when is it needed?
  20. What’s the difference between supervised and unsupervised discretization?
  21. Given a sample dataset, apply unsupervised (e.g., equal width, equal frequency) discretization, or supervised discretization (e.g., using entropy).
  22. Describe 2 approaches to handle nominal attributes with too many values.
  23. Given a dataset, apply variable transformation: Either a simple given function, normalization, or standardization.
  24. Definition of Correlation and Covariance, and how to use them in data pre-processing (see pp. 76-78).

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Benefits of our college essay writing service

  • 80+ disciplines

    Buy an essay in any subject you find difficult—we’ll have a specialist in it ready

  • 4-hour deadlines

    Ask for help with your most urgent short tasks—we can complete them in 4 hours!

  • Free revision

    Get your paper revised for free if it doesn’t meet your instructions.

  • 24/7 support

    Contact us anytime if you need help with your essay

  • Custom formatting

    APA, MLA, Chicago—we can use any formatting style you need.

  • Plagiarism check

    Get a paper that’s fully original and checked for plagiarism

What the numbers say?

  • 527
    writers active
  • 9.5 out of 10
    current average quality score
  • 98.40%
    of orders delivered on time
error: