Rule confidence is used to  Identify frequent item sets  Measure the intuitiveness of a rule  Determine the rule with the most items  Prune rules by eliminating rules with low confidence

We did not include the minimum wind measurements in the analysis since they are highly correlated with the average wind measurements. What is the correlation between min_wind_speed and avg_wind_speed (to two decimals)? (Compute this using one-tenth of the original dataset, and dropping all rows with missing values.)  0.97 -0.12 0.62

If we perform clustering with 20 clusters (and seed = 1), which cluster appears to identify Santa Ana conditions (lowest humidity and highest wind speeds)?  Cluster 12 Cluster 1 Cluster 16

What do clusters 7, 8, and 11 have in common?  They capture weather patterns associated with warm and dry days  They capture weather patterns associated with high air pressure  They capture weather patterns associated with very strong winds

Just by looking at the values for the cluster centers, which cluster contains samples with the lowest relative humidity? Cluster 4 Cluster 3 Cluster 9

This line of code creates a k-means model with 12 clusters:kmeans = KMeans (k=12, seed=1) What is the significance of "seed=1"?  This sets the seed to a specific value, which is necessary to reproduce the k-means results  This specifies that the first cluster centroid is set to sample #1  This means that this is the

If we wanted to create a data subset by taking every 5th sample instead of every 10th sample, how many samples would be in that subset?  317,452 1,587,257 158,726

Why is it necessary to scale the data (Step 4)?  Since the values of the features are on different scales, all features need to be scaled so that all values will be positive.  Since the values of the features are on different scales, all features need to be scaled so that no one feature dominates

What percentage of samples have 0 for rain_accumulation?  157812 / 158726 = 99.4% 157237 / 158726 = 99.1%  There is not enough information to determine this

The support of an item set  Captures the frequency of that item set  Captures the number of items in that item set  Captures how many times that item set is used in a rule  Captures the correlation between the items in that item set

In association analysis, an item set is  A transaction or set of items that occur together  A set of items that two rules have in common  A set of items that infrequently occur together  A set of transactions that occur a certain number of times in the data

The goal of association analysis is  To find the number of outliers in the data  To find rules to capture associations between items or events  To find the number of clusters for cluster analysis  To find the most complex rules to explain associations between as many items as possible in the data.