- Linear Regression - To make predictions for sales forecast, price optimization, marketing optimization, financial risk assessment.
- Logistic Regression - To predict customer churn, to predict response versus advertisement spending, predict lifetime value of customer, and to monitor how business decisions affect predicted churn rates.
- Naive Bayes - Build spam detector, analyze customer sentiments, or automatically categorize products, customers or competitors.
- K-means clustering - Useful for cost modeling and customer segmentation
- Hierarchical clustering - Model business processes, or to segment customers based on survey responses, hierarchical clustering will probably come in handy.
- K-nearest neighbor classification - Type of instance based learning. use it for text document classification, financial distress prediction modeling, and competitor analysis and classification.
- Principal component analysis - Dimensionality reduction method that you can use for detecting fraud, for speech recognition, and for spam detection.
We are using Scala 2.11 , Spark 2.4 and Spark MongoDB Connector 2.4.4 Use Case 1 - We wanted to read a Shareded Mongo Collection and copy its data to another Mongo Collection. We noticed that after Spark Job successful completion. Output MongoDB did not had many records. Use Case 2 - We read a MongoDB collection and doing count on dataframe lead to different count on each execution. Analysis, We realized that MongoDB Spark Connector is missing data on bulk read as a dataframe. We tried various partitioner, listed on page - https://www.mongodb.com/docs/spark-connector/v2.4/configuration/ But, none of them worked for us. Finally, we tried MongoShardedPartitioner this lead to constant count on each execution. But, it was greater than the actual count of records on the collection. This seems to be limitation with MongoDB Spark Connector. But, MongoShardedPartitioner seemed closest possible solution to this kind of situation. But, it per...
Comments
Post a Comment