The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.

Data

gender_submission.csvtest.csvtrain.csv

train_data

  1. train_data = pd.read_csv("/kaggle/input/titanic/train.csv")
  2. train_data.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

test_data

  1. test_data = pd.read_csv("/kaggle/input/titanic/test.csv")
  2. test_data.head()
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 892 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q
1 893 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S
2 894 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q
3 895 3 Wirz, Mr. Albert male 27.0 0 0 315154 8.6625 NaN S
4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0 1 1 3101298 12.2875 NaN S

gender_submission

  1. gender_submission = pd.read_csv("/kaggle/input/titanic/gender_submission.csv")
  2. gender_submission.head()
PassengerId Survived
0 892 0
1 893 1
2 894 0
3 895 0
4 896 1

Code

随机森林算法

  1. import numpy as np
  2. import pandas as pd
  3. from sklearn.ensemble import RandomForestClassifier
  4. # 获取数据
  5. train_data = pd.read_csv("/kaggle/input/titanic/train.csv")
  6. test_data = pd.read_csv("/kaggle/input/titanic/test.csv")
  7. gender_submission = pd.read_csv("/kaggle/input/titanic/gender_submission.csv")
  8. # 查看男性与女性的生存概率
  9. women = train_data.loc[train_data.Sex == 'female']["Survived"]
  10. rate_women = sum(women)/len(women)
  11. print("% of women who survived:", rate_women)
  12. men = train_data.loc[train_data.Sex == 'male']["Survived"]
  13. rate_men = sum(men)/len(men)
  14. print("% of men who survived:", rate_men)
  15. # 随机森林算法
  16. y = train_data["Survived"]
  17. features = ["Pclass", "Sex", "SibSp", "Parch"]
  18. X = pd.get_dummies(train_data[features])
  19. X_test = pd.get_dummies(test_data[features])
  20. model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)
  21. model.fit(X, y)
  22. predictions = model.predict(X_test)
  23. print(predictions)
  24. gender_submission['Survived'] = predictions
  25. print(gender_submission)
  26. gender_submission.to_csv('my_submission.csv', index=False)