In your python file Ex4.7-A.py line 51 I think it should read
temp[((value_A_Changed, value_B_Changed),reward)] = temp.get( ((value_A_Changed, value_B_Changed),reward), 0 )
instead of
temp[((value_A_Changed, value_B_Changed),reward)] = temp.get( (value_A_Changed, value_B_Changed), 0 )
The second line above will always return 0 because the key (value_A_Changed, value_B_Changed) does not exist in temp
I tried rerunning it with this change and could not reproduce the answer of the book. I am attaching the optimal policy map that I got

In your python file Ex4.7-A.py line 51 I think it should read
temp[((value_A_Changed, value_B_Changed),reward)] = temp.get( ((value_A_Changed, value_B_Changed),reward), 0 )instead of
temp[((value_A_Changed, value_B_Changed),reward)] = temp.get( (value_A_Changed, value_B_Changed), 0 )The second line above will always return 0 because the key
(value_A_Changed, value_B_Changed)does not exist intempI tried rerunning it with this change and could not reproduce the answer of the book. I am attaching the optimal policy map that I got