Dacke
|
|
« Reply #220 on: September 12, 2015, 07:05:05 PM » |
|
Post the message?
|
|
|
Logged
|
programming • free software animal liberation • veganism anarcho-communism • intersectionality • feminism
|
|
|
ProgramGamer
|
|
« Reply #221 on: September 12, 2015, 07:21:46 PM » |
|
fatal: The current branch master has no upstream branch. To push the current branch and set the remote as upstream, use
git push --set-upstream www.WebSiteWhereIWantToUploadMyThings.com master
Here's the message
|
|
|
Logged
|
|
|
|
Dacke
|
|
« Reply #222 on: September 12, 2015, 08:33:57 PM » |
|
What command did you use? What does git remote say? edit: My guess is that you've forgotten to connect your local repository (on your computer) to your bitbucket repository. Did you add bitbucket as a remote repository, as per the bitbucket tutorial? https://confluence.atlassian.com/bitbucket/create-a-repository-221449521.html
|
|
« Last Edit: September 13, 2015, 05:00:08 AM by Dacke »
|
Logged
|
programming • free software animal liberation • veganism anarcho-communism • intersectionality • feminism
|
|
|
gimymblert
|
|
« Reply #223 on: September 13, 2015, 07:31:04 AM » |
|
wrote this import os import sys
File = open("C:\Users\user\Documents\#1 ConceptNet Relations\dictionary.txt","r") line = File.readline() print(line)
got this C:\Python34\python.exe C:/Users/user/PycharmProjects/hellopython/ParseDictionaryToUnique.py File "C:/Users/user/PycharmProjects/hellopython/ParseDictionaryToUnique.py", line 4 File = open("C:\Users\user\Documents\#1 ConceptNet Relations\dictionary.txt","r") ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Process finished with exit code 1 What's wrong??
|
|
|
Logged
|
|
|
|
Layl
|
|
« Reply #224 on: September 13, 2015, 07:33:24 AM » |
|
What's wrong??
String escaping
|
|
|
Logged
|
|
|
|
indie11
|
|
« Reply #225 on: September 13, 2015, 07:45:47 AM » |
|
Anyone here ever worked on a turn-based multiplayer game in Unity? If so, did you roll own your own system or some 3rd party API?
|
|
|
Logged
|
|
|
|
gimymblert
|
|
« Reply #226 on: September 13, 2015, 07:52:58 AM » |
|
I solved with randomly stubbling on unrelated stack overflow about another problem, Using "r" as a prefix solve it (string as raw) apparently it's the \u that is a problem and now I have import os import sys
File = open(r"C:\Users\user\Documents\#1 ConceptNet Relations\dictionary.txt", "r") # r before "" for raw tesxt line = File.readline() print(line) l = list(File) for line in File: print (line) which lead to Traceback (most recent call last): File "C:/Users/user/PycharmProjects/hellopython/ParseDictionaryToUnique.py", line 8, in <module> for line in File: File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 4847: character maps to <undefined>
When I comment the line I get a cut off there colombia
2009_banja_luka_challenger
bosnia_and_herzegovina
2009_barcelona_open_banco_sabadell
barcelona
2009_barcelona_open_banco_sabadell
spain
2009_bh_telecom_indoors Looking at the same place in the file genoa 2009_asb_classic auckland 2009_australian_open victoria/n/australia 2009_bancolombia_open colombia 2009_banja_luka_challenger bosnia_and_herzegovina 2009_barcelona_open_banco_sabadell barcelona 2009_barcelona_open_banco_sabadell spain 2009_bh_telecom_indoors bosnia_and_herzegovina 2009_bh_tennis_open_international_cup brazil 2009_brazilian_grand_prix autódromo_josé_carlos_pace 2009_brazilian_grand_prix são_paulo 2009_british_grand_prix buckinghamshire Nothing suspect at all, the next line seems very fine ...
|
|
|
Logged
|
|
|
|
|
gimymblert
|
|
« Reply #228 on: September 13, 2015, 08:26:57 AM » |
|
It's from the former blitz parser I wrote, so if there is a strange keycode, it should be at everyline! Maybe the original data have some strange char? I'm looking at your link The blitzcode in question, super straightforward ... Include "blitz 3D test parser.bb" ;http://www.blitzbasic.com/codearcs/codearcs.php?code=161
;http://www.zytrax.com/tech/codes.htm ;HT-09-9-Horizontal Tab
; Set The Graphic Mode Graphics 600,300,0,2
; Open the file to Read filein = ReadFile("C:\Users\user\Desktop\part_00.csv")
file$ = "" this = 0
currentFolder$ = CurrentDir() + "#1 ConceptNet Relations"
dictionary$ = "\dictionary.txt" Dictfile = WriteFile (currentFolder + dictionary)
If FileType(currentFolder) <> 2 Then Print "no folder found! - trying to create new folder" CreateDir currentFolder Print currentFolder If FileType(currentFolder) <> 2 Then Print "Creation failed!!":WaitKey() Else Print "creation succeed!":WaitKey() EndIf EndIf
Print
Print "Lines of text read from file " + filein Print
rtemp$ = "";for list of different relation rcount = 0
;MAINLOOP: skim through each line parsing and filtering data While Not Eof( filein ) Or KeyHit (1)
Read1$ = ReadLine$( filein ) ; read a new line parse( Read1, Chr(9) ) ; parse the line into chunk
prevcount = count count = 0
;strip the non desired data from the chunk, discard useless chunk For back.parsereturn=Each parsereturn ;flip through the chunk ;let's try to rip the member after the assertion ;If count = 1 Then relation$ = back\word ;store the relation If count = 2 Then arg1$ = back\word ;store arguments 1 If count = 3 Then arg2$ = back\word ;store arguments 1
count = count +1 Next
If Instr(arg1,"/c/en/") <> 0 And Instr(arg2,"/c/en/") <> 0 Then arg1=Replace (arg1,"/c/en/",""): arg2=Replace (arg2,"/c/en/","") ParseArg(arg1,dictfile) ParseArg(arg2,dictfile) EndIf
;visualization control If KeyHit (28) Then WaitKey() Print "pause" EndIf ;Print
Wend ;END MAINLOOP
CloseFile (Dictfile) Print "end " + rcount ;50 CloseFile( filein ) ;WaitKey()
;----------------------------------------------------------------
Function ParseArg(arg$, file) parse (arg, ",") Local count = 0
For back.parsereturn = Each parsereturn count = count + 1 WriteLine( file , back\word ) Next
; writeline (file, element)
; argcount$ = count ; Print argcount + " " + arg
End Function
EDIT: Oh it seems it's about the latin capital or something like that http://www.i18nqa.com/debug/bug-double-conversion.htmlhttp://www.fileformat.info/info/unicode/char/00cd/index.htm
|
|
« Last Edit: September 13, 2015, 08:32:08 AM by Jimym GIMBERT »
|
Logged
|
|
|
|
gimymblert
|
|
« Reply #229 on: September 13, 2015, 01:19:28 PM » |
|
:panda:File = open(r"C:\Users\user\Documents\#1 ConceptNet Relations\dictionary.txt", "r", encoding="utf8") Traceback (most recent call last): File "C:/Users/user/PycharmProjects/hellopython/ParseDictionaryToUnique.py", line 9, in <module> print(line) File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u1ed9' in position 11: character maps to <undefined>
|
|
|
Logged
|
|
|
|
gimymblert
|
|
« Reply #230 on: September 13, 2015, 01:35:13 PM » |
|
Okay That's weird, I opened Notepad++ convert the whole file to many encoding, still get an error no mater what, generally at consistent but different break point HALP!
|
|
|
Logged
|
|
|
|
Cheesegrater
Level 1
|
|
« Reply #231 on: September 13, 2015, 02:04:37 PM » |
|
Its probably not UTF-8. Have you tried latin-1? UTF-16?
|
|
|
Logged
|
|
|
|
gimymblert
|
|
« Reply #232 on: September 13, 2015, 02:12:52 PM » |
|
I'm trying everything, I haven't found a list of unicode parameter on python yet
EDIT: Latin-1 DID IT!
|
|
|
Logged
|
|
|
|
gimymblert
|
|
« Reply #233 on: September 13, 2015, 02:19:16 PM » |
|
OKAY, now I can print with repr(line) but not directly better though, just need to kill the \n
|
|
|
Logged
|
|
|
|
gimymblert
|
|
« Reply #234 on: September 13, 2015, 08:25:35 PM » |
|
well still don't work: The goal was to use the "set" to remove all duplicates, but as soon as I move from the repr to the actual set function it crashed because unicode ..... Rsearch show that unicode is a nightmare on python (and in general), I don't know what to do, now I file it as failure. Problem is that 2 140 288 line in the folder, trying them all for duplicate would take n² time using brute force Turn out I need a crash course in sorting no way to avoid it now.
|
|
|
Logged
|
|
|
|
gimymblert
|
|
« Reply #235 on: September 13, 2015, 09:07:30 PM » |
|
|
|
|
Logged
|
|
|
|
Dacke
|
|
« Reply #236 on: September 13, 2015, 10:11:42 PM » |
|
If possible, try to avoid having text files in anything but utf8. Death to latin1. Using a set for utf8 text works perfectly fine. I made this file and encoded it in utf8: Öñü中華民族日本語 ひらがな平仮名 2009_bh_telecom_indoors bosnia_and_herzegovina Öñü中華民族日本語 2009_bh_telecom_indoors
Then I wrote this python3 program: # open file file = open("utf8lines.txt")
# create set unique_lines = set()
# strip and add all lines in file to set for line in file: unique_lines.add(line.strip())
# print all lines in set for line in unique_lines: print(line)
Which correctly outputs the unique lines: Öñü中華民族日本語 ひらがな平仮名 2009_bh_telecom_indoors bosnia_and_herzegovina
|
|
|
Logged
|
programming • free software animal liberation • veganism anarcho-communism • intersectionality • feminism
|
|
|
gimymblert
|
|
« Reply #237 on: September 13, 2015, 10:38:44 PM » |
|
To be frank I'm parsing a text from another text that I didn't build originally (the conceptnet)
I did fucked up some character, which mean I will have miss match for the next step.
Basically I have 5gb of csv data in 5 files, I extracted all the relevant data of the first file, only from english concept, in separate file where the semantic relation is the name of the file and the argument are tab separated on each line. Then I extract all the arguments in a dictionary file, tried to remove duplicate and order them alphabetically in hope to build an index.
Then the next step would have been to replace the argument in the relation file by the concept index in the dictionary file. Then generalize for all concept, in all languages, to extract the remaining data in the 5GB database, by automatically detected other language and cross language concept.
So far it's compromised, at least with my current implementation.
I'll tried your stuff.
|
|
|
Logged
|
|
|
|
gimymblert
|
|
« Reply #238 on: September 13, 2015, 11:38:54 PM » |
|
nope bar_aqueduct jujubinus_poppei yunshan_road_station yves_duval brinklow donggyo-dong fuentelsaz_de_soria étienne_boulay Traceback (most recent call last): File "C:/Users/user/PycharmProjects/hellopython/ParseDictionaryToUnique.py", line 24, in <module> print(line) File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0144' in position 5: character maps to <undefined>
|
|
|
Logged
|
|
|
|
Dacke
|
|
« Reply #239 on: September 13, 2015, 11:46:29 PM » |
|
You still have to make sure to get the encoding right. But that's true no matter what programming language you use, python isn't better or worse.
edit: This isn't the issue, Boris gets it right in the next post.
|
|
« Last Edit: September 13, 2015, 11:53:55 PM by Dacke »
|
Logged
|
programming • free software animal liberation • veganism anarcho-communism • intersectionality • feminism
|
|
|
|